Patent application number | Description | Published |
20080306987 | BUSINESS INFORMATION WAREHOUSE TOOLKIT AND LANGUAGE FOR WAREHOUSING SIMPLIFICATION AND AUTOMATION - A method for use with an information (or data) warehouse comprises managing the information warehouse with instructions in a declarative language. The instructions specify information warehouse-level tasks to be done without specifying certain details of how the tasks are to be implemented, for example, using databases and text indexers. The details are hidden from the user and include, for example, in an information warehouse having a FACT table that joins two or more dimension tables, details of database level operations when structured data are being handled, including database command line utilities, database drivers, and structured query language (SQL) statements; and details of text-indexing engines when unstructured data are being handled. The information warehouse is managed in a dynamic way in which different tasks—such as data loading tasks and information warehouse construction tasks—may be interleaved (i.e., there is no particular order in which the different tasks must be completed). | 12-11-2008 |
20080307011 | FAILURE RECOVERY AND ERROR CORRECTION TECHNIQUES FOR DATA LOADING IN INFORMATION WAREHOUSES - A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table. | 12-11-2008 |
20080307255 | FAILURE RECOVERY AND ERROR CORRECTION TECHNIQUES FOR DATA LOADING IN INFORMATION WAREHOUSES - A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table. | 12-11-2008 |
20080307386 | BUSINESS INFORMATION WAREHOUSE TOOLKIT AND LANGUAGE FOR WAREHOUSING SIMPLIFICATION AND AUTOMATION - A method for use with an information (or data) warehouse comprises managing the information warehouse with instructions in a declarative language. The instructions specify information warehouse-level tasks to be done without specifying certain details of how the tasks are to be implemented, for example, using databases and text indexers. The details are hidden from the user and include, for example, in an information warehouse having a FACT table that joins two or more dimension tables, details of database level operations when structured data are being handled, including database command line utilities, database drivers, and structured query language (SQL) statements; and details of text-indexing engines when unstructured data are being handled. The information warehouse is managed in a dynamic way in which different tasks—such as data loading tasks and information warehouse construction tasks—may be interleaved (i.e., there is no particular order in which the different tasks must be completed). | 12-11-2008 |
20090187582 | EFFICIENT UPDATE METHODS FOR LARGE VOLUME DATA UPDATES IN DATA WAREHOUSES - A system and method for ensuring large and frequent updates to a data warehouse. The process leverages a set of temporary staging tables to track the updates. A set of intermediate steps are performed to accomplish bulk deletions of the outdated changed records, and perform modifications to the map tables for models such as snowflake. Finally, bulk load operations load the updates and insert them into the final dimension tables. The process ensures performance comparable to insertion-only schemes with at most only slight performance degradation. Furthermore, a modified process is applied on the newfact data warehouse dimension model. The process can be readily adapted to handle star schema and other hierarchical data warehouse models. | 07-23-2009 |
20090187602 | Efficient Update Methods For Large Volume Data Updates In Data Warehouses - A system and method for ensuring large and frequent updates to a data warehouse. The process leverages a set of temporary staging tables to track the updates. A set of intermediate steps are performed to accomplish bulk deletions of the outdated changed records, and perform modifications to the map tables for models such as snowflake. Finally, bulk load operations load the updates and insert them into the final dimension tables. The process ensures performance comparable to insertion-only schemes with at most only slight performance degradation. Furthermore, a modified process is applied on the newfact data warehouse dimension model. The process can be readily adapted to handle star schema and other hierarchical data warehouse models. | 07-23-2009 |
20090292704 | ADAPTIVE AGGREGATION: IMPROVING THE PERFORMANCE OF GROUPING AND DUPLICATE ELIMINATION BY AVOIDING UNNECESSARY DISK ACCESS - A method for use with an aggregation operation (e.g., on a relational database table) includes a sorting pass and a merging pass. The sorting pass includes: (a) reading blocks of the table from a storage medium into a memory using an aggregation method until the memory is substantially full or until all the data have been read into the memory; (b) determining a number k of blocks to write back to the storage medium from the memory; (c) selecting k blocks from memory, sorting the k blocks, and then writing the k blocks back to the storage medium as a new sublist; and (d) repeating steps (a), (b), and (c) for any unprocessed tuples in the database table. The merging pass includes: merging all the sublists to form an aggregation result using a merge-sort algorithm. | 11-26-2009 |
20090300038 | Methods and Apparatus for Reuse Optimization of a Data Storage Process Using an Ordered Structure - Techniques for reducing a number of computations in a data storage process are provided. One or more computational elements are identified in the data storage process. An ordered structure of one or more nodes is generated using the one or more computational elements. Each of the one or more nodes represents one or more computational elements. Further, a weight is assigned to each of the one or more nodes. An ordered structure of one or more reusable nodes is generated by deleting one or more nodes in accordance with the assigned weights. The ordered structure of one or more reusable nodes is utilized to reduce the number of computations in the data storage process. The data storage process converts data from a first format into a second format, and stores the data in the second format on a computer readable medium for data analysis purposes. | 12-03-2009 |
20100161576 | DATA FILTERING AND OPTIMIZATION FOR ETL (EXTRACT, TRANSFORM, LOAD) PROCESSES - A method and system are disclosed for use with an ETL (Extract, Transform, Load) process, comprising optimizing a filter expression to select a subset of data and evaluating the filter expression on the data after the extracting, before the loading, but not during the transforming of the ETL process. The method and system optimizes the filtering using a pipelined evaluation for single predicate filtering and an adaptive optimization for multiple predicate filtering. The adaptive optimization includes an initial phase and a dynamic phase. | 06-24-2010 |
20100280991 | METHOD AND SYSTEM FOR VERSIONING DATA WAREHOUSES - A method, system, and computer program product are disclosed. Exemplary embodiments of the method, system, and computer program product may include hardware, process steps, and computer program instructions for supporting versioning in a data warehouse. The data warehouse may include a data warehouse engine for creating a data warehouse including a fact table and temporary tables. Updated or new data records may be transferred into the data warehouse and bulk loaded into the temporary tables. The updated or new data records may be evaluated for attributes matching existing data records. A version number may be assigned to data records and data records may be marked as being the most current version. Updated and new data records may be bulk loaded from the temporary tables into the fact table when a version number or a version status is calculated. | 11-04-2010 |
20110113005 | SUPPORTING SET-LEVEL SLICE AND DICE IN DATA WAREHOUSES - A method and system for coping with slice and dice operations in data warehouses is disclosed. An external approach may be utilized, creating queries using structured query language on a computer. An algorithm may be used to rewrite the queries. The resulting predicates may be joined to dimension tables corresponding to fact tables. An internal approach may be utilized, using aggregation functions with early aggregation for creating the queries. The results of the slice and dice operations may be outputted to a user on a computer monitor. | 05-12-2011 |
20110213756 | CONCURRENCY CONTROL FOR EXTRACTION, TRANSFORM, LOAD PROCESSES - System and methods manage concurrent ETL processes accessing a database. Exemplary embodiments include a method for concurrency management for ETL processes in a database having database tables and communicatively coupled to a computer, the method including establishing a session lock for the database, determining that a current ETL process is accessing the database at a current time, associating a current expiration time with the session lock, the expiration time being stored in a lock table in the database, sending the session lock to the current ETL process and performing ETL-level locking for the current ETL process. | 09-01-2011 |
20110213801 | EFFICIENT COMPUTATION OF TOP-K AGGREGATION OVER GRAPH AND NETWORK DATA - A method and system for efficiently answering a local neighborhood aggregation query over graph data. A graph which has a plurality of nodes is received and stored in memory. A local neighborhood aggregation query is received. A processing engine applies forward processing with differential index-based pruning, backward processing using partial distribution, or an enhanced backward processing that combines the backward processing and the forward processing. As a result of the forward, backward, or enhanced backward processing, nodes in the graph that have the top-k highest aggregate values over neighbors within h-hops of the nodes are determined. Identities of entities or persons associated with the determined nodes are presented and/or stored. | 09-01-2011 |
20110219038 | SIMPLIFIED ENTITY RELATIONSHIP MODEL TO ACCESS STRUCTURE DATA - A method, system and program product for modeling data as an undirected graph is disclosed. A set of entities and a set of attributes are defined. A set of relationships is defined to represent semantic associations with each association connecting at least two entities. Attributes are associated with entities rather than with relationships. A hierarchical query language with a set of atomic operations on modeled data is employed. The modeled data is displayed on a display unit. | 09-08-2011 |
20110270844 | EFFICIENT AND SCALABLE DATA EVOLUTION WITH COLUMN ORIENTED DATABASES - A method, system and program product for data evolution on column oriented databases is disclosed. For an input evolution operation, reusable and non-reusable attributes are identified. For attributes in a target schema that cannot be reused from the source schema, data and bitmap indexes of those attributes are generated from source data and bitmap indexes. A decompose operation is disclosed for decomposing a table into two tables. A merge operation is disclosed in which only one input table can be reused for mergence. A second merge operation is disclosed in which both input tables cannot be reused for mergence. | 11-03-2011 |
20110270871 | ICEBERG QUERY EVALUATION IMPLEMENTING A COMPRESSED BITMAP INDEX - Exemplary embodiments include an iceberg query method, including processing the iceberg query using a bitmap index having a plurality of bitmap vectors in a database, eliminating any of the plurality of bitmap vectors in the bitmap index that fails to meet a given condition thereby forming a subset of the plurality of bitmap vectors and aligning the vectors in the subset of the plurality of bitmap vectors in the bitmap index according to respective positions of the bitmap vectors in the subset of the plurality of bitmap vectors. | 11-03-2011 |
20110276553 | CLASSIFYING DOCUMENTS ACCORDING TO READERSHIP - One embodiment is a computer-implemented method for classifying documents in a collection of documents according to their intended readerships. The method comprises using a computer to select a document in the collection of documents; and using a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. The method further includes using a computer to classify the selected document as misleading, commercial, or personal according to its determined characteristic; and using a computer to repeat the steps of select document, determine a characteristic of the selected document, and classify the selected document for additional documents in the collection. At least some documents are classified as misleading, at least some documents are classified as commercial, and at least some documents are classified as personal. Other methods and computer program products are also disclosed according to even more embodiments. | 11-10-2011 |
20120209873 | SET-LEVEL COMPARISONS IN DYNAMICALLY FORMED GROUPS - Systems and methods are disclosed of processing a set-level query across one or more attributes, the query being grouped by one or more attributes, whereby groups that satisfy the set-level query may be aggregated over one or more attributes. The systems and methods use bitwise arithmetic to efficiently traverse bitmap and bit-slice vectors and indexes of a data relation to determine groups that solve the set-level query. | 08-16-2012 |
20120226695 | CLASSIFYING DOCUMENTS ACCORDING TO READERSHIP - A system for classifying documents in a collection of documents according to their intended readerships includes: a computer configured to select a document in the collection of documents; and a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. A computer classifies the selected document as misleading, commercial, or personal according to its determined characteristic; and a computer repeats the steps of select document, determines a characteristic of the selected document, and classifies the selected document for additional documents in the collection. At least some documents are classified as misleading, some as commercial, and at least some as personal. | 09-06-2012 |
20120254120 | LOGGING SYSTEM USING PERSISTENT MEMORY - A computer program product, including: a computer readable storage device to store a computer readable program, wherein the computer readable program, when executed by a processor within a computer, causes the computer to perform operations for logging. The operations include: receiving a transaction including data and a log record corresponding to the data; writing the data to a data storage device; and writing the log record to a log space on a persistent memory device coupled to the data storage device. | 10-04-2012 |
20120303633 | SYSTEMS AND METHODS FOR QUERYING COLUMN ORIENTED DATABASES - Systems and methods for accessing data stored in a data array, mapping the data using a bitmap index, and processing data queries by determining positions of query attributes in the bitmap index and locating values corresponding to the positions in the data array are described herein. | 11-29-2012 |
20120323867 | SYSTEMS AND METHODS FOR QUERYING COLUMN ORIENTED DATABASES - Systems and methods for accessing data stored in a data array, mapping the data using a bitmap index, and processing data queries by determining positions of query attributes in the bitmap index and locating values corresponding to the positions in the data array are described herein. | 12-20-2012 |
20120323962 | SET-LEVEL COMPARISONS IN DYNAMICALLY FORMED GROUPS - Methods are disclosed of processing a set-level query across one or more attributes, the query being grouped by one or more attributes, whereby groups that satisfy the set-level query may be aggregated over one or more attributes. The methods use bitwise arithmetic to efficiently traverse bitmap and bit-slice vectors and indexes of a data relation to determine groups that solve the set-level query. | 12-20-2012 |
20130006969 | SUPPORTING SET-LEVEL SLICE AND DICE IN DATA WAREHOUSES - A method and system for coping with slice and dice operations in data warehouses is disclosed. An external approach may be utilized, creating queries using structured query language on a computer. An algorithm may be used to rewrite the queries. The resulting predicates may be joined to dimension tables corresponding to fact tables. An internal approach may be utilized, using aggregation functions with early aggregation for creating the queries. The results of the slice and dice operations may be outputted to a user on a computer monitor. | 01-03-2013 |
20130226955 | BI-TEMPORAL KEY VALUE CACHE SYSTEM - Described herein are techniques for supporting bi-temporal data in a key value cache system. An embodiment provides bi-temporal data as the basic functionality of a key value cache system. An embodiment provides a redesign of the core data structures of a key value cache system, adds bi-temporal data storage in the key value hashing structure, and provides a temporality-aware memory space manager. Embodiments can achieve the same performance as current key value cache systems for regular queries (that is, the queries that only access the current versions of data) while supporting bi-temporal data. | 08-29-2013 |
20130290283 | SCM-CONSCIOUS TRANSACTIONAL KEY-VALUE STORE - Embodiments of a method are described. In one embodiment, the method is a method for executing and supporting transactions. The method includes receiving a transaction comprising a command and data. The method includes writing the data to a transaction manager on a persistent memory device. The transaction manager also maintains a status of the transaction and reference to entries within memory that are manipulated by the transaction. The method also includes creating an in-memory log of the transaction in a first hash directory. The method includes committing a copy of the first hash directory to a second hash directory maintained on a persistent memory device. | 10-31-2013 |
20130290655 | SCM-CONSCIOUS TRANSACTIONAL KEY-VALUE STORE - Embodiments of a system are described. In one embodiment, the system is a device for performing operations and supporting transactions. The device is configured to receive a transaction comprising a command and data. The device writes the data to a transaction manager on a persistent memory device. The transaction manager also maintains a status of the transaction and reference to entries within memory that are manipulated by the transaction. The device also creates an in-memory log of the transaction in a first hash directory. The device then commits a copy of the first hash directory to a second hash directory maintained on a persistent memory device. | 10-31-2013 |
20130318090 | SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES - Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output. | 11-28-2013 |
20130318091 | SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES - Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output. | 11-28-2013 |
20140032603 | SIMPLIFIED ENTITY RELATIONSHIP MODEL TO ACCESS STRUCTURE DATA - A system and program product for modeling data as an undirected graph is disclosed. A set of entities and a set of attributes are defined. A set of relationships is defined to represent semantic associations with each association connecting at least two entities. Attributes are associated with entities rather than with relationships. A hierarchical query language with a set of atomic operations on modeled data is employed. The modeled data is displayed on a display unit. | 01-30-2014 |
20140059284 | SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS MEMORY SPACE MANAGEMENT FOR STORAGE CLASS MEMORY - Embodiments of the present invention provide a system, method and computer program products for memory space management for storage class memory. One embodiment comprises a method for information storage in an information technology environment. The method comprises storing data in a storage class memory (SCM) space, and storing storage management metadata corresponding to said data, in the SCM in a first data structure. The method further includes buffering storage management metadata corresponding to said data, in a main memory in a second data structure. | 02-27-2014 |
20140195471 | TECHNOLOGY PREDICTION - Embodiments of the invention relate to technology prediction. A technical dictionary of technical terms is constructed based on a collection of documents. The technical terms are partitioned into equivalence classes. A table is generated that correlates technical terms across equivalence classes based on temporal co-occurrence of the technical terms across the equivalence classes. For a given technical term the table is accessed to determine a first set of technical terms that correlate to the given technical term. The table is accessed again to determine a second set of technical terms that correlate to the first set of technical terms. It is predicted that the second set of technical terms will correlate to the given technical term in the future. | 07-10-2014 |