Bin He, San Jose US

Bin He, San Jose, CA US

Patent application number	Description	Published
20080306987	BUSINESS INFORMATION WAREHOUSE TOOLKIT AND LANGUAGE FOR WAREHOUSING SIMPLIFICATION AND AUTOMATION - A method for use with an information (or data) warehouse comprises managing the information warehouse with instructions in a declarative language. The instructions specify information warehouse-level tasks to be done without specifying certain details of how the tasks are to be implemented, for example, using databases and text indexers. The details are hidden from the user and include, for example, in an information warehouse having a FACT table that joins two or more dimension tables, details of database level operations when structured data are being handled, including database command line utilities, database drivers, and structured query language (SQL) statements; and details of text-indexing engines when unstructured data are being handled. The information warehouse is managed in a dynamic way in which different tasks—such as data loading tasks and information warehouse construction tasks—may be interleaved (i.e., there is no particular order in which the different tasks must be completed).	12-11-2008
20080307011	FAILURE RECOVERY AND ERROR CORRECTION TECHNIQUES FOR DATA LOADING IN INFORMATION WAREHOUSES - A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table.	12-11-2008
20080307255	FAILURE RECOVERY AND ERROR CORRECTION TECHNIQUES FOR DATA LOADING IN INFORMATION WAREHOUSES - A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table.	12-11-2008
20080307386	BUSINESS INFORMATION WAREHOUSE TOOLKIT AND LANGUAGE FOR WAREHOUSING SIMPLIFICATION AND AUTOMATION - A method for use with an information (or data) warehouse comprises managing the information warehouse with instructions in a declarative language. The instructions specify information warehouse-level tasks to be done without specifying certain details of how the tasks are to be implemented, for example, using databases and text indexers. The details are hidden from the user and include, for example, in an information warehouse having a FACT table that joins two or more dimension tables, details of database level operations when structured data are being handled, including database command line utilities, database drivers, and structured query language (SQL) statements; and details of text-indexing engines when unstructured data are being handled. The information warehouse is managed in a dynamic way in which different tasks—such as data loading tasks and information warehouse construction tasks—may be interleaved (i.e., there is no particular order in which the different tasks must be completed).	12-11-2008
20090187582	EFFICIENT UPDATE METHODS FOR LARGE VOLUME DATA UPDATES IN DATA WAREHOUSES - A system and method for ensuring large and frequent updates to a data warehouse. The process leverages a set of temporary staging tables to track the updates. A set of intermediate steps are performed to accomplish bulk deletions of the outdated changed records, and perform modifications to the map tables for models such as snowflake. Finally, bulk load operations load the updates and insert them into the final dimension tables. The process ensures performance comparable to insertion-only schemes with at most only slight performance degradation. Furthermore, a modified process is applied on the newfact data warehouse dimension model. The process can be readily adapted to handle star schema and other hierarchical data warehouse models.	07-23-2009
20090187602	Efficient Update Methods For Large Volume Data Updates In Data Warehouses - A system and method for ensuring large and frequent updates to a data warehouse. The process leverages a set of temporary staging tables to track the updates. A set of intermediate steps are performed to accomplish bulk deletions of the outdated changed records, and perform modifications to the map tables for models such as snowflake. Finally, bulk load operations load the updates and insert them into the final dimension tables. The process ensures performance comparable to insertion-only schemes with at most only slight performance degradation. Furthermore, a modified process is applied on the newfact data warehouse dimension model. The process can be readily adapted to handle star schema and other hierarchical data warehouse models.	07-23-2009
20090292704	ADAPTIVE AGGREGATION: IMPROVING THE PERFORMANCE OF GROUPING AND DUPLICATE ELIMINATION BY AVOIDING UNNECESSARY DISK ACCESS - A method for use with an aggregation operation (e.g., on a relational database table) includes a sorting pass and a merging pass. The sorting pass includes: (a) reading blocks of the table from a storage medium into a memory using an aggregation method until the memory is substantially full or until all the data have been read into the memory; (b) determining a number k of blocks to write back to the storage medium from the memory; (c) selecting k blocks from memory, sorting the k blocks, and then writing the k blocks back to the storage medium as a new sublist; and (d) repeating steps (a), (b), and (c) for any unprocessed tuples in the database table. The merging pass includes: merging all the sublists to form an aggregation result using a merge-sort algorithm.	11-26-2009
20090300038	Methods and Apparatus for Reuse Optimization of a Data Storage Process Using an Ordered Structure - Techniques for reducing a number of computations in a data storage process are provided. One or more computational elements are identified in the data storage process. An ordered structure of one or more nodes is generated using the one or more computational elements. Each of the one or more nodes represents one or more computational elements. Further, a weight is assigned to each of the one or more nodes. An ordered structure of one or more reusable nodes is generated by deleting one or more nodes in accordance with the assigned weights. The ordered structure of one or more reusable nodes is utilized to reduce the number of computations in the data storage process. The data storage process converts data from a first format into a second format, and stores the data in the second format on a computer readable medium for data analysis purposes.	12-03-2009
20100161576	DATA FILTERING AND OPTIMIZATION FOR ETL (EXTRACT, TRANSFORM, LOAD) PROCESSES - A method and system are disclosed for use with an ETL (Extract, Transform, Load) process, comprising optimizing a filter expression to select a subset of data and evaluating the filter expression on the data after the extracting, before the loading, but not during the transforming of the ETL process. The method and system optimizes the filtering using a pipelined evaluation for single predicate filtering and an adaptive optimization for multiple predicate filtering. The adaptive optimization includes an initial phase and a dynamic phase.	06-24-2010
20100280991	METHOD AND SYSTEM FOR VERSIONING DATA WAREHOUSES - A method, system, and computer program product are disclosed. Exemplary embodiments of the method, system, and computer program product may include hardware, process steps, and computer program instructions for supporting versioning in a data warehouse. The data warehouse may include a data warehouse engine for creating a data warehouse including a fact table and temporary tables. Updated or new data records may be transferred into the data warehouse and bulk loaded into the temporary tables. The updated or new data records may be evaluated for attributes matching existing data records. A version number may be assigned to data records and data records may be marked as being the most current version. Updated and new data records may be bulk loaded from the temporary tables into the fact table when a version number or a version status is calculated.	11-04-2010
20110113005	SUPPORTING SET-LEVEL SLICE AND DICE IN DATA WAREHOUSES - A method and system for coping with slice and dice operations in data warehouses is disclosed. An external approach may be utilized, creating queries using structured query language on a computer. An algorithm may be used to rewrite the queries. The resulting predicates may be joined to dimension tables corresponding to fact tables. An internal approach may be utilized, using aggregation functions with early aggregation for creating the queries. The results of the slice and dice operations may be outputted to a user on a computer monitor.	05-12-2011
20110213756	CONCURRENCY CONTROL FOR EXTRACTION, TRANSFORM, LOAD PROCESSES - System and methods manage concurrent ETL processes accessing a database. Exemplary embodiments include a method for concurrency management for ETL processes in a database having database tables and communicatively coupled to a computer, the method including establishing a session lock for the database, determining that a current ETL process is accessing the database at a current time, associating a current expiration time with the session lock, the expiration time being stored in a lock table in the database, sending the session lock to the current ETL process and performing ETL-level locking for the current ETL process.	09-01-2011
20110213801	EFFICIENT COMPUTATION OF TOP-K AGGREGATION OVER GRAPH AND NETWORK DATA - A method and system for efficiently answering a local neighborhood aggregation query over graph data. A graph which has a plurality of nodes is received and stored in memory. A local neighborhood aggregation query is received. A processing engine applies forward processing with differential index-based pruning, backward processing using partial distribution, or an enhanced backward processing that combines the backward processing and the forward processing. As a result of the forward, backward, or enhanced backward processing, nodes in the graph that have the top-k highest aggregate values over neighbors within h-hops of the nodes are determined. Identities of entities or persons associated with the determined nodes are presented and/or stored.	09-01-2011
20110219038	SIMPLIFIED ENTITY RELATIONSHIP MODEL TO ACCESS STRUCTURE DATA - A method, system and program product for modeling data as an undirected graph is disclosed. A set of entities and a set of attributes are defined. A set of relationships is defined to represent semantic associations with each association connecting at least two entities. Attributes are associated with entities rather than with relationships. A hierarchical query language with a set of atomic operations on modeled data is employed. The modeled data is displayed on a display unit.	09-08-2011
20110270844	EFFICIENT AND SCALABLE DATA EVOLUTION WITH COLUMN ORIENTED DATABASES - A method, system and program product for data evolution on column oriented databases is disclosed. For an input evolution operation, reusable and non-reusable attributes are identified. For attributes in a target schema that cannot be reused from the source schema, data and bitmap indexes of those attributes are generated from source data and bitmap indexes. A decompose operation is disclosed for decomposing a table into two tables. A merge operation is disclosed in which only one input table can be reused for mergence. A second merge operation is disclosed in which both input tables cannot be reused for mergence.	11-03-2011
20110270871	ICEBERG QUERY EVALUATION IMPLEMENTING A COMPRESSED BITMAP INDEX - Exemplary embodiments include an iceberg query method, including processing the iceberg query using a bitmap index having a plurality of bitmap vectors in a database, eliminating any of the plurality of bitmap vectors in the bitmap index that fails to meet a given condition thereby forming a subset of the plurality of bitmap vectors and aligning the vectors in the subset of the plurality of bitmap vectors in the bitmap index according to respective positions of the bitmap vectors in the subset of the plurality of bitmap vectors.	11-03-2011
20110276553	CLASSIFYING DOCUMENTS ACCORDING TO READERSHIP - One embodiment is a computer-implemented method for classifying documents in a collection of documents according to their intended readerships. The method comprises using a computer to select a document in the collection of documents; and using a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. The method further includes using a computer to classify the selected document as misleading, commercial, or personal according to its determined characteristic; and using a computer to repeat the steps of select document, determine a characteristic of the selected document, and classify the selected document for additional documents in the collection. At least some documents are classified as misleading, at least some documents are classified as commercial, and at least some documents are classified as personal. Other methods and computer program products are also disclosed according to even more embodiments.	11-10-2011
20120209873	SET-LEVEL COMPARISONS IN DYNAMICALLY FORMED GROUPS - Systems and methods are disclosed of processing a set-level query across one or more attributes, the query being grouped by one or more attributes, whereby groups that satisfy the set-level query may be aggregated over one or more attributes. The systems and methods use bitwise arithmetic to efficiently traverse bitmap and bit-slice vectors and indexes of a data relation to determine groups that solve the set-level query.	08-16-2012
20120226695	CLASSIFYING DOCUMENTS ACCORDING TO READERSHIP - A system for classifying documents in a collection of documents according to their intended readerships includes: a computer configured to select a document in the collection of documents; and a computer to determine a characteristic of the selected document, the characteristic being: misleading when the document includes one or more features that are determined to be for a purpose other than reading the document; commercial when the document includes features that are presented for a commercial purpose; or personal when the document includes features of a personal opinion. A computer classifies the selected document as misleading, commercial, or personal according to its determined characteristic; and a computer repeats the steps of select document, determines a characteristic of the selected document, and classifies the selected document for additional documents in the collection. At least some documents are classified as misleading, some as commercial, and at least some as personal.	09-06-2012
20120254120	LOGGING SYSTEM USING PERSISTENT MEMORY - A computer program product, including: a computer readable storage device to store a computer readable program, wherein the computer readable program, when executed by a processor within a computer, causes the computer to perform operations for logging. The operations include: receiving a transaction including data and a log record corresponding to the data; writing the data to a data storage device; and writing the log record to a log space on a persistent memory device coupled to the data storage device.	10-04-2012
20120303633	SYSTEMS AND METHODS FOR QUERYING COLUMN ORIENTED DATABASES - Systems and methods for accessing data stored in a data array, mapping the data using a bitmap index, and processing data queries by determining positions of query attributes in the bitmap index and locating values corresponding to the positions in the data array are described herein.	11-29-2012
20120323867	SYSTEMS AND METHODS FOR QUERYING COLUMN ORIENTED DATABASES - Systems and methods for accessing data stored in a data array, mapping the data using a bitmap index, and processing data queries by determining positions of query attributes in the bitmap index and locating values corresponding to the positions in the data array are described herein.	12-20-2012
20120323962	SET-LEVEL COMPARISONS IN DYNAMICALLY FORMED GROUPS - Methods are disclosed of processing a set-level query across one or more attributes, the query being grouped by one or more attributes, whereby groups that satisfy the set-level query may be aggregated over one or more attributes. The methods use bitwise arithmetic to efficiently traverse bitmap and bit-slice vectors and indexes of a data relation to determine groups that solve the set-level query.	12-20-2012
20130006969	SUPPORTING SET-LEVEL SLICE AND DICE IN DATA WAREHOUSES - A method and system for coping with slice and dice operations in data warehouses is disclosed. An external approach may be utilized, creating queries using structured query language on a computer. An algorithm may be used to rewrite the queries. The resulting predicates may be joined to dimension tables corresponding to fact tables. An internal approach may be utilized, using aggregation functions with early aggregation for creating the queries. The results of the slice and dice operations may be outputted to a user on a computer monitor.	01-03-2013
20130226955	BI-TEMPORAL KEY VALUE CACHE SYSTEM - Described herein are techniques for supporting bi-temporal data in a key value cache system. An embodiment provides bi-temporal data as the basic functionality of a key value cache system. An embodiment provides a redesign of the core data structures of a key value cache system, adds bi-temporal data storage in the key value hashing structure, and provides a temporality-aware memory space manager. Embodiments can achieve the same performance as current key value cache systems for regular queries (that is, the queries that only access the current versions of data) while supporting bi-temporal data.	08-29-2013
20130290283	SCM-CONSCIOUS TRANSACTIONAL KEY-VALUE STORE - Embodiments of a method are described. In one embodiment, the method is a method for executing and supporting transactions. The method includes receiving a transaction comprising a command and data. The method includes writing the data to a transaction manager on a persistent memory device. The transaction manager also maintains a status of the transaction and reference to entries within memory that are manipulated by the transaction. The method also includes creating an in-memory log of the transaction in a first hash directory. The method includes committing a copy of the first hash directory to a second hash directory maintained on a persistent memory device.	10-31-2013
20130290655	SCM-CONSCIOUS TRANSACTIONAL KEY-VALUE STORE - Embodiments of a system are described. In one embodiment, the system is a device for performing operations and supporting transactions. The device is configured to receive a transaction comprising a command and data. The device writes the data to a transaction manager on a persistent memory device. The transaction manager also maintains a status of the transaction and reference to entries within memory that are manipulated by the transaction. The device also creates an in-memory log of the transaction in a first hash directory. The device then commits a copy of the first hash directory to a second hash directory maintained on a persistent memory device.	10-31-2013
20130318090	SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES - Embodiments of the invention provide a system, method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.	11-28-2013
20130318091	SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR FAST AND SCALABLE PROXIMAL SEARCH FOR SEARCH QUERIES - Embodiments of the invention provide a method and computer program products for information retrieval from multiple documents by proximity searching for search queries. A method includes generating an index for the multiple documents, wherein the index includes words in snippets in the documents. An input search query is processed against the index by searching query terms over the snippets to introduce term proximity information implicitly in the information retrieval. Results of multiple sentence level search operations are combined as output.	11-28-2013
20140032603	SIMPLIFIED ENTITY RELATIONSHIP MODEL TO ACCESS STRUCTURE DATA - A system and program product for modeling data as an undirected graph is disclosed. A set of entities and a set of attributes are defined. A set of relationships is defined to represent semantic associations with each association connecting at least two entities. Attributes are associated with entities rather than with relationships. A hierarchical query language with a set of atomic operations on modeled data is employed. The modeled data is displayed on a display unit.	01-30-2014
20140059284	SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS MEMORY SPACE MANAGEMENT FOR STORAGE CLASS MEMORY - Embodiments of the present invention provide a system, method and computer program products for memory space management for storage class memory. One embodiment comprises a method for information storage in an information technology environment. The method comprises storing data in a storage class memory (SCM) space, and storing storage management metadata corresponding to said data, in the SCM in a first data structure. The method further includes buffering storage management metadata corresponding to said data, in a main memory in a second data structure.	02-27-2014
20140195471	TECHNOLOGY PREDICTION - Embodiments of the invention relate to technology prediction. A technical dictionary of technical terms is constructed based on a collection of documents. The technical terms are partitioned into equivalence classes. A table is generated that correlates technical terms across equivalence classes based on temporal co-occurrence of the technical terms across the equivalence classes. For a given technical term the table is accessed to determine a first set of technical terms that correlate to the given technical term. The table is accessed again to determine a second set of technical terms that correlate to the first set of technical terms. It is predicted that the second set of technical terms will correlate to the given technical term in the future.	07-10-2014

Patent applications by Bin He, San Jose, CA US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Bin He, San Jose US

Bin He, San Jose, CA US