Patent application number | Description | Published |
20090006331 | ENTITY-BASED BUSINESS INTELLIGENCE - A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table. | 01-01-2009 |
20090006349 | ENTITY-BASED BUSINESS INTELLIGENCE - A method is disclosed for conducting a query to transform data in a pre-existing database, the method comprising: collecting database information from the pre-existing database, the database information including inconsistent dimensional tables and fact tables; running an entity discovery process on the inconsistent dimensional tables and the fact tables to produce entity mapping tables; using the entity mapping tables to resolve the inconsistent dimensional tables into resolved dimensional tables; and running the query on a resolved database to obtain a query result, the resolved database including the resolved dimensional table. | 01-01-2009 |
20090125805 | METHODS FOR OBTAINING IMPROVED TEXT SIMILARITY MEASURES - The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms. | 05-14-2009 |
20090192980 | Method for Estimating the Number of Distinct Values in a Partitioned Dataset - The task of estimating the number of distinct values (DVs) in a large dataset arises in a wide variety of settings in computer science and elsewhere. The present invention provides synopses for DV estimation in the setting of a partitioned dataset, as well as corresponding DV estimators that exploit these synopses. Whenever an output compound data partition is created via a multiset operation on a pair of (possibly compound) input partitions, the synopsis for the output partition can be obtained by combining the synopses of the input partitions. If the input partitions are compound partitions, it is not necessary to access the synopses for all the base partitions that were used to construct the input partitions. Superior (in certain cases near-optimal) accuracy in DV estimates is maintained, especially when the synopsis size is small. The synopses can be created in parallel, and can also handle deletions of individual partition elements. | 07-30-2009 |
20090216799 | DISCOVERING TOPICAL STRUCTURES OF DATABASES - A system and method for automatically discovering topical structures of databases includes a model builder adapted to compute various kinds of representations for the database based on schema information and data values of the database. A plurality of base clusterers is also provided, one for each representation. Each base clusterer is adapted to perform, for the representation, preliminary topical clustering of tables within the database to produce a plurality of clusters, such that each of the clusters corresponds to a set of tables on the same topic. A meta-clusterer aggregates results of the clusterers into a final clustering, such that the final clustering comprises a plurality of the clusters. A representative finder identifies representative tables from the clusters in the final clustering. The representative finder identifies at least one representative table for each of the clusters in the final clustering. The representative finder also arranges the representative tables by topic as a topical directory and outputs the topical directory. | 08-27-2009 |
20100223266 | SCALING DYNAMIC AUTHORITY-BASED SEARCH USING MATERIALIZED SUBGRAPHS - According to one embodiment of the present invention, a method for processing a query is provided. The method includes generating a set of pre-computed materialized sub-graphs from a dataset and receiving a search query having one or more search query terms. A particular one of the pre-computed materialized sub-graphs is accessed and a dynamic authority-based keyword search is executed on the particular one of the pre-computed materialized sub-graphs. Nodes in the dataset are then retrieved based on the executing, and a response to the search query is provided which includes the retrieved nodes. | 09-02-2010 |
20110047159 | SYSTEM, METHOD, AND APPARATUS FOR MULTIDIMENSIONAL EXPLORATION OF CONTENT ITEMS IN A CONTENT STORE - A computer-implemented method for accessing content items in a content store are described. In one embodiment, the computer-implemented method includes maintaining a text index of content items in a content store to enable a keyword search on the content items, receiving a query having a keyword and generating a hit list from the text index using the keyword, and extracting frequent phrases from text within content items of the hit list. The computer-implemented method also includes assigning a relative relevance to the frequent phrases and grouping content items into topics based on presence of relevant phrases within the content items of the hit list. The hit list includes one or more content items of the content store. The frequent phrases having a relatively high relevance are relevant phrases. | 02-24-2011 |
20120226639 | Systems and Methods for Processing Machine Learning Algorithms in a MapReduce Environment - Systems and methods for processing Machine Learning (ML) algorithms in a MapReduce environment are described. In one embodiment of a method, the method includes receiving a ML algorithm to be executed in the MapReduce environment. The method further includes parsing the ML algorithm into a plurality of statement blocks in a sequence, wherein each statement block comprises a plurality of basic operations (hops). The method also includes automatically determining an execution plan for each statement block, wherein at least one of the execution plans comprises one or more low-level operations (lops). The method further includes implementing the execution plans in the sequence of the plurality of the statement blocks. | 09-06-2012 |
20130139172 | CONTROLLING THE USE OF COMPUTING RESOURCES IN A DATABASE AS A SERVICE - A method and apparatus controls use of a computing resource by multiple tenants in DBaaS service. The method includes intercepting a task that is to access a computer resource, the task being an operating system process or thread; identifying a tenant that is in association with the task from the multiple tenants; determining other tasks of the tenant that access the computing resource; and controlling the use of the computing resource by the task, so that the total amount of usage of the computing resource by the task and the other tasks does not exceed the limit of usage of the computing resource for the tenant. | 05-30-2013 |
20130325906 | PLACING A DATABASE - A method and system for placing database. The method includes: receiving a request of creating a new database; determining whether there is a need to migrate current database among current virtual machines based on resource demand and free resource in the current virtual machines; determining database placement plan based on the resource demand, migration strategy and migration cost associated with the migration strategy in response to whether there is a need to migrate the database; and executing the database placement plan. The invention can help a database service provider to optimize database layout in database provision through database migration. | 12-05-2013 |