Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Olston, CA

Chris Olston, Mountain View, CA US

Patent application number	Description	Published
20090216718	System for Query Scheduling to Maximize Work Sharing - A system of query scheduling to maximize work sharing. The system schedules queries to account for future queries possessing a sharability component. Included in the system are operations for assigning an incoming query to a query queue based on a sharability characteristic of the incoming query, and evaluating a priority function for each member of a plurality of query queues to identify one highest priority query queue. The priority function accounts for the probability that a future incoming query will contain the sharability characteristic common to a member of the plurality of query queues. The system of query scheduling to maximize work sharing selects a batch of queries from the highest priority query queue, and dispatches the batch to one or more query execution units.	08-27-2009
20090307329	ADAPTIVE FILE PLACEMENT IN A DISTRIBUTED FILE SYSTEM - In a distributed system that includes multiple machines, a scheduler attempts to schedule a task on a machine that is not currently overloaded with work. If a task is scheduled on a machine that does not yet have copies of the portions of the data set on which the task needs to operate, then that machine obtains copies of those portions from other machines that already have them. Whenever a “source” machine ships a copy of a portion to another “destination” machine in the distributed system, the destination machine persistently stores that copy on the destination machine's persistent storage mechanism. The copy also remains on the source machine. Thus, portions of the data set are automatically replicated whenever those portions are shipped between machines of the distributed system. Each machine in the distributed system has access to “global” information that indicates which machines have which portions of the data set.	12-10-2009
20090319476	ADAPTIVE MATERIALIZED VIEW SELECTION FOR DATABASES - Techniques described herein adaptively select materialized view fragments for persistent maintenance. During an interval of time, the selected fragments are persistently maintained in the database system, while the other non-selected fragments are not persistently maintained as materialized view fragments. Over time, the composition of the set of selected fragments may change. As queries are executed in the database system over an interval of time, statistics including the frequency of access of each currently selected fragment during that interval are generated. At the start of the next interval of time, based on these statistics, some currently selected fragments may be unselected. Some currently non-selected fragments of one or more candidate materialized views may be selected based on the statistics. For the next interval, the newly unselected fragments cease to be persistently maintained as materialized view fragments, while the newly selected fragments begin to be persistently maintained as materialized view fragments.	12-24-2009
20120203782	METHOD AND SYSTEM FOR DATA PROVENANCE MANAGEMENT IN MULTI-LAYER SYSTEMS - Method, system, and programs for heterogeneous data management. Information from multiple data sources is first obtained. Data/metadata from each of the data sources is modeled based on the source and/or granularity information of the data/metadata to generate data/metadata models. The data/metadata from multiple data sources are integrated, by applying one or more processes to the data/metadata from different data sources based on the data/metadata models, to generate integrated data/metadata. A provenance representation for the integrated data/metadata is created tracing sources, granularities, and/or processes applied and archived for enabling an query associated with the integrated data/metadata.	08-09-2012

Christopher Olston, Mountain View, CA US

Patent application number	Description	Published
20090182706	Generating Example Data for Testing Database Queries - Computer-implemented methods, modules and clients relate to expanded, pruned sample table for testing database queries against a base table. The expanded, pruned sample table is formed from the base table by a process of initial sampling, synthesis, and pruning.	07-16-2009
20090204575	MODULAR WEB CRAWLING POLICIES AND METRICS - A web crawler loads a policy from a customizable stored module that is separate and distinct from the web crawler's source code. The web crawler follows these policies in determining the order in which the web crawler will visit and index web pages in an index used by an Internet search engine. As a result, the web crawler's behavior can be modified more easily. The web crawler's behavior can be finely tuned to be more efficient and/or to accommodate the particular needs of the search engine. Multiple different policies may be maintained concurrently in separate stored modules, and the web crawler can be instructed to use different modules' policies at different specified times or under different specified circumstances.	08-13-2009
20100114867	Virtual Environment Spanning Desktop and Cloud - A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.	05-06-2010
20100121847	Exploring Large Textual Data Sets Via Interactive Aggregation - A method and a system are provided for exploring a large textual data set via interactive aggregation. In one example, the method includes receiving the large textual data set and an original query template, building an index for the query template, wherein the building the index comprises ordering the index a particular way to optimize query time, receiving one or more bindings for the original query template, computing an answer to the original query template using the index and the one or more bindings, and anticipating one or more future queries that a user may submit and that are related to the original query template.	05-13-2010
20110218991	SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF NEEDY QUERIES - The present invention relates to methods, systems, and computer readable media comprising instructions for identifying needy queries for which additional responsive content is needed. The method of the present invention comprises receiving a query comprising one or more terms and retrieving one or more content items identified as responsive to the query, the one or more content items ranked according to one or more ranking techniques. A score is generated for the one or more ranked content items identified as responsive to the query. A determination is thereafter made as to whether the query is needy based upon a comparison of the one or more scores associated with the one or more content items identified as responsive to the query and a needy query score threshold.	09-08-2011
20140324822	VIRTUAL ENVIRONMENT SPANNING DESKTOP AND CLOUD - A method and system are given for providing a virtual environment spanning a desktop and a cloud. In one example, the method includes receiving a query template over a data set that resides in the cloud, optimizing the query template to segment the query template into an offline phase and an online phase, executing the offline phase on the cloud to build one or more indexes, and sending the one or more indexes to the desktop.	10-30-2014

Patent applications by Christopher Olston, Mountain View, CA US

Christopher Olston, Los Altos, CA US

Patent application number	Description	Published
20090164425	System and method for crawl ordering by search impact - An improved system and method for crawl ordering of a web crawler by impact upon search results of a search engine is provided. Content-independent features of uncrawled web pages may be obtained, and the impact of uncrawled web pages may be estimated for queries of a workload using the content-independent features. The impact of uncrawled web pages may be estimated for queries by computing an expected impact score for uncrawled web pages that match needy queries. Query sketches may be created for a subset of the queries by computing an expected impact score for crawled web pages and uncrawled web pages matching the queries. Web pages may then be selected to fetch using a combined query-based estimate and query-independent estimate of the impact of fetching the web pages on search query results.	06-25-2009

Christopher A. Olston, Mountain View, CA US

Patent application number	Description	Published
20110252427	MODELING AND SCHEDULING ASYNCHRONOUS INCREMENTAL WORKFLOWS - Disclosed are methods and apparatus for scheduling an asynchronous workflow having a plurality of processing paths. In one embodiment, one or more predefined constraint metrics that constrain temporal asynchrony for one or more portions of the workflow may be received or provided. Input data is periodically received or intermediate or output data is generated for one or more of the processing paths of the workflow, via one or more operators, based on a scheduler process. One or more of the processing paths for generating the intermediate or output data are dynamically selected based on received input data or generated intermediate or output data and the one or more constraint metrics. The selected one or more processing paths of the workflow are then executed so that each selected processing path generates intermediate or output data for the workflow.	10-13-2011