| Patent application number | Description | Published |
| 20090052448 | METHODS AND SYSTEMS TO STORE STATE USED TO FORWARD MULTICAST TRAFFIC - Methods and systems are described to store state used to forward multicast traffic. The system includes a receiving module to receive request to add a first node to a membership tree. The membership tree includes a first plurality of nodes associated with a multicast group. The system further includes a processing module to identify a second node in the first plurality of nodes and to communicate a node identifier that identifies the first node over a network to the second node. The node identifier is to be stored at the second node to add the first node to the membership tree. The node identifier is further to be stored in the membership tree exclusively at the second node to enable the second node to forward the multicast traffic to the first node. | 02-26-2009 |
| 20090052449 | MULTICAST WITH ADAPTIVE DUAL-STATE - A method and system are described to multicast with an adaptive dual state. The system receives multicast traffic over a membership tree including a first plurality of nodes connected in a first topology destined for a plurality of multicast members of a first multicast group. Next, the system determines a rate of multicast traffic that exceeds a predetermined threshold based on the receiving the multicast traffic. Next, the system generates a dissemination tree including a second plurality of nodes connected in a second topology to reduce a number of hops to communicate the multicast traffic to the plurality of multicast members of the first multicast group. Finally, the system forwards the multicast traffic to the plurality of multicast members of the first multicast group over the dissemination tree. | 02-26-2009 |
| 20090063681 | Systems and methods for distributing video on demand - A method of providing content comprises making the content available on a central server, and surveying a plurality of peers for a portion of the content. The portion of the content from one of the peers is obtained when the portion of the content is available from the one of the peers, and obtained from the central server when the portion of the content is not available from the plurality of peers. | 03-05-2009 |
| 20090106417 | Method and apparatus for packet analysis in a network - A method and system for monitoring traffic in a data communication network and for extracting useful statistics and information is disclosed. | 04-23-2009 |
| 20090138469 | METHOD OF PATTERN SEARCHING - Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided. | 05-28-2009 |
| 20090138470 | METHOD OF PATTERN SEARCHING - Structural join mechanisms provide efficient query pattern matching. In one embodiment, tree-merge mechanisms are provided. In another embodiment, stack-tree mechanisms are provided. | 05-28-2009 |
| 20090150339 | METHOD AND SYSTEM FOR PATTERN MATCHING HAVING HOLISTIC TWIG JOINS - A method of query pattern matching uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. | 06-11-2009 |
| 20090171944 | Set Similarity selection queries at interactive speeds - The similarity between a query set comprising query set tokens and a database set comprising database set tokens is determined by a similarity score. The database sets belong to a data collection set, which contains all database sets from which information may be retrieved. If the similarity score is greater than or equal to a user-defined threshold, the database set has information relevant to the query set. The similarity score is calculated with an inverse document frequency method (IDF) similarity measure independent of term frequency. The document frequency is based at least in part on the number of database sets in the data collection set and the number of database sets which contain at least one query set token. The length of the query set and the length of the database set are normalized. | 07-02-2009 |
| 20090287721 | Generating conditional functional dependencies - Techniques are disclosed for generating conditional functional dependency (CFD) pattern tableaux having the desirable properties of support, confidence and parsimony. These techniques include both a greedy algorithm for generating a tableau and, for large data sets, an “on-demand” algorithm that outperforms the basic greedy algorithm in running time by an order of magnitude. In addition, a range tableau, as a generalization of a pattern tableau, can achieve even more parsimony. | 11-19-2009 |
| 20090292726 | System and Method for Identifying Hierarchical Heavy Hitters in Multi-Dimensional Data - A method including receiving a plurality of elements of a data stream, storing a multi-dimensional data structure in a memory, said multi-dimensional data structure storing the plurality of elements as a hierarchy of nodes, each node having a frequency count corresponding to the number of elements stored therein, comparing the frequency count of each node to a threshold value based on a total number of the elements stored in the nodes and identifying each node for which the frequency count is at least as great as the threshold value as a hierarchical heavy hitter (HHH) node and propagating the frequency count of each non-HHH nodes to its corresponding parent nodes. | 11-26-2009 |
| 20100023512 | Methods and Systems for Content Access and Distribution - Distribution of content between publishers and consumers is accomplished using an overlay network that may make use of XML language to facilitate content identification. The overlay network includes a plurality of routers that may be in communication with each other and the publishers and consumers on the Internet. Content and queries are identified by content descriptors that are routed from the originator to a nearest router in the overlay network. The nearest router, for each unique content descriptor, generates a hash identification of the content descriptor which is used by remaining routers in the overlay network to provide the appropriate functions with respect to the content descriptor. In particular, this allows all routers in the overlay network except the nearest router to properly route content without processing every content descriptor. | 01-28-2010 |
| 20100042581 | JOIN PATHS ACROSS MULTIPLE DATABASES - Methods, systems and computer instructions on computer readable media are disclosed for optimizing a query, including a first join path, a second join path, and an optimizer, to efficiently provide high quality information from large, multiple databases. The methods and systems include evaluating a schema graph identifying the join paths between a field X and a field Y, and a value X=x, to identify the top-few values of Y=y that are reachable from a specified X=x value when using the join paths. Each data path that instantiates the schema join paths can be scored and evaluated as to the quality of the data with respect to specified integrity constraints to alleviate data quality problems. Agglomerative scoring methodologies can be implemented to compute high quality information in the form of a top-few answers to a specified problem as requested by the query. | 02-18-2010 |
| 20100042606 | MULTIPLE AGGREGATIONS OVER DATA STREAMS - A system for a data stream management system includes a filter transport aggregate for a high speed input data stream with a plurality of packets each packet comprising attributes. The system includes an evaluation system to evaluate the high speed input data stream and partitions the packets into groups the attributes and a table, wherein the table stores the attributes of each packets using a hash function. A phantom query is used to define partitioned groups of packets using attributes other than those used to group the packets for solving user queries without performing the user queries on the high speed input data stream. | 02-18-2010 |
| 20100058405 | Systems and Methods for Distributing Video on Demand - A method of receiving content includes joining an in-progress multicast stream to receive a first portion of a content. The method further includes sending a request to a peer for a catch-up portion of the content, the request including a deadline for delivery of the content, and receiving the catch-up portion of the content from the peer prior to the deadline. | 03-04-2010 |
| 20100100538 | METHOD AND APPARATUS FOR OPTIMIZING QUERIES UNDER PARAMETRIC AGGREGATION CONSTRAINTS - The present invention relates to a method and apparatus for optimizing queries. The present invention discloses an efficient method for providing answers to queries under parametric aggregation constraints. | 04-22-2010 |
| 20100100552 | ROUTING XML QUERIES - A vast amount of information currently accessible over the Web, and in corporate networks, is stored in a variety of databases, and is being exported as XML data. However, querying this totality of information in a declarative and timely fashion is problematic because this set of databases is dynamic, and a common schema is difficult to maintain. The present invention provides a solution to the problem of issuing declarative, ad hoc XPath queries against such a dynamic collection of XML databases, and receiving timely answers. There is proposed a decentralized architectures, under the open and the agreement cooperation models between a set of sites, for processing queries and updates to XML data. Each site consists of XML data nodes. (which export their data as XML, and also pose queries) and one XML router node (which manages the query and update interactions between sites). The architectures differ in the degree of knowledge individual router nodes have about data nodes containing specific XML data. There is therefore provided a method for accessing data over a wide area network comprising: providing a decentralized architecture comprising a plurality of data nodes each having a database, a query processor and a path index, and a plurality of router nodes each having a routing state, maintaining a routing state in each of the router nodes, broadcasting routing state updates from each of the databases to the router nodes, routing path queries to each of the databases by accessing the routing state. | 04-22-2010 |
| 20100100553 | METHOD AND APPARATUS FOR RANKED JOIN INDICES - A method and apparatus for ranked join indices includes a solution providing performance guarantees for top-k join queries over two relations, when preprocessing to construct a ranked join index for a specific join condition is permitted. The concepts of ranking join indices presented herein are also applicable in the case of a single relation. In this case, the concepts herein provide a solution to the top-k selection problem with monotone linear functions, having guaranteed worst case search performance for the case of two ranked attributes and arbitrary preference vectors. | 04-22-2010 |
| 20100114840 | SYSTEMS AND ASSOCIATED COMPUTER PROGRAM PRODUCTS THAT DISGUISE PARTITIONED DATA STRUCTURES USING TRANSFORMATIONS HAVING TARGETED DISTRIBUTIONS - A data structure that includes at least one partition containing non-confidential quasi-identifier microdata and at least one other partition containing confidential microdata is formed. The partitioned confidential microdata is disguised by transforming the confidential microdata to conform to a target distribution. The disguised confidential microdata and the quasi-identifier microdata are combined to generate a disguised data structure. The disguised data structure is used to carry out statistical analysis and to respond to a statistical query is directed to the use of confidential microdata. In this manner, the privacy of the confidential microdata is preserved. | 05-06-2010 |
| 20100114920 | COMPUTER SYSTEMS, METHODS AND COMPUTER PROGRAM PRODUCTS FOR DATA ANONYMIZATION FOR AGGREGATE QUERY ANSWERING - Computer program products are provided for anonymizing a database that includes tuples. A respective tuple includes at least one quasi-identifier and sensitive attributes associated with the quasi-identifier. These computer program products include computer readable program code that is configured to (k,e)-anonymize the tuples over a number k of different values in a range e of values, while preserving coupling at least two of the sensitive attributes to one another in the sets of attributes that are anonymized to provide a (k,e)-anonymized database. Related computer systems and methods are also provided. | 05-06-2010 |
| 20100125559 | SELECTIVITY ESTIMATION OF SET SIMILARITY SELECTION QUERIES - The invention relates to a system and/or methodology for selectivity estimation of set similarity queries. More specifically, the invention relates to a selectivity estimation technique employing hashed sampling. The invention providing for samples constructed a priori that can efficiently and quickly provide accurate estimates for arbitrary queries, and can be updated efficiently as well. | 05-20-2010 |
| 20100132036 | VERIFICATION OF OUTSOURCED DATA STREAMS - Embodiments disclosed herein are directed to verifying query results of an untrusted server. A data owner outsources a data stream to the untrusted server, which is configured to respond to a query from a client with the query result, which is returned to the client. The data owner can maintain a vector associated with query results returned by the server and can generate a verification synopsis using the vector and a seed. The verification synopsis includes a polynomial, where coefficients of the polynomial are determined based on the seed. The data owner outputs the verification synopsis and the seed to a client for verification of the query results. | 05-27-2010 |
| 20100138443 | User-Powered Recommendation System - Recommendation systems are widely used in Internet applications. In current recommendation systems, users only play a passive role and have limited control over the recommendation generation process. As a result, there is often considerable mismatch between the recommendations made by these systems and the actual user interests, which are fine-grained and constantly evolving. With a user-powered distributed recommendation architecture, individual users can flexibly define fine-grained communities of interest in a declarative fashion and obtain recommendations accurately tailored to their interests by aggregating opinions of users in such communities. By combining a progressive sampling technique with data perturbation methods, the recommendation system is both scalable and privacy-preserving. | 06-03-2010 |
| 20100153064 | Methods and Apparatus to Determine Statistical Dominance Point Descriptors for Multidimensional Data - Methods and apparatus to determine statistical dominance point descriptors for multidimensional data are disclosed. An example method disclosed herein comprises determining a first joint dominance value for a first data point in a multidimensional data set, data points in the multidimensional data set comprising multidimensional values, each dimension corresponding to a different measurement of a physical event, the first joint dominance value corresponding to a number of data points in the multidimensional data set dominated by the first data point in every dimension, determining a first skewness value for the first data point, the first skewness value corresponding to a size of a first dimension of the first data point relative to a combined size of all dimensions of the first data point, and combining the first joint dominance and first skewness values to determine a first statistical dominance point descriptor associated with the first data point. | 06-17-2010 |
| 20100153379 | System and Method for Generating Statistical Descriptors for a Data Stream - Described is a system and method for receiving a data stream of multi-dimensional items, collecting a sample of the data stream having a predetermined number of items and dividing the sample into a plurality of subsamples, each subsample corresponding to a single dimension of each of the predetermined number of items. A query is then executed on a particular item in at least two of the subsamples to generate data for the corresponding subsample. This data is combined into a single value. | 06-17-2010 |
| 20100268719 | METHOD AND APPARATUS FOR PROVIDING ANONYMIZATION OF DATA - A method and apparatus for providing an anonymization of data are disclosed. For example, the method receives a communications graph that encodes a plurality of types of interactions between two or more entities. The method partitions the two or more entities into a plurality of classes, and applies a type of anonymization to the communications graph. | 10-21-2010 |
| 20100274785 | Database Analysis Using Clusters - A method for mapping relationships in a database results in a cluster graph. A representative sample of records in each of a plurality of tables in the database is analyzed for nearest neighbor join edges instantiated by the record. Records with corresponding nearest neighbor join edges are grouped into clusters. Cluster pairs which share a join relationship between two tables are identified. A weighting may be applied to cluster pairs based on the number of records for the cluster pair. Meaningful cluster pairs above a weighted threshold may be ordered according to table and displayed as a cluster graph. Analyses of the cluster graph may reveal important characteristics of the database. | 10-28-2010 |
| 20100293129 | DEPENDENCY BETWEEN SOURCES IN TRUTH DISCOVERY - A method and system for truth discovery may implement a methodology that accounts for accuracy of sources and dependency between sources. The methodology may be based on Bayesian probability calculus for determining which data object values published by sources are likely to be true. The method may be recursive with respect to dependency, accuracy, and actual truth discovery for a plurality of sources. | 11-18-2010 |
| 20100318519 | Incremental Maintenance of Inverted Indexes for Approximate String Matching - In embodiments of the disclosed technology, indexes, such as inverted indexes, are updated only as necessary to guarantee answer precision within predefined thresholds which are determined with little cost in comparison to the updates of the indexes themselves. With the present technology, a batch of daily updates can be processed in a matter of minutes, rather than a few hours for rebuilding an index, and a query may be answered with assurances that the results are accurate or within a threshold of accuracy. | 12-16-2010 |
| 20110041184 | METHOD AND APPARATUS FOR PROVIDING ANONYMIZATION OF DATA - A method and apparatus for providing an anonymization of data are disclosed. For example, the method receives a request for anonymizing, wherein the request comprises a bipartite graph for a plurality of associations or a table that encodes the plurality of associations for the bipartite graph. The method places each node in the bipartite graph in a safe group and provides an anonymized graph that encodes the plurality of associations of the bipartite graph, if a safe group for all nodes of the bipartite graph is found. | 02-17-2011 |
| 20110047185 | META-DATA INDEXING FOR XPATH LOCATION STEPS - In accordance with a method of encoding meta-data associated with tree-structured data, a first set of elements of a plurality of elements in the tree-structured is associated explicitly with explicit meta-data levels, and a second set of elements of the plurality of elements is associated by inheritance with explicit meta-data levels of closest ancestor elements of the first set of elements. The plurality of elements is packed into a plurality of leaf nodes of an index structure. The plurality of leaf nodes is merged into a plurality of non-leaf nodes until a root non-leaf node is generated. The plurality of non-leaf nodes of the index structure is associated with indicators representing ranges of the explicit meta-data levels in the packed first set of elements, such that explicit meta-data level ranges of descendant non-leaf nodes are subsets of explicit meta-data level ranges of ancestor non-leaf nodes. | 02-24-2011 |
| 20110060818 | Method and Apparatus for Packet Analysis in a Network - A method and system for monitoring traffic in a data communication network and for extracting useful statistics and information is disclosed. | 03-10-2011 |
| 20110066600 | FORWARD DECAY TEMPORAL DATA ANALYSIS - A disclosed method for implementing time decay in the analysis of streaming data objects is based on the age, referred to herein as the forward age, of a data object measured from a landmark time in the past to a time associated with the occurrence of the data object, e.g., an object's timestamp. A forward time decay function is parameterized on the forward age. Because a data object's forward age does not depend on the current time, a value of the forward time decay function is determined just once for each data object. A scaling factor or weight associated with a data object may be weighted according to its decay function value. Forward time decay functions are beneficial in determining decayed aggregates, including decayed counts, sums, and averages, decayed minimums and maximums, and for drawing decay-influenced samples. | 03-17-2011 |
| 20110131170 | Processing data using sequential dependencies - The specification describes data processes for analyzing large data steams for target anomalies. “Sequential dependencies” (SDs) are chosen for ordered data and present a framework for discovering which subsets of the data obey a given sequential dependency. Given an interval G, an SD on attributes X and Y, written as X→G Y, denotes that the distance between the Y-values of any two consecutive records, when sorted on X, are within G. SDs may be extended to Conditional Sequential Dependencies (CSDs), consisting of an underlying SD plus a representation of the subsets of the data that satisfy the SD. The conditional approximate sequential dependencies may be expressed as pattern tableaux, i.e., compact representations of the subsets of the data that satisfy the underlying dependency. | 06-02-2011 |