Patent application number | Description | Published |
20090112945 | DATA PROCESSING APPARATUS AND METHOD OF PROCESSING DATA - Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing a plurality of manifests, each of which represents at least a part of a data set and each of which comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only some specimen data chunks, the processor being operable to: process input data into input data chunks; identify manifests having at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks and on which there is information contained in the sparse chunk index; and prioritize the identified manifests for subsequent operation. | 04-30-2009 |
20100114980 | LANDMARK CHUNKING OF LANDMARKLESS REGIONS - A computer-executed method for forming data chunks from a sequence of data values comprises determining whether processing of the sequence of data values has entered a landmark-free region. If processing has entered a landmark-free region, a data chunk is produced using a specialized landmark chunking technique that is specialized for landmark-free regions. Otherwise, the method comprises producing a data chunk using a standard-data landmark chunking technique. | 05-06-2010 |
20100205163 | SYSTEM AND METHOD FOR SEGMENTING A DATA STREAM - A method of limiting redundant storage of data comprises receiving a data stream and partitioning the data stream into a series of data chunks. At least one content hash value for a set of data chunks is generated based on data content of the set of data chunks. One or more data chunks are grouped into a segment with at least one boundary of the segment defined based on an evaluation of content hash values of data chunks. Content hash values of data chunks of the segment are compared to content hash values of data chunks of segments stored on a backup mass storage device. A pointer to a stored data chunk of an existing segment is stored on the backup mass storage device if a content hash value of a data chunk of the segment matches the content hash value of the stored data chunk. | 08-12-2010 |
20100223441 | STORING CHUNKS IN CONTAINERS - Chunks are stored in a container of a data store, where the chunks are produced by dividing input data as part of a deduplication process. In response to determining that the size of the container has reached a predefined size threshold, at least one of the chunks in the container is moved to another container. | 09-02-2010 |
20100235485 | PARALLEL PROCESSING OF INPUT DATA TO LOCATE LANDMARKS FOR CHUNKS - Input data is divided into a plurality of segments, which are processed, in parallel, by respective first processing elements to locate landmarks in the segments. At least one other processing element is used to produce chunks from the input data based on positions of the landmarks provided by the first processing elements. | 09-16-2010 |
20100246709 | PRODUCING CHUNKS FROM INPUT DATA USING A PLURALITY OF PROCESSING ELEMENTS - Input data is divided into multiple segments that are processed by processing elements of a computer. The processing of the segments produces a plurality of tentative sets of chunks. The plurality of tentative sets of chunks are stitched together to produce an output set of chunks. | 09-30-2010 |
20100250480 | IDENTIFYING SIMILAR FILES IN AN ENVIRONMENT HAVING MULTIPLE CLIENT COMPUTERS - To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing. | 09-30-2010 |
20100274888 | GENERATING A SUMMARY OF USERS THAT HAVE ACCESSED A RESOURCE - Information relating to monitored communications between user machines and a resource of a particular machine is received. Group information that identifies groups of the users is received. Based on the monitored communications and the group information, a summary of a subset of users that have accessed the resource is generated. | 10-28-2010 |
20100280997 | COPYING A DIFFERENTIAL DATA STORE INTO TEMPORARY STORAGE MEDIA IN RESPONSE TO A REQUEST - A plurality of differential data stores are stored in persistent storage media. In response to receiving a first request to store a particular data object, one of the differential data stores that are stored in the persistent storage media is selected, wherein selecting the one differential data store is according to a criterion relating to compression of data objects in the differential data stores. The selected differential data store is copied into temporary storage media, where the copying is not delayed after receiving the first request to await receipt of more requests. The particular data object is inserted into the copy of the selected differential data store in the temporary storage media, where the inserting is performed without having to retrieve more data from the selected differential store in the persistent storage media. The selected differential data store in the persistent storage media is replaced with the copy of the selected differential data store in the temporary storage media that has been modified. | 11-04-2010 |
20100281077 | BATCHING REQUESTS FOR ACCESSING DIFFERENTIAL DATA STORES - Data objects are selectively stored across a plurality of differential data stores, where selection of the differential data stores for storing respective data objects is according to a criterion relating to compression of the data objects in each of the data stores, and where the differential data stores are stored in persistent storage media. Plural requests for accessing the differential data stores are batched, and one of the differential data stores is selected to page into temporary storage from the persistent storage media. The batched plural requests for accessing the selected differential data store that has been paged into the temporary storage are executed. | 11-04-2010 |
20110252217 | CAPPING A NUMBER OF LOCATIONS REFERRED TO BY CHUNK REFERENCES - As part of a deduplication process, chunks are produced from data. The chunks are assigned to locations in a data store, where the assignments are such that a number of locations referenced is capped according to at least one predefined parameter. | 10-13-2011 |
20120036113 | PERFORMING DEDUPLICATION OF INPUT DATA AT PLURAL LEVELS - Deduplication of input data is performed at a first level, where the deduplication at the first level avoids storing an additional copy of at least one of the chunks in a data store. Additional deduplication of the deduplicated input data is performed, wherein the additional deduplication further reduces duplication. | 02-09-2012 |
20120158674 | Indexing for deduplication - Systems and methods of indexing for deduplication are disclosed. An example method includes providing a first table in a first storage and a second table in a second storage. The method also includes looking up a key in the first table. If the key is not found in the first table, the key is looked up in the second table. If the key is found in the second table, the key is copied from the second table to the first table. If the entry is not found or in the second table, an entry with the key is inserted in the first table. The method also includes applying an operation to the entry associated with the key in the first table. The method also includes merging data of the first table with data of the second table when the first table is full to produce a new version of the second table that replaces a previous version. | 06-21-2012 |
20120226699 | DEDUPLICATION WHILE REBUILDING INDEXES - Systems and methods of deduplicating while loading index entries are disclosed. An example method includes loading a first group of index entries into an index. The example method also includes deduplicating data using the index before loading the first group of index entries is completed. | 09-06-2012 |
20120317359 | PROCESSING A REQUEST TO RESTORE DEDUPLICATED DATA - For a restore request, at least a portion of a recipe that refers to chunks is read. Based on the recipe portion, a container having plural chunks is retrieved. From the recipe portion, it is identified which of the plural chunks of the container to save, where some of the chunks identified do not, at a time of the identifying, have to be presently communicated to a requester. The identified chunks are stored in a memory area from which chunks are read for the restore operation. | 12-13-2012 |
20140156607 | INDEX FOR DEDUPLICATION - Techniques for deduplication include an index, a receiver module, and an indexer module. The index can store information about data blocks. The receiver module can receive a data block. The indexer module can check whether information about the data block is in the index, and if information about the data block is not found in the index, then it can make a random decision about whether to store information about the data block in the index, and if the random decision is to store information about the data block in the index, then it can store information about the data block in the index. | 06-05-2014 |
20140214768 | REDUCING BACKUP BANDWIDTH BY REMEMBERING DOWNLOADS - Systems and methods of reducing backup bandwidth by remembering downloads to a computing device. An example method may include remembering information for a download to a computing device. The method may also include backing up the computing device to a different system. The information remembered for the download is used to provide a backup of the computing device without copying some of the downloaded data present on the computing device from the computing device. | 07-31-2014 |
20140215001 | REDUCING BANDWIDTH USAGE OF A MOBILE CLIENT - Systems and methods of reducing bandwidth usage of a mobile client are disclosed. An example method may include caching a first version of network content in a mobile client. The method may also include comparing the cached content with a second version of the network content. The method may also include generating a recipe to construct the second version of the network content from the cached network content based on a result of the comparing. The method may also include sending the recipe to the mobile client. | 07-31-2014 |