Patent application number | Description | Published |
20100332452 | System and method for providing long-term storage for data - A system for storing files comprises a processor and a memory. The processor is configured to break a file into one or more segments; store the one or more segments in a first storage unit; and add metadata to the first storage unit so that the file can be accessed independent of a second storage unit, wherein a single namespace enables access for files stored in the first storage unit and the second storage unit. The memory is coupled to the processor and configured to provide the processor with instructions | 12-30-2010 |
20110016083 | SEEDING REPLICATION - Seeding replication is disclosed. One or more but not all files stored on a deduplicated storage system are selected to be replicated. One or more segments referred to by the selected one or more but not all files are determined. A data structure is created that is used to indicate that at least the one or more segments are to be replicated. In the event that an indication based at least in part on the data structure indicates that a candidate segment stored on the deduplicating storage system is to be replicated, the candidate segment is replicated. | 01-20-2011 |
20110071980 | PERFORMANCE IMPROVEMENT OF A CAPACITY OPTIMIZED STORAGE SYSTEM INCLUDING A DETERMINER - A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit. | 03-24-2011 |
20110072226 | SNAPSHOTTING OF A PERFORMANCE STORAGE SYSTEM IN A SYSTEM FOR PERFORMANCE IMPROVEMENT OF A CAPACITY OPTIMIZED STORAGE SYSTEM - A system for storing data comprises a performance storage system for storing one or more data items. A data item of the one or more data items comprises a data file or a data block. The system further comprises a segment storage system for storing a snapshot of a stored data item of the one or more data items in the performance storage system. The taking of the snapshot of the stored data item enables recall of the stored data item as stored at the time of the snapshot. At least one newly written segment is stored as a reference to a previously stored segment. | 03-24-2011 |
20110072227 | PERFORMANCE IMPROVEMENT OF A CAPACITY OPTIMIZED STORAGE SYSTEM USING A PERFORMANCE SEGMENT STORAGE SYSTEM AND A SEGMENT STORAGE SYSTEM - A system for storing data comprises a performance storage unit for storing a data stream or a data block in. The data stream or the data block comprises one or more data items. The system further comprises a segment storage system for automatically storing a stored data item of the one or more data items as a set of segments. The system further comprises a performance segment storage unit for storing the set of segments in the event that the stored data item has been stored using the segment storage system. | 03-24-2011 |
20110270887 | CLUSTER STORAGE USING SUBSEGMENTING FOR EFFICIENT STORAGE - Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and a portion of the segment smaller than the segment is identified that is a duplicate of a portion of a segment already managed by the cluster node. | 11-03-2011 |
20110302326 | PARTITIONING A DATA STREAM USING EMBEDDED ANCHORS - Selecting a segment boundary within block b is disclosed. A first anchor location j|j+1 is identified wherein a value of f(b[j−A+1 . . . j+B]) satisfies a constraint and wherein A and B are non-negative integers. A segment boundary location k|k+1 is determined wherein k is greater than minimum distance from j. | 12-08-2011 |
20120084333 | TRANSMITTING FILESYSTEM CHANGES OVER A NETWORK - Transmitting filesystem changes over a network is disclosed. A hash of data comprising a chunk of directory elements comprising one or more consecutive directory elements in a set of elements sorted in a canonical order is computed at a client system. One or more directory elements comprising the chunk are sent to a remote server in the event it is determined based at least in part on the computed hash that corresponding directory elements as stored on the remote server are not identical to the directory elements comprising the chunk as stored on the client system. | 04-05-2012 |
20120209820 | GARBAGE COLLECTION FOR MERGED COLLECTIONS - A method of identifying nonreferenced memory elements in a storage system is disclosed. A plurality of lists of referenced elements from a plurality of storage subsystems is input. A union of the lists of referenced elements is compiled. The union of the lists of referenced memory elements is compared to a list of previously referenced memory elements to determine previously referenced elements that are no longer referenced. The previously referenced elements that are no longer referenced is output. | 08-16-2012 |
20120226961 | EFFICIENT REDUNDANT MEMORY UNIT ARRAY - A method of storing data is disclosed. A set of data blocks, including a plurality of proper subsets of data blocks, is stored. A plurality of first-level parity blocks is generated, wherein each first-level parity block is generated from a corresponding proper subset of data blocks within the plurality of proper subsets of data blocks without reference to other data blocks not in the corresponding proper subset. A second-level parity block is generated, wherein the second level parity block is generated from a plurality of data blocks included in at least two of the plurality of proper subsets of data blocks, and wherein recovery of a lost block in a given proper subset of data blocks is possible without reference to any data blocks not in the given proper subset. | 09-06-2012 |
20120317381 | EFFICIENT DATA STORAGE SYSTEM - A system and method are disclosed for providing efficient data storage. A plurality of data segments is received in a data stream. The system preliminarily checks in a memory having a relatively low latency whether one of the plurality of data segments may have been stored previously in a data segment repository. The memory having the relatively low latency stores data segment information. In the event that the preliminary check determines that one of the plurality of data segments may have been stored in the data segment repository, a memory having a relatively higher latency is checked to determine whether the data segment has been stored previously in the data segment repository. | 12-13-2012 |
20130304969 | PERFORMANCE IMPROVEMENT OF A CAPACITY OPTIMIZED STORAGE SYSTEM INCLUDING A DETERMINER - A system for storing data comprises a performance storage unit and a performance segment storage unit. The system further comprises a determiner. The determiner determines whether a requested data is stored in the performance storage unit. The determiner determines whether the requested data is stored in the performance segment storage unit in the event that the requested data is not stored in the performance storage unit. | 11-14-2013 |
20140040192 | SEEDING REPLICATION - Seeding replication is disclosed. One or more but not all files stored on a deduplicated storage system are selected to be replicated. One or more segments referred to by the selected one or more but not all files are determined. A data structure is created that is used to indicate that at least the one or more segments are to be replicated. In the event that an indication based at least in part on the data structure indicates that a candidate segment stored on the deduplicating storage system is to be replicated, the candidate segment is replicated. | 02-06-2014 |
20140129790 | EFFICIENT DATA STORAGE SYSTEM - A system and method are disclosed for providing efficient data storage. A plurality of data segments is received in a data stream. The system preliminarily checks in a memory having a relatively low latency whether one of the plurality of data segments may have been stored previously in a data segment repository. The memory having the relatively low latency stores data segment information. In the event that the preliminary check determines that one of the plurality of data segments may have been stored in the data segment repository, a memory having a relatively higher latency is checked to determine whether the data segment has been stored previously in the data segment repository. | 05-08-2014 |
20140181399 | SYSTEM AND METHOD FOR PROVIDING LONG-TERM STORAGE FOR DATA - A system for storing files comprises a processor and a memory. The processor is configured to break a file into one or more segments; store the one or more segments in a first storage unit; and add metadata to the first storage unit so that the file can be accessed independent of a second storage unit, wherein a single namespace enables access for files stored in the first storage unit and the second storage unit. The memory is coupled to the processor and configured to provide the processor with instructions | 06-26-2014 |
20140201430 | SNAPSHOTTING OF A PERFORMANCE STORAGE SYSTEM IN A SYSTEM FOR PERFORMANCE IMPROVEMENT OF A CAPACITY OPTIMIZED STORAGE SYSTEM - A system for storing data comprises a performance storage system for storing one or more data items. A data item of the one or more data items comprises a data file or a data block. The system further comprises a segment storage system for storing a snapshot of a stored data item of the one or more data items in the performance storage system. The taking of the snapshot of the stored data item enables recall of the stored data item as stored at a time of the snapshot. At least one newly stored segment is stored as a reference to a previously stored segment. | 07-17-2014 |
20140244691 | CLUSTER STORAGE USING SUBSEGMENTING FOR EFFICIENT STORAGE - Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and a portion of the segment smaller than the segment is identified that is a duplicate of a portion of a segment already managed by the cluster node. | 08-28-2014 |
20140317063 | SYNCHRONIZATION OF STORAGE USING COMPARISONS OF FINGERPRINTS OF BLOCKS - A system for processing data comprises a deduplicating system, an interface, and a processor. The deduplicating system stores a copy of data stored in a data storage system by storing a set of segments that is able to reconstruct the data stored in the data storage system. The interface receives an indication to revert data stored in the data storage system to a state of data at a snapshot time stored in the deduplicating system. The processor is configured to determine a subset of the data stored in the data storage system that has changed between the data stored in the data storage system and the state of data at the snapshot time stored in the deduplicating system using a first list of fingerprints associated with the data stored on the data storage system and a second list of fingerprints associated with the state of data at the snapshot time stored in the deduplicating system. | 10-23-2014 |
20140324796 | STATE-BASED DIRECTING OF SEGMENTS IN A MULTINODE DEDUPLICATED STORAGE SYSTEM - A system for directing for storage comprises a processor and a memory. The processor is configured to determine a segment overlap for each of a plurality of nodes. The processor is further configured to determine a selected node of the plurality of nodes based at least in part on the segment overlap for each of the plurality of nodes and based at least in part on a selection criteria. The memory is coupled to the processor and configured to provide the processor with instructions. | 10-30-2014 |
20140337363 | SUBSEGMENTING FOR EFFICIENT STORAGE, RESEMBLANCE DETERMINATION, AND TRANSMISSION - Transmitting or storing subsegments is disclosed. A data stream or a data block is received and broken into a plurality of segments. For at least one segment, the segment is broken into a plurality of subsegments. A previously stored or transmitted segment similar to the at least one segment is identified. A fingerprint is computed for at least one subsegment. And, using the fingerprint for the at least one subsegment, determining whether the at least one subsegment is identical to a subsegment of the previously stored or transmitted segment without directly comparing the content of the at least one subsegment with the content of the subsegment of the previously stored or transmitted segment. | 11-13-2014 |
Patent application number | Description | Published |
20080256143 | Cluster storage using subsegmenting - Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and a portion of the segment smaller than the segment is identified that is a duplicate of a portion of a segment already managed by the cluster node. | 10-16-2008 |
20080256326 | Subsegmenting for efficient storage, resemblance determination, and transmission - Transmitting or storing subsegments is disclosed. A data stream or a data block is received and broken into a plurality of segments. For at least one segment, the segment is broken into a plurality of subsegments. A previously stored or transmitted segment similar to the at least one segment is identified. A fingerprint is computed for at least one subsegment. And, using the fingerprint for the at least one subsegment, determining whether the at least one subsegment is identical to a subsegment of the previously stored or transmitted segment without directly comparing the content of the at leas one subsegment with the content of the subsegment of the previously stored or transmitted segment. | 10-16-2008 |
20080263109 | Seeding replication - Seeding replication is disclosed. One or more but not all files stored on a deduplicated storage system are selected to be replicated. One or more segments referred to by the selected one or more but not all files are determined. A data structure is created that is used to indicate that at least the one or more segments are to be replicated. In the event that an indication based at least in part on the data structure indicates that a candidate segment stored on the deduplicating storage system is to be replicated, the candidate segment is replicated. | 10-23-2008 |
20080270729 | Cluster storage using subsegmenting - Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and a portion of the segment smaller than the segment is identified that is a duplicate of a portion of a segment already managed by the cluster node. | 10-30-2008 |
20080294660 | Cluster storage using delta compression - Cluster storage is disclosed. A data stream or a data block is received. The data stream or the data block is broken into segments. For each segment, a cluster node is selected, and in the event that a similar segment to the segment is identified that is already managed by the selected cluster node, a reference to the similar segment and a delta between the similar segment and the segment is caused to be stored on the selected cluster node. | 11-27-2008 |
20110196869 | CLUSTER STORAGE USING DELTA COMPRESSION - Storage of data segments is disclosed. For each segment, a similar segment to the segment is identified, wherein the similar segment is already managed by a cluster node. In the event the similar segment is identified, a reference to the similar segment and a delta between the similar segment and the segment are caused to be stored instead of the segment. | 08-11-2011 |
20110307530 | INCREMENTAL GARBAGE COLLECTION OF DATA IN A SECONDARY STORAGE - A method and apparatus for different embodiments of incremental garbage collection of data in a secondary storage. In one embodiment, a method comprises locating blocks of data in a log that are referenced and within a range at a tail of the log. The method also includes copying the blocks of data that are referenced and within the range to an unallocated segment of the log. | 12-15-2011 |
20120041957 | EFFICIENTLY INDEXING AND SEARCHING SIMILAR DATA - Techniques for efficiently indexing and searching similar data are described herein. According to one embodiment, in response to a query for one or more terms received from a client, a query index is accessed to retrieve a list of one or more super files. Each super file is associated with a group of similar files. Each super file includes terms and/or sequences of terms obtained from the associated group of similar files. Thereafter, the super files representing groups of similar files are presented to the client, where each of the super files includes at least one of the queried terms. Other methods and apparatuses are also described. | 02-16-2012 |