Patent application number | Description | Published |
20090043967 | Caching of information according to popularity - A system includes logic to cache at least one block in at least one cache if the block has a popularity that compares favorably to the popularity of other blocks in the cache, where the popularity of the block is determined by reads of the block from persistent storage and reads of the block from the cache. | 02-12-2009 |
20120124012 | SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES AND BY INGESTING DIFFERENCE DATA - Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments, whereby a deduplicated image of the data object for a second temporal state is stored without requiring reception of a complete image of the data object for the second temporal state. | 05-17-2012 |
20120124013 | SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA STORING NON-LOSSY ENCODINGS OF DATA DIRECTLY IN A CONTENT ADDRESSABLE STORE - Systems and methods are disclosed for storing deduplicated images in which a portion of the image is stored in encoded form directly in a hash table, the method comprising: organizing unique content of each data object as a plurality of content segments and storing the content segments in a data store; receiving content to be included in the deduplicated image of the data object; determining if the received content may be encoded using a predefined non-lossy encoding technique and in which the encoded value would fit within the field for containing a hash signature; if so, placing the encoding in the field and marking the hash structure to indicate that the field contains encoded content; otherwise, generating a hash signature for the received content and placing the hash signature in the field and placing the received content in a corresponding content segment if it is unique. | 05-17-2012 |
20120124014 | SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY SENDING DIFFERENCE DATA BETWEEN NEAR-NEIGHBOR TEMPORAL STATES - Systems and methods are disclosed for using a first deduplicating store to update a second deduplicating store with information representing how data objects change over time, said method comprising: at a first and a second deduplicating store, for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object and wherein the logical arrangement of structures is indicative of the changing temporal states of the data object; finding a temporal state that is common to and in temporal proximity to the current state of the first and second deduplicating stores; and compiling and sending a set of hash signatures for the content that has changed from the common state to the current temporal state of the first deduplicating store. | 05-17-2012 |
20120124046 | SYSTEM AND METHOD FOR MANAGING DEDUPLICATED COPIES OF DATA USING TEMPORAL RELATIONSHIPS AMONG COPIES - Systems and methods are disclosed for managing deduplicated images of data objects that change over time. The method includes: organizing unique content of each data object as a plurality of content segments and storing the content segments in a data store; for each data object, creating an organized arrangement of hash structures, wherein each structure, for a subset of the hash structures, includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object, and wherein each temporal state is associated with the hash structures representing the content of the data object during that temporal state. | 05-17-2012 |
20120124105 | SYSTEM AND METHOD FOR IMPROVED GARBAGE COLLECTION OPERATIONS IN A DEDUPLICATED STORE BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES - Systems and methods are disclosed for performing garbage collection to identify content segments no longer referenced in a deduplicating storage system in which redundant mark operations in a mark-and-sweep technique are avoided. An organized arrangement of hash structures is created for each data object, wherein each structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and the logical organization of the arrangement represents the logical organization of the content segments as they are represented within the data object. Additionally, for each data object, temporal states are maintained over time. Garbage collection iterates over the temporal structures and, for each temporal structure, marks the garbage collection state for the associated content segments for only the content segments that have changed relative to an immediately prior temporal state of the data object. | 05-17-2012 |
20120124306 | SYSTEM AND METHOD FOR PERFORMING BACKUP OR RESTORE OPERATIONS UTILIZING DIFFERENCE INFORMATION AND TIMELINE STATE INFORMATION - Systems and methods for backing-up data from a first storage pool to a second storage pool using difference information between time states are disclosed. The system has a data management engine for performing data management functions, including at least a back-up function to create a back-up copy of data. By executing a sequence of snapshot operations to create point-in-time images of application data on a first storage pool, each successive point-in-time image corresponding to a specific, successive time-state of the application data, a series of snapshots is created. The snapshots are then used to create difference information indicating which application data has changed and the content of the changed application data for the corresponding time state. This difference information is then sent to a second storage pool to create a back-up copy of data for the current time-state. | 05-17-2012 |
20130036091 | INCREMENTAL COPY PERFORMANCE BETWEEN DATA STORES - Systems and methods are disclosed for copying a data object to a target storage pool using a hybrid of storage pools, in which at least one of the storage pools is particularly efficient at identifying data that should be used for copying the data object to the target storage pool, and at least one of the storage pools is particularly efficient at retrieving the data that should be sent to the target storage pool. The system comprises a performance storage pool for storing data and having relatively high performance for retrieving stored data; a deduplicating storage pool for storing deduplicated data and storing metadata about data objects in the system and which has relatively high performance for identifying and specifying differences in a data object over time; and a controller for causing the performance storage pool to retrieve differences and provide the data to the target storage pool. | 02-07-2013 |
20130036097 | DATA FINGERPRINTING FOR COPY ACCURACY ASSURANCE - Systems and methods are disclosed for efficiently creating a data fingerprint to identify or characterize contents of a data object by using a selection function to select a plurality of non-contiguous regions from the data object, the selected regions each having a small number of bytes relative to the number of bytes in the data object and being distributed throughout the data object so that the selected regions comprise a sparse subset of the data of the data object yet provide a significant probability of including bytes that change if the data object were modified; and performing a hash operation on the data to produce a fingerprint based on the sparse subset of the data object. The data fingerprint thereby efficiently provides an indication of the contents of the data object, so that comparing data fingerprints can determine if the data objects are different if the corresponding fingerprints are different. | 02-07-2013 |
20130036098 | SUCCESSIVE DATA FINGERPRINTING FOR COPY ACCURACY ASSURANCE - Systems and methods are disclosed for checking the data integrity of a data object copied between storage pools by comparing data fingerprints of data objects, comprising scheduling a series of successive copy operations over time for copying a data object from a source data store to a target data store; generating a partial fingerprint of the data object at the source data store that creates a fingerprint from a subset of the data object; sending the partial fingerprint of the data object to the target data store; sending any new data contents to the target data store; and creating a partial fingerprint of the data object at the target data store and comparing it to the received partial fingerprint to determine if they differ, thereby allowing incremental verification that the copy of the data object at the target data store is the same as at the source data store. | 02-07-2013 |
20130042083 | Data Replication System - Systems and methods are provided for an asynchronous data replication system in which the remote replication reduces bandwidth requirements by copying deduplicated differences in business data from a local storage site to a remote, backup storage site, the system comprising: a local performance storage pool for storing data; a local deduplicating storage pool for storing deduplicated data, said local deduplicating storage pool further storing metadata about data objects in the system and which has metadata analysis logic for identifying and specifying differences in a data object over time; a remote performance storage pool for storing a copy of said data, available for immediate use as a backup copy of said data to provide business continuity to said data; a remote deduplicating storage pool for storing deduplicated data; and a controller for synchronizing the remote performance storage pool to have the second version of the data object using deduplicated data. | 02-14-2013 |
20130226884 | SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY SENDING DIFFERENCE DATA BETWEEN NEAR-NEIGHBOR TEMPORAL STATES - Systems and methods are disclosed for using a first deduplicating store to update a second deduplicating store with information representing how data objects change over time, said method including: at a first and a second deduplicating store, for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object and wherein the logical arrangement of structures is indicative of the changing temporal states of the data object; finding a temporal state that is common to and in temporal proximity to the current state of the first and second deduplicating stores; and compiling and sending a set of hash signatures for the content that has changed from the common state to the current temporal state of the first deduplicating store. | 08-29-2013 |
20130318053 | SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES USING HIGHER-LEVEL HASH STRUCTURES - Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments. The method also includes determining, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store. | 11-28-2013 |
20130339319 | SYSTEM AND METHOD FOR CACHING HASHES FOR CO-LOCATED DATA IN A DEDUPLICATION DATA STORE - Systems and methods are provided for caching hashes for deduplicated data. A request to read data from the deduplication data store is received. A persist header stored in a deduplication data store is identified in a first hash structure that is not stored in memory of the computing device. The persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read. Each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes. The set of hashes is cached in a second hash structure stored in the memory, whereby the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header. | 12-19-2013 |
20140344216 | Garbage collection predictions - Described herein are systems and methods for garbage collection prediction. A temporal graph is received, the temporal graph including nodes, the nodes including hash references to objects. An accumulated difference count is updated when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph. A divested difference count is updated when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node. The outcome of the garbage collection is predicted based on at least one of the accumulated difference count and the divested difference count. | 11-20-2014 |
20140351214 | Efficient data replication - Described herein are systems and methods for efficient data replication. A set of hashes for a source object to be replicated is sent from the source local deduplication store to the remote server. The remote server generates a set of object hashes representative of data in the source object that is already present on the remote server, and data indicative of source object hashes that are not present on the remote server. The remote server transmits the generated data to the source local deduplication store. The source local deduplication store identifies portions of the source object that are not already present on the remote server based on the received data. The source local deduplication store transmits the identified portions of the source object to the remote server to replicate the source object on the remote server. | 11-27-2014 |
20150019556 | SYSTEM AND METHOD FOR MANAGING DEDUPLICATED COPIES OF DATA USING TEMPORAL RELATIONSHIPS AMONG COPIES - Systems and methods are disclosed for managing deduplicated images of data objects that change over time. The method includes: organizing unique content of each data object as a plurality of content segments and storing the content segments in a data store; for each data object, creating an organized arrangement of hash structures, wherein each structure, for a subset of the hash structures, includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object, and wherein each temporal state is associated with the hash structures representing the content of the data object during that temporal state. | 01-15-2015 |
20150106580 | SYSTEM AND METHOD FOR PERFORMING BACKUP OR RESTORE OPERATIONS UTILIZING DIFFERENCE INFORMATION AND TIMELINE STATE INFORMATION - Systems and methods for backing-up data from a first storage pool to a second storage pool using difference information between time states are disclosed. The system has a data management engine for performing data management functions, including at least a back-up function to create a back-up copy of data. By executing a sequence of snapshot operations to create point-in-time images of application data on a first storage pool, each successive point-in-time image corresponding to a specific, successive time-state of the application data, a series of snapshots is created. The snapshots are then used to create difference information indicating which application data has changed and the content of the changed application data for the corresponding time state. This difference information is then sent to a second storage pool to create a back-up copy of data for the current time-state. | 04-16-2015 |
20150142739 | DATA REPLICATION SYSTEM - Systems and methods are provided for an asynchronous data replication system in which the remote replication reduces bandwidth requirements by copying deduplicated differences in business data from a local storage site to a remote, backup storage site, the system comprising: a local performance storage pool for storing data; a local deduplicating storage pool for storing deduplicated data, said local deduplicating storage pool further storing metadata about data objects in the system and which has metadata analysis logic for identifying and specifying differences in a data object over time; a remote performance storage pool for storing a copy of said data, available for immediate use as a backup copy of said data to provide business continuity to said data; a remote deduplicating storage pool for storing deduplicated data; and a controller for synchronizing the remote performance storage pool to have the second version of the data object using deduplicated data. | 05-21-2015 |
20150143063 | SUCCESSIVE DATA FINGERPRINTING FOR COPY ACCURACY ASSURANCE - Systems and methods for checking data integrity of a data object copied between storage pools in a storage system by comparing data samples copied from data objects. A series of successive copy operations are scheduled over time for copying a data object from a source data store to a target data store. A first data sample is generated based on a sampling scheme comprising an offset and a period. A second data sample is generated using a similar sampling scheme. The blocks of data in the first data sample and the second data sample are compared to determine if they differ to thereby indicate that the data object at the target store differs from the corresponding data object at the source data store. | 05-21-2015 |
20150161159 | SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES USING HIGHER-LEVEL HASH STRUCTURES - Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments. The method also includes determining, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store. | 06-11-2015 |
20150161194 | SYSTEM AND METHOD FOR RAPID ESTIMATION OF DATA SIMILARITY - Systems and methods for estimating data similarity between an inserted volume of data and a stored volume of data during file backup of a deduplicated data store when the ancestry of the inserted data to previously-stored data is unknown to identify an ancestor of the inserted volume of data in the stored volume so that only incremental data of the inserted volume is stored, the systems and methods comprising ingesting a volume of data, creating a subset of bits for the ingested volume using a filtering process, creating a subset of bits for each volume of stored data using the filtering process, comparing the subset of bits for the ingested volume with the subset of bits for each of the stored volumes, and determining the subset of bits for a stored volume with the most bits in common with the subset of bits for the ingested volume. | 06-11-2015 |
20150178347 | SUCCESSIVE DATA FINGERPRINTING FOR COPY ACCURACY ASSURANCE - Systems and methods are disclosed for checking the data integrity of a data object copied between storage pools in a storage system by comparing data fingerprints of data objects, by scheduling a series of successive copy operations over time for copying a data object from a source data store to a target data store; generating a partial fingerprint of the data object at the source data store using a data fingerprinting operation that creates a fingerprint from a subset of data of the data object; sending the partial fingerprint of the data object to the target data store; sending any new data contents for the data object to the target data store; and creating a partial fingerprint of the data object at the target data store and comparing it to the partial fingerprint sent to the target data store to determine if they differ. | 06-25-2015 |
20150227600 | VIRTUAL DATA BACKUP - Techniques are disclosed for creating, in a network, a single instance of deduplicated data across a plurality of end user data. A first computing device receives data associated with a plurality of computing devices, the plurality of computing devices being managed by the first computing device. The first computing device aggregates and deduplicates the data associated with each of the plurality of computing devices. The deduplicated aggregated data set is then transmitted to a second computing device for further aggregation and deduplication with one or more additional aggregated data sets generated by other computing devices managing respective sets of computing devices. | 08-13-2015 |
20150227601 | VIRTUAL DATA BACKUP - Techniques are disclosed for remotely backing up data associated with a plurality of storage environments. A first computing device receives a storage type associated with a second computing device managed by the first computing device. Storage parameters are configured based on the storage type to customize a backup process for the second computing device based on the storage type. Data associated with the second computing device is protected using the storage parameters, wherein protecting data associated with the second computing device further includes copying at a first point in time a full copy of data associated with the second computing device, and copying changes to the data associated with the second computing device at a set of points in time later than the first point in time, the set of points in time being based on an end-user policy. | 08-13-2015 |
20150227602 | VIRTUAL DATA BACKUP - Techniques are disclosed for providing content data storage services to a remote device over the internet to enable access of the remote device in the cloud. A content data storage device receives data indicative of a subscription to content data storage services from a remote device. The content data storage device provisions cloud storage to provide the content data storage services subscribed to by the remote device. Data associated with the remote device is replicated to the provisioned cloud storage to provide a replicated device in the cloud. Data indicative of a request to use the replicated device in the cloud is received. The content data storage device executes the replicated device in the cloud, thereby providing access of the remote device in the cloud for the remote device. | 08-13-2015 |