Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Christopher A. Provenzano, Somerville US

Christopher A. Provenzano, Somerville, MA US

Patent application number	Description	Published
20090043967	Caching of information according to popularity - A system includes logic to cache at least one block in at least one cache if the block has a popularity that compares favorably to the popularity of other blocks in the cache, where the popularity of the block is determined by reads of the block from persistent storage and reads of the block from the cache.	02-12-2009
20120124012	SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES AND BY INGESTING DIFFERENCE DATA - Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments, whereby a deduplicated image of the data object for a second temporal state is stored without requiring reception of a complete image of the data object for the second temporal state.	05-17-2012
20120124013	SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA STORING NON-LOSSY ENCODINGS OF DATA DIRECTLY IN A CONTENT ADDRESSABLE STORE - Systems and methods are disclosed for storing deduplicated images in which a portion of the image is stored in encoded form directly in a hash table, the method comprising: organizing unique content of each data object as a plurality of content segments and storing the content segments in a data store; receiving content to be included in the deduplicated image of the data object; determining if the received content may be encoded using a predefined non-lossy encoding technique and in which the encoded value would fit within the field for containing a hash signature; if so, placing the encoding in the field and marking the hash structure to indicate that the field contains encoded content; otherwise, generating a hash signature for the received content and placing the hash signature in the field and placing the received content in a corresponding content segment if it is unique.	05-17-2012
20120124014	SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY SENDING DIFFERENCE DATA BETWEEN NEAR-NEIGHBOR TEMPORAL STATES - Systems and methods are disclosed for using a first deduplicating store to update a second deduplicating store with information representing how data objects change over time, said method comprising: at a first and a second deduplicating store, for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object and wherein the logical arrangement of structures is indicative of the changing temporal states of the data object; finding a temporal state that is common to and in temporal proximity to the current state of the first and second deduplicating stores; and compiling and sending a set of hash signatures for the content that has changed from the common state to the current temporal state of the first deduplicating store.	05-17-2012
20120124046	SYSTEM AND METHOD FOR MANAGING DEDUPLICATED COPIES OF DATA USING TEMPORAL RELATIONSHIPS AMONG COPIES - Systems and methods are disclosed for managing deduplicated images of data objects that change over time. The method includes: organizing unique content of each data object as a plurality of content segments and storing the content segments in a data store; for each data object, creating an organized arrangement of hash structures, wherein each structure, for a subset of the hash structures, includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object, and wherein each temporal state is associated with the hash structures representing the content of the data object during that temporal state.	05-17-2012
20120124105	SYSTEM AND METHOD FOR IMPROVED GARBAGE COLLECTION OPERATIONS IN A DEDUPLICATED STORE BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES - Systems and methods are disclosed for performing garbage collection to identify content segments no longer referenced in a deduplicating storage system in which redundant mark operations in a mark-and-sweep technique are avoided. An organized arrangement of hash structures is created for each data object, wherein each structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and the logical organization of the arrangement represents the logical organization of the content segments as they are represented within the data object. Additionally, for each data object, temporal states are maintained over time. Garbage collection iterates over the temporal structures and, for each temporal structure, marks the garbage collection state for the associated content segments for only the content segments that have changed relative to an immediately prior temporal state of the data object.	05-17-2012
20120124306	SYSTEM AND METHOD FOR PERFORMING BACKUP OR RESTORE OPERATIONS UTILIZING DIFFERENCE INFORMATION AND TIMELINE STATE INFORMATION - Systems and methods for backing-up data from a first storage pool to a second storage pool using difference information between time states are disclosed. The system has a data management engine for performing data management functions, including at least a back-up function to create a back-up copy of data. By executing a sequence of snapshot operations to create point-in-time images of application data on a first storage pool, each successive point-in-time image corresponding to a specific, successive time-state of the application data, a series of snapshots is created. The snapshots are then used to create difference information indicating which application data has changed and the content of the changed application data for the corresponding time state. This difference information is then sent to a second storage pool to create a back-up copy of data for the current time-state.	05-17-2012
20130036091	INCREMENTAL COPY PERFORMANCE BETWEEN DATA STORES - Systems and methods are disclosed for copying a data object to a target storage pool using a hybrid of storage pools, in which at least one of the storage pools is particularly efficient at identifying data that should be used for copying the data object to the target storage pool, and at least one of the storage pools is particularly efficient at retrieving the data that should be sent to the target storage pool. The system comprises a performance storage pool for storing data and having relatively high performance for retrieving stored data; a deduplicating storage pool for storing deduplicated data and storing metadata about data objects in the system and which has relatively high performance for identifying and specifying differences in a data object over time; and a controller for causing the performance storage pool to retrieve differences and provide the data to the target storage pool.	02-07-2013
20130036097	DATA FINGERPRINTING FOR COPY ACCURACY ASSURANCE - Systems and methods are disclosed for efficiently creating a data fingerprint to identify or characterize contents of a data object by using a selection function to select a plurality of non-contiguous regions from the data object, the selected regions each having a small number of bytes relative to the number of bytes in the data object and being distributed throughout the data object so that the selected regions comprise a sparse subset of the data of the data object yet provide a significant probability of including bytes that change if the data object were modified; and performing a hash operation on the data to produce a fingerprint based on the sparse subset of the data object. The data fingerprint thereby efficiently provides an indication of the contents of the data object, so that comparing data fingerprints can determine if the data objects are different if the corresponding fingerprints are different.	02-07-2013
20130036098	SUCCESSIVE DATA FINGERPRINTING FOR COPY ACCURACY ASSURANCE - Systems and methods are disclosed for checking the data integrity of a data object copied between storage pools by comparing data fingerprints of data objects, comprising scheduling a series of successive copy operations over time for copying a data object from a source data store to a target data store; generating a partial fingerprint of the data object at the source data store that creates a fingerprint from a subset of the data object; sending the partial fingerprint of the data object to the target data store; sending any new data contents to the target data store; and creating a partial fingerprint of the data object at the target data store and comparing it to the received partial fingerprint to determine if they differ, thereby allowing incremental verification that the copy of the data object at the target data store is the same as at the source data store.	02-07-2013
20130042083	Data Replication System - Systems and methods are provided for an asynchronous data replication system in which the remote replication reduces bandwidth requirements by copying deduplicated differences in business data from a local storage site to a remote, backup storage site, the system comprising: a local performance storage pool for storing data; a local deduplicating storage pool for storing deduplicated data, said local deduplicating storage pool further storing metadata about data objects in the system and which has metadata analysis logic for identifying and specifying differences in a data object over time; a remote performance storage pool for storing a copy of said data, available for immediate use as a backup copy of said data to provide business continuity to said data; a remote deduplicating storage pool for storing deduplicated data; and a controller for synchronizing the remote performance storage pool to have the second version of the data object using deduplicated data.	02-14-2013
20130226884	SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY SENDING DIFFERENCE DATA BETWEEN NEAR-NEIGHBOR TEMPORAL STATES - Systems and methods are disclosed for using a first deduplicating store to update a second deduplicating store with information representing how data objects change over time, said method including: at a first and a second deduplicating store, for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object and wherein the logical arrangement of structures is indicative of the changing temporal states of the data object; finding a temporal state that is common to and in temporal proximity to the current state of the first and second deduplicating stores; and compiling and sending a set of hash signatures for the content that has changed from the common state to the current temporal state of the first deduplicating store.	08-29-2013
20130318053	SYSTEM AND METHOD FOR CREATING DEDUPLICATED COPIES OF DATA BY TRACKING TEMPORAL RELATIONSHIPS AMONG COPIES USING HIGHER-LEVEL HASH STRUCTURES - Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments. The method also includes determining, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store.	11-28-2013
20130339319	SYSTEM AND METHOD FOR CACHING HASHES FOR CO-LOCATED DATA IN A DEDUPLICATION DATA STORE - Systems and methods are provided for caching hashes for deduplicated data. A request to read data from the deduplication data store is received. A persist header stored in a deduplication data store is identified in a first hash structure that is not stored in memory of the computing device. The persist header comprises a set of hashes that includes a hash that is indicative of the data the computing device requested to read. Each hash in the set of hashes represents data stored in the deduplication data store after the persist header that is co-located with other data represented by the remaining hashes in the set of hashes. The set of hashes is cached in a second hash structure stored in the memory, whereby the computing device can identify the additional data using the second hash structure if the additional data is represented by the persist header.	12-19-2013
20140344216	Garbage collection predictions - Described herein are systems and methods for garbage collection prediction. A temporal graph is received, the temporal graph including nodes, the nodes including hash references to objects. An accumulated difference count is updated when a node is added to the temporal graph, the accumulated difference count including a number of hash differences between a parent node and its children nodes in the temporal graph. A divested difference count is updated when a node is removed from the temporal graph, the divested difference count including a number of hash differences referenced by the removed node but not by either a parent node of the removed node or any child nodes of the removed node. The outcome of the garbage collection is predicted based on at least one of the accumulated difference count and the divested difference count.	11-20-2014
20140351214	Efficient data replication - Described herein are systems and methods for efficient data replication. A set of hashes for a source object to be replicated is sent from the source local deduplication store to the remote server. The remote server generates a set of object hashes representative of data in the source object that is already present on the remote server, and data indicative of source object hashes that are not present on the remote server. The remote server transmits the generated data to the source local deduplication store. The source local deduplication store identifies portions of the source object that are not already present on the remote server based on the received data. The source local deduplication store transmits the identified portions of the source object to the remote server to replicate the source object on the remote server.	11-27-2014
20150019556	SYSTEM AND METHOD FOR MANAGING DEDUPLICATED COPIES OF DATA USING TEMPORAL RELATIONSHIPS AMONG COPIES - Systems and methods are disclosed for managing deduplicated images of data objects that change over time. The method includes: organizing unique content of each data object as a plurality of content segments and storing the content segments in a data store; for each data object, creating an organized arrangement of hash structures, wherein each structure, for a subset of the hash structures, includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and for each data object, maintaining an organized arrangement of temporal structures to represent a corresponding data object over time, wherein each structure is associated with a temporal state of the data object, and wherein each temporal state is associated with the hash structures representing the content of the data object during that temporal state.	01-15-2015

Patent applications by Christopher A. Provenzano, Somerville, MA US