Patent application number | Description | Published |
20100161554 | ASYNCHRONOUS DISTRIBUTED DE-DUPLICATION FOR REPLICATED CONTENT ADDRESSABLE STORAGE CLUSTERS - A method is performed by a device of a group of devices in a distributed data replication system. The method includes storing an index of objects in the distributed data replication system, the index being replicated while the objects are stored locally by the plurality of devices in the distributed data replication system. The method also includes conducting a scan of at least a portion of the index and identifying a redundant replica(s) of the at least one of the objects based on the scan of the index. The method further includes de-duplicating the redundant replica(s), and updating the index to reflect the status of the redundant replica. | 06-24-2010 |
20100161688 | ASYNCHRONOUS DISTRIBUTED GARBAGE COLLECTION FOR REPLICATED STORAGE CLUSTERS - A method may be performed by a device of a group of devices in a distributed data replication system. The method may include storing objects in a data store, at least one or more of the objects being replicated with the distributed data replication system, and conducting a scan of the objects in the data store. The method may further include identifying one of the objects as not having a reference pointing to the object, storing a delete negotiation message as metadata associated with the one of the objects, and replicating the metadata with the delete negotiation message to one or more other devices of the group of devices. | 06-24-2010 |
20100268902 | ASYNCHRONOUS DISTRIBUTED OBJECT UPLOADING FOR REPLICATED CONTENT ADDRESSABLE STORAGE CLUSTERS - A method is performed by two or more devices of a group of devices in a distributed data replication system. The method includes receiving, at the two or more devices, a group of chunks having a same unique temporary identifier, where the group of chunks comprises an object to be uploaded; creating an entry for the object in a replicated index, where the entry is keyed by the unique temporary identifier, and where the replicated index is replicated at each of the two or more devices; and determining, by an initiating device of the two or more devices, that a union of the group of chunks contains all data of the object. The method also includes calculating a content-based identifier to the object; creating another entry for the object in the replicated index, where the other entry is keyed by the content-based identifier; and updating the replicated index to point from the unique temporary identifier to the content-based identifier. | 10-21-2010 |
20110196664 | Location Assignment Daemon (LAD) Simulation System and Method - A system and method for simulating a state of a distributed storage system is provided. A current state of a distributed storage system and replication policies for the objects in the distributed storage system is obtained. Proposed modifications to the current state of the distributed storage system are received. The state of the distributed storage system is simulated over time based on the current state of the distributed storage system, the replication policies for the objects in the distributed storage system, and the proposed modifications to the current state of the distributed storage system. Then reports relating to the time evolution of the current state of the distributed storage system are generated based on the simulation. | 08-11-2011 |
20110196822 | Method and System For Uploading Data Into A Distributed Storage System - A method for uploading an object into a distributed storage system is implemented at a computing device The computing device splits an object into one or more chunks and uploads the one or more chunks into the distributed storage system. For each uploaded chunk, the computing device receives a write token from the distributed storage system, inserts an entry into an extents table of the object for the chunk in accordance with the received write token and the chunk ID, chunk offset, and chunk size of the chunk, generates a digest of the extents table, the digest representing the one or more chunks that the client expects to be within the distributed storage system, and sends the digest of the extents table to the distributed storage system. The distributed storage system is configured to use the digest to determine whether it has each of the one or more client-expected chunks. | 08-11-2011 |
20110196828 | Method and System for Dynamically Replicating Data Within A Distributed Storage System - A server computer at a first storage sub-system of a distributed storage system receives from a client a first client request for an object. If the object is not present in the first storage sub-system, the server computer identifies a second storage sub-system of the distributed storage system as having a replica of the requested object, the requested object including content and metadata. The server computer submits an object replication request for the requested object to the second storage sub-system and independently receives the content and metadata of the requested object from the second storage sub-system. The server computer generates a new replica of the object at the first storage sub-system using the received metadata and content and returns the metadata of the new replica of the object to the client. | 08-11-2011 |
20110196830 | System and Method for Managing Replicas of Objects In A Distributed Storage System - A system and method for generating replication requests for objects in a distributed storage system is provided. Replication requests for objects in a distributed storage system are generated based at least in part on replication policies for the objects and a current state of the distributed storage system, wherein a respective replication request for a respective object instructs a respective instance of the distributed storage system to replicate the respective object so as to at least partially satisfy a replication policy for the respective object, wherein a respective replication policy includes criteria specifying at least storage device types on which replicas of object are to be stored. At least a subset of the replication requests is then distributed to the respective instances of the distributed storage system for execution. | 08-11-2011 |
20110196831 | Pruning of Blob Replicas - A system and method generating and distributing replica removal requests for objects in a distributed storage system is provided. Replica removal requests for objects in a distributed storage system are generated based at least in part on replication policies for the objects. A respective replica removal request instructs a respective instance of the distributed storage system to remove a respective replica of the respective object so as to at least partially satisfy replication policies for the respective object. Then the replica removal requests for the objects in the distributed storage system are distributed to respective instances of the distributed storage system corresponding to the replica removal requests for execution. | 08-11-2011 |
20110196832 | Location Assignment Daemon (LAD) For A Distributed Storage System - A system and method for generating replication requests for objects in a distributed storage system is provided. For a respective object in a distributed storage system the following is performed. Replication policies for the object that have not been satisfied are determined. Replication requests are ranked for the object whose replication policies have not been satisfied based on a number of replicas of the object that need to be created in order to satisfy the replication policies for the object. Replication requests are generated for the object based at least in part on the replication policies for the object that have not been satisfied and on a current state of the distributed storage system. At least a subset of the replication requests for the objects in the distributed storage system are distributed to respective instances of the distributed storage system corresponding to the replication requests for execution. | 08-11-2011 |
20110196833 | Storage of Data In A Distributed Storage System - A distributed storage system has multiple instances. There is a plurality of local instances, and at least some of the local instances are at physically distinct geographic locations. Each local instance is configured to store data for a non-empty set of blobs in a plurality of data stores having a plurality of distinct data store types. In addition, each local instance stores metadata for the respective set of blobs in a metadata store distinct from the data stores. There is also a plurality of global instances. Each global instance is configured to store data for zero or more blobs in zero or more data stores and store metadata for all blobs stored at any local or global instance. The system selects one global instance to run a replication module that replicates blobs between instances according to blob policies. Some systems also include dynamic replication based on user needs. | 08-11-2011 |
20110196838 | Method and System for Managing Weakly Mutable Data In A Distributed Storage System - A method for managing multiple generations of an object within a distributed storage system is implemented at a computing device. The computing device receives metadata and content of a first generation of an object from a first client connected to the distributed storage system and stores the first generation's metadata and content within a first storage sub-system. The computing device receives metadata and content of a second generation of the object from a second client connected to the distributed storage system and stores the second generation's metadata and content within a second storage sub-system. The computing device independently replicates the first generation's metadata and content from the first storage sub-system to the second storage sub-system and replicates the second generation's metadata and content from the second storage sub-system to the first storage sub-system such that both storage sub-systems include a replica of the object's first and second generations. | 08-11-2011 |
20110196900 | Storage of Data In A Distributed Storage System - A distributed storage system stores data for files. A first blob (binary large object) of data is received. The first blob is split into one or more first chunks of data. Content fingerprints for the first chunks of data are computed. The first chunks of data are stored in a chunk store while and their content fingerprints are stored in a store distinct from the chunk store. A second blob of data is received. The second blob is split into one or more second chunks of data. Content fingerprints for the second chunks of data are computed. Then for a second chunk of data whose content fingerprint matches a content fingerprint of a first chunk of data, a second reference to the corresponding first chunk of data that has a matching content fingerprint is stored, but the second chunk of data is not stored. | 08-11-2011 |
20110196901 | System and Method for Determining the Age of Objects in the Presence of Unreliable Clocks - A system and method for determining an age of an object is provided. A first index for a timestamp entry in a sequence of timestamps corresponding to a time at which an object was created is identified. At least one subsequence of timestamps from the sequence of timestamps having indexes for entries in the sequence of timestamps that are between the first index in the sequence of timestamps and a last index for a last timestamp entry in the sequence of timestamps is identified, wherein the at least one subsequence of timestamps conforms to a function of a time interval between storage of consecutive current timestamps reported by clock of the computer system. Timestamps from the sequence of timestamps that are not included in the at least one subsequence of timestamps are removed. An age of the object is determined based on the at least one subsequence of timestamps. | 08-11-2011 |
20120259948 | ASYNCHRONOUS DISTRIBUTED OBJECT UPLOADING FOR REPLICATED CONTENT ADDRESSABLE STORAGE CLUSTERS - A method performed by two or more devices of a group of devices in a distributed data replication system may include receiving a group of chunks having a same unique temporary identifier, the group of chunks comprising an object to be uploaded; creating an entry for the object in a replicated index, the entry being keyed by the unique temporary identifier, and the replicated index being replicated at each of the two or more devices; and determining, by an initiating device of the two or more devices, that a union of the group of chunks contains all data of the object. The method may also include calculating a content-based identifier to the object; creating another entry for the object in the replicated index, the other entry being keyed by the content-based identifier; and updating the replicated index to point from the unique temporary identifier to the content-based identifier. | 10-11-2012 |
20130124470 | ASYNCHRONOUS DISTRIBUTED GARBAGE COLLECTION FOR REPLICATED STORAGE CLUSTERS - A method may be performed by a device of a group of devices in a distributed data replication system. The method may include storing objects in a data store, at least one or more of the objects being replicated with the distributed data replication system, and conducting a scan of the objects in the data store. The method may further include identifying one of the objects as not having a reference pointing to the object, storing a delete negotiation message as metadata associated with the one of the objects, and replicating the metadata with the delete negotiation message to one or more other devices of the group of devices. | 05-16-2013 |
20130268486 | ASYNCHRONOUS DISTRIBUTED OBJECT UPLOADING FOR REPLICATED CONTENT ADDRESSABLE STORAGE CLUSTERS - A method performed by two or more devices of a group of devices in a distributed data replication system may include receiving a group of chunks having a same unique temporary identifier, the group of chunks comprising an object to be uploaded; creating an entry for the object in a replicated index, the entry being keyed by the unique temporary identifier, and the replicated index being replicated at each of the two or more devices; and determining, by an initiating device of the two or more devices, that a union of the group of chunks contains all data of the object. The method may also include calculating a content-based identifier to the object; creating another entry for the object in the replicated index, the other entry being keyed by the content-based identifier; and updating the replicated index to point from the unique temporary identifier to the content-based identifier. | 10-10-2013 |
20140032200 | Systems and Methods of Simulating the State of a Distributed Storage System - A distributed storage system has a plurality of instances. A computer system simulates the state of the distributed storage system. The system obtains a current state of the distributed storage system and replication policies for objects in the distributed storage system. Each replication policy specifies criteria for placing copies of the relevant objects among the plurality of instances. The system receives proposed modifications to the state of the distributed storage system and simulates the state of the distributed storage system over time based on the current state of the distributed storage system, current statistical trends in the state of the distributed storage system, the replication policies for the objects in the distributed storage system, and the proposed modifications to the state of the distributed storage system. One or more reports are generated relating to time evolution of the state of the distributed storage system based on the simulation. | 01-30-2014 |
20140236888 | Asynchronous Distributed De-Duplication for Replicated Content Addressable Storage Clusters - A method is performed by a device of a group of devices in a distributed data replication system. The method includes storing an index of objects in the distributed data replication system, the index being replicated while the objects are stored locally by the plurality of devices in the distributed data replication system. The method also includes conducting a scan of at least a portion of the index and identifying a redundant replica(s) of the at least one of the objects based on the scan of the index. The method further includes de-duplicating the redundant replica(s), and updating the index to reflect the status of the redundant replica. | 08-21-2014 |
20140304240 | Pruning of Blob Replicas - A method allocates object replicas in a distributed storage system. The method identifies a plurality of objects in the distributed storage system. Each object has an associated storage policy that specifies a target number of object replicas stored at distinct instances of the distributed storage system. The method identifies an object of the plurality of objects whose number of object replicas exceeds the target number of object replicas specified by the storage policy associated with the object. The method selects a first replica of the object for removal based on last access times for replicas of the object, and transmits a request to a first instance of the distributed storage system that stores the first replica. The request instructs the first instance to remove the first replica of the object. | 10-09-2014 |
20150026128 | STORAGE OF DATA IN A DISTRIBUTED STORAGE SYSTEM - A distributed storage system has multiple instances. There is a plurality of local instances, and at least some of the local instances are at physically distinct geographic locations. Each local instance is configured to store data for a non-empty set of blobs in a plurality of data stores having a plurality of distinct data store types. In addition, each local instance stores metadata for the respective set of blobs in a metadata store distinct from the data stores. There is also a plurality of global instances. Each global instance is configured to store data for zero or more blobs in zero or more data stores and store metadata for all blobs stored at any local or global instance. The system selects one global instance to run a replication module that replicates blobs between instances according to blob policies. Some systems also include dynamic replication based on user needs. | 01-22-2015 |