Patent application number | Description | Published |
20080228691 | Concurrent extensible cuckoo hashing - Concurrent cuckoo hashing is performed on a hash table that includes a number of locations; each may hold a value. A plurality of processes may concurrently execute on the table; each process includes a sequence of operations, which are divided into a number of phases. Each phase corresponds to one operation in the sequence. An overflow buffer is provided for each location in the table. Each overflow buffer may hold a value displaced from its corresponding location in the table. A plurality of sequences of operations is concurrently executed. Each phase in a sequence executes by acquiring one or two locks on two locations in the table; a lock acts on a location and its overflow buffer. An operation of a phase is then executed. If, on conclusion of the phase execution, any overflow buffer holds a value, the execution is repeated until all overflow buffers are empty. | 09-18-2008 |
20080256074 | EFFICIENT IMPLICIT PRIVATIZATION OF TRANSACTIONAL MEMORY - Apparatus, methods, and program products are disclosed that provide a technology that implicitly isolates a portion of a transactional memory that is shared between multiple threads for exclusive use by an isolating thread without the possibility of other transactions modifying the isolated portion of the transactional memory. | 10-16-2008 |
20090132563 | SIMPLE OPTIMISTIC SKIPLIST - Apparatus, methods, and computer program products are disclosed for concurrently searching a memory containing a skiplist data structure. The method locates the skiplist data structure in the memory. The skiplist data structure includes a plurality of linked lists related by a skiplist invariant. Furthermore, the plurality of linked lists includes a first-level linked list and one or more higher-level linked lists. The skiplist data structure also includes a plurality of nodes, each of which includes a key field, at least one pointer field, and a lock field, respectively. Each of the plurality of nodes is linked to the first-level linked list through the at least one pointer field and ordered responsive to the key field. The method performs a search operation on the skiplist data structure, while the skiplist data structure is subject to concurrent alteration of the plurality of nodes by a plurality of execution threads that are configured to maintain the skiplist invariant and returns a result of the search operation. | 05-21-2009 |
20090172327 | Optimistic Semi-Static Transactional Memory Implementations - A lock-based software transactional memory (STM) implementation may determine whether a transaction's write-set is static (e.g., known in advance not to change). If so, and if the read-set is not static, the STM implementation may execute, or attempt to execute, the transaction as a semi-static transaction. A semi-static transaction may involve obtaining, possibly after incrementing, a reference version value against which to subsequently validate that memory locations, such as read-set locations, have not been modified concurrently with the semi-static transaction. The read-set locations may be validated while locks are held for the locations to be written (e.g., the write-set locations). After committing the modifications to the write-set locations and as part of releasing the locks, versioned write-locks associated with the write-set locations may be updated to reflect the previously obtained, or newly incremented, reference version value. | 07-02-2009 |
20100042584 | CONCURRENT LOCK-FREE SKIPLIST WITH WAIT-FREE CONTAINS OPERATOR - Apparatus, methods, and computer program products are disclosed for performing a wait-free search of a concurrent, lock-free skiplist to determine existence of a sought-after key. | 02-18-2010 |
20100174875 | System and Method for Transactional Locking Using Reader-Lists - In traditional transactional locking systems, such as TLRW, threads may frequently update lock metadata, causing system performance degradation. A system and method for implementing transactional locking using reader-lists (TLRL) may associate a respective reader-list with each stripe of data in a shared memory system. Before reading a given stripe as part of a transaction, a thread may add itself to the stripe's reader-list, if the thread is not already on the reader-list. A thread may leave itself on a reader-list after finishing the transaction. Before a thread modifies a stripe, the modifying thread may acquire a write-lock for the stripe. The writer thread may indicate to each reader thread on the stripe's reader-list that if the reader thread is executing a transaction, the reader thread should abort. The indication may include setting an invalidation flag for the reader. The writer thread may clear the reader-list of a stripe it modified. | 07-08-2010 |
20100332770 | Concurrency Control Using Slotted Read-Write Locks - A system and method for concurrency control may use slotted read-write locks. A slotted read-write lock is a lock data structure associated with a shared memory area, wherein the slotted read-write lock indicates whether any thread has a read-lock and/or a write-lock for the shared memory area. Multiple threads may concurrently have the read-lock but only one thread can have the write-lock at any given time. The slotted read-write lock comprises multiple slots, each associated with a single thread. To acquire the slotted read-write lock for reading, a thread assigned to a slot performs a store operation to the slot and then attempts to determine that no other thread holds the slotted read-write lock for writing. To acquire the slotted read-write lock for writing, a thread assigned to a slot sets its write-bit and then attempts to determine that the write-lock is not held. | 12-30-2010 |
20100333095 | Bulk Synchronization in Transactional Memory Systems - A method and system for acquiring multiple software locks in bulk is disclosed. When multiple locks need to be acquired, such as for atomic transactions in transactional memory systems, the disclosed techniques may be applied to consolidate computationally expensive memory barrier operations across the lock acquisitions. A system may acquire multiple locks in bulk, at least in part, by modifying values in one or more fields of multiple locks and by then performing a memory barrier operation to ensure that the modified values in the multiple locks are visible to other application threads. The technique may be repeated for locks that the system fails to acquire during earlier iterations until all required locks are acquired. The described technique may be applied to various scenarios including static and/or dynamic transactional locking protocols. | 12-30-2010 |
20100333096 | Transactional Locking with Read-Write Locks in Transactional Memory Systems - A system and method for transactional memory using read-write locks is disclosed. Each of a plurality of shared memory areas is associated with a respective read-write lock, which includes a read-lock portion indicating whether any thread has a read-lock for read-only access to the memory area and a write-lock portion indicating whether any thread has a write-lock for write access to the memory area. A thread executing a group of memory access operations as an atomic transaction acquires the proper read or write permissions before performing a memory operation. To perform a read access, the thread attempts to obtain the corresponding read-lock and succeeds if no other thread holds a write-lock for the memory area. To perform a write-access, the thread attempts to obtain the corresponding write-lock and succeeds if no other thread holds a write-lock or read-lock for the memory area. | 12-30-2010 |
20110138135 | Fast and Efficient Reacquisition of Locks for Transactional Memory Systems - A system and method is disclosed for fast lock acquisition and release in a lock-based software transactional memory system. The method includes determining that a group of shared memory areas are likely to be accessed together in one or more atomic memory transactions executed by one or more threads of a computer program in a transactional memory system. In response to determining this, the system associates the group of memory areas with a single software lock that is usable by the transactional memory system to coordinate concurrent transactional access to the group of memory areas by the threads of the computer program. Subsequently, a thread of the program may gain access to a plurality of the memory areas of the group by acquiring the single software lock. | 06-09-2011 |
20110246727 | System and Method for Tracking References to Shared Objects Using Byte-Addressable Per-Thread Reference Counters - The system described herein may track references to a shared object by concurrently executing threads using a reference tracking data structure that includes an owner field and an array of byte-addressable per-thread entries, each including a per-thread reference counter and a per-thread counter lock. Slotted threads assigned to a given array entry may increment or decrement the per-thread reference counter in that entry in response to referencing or dereferencing the shared object. Unslotted threads may increment or decrement a shared unslotted reference counter. A thread may update the data structure and/or examine it to determine whether the number of references to the shared object is zero or non-zero using a blocking-optimistic or a non-blocking mechanism. A checking thread may acquire ownership of the data structure, obtain an instantaneous snapshot of all counters, and return a value indicating whether the number of references to the shared object is zero or non-zero. | 10-06-2011 |
20120278576 | EFFICIENT NON-BLOCKING K-COMPARE-SINGLE-SWAP OPERATION - The design of nonblocking linked data structures using single-location synchronization primitives such as compare-and-swap (CAS) is a complex affair that often requires severe restrictions on the way pointers are used. One way to address this problem is to provide stronger synchronization operations, for example, ones that atomically modify one memory location while simultaneously verifying the contents of others. We provide a simple and highly efficient nonblocking implementation of such an operation: an atomic k-word-compare single-swap operation (KCSS). Our implementation is obstruction-free. As a result, it is highly efficient in the uncontended case and relies on contention management mechanisms in the contended cases. It allows linked data structure manipulation without the complexity and restrictions of other solutions. Additionally, as a building block of some implementations of our techniques, we have developed the first nonblocking software implementation of load-linked/store-conditional that does not severely restrict word size. | 11-01-2012 |
20120311606 | System and Method for Implementing Hierarchical Queue-Based Locks Using Flat Combining - The system and methods described herein may be used to implement a scalable, hierarchal, queue-based lock using flat combining. A thread executing on a processor core in a cluster of cores that share a memory may post a request to acquire a shared lock in a node of a publication list for the cluster using a non-atomic operation. A combiner thread may build an ordered (logical) local request queue that includes its own node and nodes of other threads (in the cluster) that include lock requests. The combiner thread may splice the local request queue into a (logical) global request queue for the shared lock as a sub-queue. A thread whose request has been posted in a node that has been combined into a local sub-queue and spliced into the global request queue may spin on a lock ownership indicator in its node until it is granted the shared lock. | 12-06-2012 |
20130047011 | System and Method for Enabling Turbo Mode in a Processor - The systems and methods described herein may enable a processor core to run at higher speeds than other processor cores in the same package. A thread executing on one processor core may begin waiting for another thread to complete a particular action (e.g., to release a lock). In response to determining that other threads are waiting, the thread/core may enter an inactive state. A data structure may store information indicating which threads are waiting on which other threads. In response to determining that a quorum of threads/cores are in an inactive state, one of the threads/cores may enter a turbo mode in which it executes at a higher speed than the baseline speed for the cores. A thread holding a lock and executing in turbo mode may perform work delegated by waiting threads at the higher speed. A thread may exit the inactive state when the waited-for action is completed. | 02-21-2013 |
20130290583 | System and Method for NUMA-Aware Locking Using Lock Cohorts - The system and methods described herein may be used to implement NUMA-aware locks that employ lock cohorting. These lock cohorting techniques may reduce the rate of lock migration by relaxing the order in which the lock schedules the execution of critical code sections by various threads, allowing lock ownership to remain resident on a single NUMA node longer than under strict FIFO ordering, thus reducing coherence traffic and improving aggregate performance. A NUMA-aware cohort lock may include a global shared lock that is thread-oblivious, and multiple node-level locks that provide cohort detection. The lock may be constructed from non-NUMA-aware components (e.g., spin-locks or queue locks) that are modified to provide thread-obliviousness and/or cohort detection. Lock ownership may be passed from one thread that holds the lock to another thread executing on the same NUMA node without releasing the global shared lock. | 10-31-2013 |
20130290967 | System and Method for Implementing NUMA-Aware Reader-Writer Locks - NUMA-aware reader-writer locks may leverage lock cohorting techniques to band together writer requests from a single NUMA node. The locks may relax the order in which the lock schedules the execution of critical sections of code by reader threads and writer threads, allowing lock ownership to remain resident on a single NUMA node for long periods, while also taking advantage of parallelism between reader threads. Threads may contend on node-level structures to get permission to acquire a globally shared reader-writer lock. Writer threads may follow a lock cohorting strategy of passing ownership of the lock in write mode from one thread to a cohort writer thread without releasing the shared lock, while reader threads from multiple NUMA nodes may simultaneously acquire the shared lock in read mode. The reader-writer lock may follow a writer-preference policy, a reader-preference policy or a hybrid policy. | 10-31-2013 |
20140181821 | Methods and Systems for Enhancing Hardware Transactions Using Hardware Transactions in Software Slow-Path - Hybrid transaction memory systems and accompanying methods. A transaction to be executed is received, and an initial attempt is made to execute the transaction in a hardware path. Upon a failure to successfully execute the transaction in the hardware path, an attempt is made to execute the transaction in a hardware-software path. The hardware-software path includes a software path and at least one hardware transaction. | 06-26-2014 |