Patent application number | Description | Published |
20080229139 | Space-and Time- Adaptive Nonblocking Algorithms - We explore techniques for designing nonblocking algorithms that do not require advance knowledge of the number of processes that participate, whose time complexity and space consumption both adapt to various measures, rather than being based on predefined worst-case scenarios, and that cannot be prevented from future memory reclamation by process failures. These techniques can be implemented using widely available hardware synchronization primitives. We present our techniques in the context of solutions to the well-known Collect problem. We also explain how our techniques can be exploited to achieve other results with similar properties; these include long-lived renaming and dynamic memory management for nonblocking data structures. | 09-18-2008 |
20090125548 | System and Method for Implementing Shared Scalable Nonzero Indicators - A Scalable NonZero Indicator (SNZI) object in a concurrent computing application may include a shared data portion (e.g., a counter portion) and a shared nonzero indicator portion, and/or may be an element in a hierarchy of SNZI objects that filters changes in non-root nodes to a root node. SNZI objects may be accessed by software applications through an API that includes a query operation to return the value of the nonzero indicator, and arrive (increment) and depart (decrement) operations. Modifications of the data portion and/or the indicator portion may be performed using atomic read-modify-write type operations. Some SNZI objects may support a reset operation. A shared data object may be set to an intermediate value, or an announce bit may be set, to indicate that a modification is in progress that affects its corresponding indicator value. Another process or thread seeing this indication may “help” complete the modification before proceeding. | 05-14-2009 |
20090171962 | System and Method for Implementing Nonblocking Zero-Indirection Transactional Memory - Systems and methods for implementing and using nonblocking zero-indirection software transactional memory (NZSTM) are disclosed. NZSTM systems implement object-based software transactional memory that eliminates all levels of indirection except in the uncommon case of a conflict with an unresponsive thread. Shared data is co-located with a header in an NZObject, and is addressable at a fixed offset from the header. Conflicting transactions are requested to abort themselves without being forced to abort. NZObjects are modified in place when there are no conflicts, and when a conflicting transaction acknowledges the abort request. In the uncommon case, NZObjects are inflated to introduce a locator and some levels of indirection, and are restored to their un-inflated form following resolution of the conflict. In some embodiments, transactions are executed using best effort hardware transactional memory if it is available and effective, and software transactional memory if not, yielding a hybrid transactional memory system, NZTM. | 07-02-2009 |
20090172299 | System and Method for Implementing Hybrid Single-Compare-Single-Store Operations - A hybrid Single-Compare-Single-Store (SCSS) operation may exploit best-effort hardware transactional memory (HTM) for good performance in the case that it succeeds, and may transparently resort to software-mediated transactions if the hardware transactional mechanisms fail. The SCSS operation may compare a value in a control location to a specified expected value, and if they match, may store a new value in a separate data location. The control value may include a global lock, a transaction status indicator, and/or a portion of an ownership record, in different embodiments. If another transaction in progress owns the data location, the SCSS operation may abort the other transaction or may help it complete by copying the other transactions' write set into its own right set before acquiring ownership. A hybrid SCSS operation, which is usually nonblocking, may be applied to building software transactional memories (STMs) and/or hybrid transactional memories (HyTMs), in some embodiments. | 07-02-2009 |
20090172306 | System and Method for Supporting Phased Transactional Memory Modes - A phased transactional memory (PhTM) may support a plurality of transactional memory implementations, including software, hardware, and hybrid implementations, and may provide mechanisms for dynamically transitioning between transactional memory modes in response to changing workload characteristics; upon discovering that the current mode does not perform well, is not suitable, or does not support functionality required for particular transactions; or according to scheduled phases. A system providing PhTM may be configured to transition from a first transactional memory mode to a second transactional memory mode while ensuring that transactions executing in the first transactional memory mode do not interfere with correct execution of transactions in the second transactional memory mode. The system may be configured to abort transactions in progress or to wait for transactions to complete, be aborted, or reach a safe transition point before transitioning to a new mode, and may use a global mode indicator in coordinating transitions. | 07-02-2009 |
20090282386 | System and Method for Utilizing Available Best Effort Hardware Mechanisms for Supporting Transactional Memory - Systems and methods for managing divergence of best effort transactional support mechanisms in various transactional memory implementations using a portable transaction interface are described. This interface may be implemented by various combinations of best effort hardware features, including none at all. Because the features offered by this interface may be best effort, a default (e.g., software) implementation may always be possible without the need for special hardware support. Software may be written to the interface, and may be executable on a variety of platforms, taking advantage of best effort hardware features included on each one, while not depending on any particular mechanism. Multiple implementations of each operation defined by the interface may be included in one or more portable transaction interface libraries. Systems and/or application software may be written as platform-independent and/or portable, and may call functions of these libraries to implement the operations for a targeted execution environment. | 11-12-2009 |
20090282405 | System and Method for Integrating Best Effort Hardware Mechanisms for Supporting Transactional Memory - Systems and methods for integrating multiple best effort hardware transactional support mechanisms, such as Read Set Monitoring (RSM) and Best Effort Hardware Transactional Memory (BEHTM), in a single transactional memory implementation are described. The best effort mechanisms may be integrated such that the overhead associated with support of multiple mechanisms may be reduced and/or the performance of the resulting transactional memory implementations may be improved over those that include any one of the mechanisms, or an un-integrated collection of multiple such mechanisms. Two or more of the mechanisms may be employed concurrently or serially in a single attempt to execute a transaction, without aborting or retrying the transaction. State maintained or used by a first mechanism may be shared with or transferred to another mechanism for use in execution of the transaction. This transfer may be performed automatically by the integrated mechanisms (e.g., without user, programmer, or software intervention). | 11-12-2009 |
20090282410 | Systems and Methods for Supporting Software Transactional Memory Using Inconsistency-Aware Compilers and Libraries - Systems and methods to reduce overhead associated with read set consistency validation in software transactional memory implementations are disclosed. These systems and methods may employ an inconsistency-aware compiler-library technique, in which an inconsistency-aware compiler communicates to various inconsistency-aware library functions knowledge about whether a given transaction has read consistent values to date. The inconsistency-aware library functions may exploit this information to avoid the need to validate the transaction, or portions thereof. If read set values are known to be consistent prior to the function call, the compiler may pass a parameter value to the function indicating as much. Otherwise, it may pass a value indicating that the read set values may be inconsistent. An inconsistency-aware function may determine that it will not perform a dangerous action, even though its parameters may not be consistent. Otherwise, the inconsistency-aware function may invoke a validation operation, or may perform other error avoidance operations. | 11-12-2009 |
20100131953 | Method and System for Hardware Feedback in Transactional Memory - Multi-threaded, transactional memory systems may allow concurrent execution of critical sections as speculative transactions. These transactions may abort due to contention among threads. Hardware feedback mechanisms may detect information about aborts and provide that information to software, hardware, or hybrid software/hardware contention management mechanisms. For example, they may detect occurrences of transactional aborts or conditions that may result in transactional aborts, and may update local readable registers or other storage entities (e.g., performance counters) with relevant contention information. This information may include identifying data (e.g., information outlining abort relationships between the processor and other specific physical or logical processors) and/or tallied data (e.g., values of event counters reflecting the number of aborted attempts by the current thread or the resources consumed by those attempts). This contention information may be accessible by contention management mechanisms to inform contention management decisions (e.g. whether to revert transactions to mutual exclusion, delay retries, etc.). | 05-27-2010 |
20100138836 | System and Method for Reducing Serialization in Transactional Memory Using Gang Release of Blocked Threads - Transactional Lock Elision (TLE) may allow multiple threads to concurrently execute critical sections as speculative transactions. Transactions may abort due to various reasons. To avoid starvation, transactions may revert to execution using mutual exclusion when transactional execution fails. Because threads may revert to mutual exclusion in response to the mutual exclusion of other threads, a positive feedback loop may form in times of high congestion, causing a “lemming effect”. To regain the benefits of concurrent transactional execution, the system may allow one or more threads awaiting a given lock to be released from the wait queue and instead attempt transactional execution. A gang release may allow a subset of waiting threads to be released simultaneously. The subset may be chosen dependent on the number of waiting threads, historical abort relationships between threads, analysis of transactions of each thread, sensitivity of each thread to abort, and/or other thread-local or global criteria. | 06-03-2010 |
20100138841 | System and Method for Managing Contention in Transactional Memory Using Global Execution Data - Transactional Lock Elision (TLE) may allow threads in a multi-threaded system to concurrently execute critical sections as speculative transactions. Such speculative transactions may abort due to contention among threads. Systems and methods for managing contention among threads may increase overall performance by considering both local and global execution data in reducing, resolving, and/or mitigating such contention. Global data may include aggregated and/or derived data representing thread-local data of remote thread(s), including transactional abort history, abort causal history, resource consumption history, performance history, synchronization history, and/or transactional delay history. Local and/or global data may be used in determining the mode by which critical sections are executed, including TLE and mutual exclusion, and/or to inform concurrency throttling mechanisms. Local and/or global data may also be used in determining concurrency throttling parameters (e.g., delay intervals) used in delaying a thread when attempting to execute a transaction and/or when retrying a previously aborted transaction. | 06-03-2010 |
20100153662 | FACILITATING GATED STORES WITHOUT DATA BYPASS - One embodiment of the present invention provides a system that facilitates precise exception semantics for a virtual machine. During operation, the system executes a program in the virtual machine using a processor that includes a gated store buffer that stores values to be written to a memory. This gated store buffer is configured to delay a store to the memory until after a speculatively-optimized region of the program commits. The processor signals an exception when it detects that a load following the store is attempting to access the same memory region being written by the store prior to the commitment of the speculatively-optimized region. | 06-17-2010 |
20100169895 | Method and System for Inter-Thread Communication Using Processor Messaging - In shared-memory computer systems, threads may communicate with one another using shared memory. A receiving thread may poll a message target location repeatedly to detect the delivery of a message. Such polling may cause excessive cache coherency traffic and/or congestion on various system buses and/or other interconnects. A method for inter-processor communication may reduce such bus traffic by reducing the number of reads performed and/or the number of cache coherency messages necessary to pass messages. The method may include a thread reading the value of a message target location once, and determining that this value has been modified by detecting inter-processor messages, such as cache coherence messages, indicative of such modification. In systems that support transactional memory, a thread may use transactional memory primitives to detect the cache coherence messages. This may be done by starting a transaction, reading the target memory location, and spinning until the transaction is aborted. | 07-01-2010 |
20110078385 | System and Method for Performing Visible and Semi-Visible Read Operations In a Software Transactional Memory - The software transactional memory system described herein may implement a revocable mechanism for managing read ownership in a shared memory. In this system, write ownership may be revoked by readers or writers at any time other than when a writer transaction is in a commit state, wherein its write ownership is irrevocable. An ownership record associated with one or more locations in the shared memory may include an indication of whether the memory locations are owned for writing, and an identifier of the latest writer. A read ownership array may record data indicating which, if any, threads currently own the memory locations for reading. The system may provide an efficient read-validation operation, in which a full read-set validation is avoided unless a change in a global read-write conflict counter value indicates a potential conflict. The system may support a wide range of contention management policies, and may provide implicit privatization. | 03-31-2011 |
20110125973 | System and Method for Performing Dynamic Mixed Mode Read Validation In a Software Transactional Memory - The transactional memory system described herein may apply a mix of read validation techniques to validate read operations (e.g., invisible reads and/or semi-visible reads) in different transactions, or to validate different read operations within a single transaction (including reads of the same location). The system may include mechanisms to dynamically determine that a read validation technique should be replaced by a different technique for reads of particular locations or for all subsequent reads, and/or to dynamically adjust the balance between different read validation techniques to manage costs. Some of the read validation techniques may be supported by hardware transactional memory (HTM). The system may delay acquisition of ownership records for reading, and may acquire two or more ownership records back-to-back (e.g., within a single hardware transaction). The user code of a software transaction may be divided into multiple segments, some of which may be executed within a hardware transaction. | 05-26-2011 |
20110202907 | METHOD AND SYSTEM FOR OPTIMIZING CODE FOR A MULTI-THREADED APPLICATION - In modern multi-threaded environments, threads often work cooperatively toward providing collective or aggregate throughput for an application as a whole. Optimizing in the small for “thread local” common path latency is often but not always the best approach for a concurrent system composed of multiple cooperating threads. Some embodiments provide a technique for augmenting traditional code emission with thread-aware policies and optimization strategies for a multi-threaded application. During operation, the system obtains information about resource contention between executing threads of the multi-threaded application. The system analyzes the resource contention information to identify regions of the code to be optimized. The system recompiles these identified regions to produce optimized code, which is then stored for subsequent execution. | 08-18-2011 |
20110246724 | System and Method for Providing Locale-Based Optimizations In a Transactional Memory - The system and methods described herein may reduce read/write fence latencies and cache pressure related to STM metadata accesses. These techniques may leverage locality information (as reflected by the value of a respective locale guard) associated with each of a plurality of data partitions (locales) in a shared memory to elide various operations in transactional read/write fences when transactions access data in locales owned by their threads. The locale state may be disabled, free, exclusive, or shared. For a given memory access operation of an atomic transaction targeting an object in the shared memory, the system may implement the memory access operation using a contention mediation mechanism selected based on the value of the locale guard associated with the locale in which the target object resides. For example, a traditional read/write fence may be employed in some memory access operations, while other access operations may employ an optimized read/write fence. | 10-06-2011 |
20110246725 | System and Method for Committing Results of a Software Transaction Using a Hardware Transaction - The system and methods described herein may exploit hardware transactional memory to improve the performance of a software or hybrid transactional memory implementation, even when an entire user transaction cannot be executed within a hardware transaction. The user code of an atomic transaction may be executed within a software transaction, which may collect read and write sets and/or other information about the atomic transaction. A single hardware transaction may be used to commit the atomic transaction by validating the transaction's read set and applying the effects of the user code to memory, reducing the overhead associated with commitment of software transactions. Because the hardware transaction code is carefully controlled, it may be less likely to fail to commit. Various remedial actions may be taken before retrying hardware transactions following some failures. If a transaction exceeds the constraints of the hardware, it may be committed by the software transactional memory alone. | 10-06-2011 |
20110246993 | System and Method for Executing a Transaction Using Parallel Co-Transactions - The transactional memory system described herein may implement parallel co-transactions that access a shared memory such that at most one of the co-transactions in a set will succeed and all others will fail (e.g., be aborted). Co-transactions may improve the performance of programs that use transactional memory by attempting to perform the same high-level operation using multiple algorithmic approaches, transactional memory implementations and/or speculation options in parallel, and allowing only the first to complete to commit its results. If none of the co-transactions succeeds, one or more may be retried, possibly using a different approach and/or transactional memory implementation. The at-most-one property may be managed through the use of a shared “done” flag. Conflicts between co-transactions in a set and accesses made by transactions or activities outside the set may be managed using lazy write ownership acquisition and/or a priority-based approach. Each co-transaction may execute on a different processor resource. | 10-06-2011 |
20120005461 | System and Method for Performing Incremental Register Checkpointing in Transactional Memory - Systems and methods described herein for performing incremental register checkpointing may employ a special register to indicate which registers have already been checkpointed. This register may include one bit per register. These systems may also include a special pointer register whose value identifies a location in user memory or in dedicated on-chip storage at which a copy of a register's value should be saved by a checkpointing operation. Only registers modified during speculative execution or execution of a transaction may be checkpointed (e.g., when register modifying instructions are encountered) and subsequently restored (e.g., due to misspeculation or transaction abort), rather than all of the registers of the processor. Each register may be checkpointed at most once for a given speculative episode or atomic transaction. Setting a bit in the special register may prevent checkpointing of the corresponding register. Setting all of the bits in the special register may disable checkpointing. | 01-05-2012 |