Patent application number | Description | Published |
20120254846 | System and Method for Optimizing a Code Section by Forcing a Code Section to be Executed Atomically - Systems and methods for optimizing code may use transactional memory to optimize one code section by forcing another code section to execute atomically. Application source code may be analyzed to identify instructions in one code section that only need to be executed if there exists the possibility that another code section (e.g., a critical section) could be partially executed or that its results could be affected by interference. In response to identifying such instructions, alternate code may be generated that forces the critical section to be executed as an atomic transaction, e.g., using best-effort hardware transactional memory. This alternate code may replace the original code or may be included in an alternate execution path that can be conditionally selected for execution at runtime. The alternate code may elide the identified instructions (which are rendered unnecessary by the transaction) by removing them, or by including them in the alternate execution path. | 10-04-2012 |
20120310987 | System and Method for Performing Memory Management Using Hardware Transactions - The systems and methods described herein may be used to implement a shared dynamic-sized data structure using hardware transactional memory to simplify and/or improve memory management of the data structure. An application (or thread thereof) may indicate (or register) the intended use of an element of the data structure and may initialize the value of the data structure element. Thereafter, another thread or application may use hardware transactions to access the data structure element while confirming that the data structure element is still part of the dynamic data structure and/or that memory allocated to the data structure element has not been freed. Various indicators may be used determine whether memory allocated to the element element can be freed. | 12-06-2012 |
20130198499 | System and Method for Mitigating the Impact of Branch Misprediction When Exiting Spin Loops - A computer system may recognize a busy-wait loop in program instructions at compile time and/or may recognize busy-wait looping behavior during execution of program instructions. The system may recognize that an exit condition for a busy-wait loop is specified by a conditional branch type instruction in the program instructions. In response to identifying the loop and the conditional branch type instruction that specifies its exit condition, the system may influence or override a prediction made by a dynamic branch predictor, resulting in a prediction that the exit condition will be met and that the loop will be exited regardless of any observed branch behavior for the conditional branch type instruction. The looping instructions may implement waiting for an inter-thread communication event to occur or for a lock to become available. When the exit condition is met, the loop may be exited without incurring a misprediction delay. | 08-01-2013 |
20140089591 | SUPPORTING TARGETED STORES IN A SHARED-MEMORY MULTIPROCESSOR SYSTEM - The present embodiments provide a system for supporting targeted stores in a shared-memory multiprocessor. A targeted store enables a first processor to push a cache line to be stored in a cache memory of a second processor in the shared-memory multiprocessor. This eliminates the need for multiple cache-coherence operations to transfer the cache line from the first processor to the second processor. The system includes an interface, such as an application programming interface (API), and a system call interface or an instruction-set architecture (ISA) that provides access to a number of mechanisms for supporting targeted stores. These mechanisms include a thread-location mechanism that determines a location near where a thread is executing in the shared-memory multiprocessor, and a targeted-store mechanism that targets a store to a location (e.g., cache memory) in the shared-memory multiprocessor. | 03-27-2014 |
20140181423 | System and Method for Implementing NUMA-Aware Statistics Counters - The systems and methods described herein may be used to implement scalable statistics counters suitable for use in systems that employ a NUMA style memory architecture. The counters may be implemented as data structures that include a count value portion and a node identifier portion. The counters may be accessible within transactions. The node identifier portion may identify a node on which a thread that most recently incremented the counter was executing or one on which a thread that has requested priority to increment the shared counter was executing. Threads executing on identified nodes may have higher priority to increment the counter than other threads. Threads executing on other nodes may delay their attempts to increment the counter, thus encouraging consecutive updates from threads on a single node. Impatient threads may attempt to update the node identifier portion or may update an anti-starvation variable to indicate a request for priority. | 06-26-2014 |
20140181473 | System and Method for Implementing Shared Probabilistic Counters Storing Update Probability Values - The systems and methods described herein may implement probabilistic counters and/or update mechanisms for those counters such that they are dependent on the value of a configurable accuracy parameter. The accuracy parameter value may be adjusted to provide fine-grained control over the tradeoff between the accuracy of the counters and the performance of applications that access them. The counters may be implemented as data structures that include a mantissa portion and an exponent portion that collectively represent an update probability value. When updating the counters, the value of the configurable accuracy parameter may affect whether, when, how often, or by what amount the mantissa portion and/or the exponent portion are updated. Updating a probabilistic counter may include multiplying its value by a constant that is dependent on the value of a configurable accuracy parameter. The counters may be accessible within transactions. The counters may have deterministic update policies. | 06-26-2014 |
20140181827 | System and Method for Implementing Scalable Contention-Adaptive Statistics Counters - The systems and methods described herein may implement scalable statistics counters that are adaptive to the amount of contention for the counters. The counters may be accessible within transactions. Methods for determining whether or when to increment the counters in response to initiation of an increment operation and/or methods for updating the counters may be selected dependent on current, recent, or historical amounts of contention. Various contention management policies or retry conditions may be applied to select between multiple methods. One counter may include a precise counter portion that is incremented under low contention and a probabilistic counter portion that is updated under high contention. Amounts by which probabilistic counters are incremented may be contention-dependent. Another counter may include a node identifier portion that encourages consecutive increments by threads on a single node only when under contention. Another counter may be inflated in response to contention for the counter. | 06-26-2014 |
20140215157 | MONITORING MULTIPLE MEMORY LOCATIONS FOR TARGETED STORES IN A SHARED-MEMORY MULTIPROCESSOR - A system and method for supporting targeted stores in a shared-memory multiprocessor. A targeted store enables a first processor to push a cache line to be stored in a cache memory of a second processor. This eliminates the need for multiple cache-coherence operations to transfer the cache line from the first processor to the second processor. More specifically, the disclosed embodiments provide a system that notifies a waiting thread when a targeted store is directed to monitored memory locations. During operation, the system receives a targeted store which is directed to a specific cache in a shared-memory multiprocessor system. In response, the system examines a destination address for the targeted store to determine whether the targeted store is directed to a monitored memory location which is being monitored for a thread associated with the specific cache. If so, the system informs the thread about the targeted store. | 07-31-2014 |
20140258645 | System and Method for Implementing Reader-Writer Locks Using Hardware Transactional Memory - Transactional reader-writer locks may leverage available hardware transactional memory (HTM) to simplify the procedures of the reader-writer lock algorithm and to eliminate a requirement for type stable memory An HTM-based reader-writer lock may include an ordered list of client-provided nodes, each of which represents a thread that holds (or desires to acquire) the lock, and a tail pointer. The locking and unlocking procedures invoked by readers and writers may access the tail pointer or particular ones of the nodes in the list using various combinations of transactions and non-transactional accesses to insert nodes into the list or to remove nodes from the list. A reader or writer that owns a node at the head of the list (or a reader whose node is preceded in the list only by other readers' nodes) may access a critical section of code or shared resource. | 09-11-2014 |
20150026688 | Systems and Methods for Adaptive Integration of Hardware and Software Lock Elision Techniques - Particular techniques for improving the scalability of concurrent programs (e.g., lock-based applications) may be effective in some environments and for some workloads, but not others. The systems described herein may automatically choose appropriate ones of these techniques to apply when executing lock-based applications at runtime, based on observations of the application in the current environment and with the current workload. In one example, two techniques for improving lock scalability (e.g., transactional lock elision using hardware transactional memory, and optimistic software techniques) may be integrated together. A lightweight runtime library built for this purpose may adapt its approach to managing concurrency by dynamically selecting one or more of these techniques (at different times) during execution of a given application. In this Adaptive Lock Elision approach, the techniques may be selected (based on pluggable policies) at runtime to achieve good performance on different platforms and for different workloads. | 01-22-2015 |