| Patent application number | Description | Published |
| 20090006767 | USING EPHEMERAL STORES FOR FINE-GRAINED CONFLICT DETECTION IN A HARDWARE ACCELERATED STM - A method and apparatus for fine-grained filtering in a hardware accelerated software transactional memory system is herein described. A data object, which may have any arbitrary size, is associated with a filter word. The filter word is in a first default state when no access, such as a read, from the data object has occurred during a pendancy of a transaction. Upon encountering a first access, such as a first read, from the data object, access barrier operations including an ephemeral/private store operation to set the filter word to a second state are performed. Upon a subsequent/redundant access, such as a second read, the access barrier operations are elided to accelerate the subsequent access, based on the filter word being set to the second state to indicate a previous access occurred. | 01-01-2009 |
| 20090089520 | Hardware acceleration of strongly atomic software transactional memory - In accordance with some embodiments, software transactional memory may be used for both managed and unmanaged environments. If a cache line is resident in a cache and this is not the first time that the cache line has been read since the last write, then the data may be read directly from the cache line, improving performance. Otherwise, a normal read may be utilized to read the information. Similarly, write performance can be accelerated in some instances to improve performance. | 04-02-2009 |
| 20090172292 | ACCELERATING SOFTWARE LOOKUPS BY USING BUFFERED OR EPHEMERAL STORES - A method and apparatus for accelerating lookups in an address based table is herein described. When an address and value pair is added to an address based table, the value is privately stored in the address to allow for quick and efficient local access to the value. In response to the private store, a cache line holding the value is transitioned to a private state, to ensure the value is not made globally visible. Upon eviction of the privately held cache line, the information is not written-back to ensure locality of the value. In one embodiment, the address based table includes a transactional write buffer to hold addresses, which correspond to tentatively updated values during a transaction. Accesses to the tentative values during the transaction may be accelerated through use of annotation bits and private stores as discussed herein. Upon commit of the transaction, the values are copied to the location to make the updates globally visible. | 07-02-2009 |
| 20090172317 | MECHANISMS FOR STRONG ATOMICITY IN A TRANSACTIONAL MEMORY SYSTEM - A method and apparatus for providing efficient strong atomicity is herein described. Optimized strong operations may be inserted at non-transactional read accesses to provide efficient strong atomicity. A global transaction value is copied at a beginning of a non-transational function to a local transaction value; essentially creating a local timestamp of the global transaction value. At a non-transactional memory access within the function, a counter value or version value is compared to the LTV to see if a transaction has started updating memory locations, or specifically the memory location accessed. If memory locations have not been updated by a transaction, execution is accelerated by avoiding a full set of slowpath strong atomic operations to ensure validity of data accessed. In contrast, the slowpath operations may be executed to resolve contention between a transactional and non-transaction access contending for the same memory location. | 07-02-2009 |
| 20100057740 | ACCELERATING A QUIESCENCE PROCESS OF TRANSACTIONAL MEMORY - A method to perform validation of a read set of a transaction is presented. In one embodiment, the method compares a read signature of a transaction to a plurality of write signatures associated with a plurality of transactions. The method determines based on the result of comparison, whether to update a local value of the transaction to a commit value of another transaction from the plurality of the transactions. | 03-04-2010 |
| 20100058344 | ACCELERATING A QUIESCENCE PROCESS OF TRANSACTIONAL MEMORY - A method to perform validation of a read set of a transaction is presented. In one embodiment, the method compares a read signature of a transaction to a plurality of write signatures associated with a plurality of transactions. The method determines based on the result of comparison, whether to update a local value of the transaction to a commit value of another transaction from the plurality of the transactions. | 03-04-2010 |
| 20100118041 | Shared virtual memory - Embodiments of the invention provide a programming model for CPU-GPU platforms. In particular, embodiments of the invention provide a uniform programming model for both integrated and discrete devices. The model also works uniformly for multiple GPU cards and hybrid GPU systems (discrete and integrated). This allows software vendors to write a single application stack and target it to all the different platforms. Additionally, embodiments of the invention provide a shared memory model between the CPU and GPU. Instead of sharing the entire virtual address space, only a part of the virtual address space needs to be shared. This allows efficient implementation in both discrete and integrated settings. | 05-13-2010 |
| 20100122073 | HANDLING EXCEPTIONS IN SOFTWARE TRANSACTIONAL MEMORY SYSTEMS - A method and apparatus for handling exceptions during execution of a transaction is herein described. A compiler associates a transaction exception handler (TEH) with a transaction in program code, such as through insertion of a call to the TEH. The TEH is also associated with an exception data structure, such as an unwind table, that is utilized during runtime to call an appropriate handler in response to an exception. Additionally, the TEH code is generated by the compiler and inserted into the program code. Upon encountering an exception during execution of the transaction, the TEH is capable of dynamically resizing the transaction to the point of the exception through an attempted commit. | 05-13-2010 |
| 20100122264 | Language level support for shared virtual memory - Embodiments of the invention provide language support for CPU-GPU platforms. In one embodiment, code can be flexibly executed on both the CPU and GPU. CPU code can offload a kernel to the GPU. That kernel may in turn call preexisting libraries on the CPU, or make other calls into CPU functions. This allows an application to be built without requiring the entire call chain to be recompiled. Additionally, in one embodiment data may be shared seamlessly between CPU and GPU. This includes sharing objects that may have virtual functions. Embodiments thus ensure the right virtual function gets invoked on the CPU or the GPU if a virtual function is called by either the CPU or GPU. | 05-13-2010 |
| 20100153953 | UNIFIED OPTIMISTIC AND PESSIMISTIC CONCURRENCY CONTROL FOR A SOFTWARE TRANSACTIONAL MEMORY (STM) SYSTEM - A method and apparatus for unified concurrency control in a Software Transactional Memory (STM) is herein described. A transaction record associated with a memory address referenced by a transactional memory access operation includes optimistic and pessimistic concurrency control fields. Access barriers and other transactional operations/functions are utilized to maintain both fields of the transaction record, appropriately. Consequently, concurrent execution of optimistic and pessimistic transactions is enabled. | 06-17-2010 |
| 20100218195 | Software filtering in a transactional memory system - A method and apparatus for utilizing hardware mechanisms of a transactional memory system is herein described. Various embodiments relate to software-based filtering of operations from read and write barriers and read isolation barriers during transactional execution. Other embodiments relate to software-implemented read barrier processing to accelerate strong atomicity. Other embodiments are also described and claimed. | 08-26-2010 |
| 20100332753 | WAIT LOSS SYNCHRONIZATION - Synchronizing threads on loss of memory access monitoring. Using a processor level instruction included as part of an instruction set architecture for a processor, a read, or write monitor to detect writes, or reads or writes respectively from other agents on a first set of one or more memory locations and a read, or write monitor on a second set of one or more different memory locations are set. A processor level instruction is executed, which causes the processor to suspend executing instructions and optionally to enter a low power mode pending loss of a read or write monitor for the first or second set of one or more memory locations. A conflicting access is detected on the first or second set of one or more memory locations or a timeout is detected. As a result, the method includes resuming execution of instructions. | 12-30-2010 |
| 20100332807 | PERFORMING ESCAPE ACTIONS IN TRANSACTIONS - Performing non-transactional escape actions within a hardware based transactional memory system. A method includes at a hardware thread on a processor beginning a hardware based transaction for the thread. Without committing or aborting the transaction, the method further includes suspending the hardware based transaction and performing one or more operations for the thread, non-transactionally and not affected by: transaction monitoring and buffering for the transaction, an abort for the transaction, or a commit for the transaction. After performing one or more operations for the thread, non-transactionally, the method further includes resuming the transaction and performing additional operations transactionally. After performing the additional operations, the method further includes either committing or aborting the transaction. | 12-30-2010 |
| 20100332808 | MINIMIZING CODE DUPLICATION IN AN UNBOUNDED TRANSACTIONAL MEMORY SYSTEM - Minimizing code duplication in an unbounded transactional memory system. A computing apparatus including one or more processors in which it is possible to use a set of common mode-agnostic TM barrier sequences that runs on legacy ISA and extended ISA processors, and that employs hardware filter indicators (when available) to filter redundant applications of TM barriers, and that enables a compiled binary representation of the subject code to run correctly in any of the currently implemented set of transactional memory execution modes, including running the code outside of a transaction, and that enables the same compiled binary to continue to work with future TM implementations which may introduce as yet unknown future TM execution modes. | 12-30-2010 |
| 20110145512 | Mechanisms To Accelerate Transactions Using Buffered Stores - In one embodiment, the present invention includes a method for executing a transactional memory (TM) transaction in a first thread, buffering a block of data in a first buffer of a cache memory of a processor, and acquiring a write monitor on the block to obtain ownership of the block at an encounter time in which data at a location of the block in the first buffer is updated. Other embodiments are described and claimed. | 06-16-2011 |
| 20110145516 | USING BUFFERED STORES OR MONITORING TO FILTER REDUNDANT TRANSACTIONAL ACCESSES AND MECHANISMS FOR MAPPING DATA TO BUFFERED METADATA - A method and apparatus for accelerating a Software Transactional Memory (STM) system is herein described. A data object and metadata for the data object may each be associated with a filter, such as a hardware monitor or ephemerally held filter information. The filter is in a first, default state when no access, such as a read, from the data object has occurred during a pendancy of a transaction. Upon encountering a first access to the metadata, such as a first read, access barrier operations, such as logging of the metadata; setting a read monitor; or updating ephemeral filter information with an ephemeral/buffered store operation, are performed. Upon a subsequent/redundant access to the metadata, such as a second read, access barrier operations are elided to accelerate the subsequent access based on the filter being set to the second state to indicate a previous access occurred. Additionally, mapping of data objects to ephemeral information may be provided by software, such as through a pointer to the ephemeral information associated with the data object; an offset from a base address of the data object to the ephemeral information included associated with the data object; an index into a segment containing the ephemeral information associated with the data object; mapping the data object to the ephemeral information utilizing address arithmetic; and a hash that maps the data object to ephemeral information. | 06-16-2011 |
| 20110145637 | Performing Mode Switching In An Unbounded Transactional Memory (UTM) System - In one embodiment, the present invention includes a method for selecting a first transaction execution mode to begin a first transaction in a unbounded transactional memory (UTM) system having a plurality of transaction execution modes. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest performant of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performant mode can be selected. Other embodiments are described and claimed. | 06-16-2011 |
| 20110153957 | SHARING VIRTUAL MEMORY-BASED MULTI-VERSION DATA BETWEEN THE HETEROGENOUS PROCESSORS OF A COMPUTER PLATFORM - A computer system may comprise a computer platform and input-output devices. The computer platform may include a plurality of heterogeneous processors comprising a central processing unit (CPU) and a graphics processing unit) GPU and a shared virtual memory supported by a physical private memory space of at least one heterogeneous processor or a physical shared memory shared by the heterogeneous processor. The CPU (producer) may create shared multi-version data and store such shared multi-version data in the physical private memory space or the physical shared memory. The GPU (consumer) may acquire or access the shared multi-version data. | 06-23-2011 |