Patent application number | Description | Published |
20090327291 | PRIMITIVES FOR SOFTWARE TRANSACTIONAL MEMORY - Software transactional memory (STM) primitives are provided that allow the results of prior open calls to be used by subsequent open calls either as-is or through another STM primitive that consumes the results of the previous invocation. The STM primitives are configured to ensure that the address of a shadow copy representing a memory location will not changed across a wide range of operations and thereby enable re-use of the shadow copy. | 12-31-2009 |
20090327636 | COMPRESSED TRANSACTIONAL LOCKS IN OBJECT HEADERS - A software transactional memory system is provided that generates and stores compressed transactional locks in a portion of object headers. The software transactional memory system allocates preferred write log memory with a predefined size of memory that corresponds to a number of bits in the compressed transactional locks. The compressed transactional locks identify write log entries in corresponding write logs in the preferred write log memory. If the preferred write log memory becomes full, additional write log memory is allocated for write log entries and subsequent transactional locks are stored uncompressed in an auxiliary memory. A pointer that may be used to locate the uncompressed transactional lock is stored in the header. If an object header with a compressed transactional lock is needed for another use, the compressed transactional lock is uncompressed and stored in the auxiliary memory. A pointer that may be used to locate the uncompressed transactional lock is stored in the header. | 12-31-2009 |
20090328018 | OPTIMIZING PRIMITIVES IN SOFTWARE TRANSACTIONAL MEMORY - A compiler is provided that determines when the use of software transactional memory (STM) primitives may be optimized with respect to a set of collectively dominating STM primitives. The compiler analysis coordinates the use of variables containing possible shadow copy pointers to allow the analysis to be performed for both direct write and buffered write STM systems. The coordination of the variables containing the possible shadow copy pointers ensures that the results of STM primitives are properly reused. The compiler analysis identifies memory accesses where STM primitives may be eliminated, combined, or substituted for lower overhead STM primitives. | 12-31-2009 |
20090328019 | DETECTING RACE CONDITIONS WITH A SOFTWARE TRANSACTIONAL MEMORY SYSTEM - A dynamic race detection system is provided that detects race conditions in code that executes concurrently in a computer system. The dynamic race detection system uses a modified software transactional memory (STM) system to detect race conditions. A compiler converts portions of the code that are not configured to operate with the STM system into pseudo STM code that operates with the STM system. The dynamic race detection system detects race conditions in response to either a pseudo STM transaction in the pseudo STM code failing to validate when executed or an actual STM transaction failing to validate when executed because of conflict with a concurrent pseudo STM transaction. | 12-31-2009 |
20100083257 | ARRAY OBJECT CONCURRENCY IN STM - A software transactional memory system is provided that creates an array of transactional locks for each array object that is accessed by transactions. The system divides the array object into non-overlapping portions and associates each portion with a different transactional lock. The system acquires transactional locks for transactions that access corresponding portions of the array object. By doing so, different portions of the array object can be accessed by different transactions concurrently. The system may use a shared shadow or undo copy for accesses to the array object. | 04-01-2010 |
20100100689 | TRANSACTION PROCESSING IN TRANSACTIONAL MEMORY - A transactional memory processing system provides for the integration of transactional memory concepts at the compiler-level into a higher-level traditional transaction processing system. Atomic blocks at the compiler-level can be specified as atomic block transactions and include the features of atomicity and isolation. Actions within this atomic block transaction include the enlistment of resource managers from a repository. The repository can now include a pre-programmed memory resource manager to manage the transactional memory. As in traditional transactions, a commit protocol can be used to determine if the actions are valid and can be exposed outside of the transaction. Unlike traditional transactions, however, the transaction is not necessarily doomed if all of the actions are not validated. Rather, memory conflicts can cause a rollback and re-execution of the atomic block transaction, which can be repeated as long as necessary, until the memory resource manger votes to commit. | 04-22-2010 |
20100191930 | TRANSACTIONAL MEMORY COMPATIBILITY MANAGEMENT - Transactional memory compatibility type attributes are associated with intermediate language code to specify, for example, that intermediate language code must be run within a transaction, or must not be run within a transaction, or may be run within a transaction. Attributes are automatically produced while generating intermediate language code from annotated source code. Default rules also generate attributes. Tools use attributes to statically or dynamically check for incompatibility between intermediate language code and a transactional memory implementation. | 07-29-2010 |
20100211931 | STM WITH GLOBAL VERSION OVERFLOW HANDLING - A software transactional memory system is provided with overflow handling. The system includes a global version counter with an epoch number and a version number. The system accesses the global version counter prior to and subsequent to memory accesses of transactions to validate read accesses of the transaction. The system includes mechanisms to detect global version number overflow and may allow some or all transactions to execute to completion subsequent to the global version number overflowing. The system also provides publication, privatization, and granular safety properties. | 08-19-2010 |
20100228927 | STM WITH MULTIPLE GLOBAL VERSION COUNTERS - A software transactional memory system is provided with multiple global version counters. The system assigns an affinity to one of the global version counters for each thread that executes transactions. Each thread maintains a local copy of the global version counters for use in validating read accesses of transactions. Each thread uses a corresponding affinitized global version counter to store version numbers of write accesses of executed transactions. The system adaptively changes the affinities of threads when data conflict or global version counter conflict is detected between threads. | 09-09-2010 |
20100228929 | EXPEDITED COMPLETION OF A TRANSACTION IN STM - A software transactional memory system is provided that provides privatization safety. The system identifies situations where the completion of a transaction may be expedited because a privatization artifact will not occur. The system determines whether a privatization artifact may occur using a read and write set intersection test, transactional variables, pessimistic locks, or declared privatizing transactions. If a privatization artifact will not occur for a transaction, then the system may allow the transaction to complete prior to one or more earlier transactions. | 09-09-2010 |
20100332538 | HARDWARE ACCELERATED TRANSACTIONAL MEMORY SYSTEM WITH OPEN NESTED TRANSACTIONS - Hardware assisted transactional memory system with open nested transactions. Some embodiments described herein implement a system whereby hardware acceleration of transactions can be accomplished by implementing open nested transaction in hardware which respect software locks such that a top level transaction can be implemented in software, and thus not be limited by hardware constraints typical when using hardware transactional memory systems. | 12-30-2010 |
20110145304 | EFFICIENT GARBAGE COLLECTION AND EXCEPTION HANDLING IN A HARDWARE ACCELERATED TRANSACTIONAL MEMORY SYSTEM - Handling garbage collection and exceptions in hardware assisted transactions. Embodiments are practiced in a computing environment including a hardware assisted transaction system. Embodiments includes acts for writing to a card table outside of a transaction; handling garbage collection compaction occurring when a hardware transaction is active by using a common global variable and instructing one or more agents to write to the common global variable any time an operation is performed which may change an object's virtual address; acts for managing a thread-local allocation context; acts for handling exceptions while in a hardware assisted transaction. A method includes beginning a hardware assisted transaction, raising an exception while in the hardware assisted transaction, including creating an exception object, determining that the transaction should be rolled back, and as a result of determining that the transaction should be rolled back, marshaling the exception object out of the hardware assisted transaction. | 06-16-2011 |
20110145553 | ACCELERATING PARALLEL TRANSACTIONS USING CACHE RESIDENT TRANSACTIONS - Handling parallelism in transactions. One embodiment includes a method that includes beginning a cache resident transaction. The method further includes encountering a nested structured parallelism construct within the cache resident transaction. A determination is made as to whether the transaction would run faster serially in cache resident mode or faster parallel in software transactional memory mode for the overall transaction. In the software transactional memory mode, cache resident mode is used for one or more hierarchically lower nested transactions. The method further includes continuing the transaction in the mode determined. | 06-16-2011 |
20110145802 | ACCELERATING UNBOUNDED MEMORY TRANSACTIONS USING NESTED CACHE RESIDENT TRANSACTIONS - Using cache resident transaction hardware to accelerate a software transactional memory system. The method includes identifying a plurality of atomic operations intended to be performed by a software transactional memory system as transactional operations as part of a software transaction. The method further includes selecting at least a portion of the plurality of atomic operations. The method further includes attempting to perform the portion of the plurality of atomic operations as hardware transactions using cache resident transaction hardware. | 06-16-2011 |
20110283091 | PARALLELIZING SEQUENTIAL FRAMEWORKS USING TRANSACTIONS - Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads. | 11-17-2011 |
20110314230 | ACTION FRAMEWORK IN SOFTWARE TRANSACTIONAL MEMORY - A software transactional memory system implements a lightweight key-based action framework. The framework includes a set of unified application programming interfaces (APIs) exposed by an STM library that allow clients to implement actions that can be registered, queried, and updated using specific keys by transactions or transaction nests in STM code. Each action includes a key, state information, and a set of one or more callbacks that can be hooked to the validation, commit, abort, and/or re-execution phases of transaction execution. The actions extend the built-in concurrency controls of the STM system with customized control logics, support transaction nesting semantics, and enable integration with garbage collection systems. | 12-22-2011 |
20110314244 | COMPOSITION OF LOCKS IN SOFTWARE TRANSACTIONAL MEMORY - A software transactional memory (STM) system allows the composition of traditional lock based synchronization with transactions in STM code. The STM system acquires each traditional lock the first time that a corresponding traditional lock acquire is encountered inside a transaction and defers all traditional lock releases until a top level transaction in a transaction nest commits or aborts. The STM system maintains state information associated with traditional lock operations in transactions and uses the state information to eliminate deferred traditional lock operations that are redundant. The STM system integrates with systems that implement garbage collection. | 12-22-2011 |
20110314256 | Data Parallel Programming Model - Described herein are techniques for enabling a programmer to express a call for a data parallel call-site function in a way that is accessible and usable to the typical programmer. With some of the described techniques, an executable program is generated based upon expressions of those data parallel tasks. During execution of the executable program, data is exchanged between non-data parallel (non-DP) capable hardware and DP capable hardware for the invocation of data parallel functions. | 12-22-2011 |
20110314444 | Compiler-Generated Invocation Stubs for Data Parallel Programming Model - Described herein are techniques for generating invocation stubs for a data parallel programming model so that a data parallel program written in a statically-compiled high-level programming language may be more declarative, reusable, and portable than traditional approaches. With some of the described techniques, invocation stubs are generated by a compiler and those stubs bridge a logical arrangement of data parallel computations to the actual physical arrangement of a target data parallel hardware for that data parallel computation. | 12-22-2011 |
20110314458 | BINDING DATA PARALLEL DEVICE SOURCE CODE - A compile environment is provided in a computer system that allows programmers to program both CPUs and data parallel devices (e.g., GPUs) using a high level general purpose programming language that has data parallel (DP) extensions. A compilation process translates modular DP code written in the general purpose language into DP device source code in a high level DP device programming language using a set of binding descriptors for the DP device source code. A binder generates a single, self-contained DP device source code unit from the set of binding descriptors. A DP device compiler generates a DP device executable for execution on one or more data parallel devices from the DP device source code unit. | 12-22-2011 |
20120005662 | INDEXABLE TYPE TRANSFORMATIONS - A high level programming language provides an extensible set of transformations for use on indexable types in a data parallel processing environment. A compiler for the language implements each transformation as a map from indexable types to allow each transformation to be applied to other transformations. At compile time, the compiler identifies sequences of the transformations on each indexable type in data parallel source code and generates data parallel executable code to implement the sequences as a combined operation at runtime using the transformation maps. The compiler also incorporates optimizations that are based on the sequences of transformations into the data parallel executable code. | 01-05-2012 |
20120124564 | MAP TRANSFORMATION IN DATA PARALLEL CODE - A high level programming language provides a map transformation that takes a data parallel algorithm and a set of one or more input indexable types as arguments. The map transformation applies the data parallel algorithm to the set of input indexable types to generate an output indexable type, and returns the output indexable type. The map transformation may be used to fuse one or more data parallel algorithms with another data parallel algorithm. | 05-17-2012 |
20120131552 | READ-ONLY COMMUNICATION OPERATOR - A high level programming language provides a read-only communication operator that prevents a computational space from being written. An indexable type with a rank and element type defines the computational space. For an input indexable type, the read-only communication operator produces an output indexable type with the same rank and element type as the input indexable type but ensures that the output indexable type may not be written. The read-only communication operator ensures that any attempt to write to the output indexable type will be detected as an error at compile time. | 05-24-2012 |
20120159458 | RECONSTRUCTING PROGRAM CONTROL FLOW - The present invention extends to methods, systems, and computer program products for reconstructing program control flow. Embodiments include implementing or morphing a control flow graph (“CFG”) into an arbitrary loop structure to reconstruct (preserve) control flow from original source code. Loop structures can be optimized and can adhere to target platform constraints. In some embodiments, C++ source code (a first higher level format) is translated into a CFG (a lower level format). The CFG is then translated into HLSL source code (a second different higher level format) for subsequent compilation into SLSL bytecode (that can then be executed at a Graphical Processing Unit (“GPU”)). The control flow from the C++ source code is preserved in the HLSL source code. | 06-21-2012 |
20120166444 | CO-MAP COMMUNICATION OPERATOR - A high level programming language provides a co-map communication operator that maps an input indexable type to an output indexable type according to a function. The function maps an index space corresponding to the output indexable type to an index space corresponding to the input indexable type. By doing so, the co-map communication operator lifts a function on an index space to a function on an indexable type to allow composability with other communication operators. | 06-28-2012 |
20120167062 | EMULATING POINTERS - The present invention extends to methods, systems, and computer program products for emulating pointers. Pointers can be emulated by replacing the pointers with a pair and replacing each dereference site with a switch on the tag and a switch body that executes the emulated pointer access on the corresponding variable the pointer points to. Data flow optimizations can be used to reduce the number of switches and/or reduce the number of cases which need be considered at each emulated pointer access sites. | 06-28-2012 |
20120317394 | TRANSFORMING ADDRESSING ALIGNMENT DURING CODE GENERATION - The present invention extends to methods, systems, and computer program products for changing addressing mode during code generation. Generally, embodiments of the invention use a compiler transformation to transform lower level code from one address alignment to another address alignment. The transformation can be based upon assumptions of a source programming language. Based on the assumptions, the transformation can eliminate arithmetic operations that compensate for different addressing alignment, resulting in more efficient code. Some particular embodiments use a compiler transformation to transform an Intermediate Representation (“IR”) from one-byte addressing alignment into multi-byte (e.g., four-byte) addressing alignment. | 12-13-2012 |
20120317556 | OPTIMIZING EXECUTION OF KERNELS - The present invention extends to methods, systems, and computer program products for optimizing execution of kernels. Embodiments of the invention include an optimization framework for optimizing runtime execution of kernels. During compilation, information about the execution properties of a kernel are identified and stored alongside the executable code for the kernel. At runtime, calling contexts access the information. The calling contexts interpret the information and optimize kernel execution based on the interpretation. | 12-13-2012 |
20120317558 | BINDING EXECUTABLE CODE AT RUNTIME - The present invention extends to methods, systems, and computer program products for binding executable code at runtime. Embodiments of the invention include late binding of specified aspects of code to improve execution performance. A runtime dynamically binds lower level code based on runtime information to optimize execution of a higher level algorithm. Aspects of a higher level algorithm having a requisite (e.g., higher) impact on execution performance can be targeted for late binding. Improved performance can be achieved with minimal runtime costs using late binding for aspects having the requisite execution performance impact. | 12-13-2012 |
20120324430 | ALIASING BUFFERS - The present invention extends to methods, systems, and computer program products for aliasing buffers. Embodiment of the inventions supporting buffer aliasing through introduction of a level of indirection between a source program's buffer accesses and the target executable physical buffers, and binding the logical buffer accesses to actual physical buffer accesses at runtime. A variety of techniques for can be used supporting runtime aliasing of buffers, in a system which otherwise disallows such runtime aliasing between separately defined buffers in the target executable code. Binding of logical buffer accesses in the source program to the actual physical buffers defined in the target executable code is delayed until runtime. | 12-20-2012 |
20130007712 | DEBUGGING IN A MULTIPLE ADDRESS SPACE ENVIRONMENT - The present invention extends to methods, systems, and computer program products for debugging in a multiple address space environment. Embodiments of the invention include techniques for recording debug information used for translating between an abstract unified address space and multiple address spaces at a target system (e.g., a co-processor, such as, a GPU or other accelerator). A table is stored in the recorded debug information. The table includes one or more entries mapping compiler assigned IDs to address spaces. During debugging within a symbolic debugger, the recorded debug information can be used for viewing program data across multiple address spaces in a live debugging session. | 01-03-2013 |
20130238579 | EFFICIENT GARBAGE COLLECTION AND EXCEPTION HANDLING IN A HARDWARE ACCELERATED TRANSACTIONAL MEMORY SYSTEM - Handling garbage collection and exceptions in hardware assisted transactions. Embodiments are practiced in a computing environment including a hardware assisted transaction system. A method includes beginning a hardware assisted transaction, raising an exception while in the hardware assisted transaction, including creating an exception object, determining that the transaction should be rolled back, and as a result of determining that the transaction should be rolled back, marshaling the exception object out of the hardware assisted transaction. | 09-12-2013 |
20140223131 | OPTIMIZING DATA TRANSFERS BETWEEN HETEROGENEOUS MEMORY ARENAS - Embodiments are directed to optimizing data transfers between heterogeneous memory arenas. In one scenario, a computer system receives an indication that a data chunk is to be transferred from a first memory arena to a third memory arena, and then determines that for the data chunk to be transferred from the first memory arena to the third arena, the data chunk is to be transferred from the first memory arena to a second memory arena, and from the second memory arena to the third memory arena. The computer system divides the data chunk into smaller data portions and copies a first data portion from the first memory arena to the second memory arena. The computer system then copies the first data portion from the second memory arena to the third memory arena and copies a second data portion from the first memory arena to the second memory arena in parallel. | 08-07-2014 |