Patent application number | Description | Published |
20090249026 | Vector instructions to enable efficient synchronization and parallel reduction operations - In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed. | 10-01-2009 |
20100005241 | DETECTION OF STREAMING DATA IN CACHE - An apparatus to detect streaming data in memory is presented. In one embodiment the apparatus use reuse bits and S-bits status for cache lines wherein an S-bit status indicates the data in the cache line are potentially streaming data. To enhance the efficiency of a cache, different measures can be applied to make the streaming data become the next victim during a replacement. | 01-07-2010 |
20100073368 | METHODS AND SYSTEMS TO DETERMINE CONSERVATIVE VIEW CELL OCCLUSION - Methods and systems to determine view cell occlusion, including to project objects of a 3-dimensional graphics environment to a 2-dimensional image plane with respect to the view point, to reduce sizes of corresponding object images, to generate an occluder map from the reduced-size object images, to compare at least a portion of the object images to the occluder map, and to identify an object as occluded with respect to the view cell when pixel depth values of the object image are greater than corresponding pixel depth values of the occluder map. Methods and systems to reduce an object image size include methods and systems to nullify pixel depth values within a radius of an edge pixel, and to determine the radius as a distance from the edge pixel to a second pixel so that a line between the view point and the second pixel is parallel with one or more of a line and a plane that is tangential to a sphere enclosing the view cell and a point on the object that corresponds to the edge pixel. | 03-25-2010 |
20100138607 | Increasing concurrency and controlling replication in a multi-core cache hierarchy - In one embodiment, the present invention includes a directory of a private cache hierarchy to maintain coherency between data stored in the cache hierarchy, where the directory is to enable concurrent cache-to-cache transfer of data to two private caches. Other embodiments are described and claimed. | 06-03-2010 |
20100146209 | METHOD AND APPARATUS FOR COMBINING INDEPENDENT DATA CACHES - Methods, apparatus, computer programs and systems related to combining independent data caches are described. Various implementations can dynamically aggregate multiple level-one (L1) data caches from distinct processors together, change the degree of interleaving (e.g., how much consecutive data is mapped to each participating data cache before addresses go on to the next one) among the cache banks, and retain the ability to subsequently adjust the number of data caches participating as one coherent cache, i.e., the degree of interleaving, such as when the requirements of an application or process change. | 06-10-2010 |
20100153649 | SHARED CACHE MEMORIES FOR MULTI-CORE PROCESSORS - Embodiments of shared cache memories for multi-core processors are presented. In one embodiment, a cache memory comprises a group of sampling cache sets and a controller to determine a number of misses that occur in the group of sampling cache sets. The controller is operable to determine a victim cache line for a cache set based at least in part on the number of misses. | 06-17-2010 |
20110025700 | Using a Texture Unit for General Purpose Computing - An interpolation unit, such as may be found in a texture unit or texture sampler, may be used utilized to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to an interpolation unit. The interpolation unit may use linear interpolators in order to perform the dot product calculations. | 02-03-2011 |
20110078340 | VIRTUAL ROW BUFFERS FOR USE WITH RANDOM ACCESS MEMORY - Methods, apparatuses and systems to decrease the energy consumption of a memory chip while increasing its effect bandwidth during the execution of any workload. Methods, apparatuses and systems may allow a memory chip utilize a plurality of virtual row buffers to respond to requests for data included in a memory array block. Methods, apparatuses and systems may further eliminate or reduce the cost associated with transferring unnecessary data from a memory array block to row buffers by altering the data transfer size between a memory array block and a row buffer. | 03-31-2011 |
20110134137 | Texture Unit for General Purpose Computing - A texture unit may be used utilized to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations. | 06-09-2011 |
20110138122 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 06-09-2011 |
20110148896 | Grouping Pixels to be Textured - A region or group of pixels may be textured as a unit, using a range specifier and one or more anchor pixels to define the group. In some embodiments, processing grouped pixels improves efficiency. | 06-23-2011 |
20110161060 | Optimization-Based exact formulation and solution of crowd simulation in virtual worlds - A method of computing a collision-free velocity ( | 06-30-2011 |
20110238680 | Time and space efficient sharing of data structures across different phases of a virtual world application - A method of decreasing a total computation time for a visual simulation loop includes sharing a common data structure across each phase of the visual simulation loop by adapting the common data structure to a requirement for each particular phase prior to performing a computation for that particular phase. | 09-29-2011 |
20120137074 | METHOD AND APPARATUS FOR STREAM BUFFER MANAGEMENT INSTRUCTIONS - A method and system to perform stream buffer management instructions in a processor. The stream buffer management instructions facilitate the creation and usage of a dedicated memory space or stream buffer of the processor in one embodiment of the invention. The dedicated memory space is a contiguous memory space and has a sequential or linear addressing scheme in one embodiment of the invention. The processor has logic to execute a stream buffer management instruction to copy data from a source memory address to a destination memory address that is specified with a desired level of memory hierarchy. | 05-31-2012 |
20120290799 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-15-2012 |
20130086354 | CACHE AND/OR SOCKET SENSITIVE MULTI-PROCESSOR CORES BREADTH-FIRST TRAVERSAL - Methods, apparatuses and storage device associated with cache and/or socket sensitive breadth-first iterative traversal of a graph by parallel threads, are disclosed. In embodiments, a vertices visited array (VIS) may be employed to track graph vertices visited. VIS may be partitioned into VIS sub-arrays, taking into consideration cache sizes of LLC, to reduce likelihood of evictions. In embodiments, potential boundary vertices arrays (PBV) may be employed to store potential boundary vertices for a next iteration, for vertices being visited in a current iteration. The number of PBV generated for each thread may take into consideration a number of sockets, over which the processor cores employed are distributed. In various embodiments, the threads may be load balanced; further data locality awareness to reduce inter-socket communication may be considered, and/or lock-and-atomic free update operations may be employed. Other embodiments may be disclosed or claimed. | 04-04-2013 |
20130297878 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-07-2013 |
20130339395 | PARALLEL OPERATION ON B+ TREES - Embodiments of techniques and systems for parallel processing of B+ trees are described. A parallel B+ tree processing module with partitioning and redistribution may include a set of threads executing a batch of B+ tree operations on a B+ tree in parallel. The batch of operations may be partitioned amongst the threads. Next, a search may be performed to determine which leaf nodes in the B+ tree are to be affected by which operations. Then, the threads may redistribute operations between each other such that multiple threads will not operate on the same leaf node. The threads may then perform B+ tree operations on the leaf nodes of the B+ tree in parallel. Subsequent modifications to nodes in the B+ may similarly be redistributed and performed in parallel as the threads work up the tree. | 12-19-2013 |
20140068226 | VECTOR INSTRUCTIONS TO ENABLE EFFICIENT SYNCHRONIZATION AND PARALLEL REDUCTION OPERATIONS - In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed. | 03-06-2014 |
20140089276 | SEARCH UNIT TO ACCELERATE VARIABLE LENGTH COMPRESSION/DECOMPRESSION - Systems and methods to accelerate compression and decompression with a search unit implemented in the processor core. According to an embodiment, a search unit may be implemented to perform compression or decompression on an input stream of data. The search unit may use a look-up table to identify appropriate compression or decompression symbols. The look-up table may be populated with a table derived using the variable length coding symbols of a sequence of vertices to be compressed or extracted from a received data stream to be decompressed. A comparator and a finite state machine may be implemented in the search unit to facilitate traversal of the look-up table. | 03-27-2014 |
20140149718 | INSTRUCTION AND LOGIC TO PROVIDE PUSHING BUFFER COPY AND STORE FUNCTIONALITY - Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core. | 05-29-2014 |
20140176590 | Texture Unit for General Purpose Computing - A texture unit may be used to perform general purpose mathematical computations such as dot products. This enables some general purpose computations and operations to be offloaded from a central processing unit to the texture unit. The texture unit may use linear interpolators in order to perform the dot product calculations. | 06-26-2014 |
20140337580 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-13-2014 |