Patent application number | Description | Published |
20090006729 | CACHE FOR A MULTI THREAD AND MULTI CORE SYSTEM AND METHODS THEREOF - According to one embodiment, the present disclosure generally provides a method for improving the performance of a cache of a processor. The method may include storing a plurality of data in a data Random Access Memory (RAM). The method may further include holding information for all outstanding requests forwarded to a next-level memory subsystem. The method may also include clearing information associated with a serviced request after the request has been fulfilled. The method may additionally include determining if a subsequent request matches an address supplied to one or more requests already in-flight to the next-level memory subsystem. The method may further include matching fulfilled requests serviced by the next-level memory subsystem to at least one requester who issued requests while an original request was in-flight to the next level memory subsystem. The method may also include storing information specific to each request, the information including a set attribute and a way attribute, the set and way attributes configured to identify where the returned data should be held in the data RAM once the data is returned, the information specific to each request further including at least one of thread ID, instruction queue position and color. The method may additionally include scheduling hit and miss data returns. Of course, various alternative embodiments are also within the scope of the present disclosure. | 01-01-2009 |
20090248983 | TECHNIQUE TO SHARE INFORMATION AMONG DIFFERENT CACHE COHERENCY DOMAINS - A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device. | 10-01-2009 |
20090327641 | Dynamic Allocation of a Buffer Across Multiple Clients in a Threaded Processor - A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data. | 12-31-2009 |
20100031268 | Thread ordering techniques - Techniques are described that can be used to ensure ordered computation and/or retirement of threads in a multithreaded environment. Threads may contain bundled instances of work, each with unique ordering restrictions relative to other instances of work packaged in other threads in the system. When applied to 3D graphics, video and image processing domains allow unrestricted processing of threads until reaching their critical sections. Ordering may be required prior to executing critical sections and beyond. | 02-04-2010 |
20100115518 | BEHAVIORAL MODEL BASED MULTI-THREADED ARCHITECTURE - Multiple parallel passive threads of instructions coordinate access to shared resources using “active” and “proactive” semaphores. The active semaphores send messages to execution and/or control circuitry to cause the state of a thread to change. A thread can be placed in an inactive state by a thread scheduler in response to an unresolved dependency, which can be indicated by a semaphore. A thread state variable corresponding to the dependency is used to indicate that the thread is in inactive mode. When the dependency is resolved a message is passed to control circuitry causing the dependency variable to be cleared. In response to the cleared dependency variable the thread is placed in an active state. Execution can proceed on the threads in the active state. A proactive semaphore operates in a similar manner except that the semaphore is configured by the thread dispatcher before or after the thread is dispatched to the execution circuitry for execution. | 05-06-2010 |
20110126208 | Processing Architecture Having Passive Threads and Active Semaphores - Multiple parallel passive threads of instructions coordinate access to shared resources using “active” semaphores. The semaphores are referred to as active because the semaphores send messages to execution and/or control circuitry to cause the state of a thread to change. A thread can be placed in an inactive state by a thread scheduler in response to an unresolved dependency, which can be indicated by a semaphore. A thread state variable corresponding to the dependency is used to indicate that the thread is in inactive mode. When the dependency is resolved a message is passed to control circuitry causing the dependency variable to be cleared. In response to the cleared dependency variable the thread is placed in an active state. Execution can proceed on the threads in the active state. | 05-26-2011 |
20110314479 | THREAD QUEUING METHOD AND APPARATUS - In some embodiments, a method includes receiving a request to generate a thread and supplying a request to a queue in response at least to the received request. The method may further include fetching a plurality of instructions in response at least in part to the request supplied to the queue and executing at least one of the plurality of instructions. In some embodiments, an apparatus includes a storage medium having stored therein instructions that when executed by a machine result in the method. In some embodiments, an apparatus includes circuitry to receive a request to generate a thread and to queue a request to generate a thread in response at least to the received request. In some embodiments, a system includes circuitry to receive a request to generate a thread and to queue a request to generate a thread in response at least to the received request, and a memory unit to store at least one instruction for the thread. | 12-22-2011 |
20120200585 | TECHNIQUE TO SHARE INFORMATION AMONG DIFFERENT CACHE COHERENCY DOMAINS - A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device. | 08-09-2012 |
20120272032 | Dynamic Allocation of a Buffer Across Multiple Clients in a Threaded Processor - A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data. | 10-25-2012 |
20130117509 | TECHNIQUE TO SHARE INFORMATION AMONG DIFFERENT CACHE COHERENCY DOMAINS - A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device. | 05-09-2013 |
20130207987 | TECHNIQUE TO SHARE INFORMATION AMONG DIFFERENT CACHE COHERENCY DOMAINS - A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device. | 08-15-2013 |
20140085302 | TECHNIQUES FOR EFFICIENT GPU TRIANGLE LIST ADJACENCY DETECTION AND HANDLING - An apparatus may include a memory to store a set of triangle vertices in a triangle, a processor circuit coupled to the memory and a cache to cache a set of triangle vertex indices corresponding to triangle vertices most recently transmitted through a graphics pipeline. The apparatus may also include an autostrip vertex processing component operative on the processor circuit to receive from the memory the set of triangle vertices, compare an index for each vertex of the set of triangle vertices to determine matches to the set of cached triangle vertex indices, and shift a single vertex index into the cache, the single vertex index corresponding to a vertex miss in which a given vertex of the set of triangle vertices does not match any vertex index of the set of cached triangle vertex indices when exactly two matches to the set of cached triangle vertex indices are found. | 03-27-2014 |
20140136797 | TECHNIQUE TO SHARE INFORMATION AMONG DIFFERENT CACHE COHERENCY DOMAINS - A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device. | 05-15-2014 |
20140139512 | Recording the Results of Visibility Tests at the Input Geometry Object Granularity - According to some embodiments of the present invention, pixel throughput may be improved by performing depth tests and recording the results on the granularity of an input geometry object. An input geometry object is any object within the depiction represented by a primitive, such as a triangle within an input triangle list or a patch within an input patch list. | 05-22-2014 |
20140176541 | TECHNIQUES FOR IMPROVING MSAA RENDERING EFFICIENCY - Various embodiments are generally directed to techniques for causing the storage of a color data value of a clear color to be deferred as rendered color data values are stored for samples. A device comprises a processor circuit and a storage to store instructions that cause the processor circuit to render a pixel from multiple samples taken of a three-dimensional model of an object, the pixel corresponding to a pixel sample data which comprises multiple color storage locations that are each identified by a numeric identifier, and which comprises multiple sample color indices that each correspond to a sample to point to at least one color storage location; and allocate color storage locations in an order selected to define a subset of possible combinations of binary index values among all of the sample color indices as invalid combinations. Other embodiments are described and claimed. | 06-26-2014 |
20140240328 | TECHNIQUES FOR LOW ENERGY COMPUTATION IN GRAPHICS PROCESSING - Techniques and architecture are disclosed for using a latency first-in/first-out (FIFO) to modally enable and disable a compute block in a graphics pipeline. In some example embodiments, the latency FIFO collects valid accesses for a downstream compute and integrates invalid inputs (e.g., bubbles), while the compute is in an off state (e.g., sleep). Once a sufficient number of valid accesses are stored in the latency FIFO, the compute is turned on, and the latency FIFO drains a burst of valid inputs thereto. In some embodiments, this burst helps to prevent or reduce any underutilization of the compute which otherwise might occur, thus providing power savings for a graphics pipeline or otherwise improving the energy efficiency of a given graphics system. In some instances, throughput demand at the latency FIFO input is maintained over a time window corresponding to the on and off time of the compute block. | 08-28-2014 |
20140306970 | Ordering Threads as Groups in a Multi-Threaded, Multi-Core Graphics Compute System - A scoreboard may keep track of thread dependencies. A set of threads with a common characteristic may be grouped so that if that characteristic is changed, the group of threads can be accessed to account for that change. Examples for such a characteristic include various types of scoreboard address changes. When the characteristic is changed the group of threads are used to identify threads affected by the characteristic change. | 10-16-2014 |
20140347385 | LOSSY COLOR MERGE FOR MULTI-SAMPLING ANTI-ALIASING COMPRESSION - Techniques related to graphics rendering including lossy color merge for multi-sampling anti-aliasing compression are discussed. | 11-27-2014 |
20140359220 | Scatter/Gather Capable System Coherent Cache - In accordance with some embodiments, a scatter/gather memory approach may be enabled that is exposed or backed by system memory and uses conventional tags and addresses. Thus, such a technique may be more amenable to conventional software developers and their conventional techniques. | 12-04-2014 |