Patent application number | Description | Published |
20080282034 | Memory Subsystem having a Multipurpose Cache for a Stream Graphics Multiprocessor - A method and a computing system are provided. The computing system may include a system memory configured to store data in a first data format. The computing system may also include a computational core comprising a plurality of execution units (EU). The computational core may be configured to request data from the system memory and to process data in a second data format. Each of the plurality of EU may include an execution control and datapath and a specialized L1 cache pool. The computing system may include a multipurpose L2 cache in communication with the each of the plurality of EU and the system memory. The multipurpose L2 cache may be configured to store data in the first data format and the second data format. The computing system may also include an orthogonal data converter in communication with at least one of the plurality of EU and the system memory. | 11-13-2008 |
20090147017 | Shader Processing Systems and Methods - Various embodiments of shader processing systems and methods are disclosed. One method embodiment, among others, comprises a dependent texture read method executed using a multi-threaded, parallel computational core of a graphics processing unit (GPU). Such a method includes generating a dependent texture read request at logic configured to perform shader computations corresponding to a first thread, and sending shader-calculated, texture-sampling related parameters corresponding to the first thread to a texture pipeline while retaining at the logic all other shader processing related information corresponding to the first thread. | 06-11-2009 |
20090153570 | Triangle Setup and Attribute Setup Integration with Programmable Execution Unit - A system for integrating triangle setup and attribute setup operations into a programmable execution unit of a graphics processing unit is disclosed. A method for integrating triangle setup and attribute setup operations into a programmable execution unit graphics processing unit is also disclosed. In one embodiment, at least one execution unit is configured for multi-threaded operation. The at least one execution unit is configured to execute at least one thread for triangle setup operations and attribute setup operations as well as threads for pixel shader, geometry shader and vertex shader operations. | 06-18-2009 |
20090251476 | Constant Buffering for a Computational Core of a Programmable Graphics Processing Unit - Embodiments of systems and methods for managing a constant buffer with rendering context specific data in multithreaded parallel computational GPU core are disclosed. Briefly described, one method embodiment, among others, comprises responsive to a first shader operation, receiving at a constant buffer a first group of constants corresponding to a first rendering context, and responsive to a second shader operation, receiving at the constant buffer a second group of constants corresponding to a second context without flushing the first group. | 10-08-2009 |
20100123717 | Dynamic Scheduling in a Graphics Processor - Among several systems and methods related to graphics processing as described herein, an embodiment of a graphics processing unit (GPU), which comprises a unified shader device and control device, is disclosed. The unified shader device of the GPU is configured to perform multiple graphics shading functions and includes a plurality of execution units. The execution units are configured to operate in parallel, where each execution unit itself has a plurality of threads also configured to operate in parallel. Each thread is configured to perform multiple graphics shading functions. The control device of the GPU, which is in communication with the shader device, is configured to receive graphics data and allocate portions of the graphics data to at least one thread of at least one execution unit. The control device is adapted to dynamically reallocate the graphics data from threads that are determined to be busy to threads that are determined to be less busy. | 05-20-2010 |
20100201703 | Systems and Methods for Improving Throughput of a Graphics Processing Unit - Systems and methods for improving throughput of a graphics processing unit are disclosed. In one embodiment, a system includes a multithreaded execution unit capable of processing requests to access a constant cache, a vertex attribute cache, at least one common register file, and an execution unit data path substantially simultaneously. | 08-12-2010 |
20110261063 | System and Method for Managing the Computation of Graphics Shading Operations - The present disclosure describes implementations for performing register accesses and operations in a graphics processing apparatus. In one implementation, a graphics processing apparatus comprises an execution unit for processing programmed shader operations, wherein the execution unit is configured for processing operations of a plurality of threads. The apparatus further comprises memory forming a register file that accommodates all register operations for all the threads executed by the execution unit, the memory being organized in a plurality of banks, with a first plurality of banks being allocated to a first plurality of the threads and a second plurality of banks being allocated to the remaining threads. In addition, the apparatus comprises address translation logic configured to translate logical register identifiers into physical register addresses. | 10-27-2011 |
20120069033 | Constant Buffering for a Computational Core of a Programmable Graphics Processing Unit - Embodiments of the present disclosure are directed to graphics processing systems, comprising: a plurality of execution units, wherein one of the execution units is configurable to process a thread corresponding to a rendering context, wherein the rendering context comprises a plurality of constants with a priority level; a constant buffer configurable to store the constants of the rendering context into a plurality of slot in a physical storage space; and an execution unit control unit configurable to assign the thread to one of the execution units; a constant buffer control unit providing a translation table for the rendering context to map the corresponding constants into the slots of the physical storage space. Comparable methods are also disclosed. | 03-22-2012 |
20120092353 | Systems and Methods for Video Processing - A multi-shader system in a programmable graphics processing unit (GPU) for processing video data, includes a first shader stage configured to receive slice data from a frame buffer and perform variable length decoding (VLD), wherein the first shader stage outputs data to a first buffer within the frame buffer; a second shader stage configured to receive the output data from the first shader stage and perform transformation and motion compensation on the slice data, wherein the second shader stage outputs decoded slice data to a second buffer within the frame buffer; a third shader stage configured to receive the decoded slice data and perform in-loop deblocking filtering (IDF) on the frame buffer; a fourth shader stage configured to perform post-processing on the frame buffer; and a scheduler configured to schedule execution of the shader stages, the scheduler comprising a plurality of counter registers; wherein execution of the shader stages is synchronized utilizing the counter registers. | 04-19-2012 |
20120092356 | Systems and Methods for Performing Shared Memory Accesses - Various systems and methods are described for accessing a shared memory in a graphics processing unit (GPU). One embodiment comprises determining whether data to be read from a shared memory aligns to a boundary of the shared memory, wherein the data comprises a plurality of data blocks, and wherein the shared memory comprises a plurality of banks and a plurality of offsets. A swizzle pattern in which the data blocks are to be arranged for processing is determined. Based on whether the data aligns with a boundary of the shared memory and based on the determined swizzle pattern, an order for performing one or more wrapping functions is determined. The shared memory is accessed by performing the one or more wrapping functions and reading the data blocks to construct the data according to the swizzle pattern. | 04-19-2012 |
20120096474 | Systems and Methods for Performing Multi-Program General Purpose Shader Kickoff - Systems and methods for thread group kickoff and thread synchronization are described. One method is directed to synchronizing a plurality of threads in a general purpose shader in a graphics processor. The method comprises determining an entry point for execution of the threads in the general purpose shader, performing a fork operation at the entry point, whereby the plurality of threads are dispatched, wherein the plurality of threads comprise a main thread and one or more sub-threads. The method further comprises performing a join operation whereby the plurality of threads are synchronized upon the main thread reaching a synchronization point. Upon completion of the join operation, a second fork operation is performed to resume parallel execution of the plurality of threads. | 04-19-2012 |