Patent application number | Description | Published |
20110080404 | Redistribution Of Generated Geometric Primitives - One embodiment of the present invention sets forth a technique for redistributing geometric primitives generated by tessellation and geometry shaders for per-vertex by multiple graphics pipelines. Geometric primitives that are generated in a first processing stage are collected and redistributed more evenly and in smaller batches to the multiple graphics pipelines for vertex processing in a second processing stage. The smaller batches do not exceed the resource limits of a graphics pipeline and the per-vertex processing workloads of the graphics pipelines in the second stage are balanced. Therefore, the performance of the tessellation and geometry shaders is improved. | 04-07-2011 |
20110080406 | CALCULATION OF PLANE EQUATIONS AFTER DETERMINATION OF Z-BUFFER VISIBILITY - One embodiment of the present invention sets forth a technique for computing plane equations for primitive shading after non-visible pixels are removed by z culling operations and pixel coverage has been determined. The z plane equations are computed before the plane equations for non-z primitive attributes are computed. The z plane equations are then used to perform screen-space z culling of primitives during and following rasterization. Culling of primitives is also performed based on pixel sample coverage. Consequently, primitives that have visible pixels after z culling operations reach the primitive shading unit. The non-z plane equations are only computed for geometry that is visible after the z culling operations. The primitive shading unit does not need to fetch vertex attributes from memory and does not need to compute non-z plane equations for the culled primitives. | 04-07-2011 |
20110090220 | ORDER-PRESERVING DISTRIBUTED RASTERIZER - One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock. | 04-21-2011 |
20110102448 | VERTEX ATTRIBUTE BUFFER FOR INLINE IMMEDIATE ATTRIBUTES AND CONSTANTS - One embodiment of the present invention sets forth a technique for providing primitives and vertex attributes to the graphics pipeline. A primitive distribution unit constructs the batches of primitives and writes inline attributes and constants to a vertex attribute buffer (VAB) rather than passing the inline attributes directly to the graphics pipeline. A batch includes indices to attributes, where the attributes for each vertex are stored in a different VAB. The same VAB may be referenced by all of the vertices in a batch or different VABs may be referenced by different vertices in one or more batches. The batches are routed to the different processing engines in the graphics pipeline and each of the processing engines reads the VABs as needed to process the primitives. The number of parallel processing engines may be changed without changing the width or speed of the interconnect used to write the VABs. | 05-05-2011 |
20110141122 | DISTRIBUTED STREAM OUTPUT IN A PARALLEL PROCESSING UNIT - A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit. | 06-16-2011 |
20130038620 | TIME SLICE PROCESSING OF TESSELLATION AND GEOMETRY SHADERS - One embodiment of the present invention sets forth a technique for redistributing geometric primitives generated by tessellation and geometry shaders for processing by multiple graphics pipelines. Geometric primitives that are generated in a first processing cycle are collected and redistributed more evenly and in smaller tasks to the multiple graphics pipelines for vertex processing in a second processing cycle. The smaller tasks do not exceed the resource limits of a graphics pipeline and the per-vertex processing workloads of the graphics pipelines in the second cycle are balanced and make full use of resources. Therefore, the performance of the tessellation and geometry shaders is improved. | 02-14-2013 |
20140118347 | TWO-PASS CACHE TILE PROCESSING FOR VISIBILITY TESTING IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention sets forth a graphics processing system. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline is configured to perform visibility testing and fragment shading. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to first transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a z-only mode, and then transmit the first set of primitives to the screen-space pipeline with a command configured to cause the screen-space pipeline to process the first set of primitives in a normal mode. In the z-only mode, at least some fragment shading operations are disabled in the screen-space pipeline. In the normal mode, fragment shading operations are enabled. | 05-01-2014 |
20140118348 | HIGHER ACCURACY Z-CULLING IN A TILE-BASED ARCHITECTURE - A graphics processing pipeline configured for z-cull operations. The graphics processing pipeline comprising a screen-space pipeline and a tiling unit. The screen-space pipeline includes a z-cull unit configured to perform z-culling operations. The tiling unit is configured to determine that a first set of primitives overlaps a first cache tile. The tiling unit is also configured to transmit the first set of primitives to the screen-space pipeline for processing. The tiling unit is further configured to select between processing the first set of primitives in a full-surface z-cull mode or processing the first set of primitives in a partial-surface z-cull mode. The tiling unit is also configured to cause the z-cull unit to process the first set of primitives in the full-surface z-cull mode or to process the first set of primitives in the partial-surface z-cull mode. | 05-01-2014 |
20140118352 | ON-CHIP ANTI-ALIAS RESOLVE IN A CACHE TILING ARCHITECTURE - One embodiment of the present invention includes a graphics subsystem for processing multi-sample anti-aliasing work. The graphics subsystem includes a cache unit, a tiling unit, and a screen-space pipeline coupled to the cache unit and to the tiling unit. The tiling unit is configured to organize multi-sample anti-aliasing commands into cache tile batches. The screen-space pipeline includes a pixel shader and a raster operations unit, and receives cache tile batches from the tiling unit. The pixel shader is configured to generate sample data based on a set of primitives and to generate resolved data based on the sample data. The raster operations unit is configured to store the sample data in the cache unit and to invalidate the sample data after the pixel shader generates the resolved data. | 05-01-2014 |
20140118361 | DISTRIBUTED TILED CACHING - One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed tiled caching. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization. | 05-01-2014 |
20140118362 | BARRIER COMMANDS IN A CACHE TILING ARCHITECTURE - One embodiment of the present invention includes a graphics subsystem. The graphics subsystem includes a first processing entity and a second processing entity. Both the first processing entity and the second processing entity are configured to receive first and second batches of primitives, and a barrier command in between the first and second batches of primitives. The barrier command may be either a tiled or a non-tiled barrier command. A tiled barrier command is transmitted through the graphics subsystem for each cache tile. A non-tiled barrier command is transmitted through the graphics subsystem only once. The barrier command causes work that is after the barrier command to stop at a barrier point until a release signal is received. The back-end unit transmits a release signal to both processing entities after the first batch of primitives has been processed by both the first processing entity and the second processing entity. | 05-01-2014 |
20140118363 | MANAGING DEFERRED CONTEXTS IN A CACHE TILING ARCHITECTURE - A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command. | 05-01-2014 |
20140118364 | DISTRIBUTED TILED CACHING - One embodiment of the present invention sets forth a graphics subsystem configured to implement distributed cache tiling. The graphics subsystem includes one or more world-space pipelines, one or more screen-space pipelines, one or more tiling units, and a crossbar unit. Each world-space pipeline is implemented in a different processing entity and is coupled to a different tiling unit. Each screen-space pipeline is implemented in a different processing entity and is coupled to the crossbar unit. The tiling units are configured to receive primitives from the world-space pipelines, generate cache tile batches based on the primitives, and transmit the primitives to the screen-space pipelines. One advantage of the disclosed approach is that primitives are processed in application-programming-interface order in a highly parallel tiling architecture. Another advantage is that primitives are processed in cache tile order, which reduces memory bandwidth consumption and improves cache memory utilization. | 05-01-2014 |
20140118365 | DATA STRUCTURES FOR EFFICIENT TILED RENDERING - One embodiment of the present invention includes a method for tracking which cache tiles included in a plurality of cache tiles are intersected by a plurality of bounding boxes. The method includes receiving the plurality of bounding boxes, wherein each bounding box is associated with one or more graphics primitives being rendered to a render surface, and wherein the render surface is divided into the plurality of cache tiles. The method further includes, for each bounding box included in the plurality of bounding boxes, determining one or more cache tiles included in the plurality of cache tiles that are intersected by the bounding box, and storing a result in an array for each cache tile that is intersected by the bounding box. Finally, the method includes determining not to process a cache tile included in the plurality of cache tiles based on the results stored in the array. | 05-01-2014 |
20140118366 | SCHEDULING CACHE TRAFFIC IN A TILE-BASED ARCHITECTURE - A tile-based system for processing graphics data. The tile based system includes a first screen-space pipeline, a cache unit, and a first tiling unit. The first tiling unit is configured to transmit a first set of primitives that overlap a first cache tile and a first prefetch command to the first screen-space pipeline for processing, and transmit a second set of primitives that overlap a second cache tile to the first screen-space pipeline for processing. The first prefetch command is configured to cause the cache unit to fetch data associated with the second cache tile from an external memory unit. The first tiling unit may also be configured to transmit a first flush command to the screen-space pipeline for processing with the first set of primitives. The first flush command is configured to cause the cache unit to flush data associated with the first cache tile. | 05-01-2014 |
20140118369 | MANAGING EVENT COUNT REPORTS IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention sets forth a graphics processing system configured to track event counts in a tile-based architecture. The graphics processing system includes a screen-space pipeline and a tiling unit. The screen-space pipeline includes a first unit, a count memory associated with the first unit, and an accumulating memory associated with the first unit. The first unit is configured to detect an event type and increment the count memory. The tiling unit is configured to cause the screen-space pipeline to update an external memory address to reflect a first value stored in the count memory when the first unit completes processing of a first set of primitives. The tiling unit is also configured to cause the screen-space pipeline to update the accumulating memory to reflect a second value stored in the count memory when the first unit completes processing of a second set of primitives. | 05-01-2014 |
20140118370 | MANAGING PER-TILE EVENT COUNT REPORTS IN A TILE-BASED ARCHITECTURE - A graphics processing system configured to track per-tile event counts in a tile-based architecture. A tiling unit in the graphics processing system is configured to cause a screen-space pipeline to load a count value associated with a first cache tile into a count memory and to cause the screen-space pipeline to process a first set of primitives that intersect the first cache tile. The tiling unit is further configured to cause the screen-space pipeline to store a second count value in a report memory location. The tiling unit is also configured to cause the screen-space pipeline to process a second set of primitives that intersect the first cache tile and to cause the screen-space pipeline to store a third count value in the first accumulating memory. Conditional rendering operations may be performed on a per-cache tile basis, based on the per-tile event count. | 05-01-2014 |
20140118373 | TECHNIQUES FOR MANAGINGGRAPHICS PROCESSING RESOURCES IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay. | 05-01-2014 |
20140118374 | TECHNIQUES FOR MANAGINGGRAPHICS PROCESSING RESOURCES IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention sets forth a technique for managing buffer entries in a tile-based architecture. The technique includes receiving a first plurality of graphics primitives and a first buffer address at which attributes associated with the first plurality of graphics primitives are stored. The technique further includes, for each tile included in a plurality of tiles, transmitting the first plurality of graphics primitives and the first buffer address to a screen space pipeline and receiving an acknowledgement from the screen space pipeline indicating that processing the first plurality of graphics primitives has completed. The technique further includes determining that processing the first plurality of graphics primitives has completed for a last tile included in the plurality of tiles and that the acknowledgement has been received for each tile included in the plurality of tiles, and, in response, releasing a buffer entry associated with the first buffer address. | 05-01-2014 |
20140118375 | TECHNIQUES FOR MANAGING GRAPHICS PROCESSING RESOURCES IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention sets forth a technique for managing buffer table entries in a tile-based architecture. The technique includes binding a plurality of shader registers to a buffer table entry. The technique further includes processing at least one tile by reading a buffer table index stored in the shader register to access the buffer table entry, reading a buffer address stored in the buffer table entry, accessing data associated with the buffer address, and unbinding the shader register from the buffer table entry. The technique further includes determining that none of the shader registers is still bound to the buffer table entry and, in response, causing a release packet to be inserted into an instruction stream. The technique further includes determining that a last tile has been processed and, in response, transmitting the release packet to cause the buffer table entry to be released. | 05-01-2014 |
20140118376 | HEURISTICS FOR IMPROVING PERFORMANCE IN A TILE BASED ARCHITECTURE - One embodiment of the present invention includes a technique for processing graphics primitives in a tile-based architecture. The technique includes storing, in a buffer, a first plurality of graphics primitives and a first plurality of state bundles received from the world-space pipeline. The technique further includes determining, based on a first condition, that the first plurality of graphics primitives should be replayed from the buffer, and, in response, replaying the first plurality of graphics primitives against a first tile included in a first plurality of tiles. Replaying the first plurality of graphics primitives includes comparing each graphics primitive against the first tile to determine whether the graphics primitive intersects the first tile, determining that one or more graphics primitives intersects the first tile, and transmitting the one or more graphics primitives and one or more associated state bundles to a screen-space pipeline for processing. | 05-01-2014 |
20140118379 | CACHING OF ADAPTIVELY SIZED CACHE TILES IN A UNIFIED L2 CACHE WITH SURFACE COMPRESSION - One embodiment of the present invention includes techniques for adaptively sizing cache tiles in a graphics system. A device driver associated with a graphics system sets a cache tile size associated with a cache tile to a first size. The detects a change from a first render target configuration that includes a first set of render targets to a second render target configuration that includes a second set of render targets. The device driver sets the cache tile size to a second size based on the second render target configuration. One advantage of the disclosed approach is that the cache tile size is adaptively sized, resulting in fewer cache tiles for less complex render target configurations. Adaptively sizing cache tiles leads to more efficient processor utilization and reduced power requirements. In addition, a unified L | 05-01-2014 |
20140118380 | STATE HANDLING IN A TILED ARCHITECTURE - One embodiment of the present invention includes a graphics subsystem that includes a tiling unit, a crossbar unit, and a screen-space pipeline. The crossbar unit is configured to transmit primitives interleaved with state change commands to the tiling unit. The tiling unit is configured to record an initial state associated with the primitives and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a first cache tile. The tiling unit is further configured to transmit the initial state to the screen-space pipeline and to transmit to the screen-space pipeline one or more primitives in the primitives that overlap a second cache tile. The tiling unit includes a state filter block configured to determine that a first state change in the state change commands is followed by a second state change, without an intervening primitive, and to forego transmitting the first state change in response. | 05-01-2014 |
20140118381 | PRIMITIVE RE-ORDERING BETWEEN WORLD-SPACE AND SCREEN-SPACE PIPELINES WITH BUFFER LIMITED PROCESSING - One embodiment of the present invention includes approaches for processing graphics primitives associated with cache tiles when rendering an image. A set of graphics primitives associated with a first render target configuration is received from a first portion of a graphics processing pipeline, and the set of graphics primitives is stored in a memory. A condition is detected indicating that the set of graphics primitives is ready for processing, and a cache tile is selected that intersects at least one graphics primitive in the set of graphics primitives. At least one graphics primitive in the set of graphics primitives that intersects the cache tile is transmitted to a second portion of the graphics processing pipeline for processing. One advantage of the disclosed embodiments is that graphics primitives and associated data are more likely to remain stored on-chip during cache tile rendering, thereby reducing power consumption and improving rendering performance. | 05-01-2014 |
20140118391 | TECHNIQUES FOR ADAPTIVELY GENERATING BOUNDING BOXES - One embodiment of the present invention includes a method for generating accumulated bounding boxes for graphics primitives. The method includes generating a first bounding box associated with a first graphics primitive. The method further includes, for each graphics primitive included in a first set of one or more additional graphics primitives, determining that the graphics primitive is within a threshold distance of the first bounding box, and adding the graphics primitive to the first bounding box. The method further includes determining not to add a second graphics primitive to the first bounding box. The method further includes generating a second bounding box associated with the second graphics primitive. Finally, the method includes transmitting the first bounding box to a tiling unit via a crossbar. One advantage of the disclosed embodiments is that multiple bounding boxes are combined to generate an accumulated bounding box that is then transferred across the crossbar. | 05-01-2014 |
20140118393 | DATA STRUCTURES FOR EFFICIENT TILED RENDERING - One embodiment of the present invention includes a method for performing a multi-pass tiling test. The method includes combining a plurality of bounding boxes to generate a coarse bounding box. The method further includes identifying a first cache tile associated with a render surface and determining that the coarse bounding box intersects the first cache tile. The method further includes comparing each bounding box included in the plurality of bounding boxes against the first cache tile to determine that a first set of one or more bounding boxes included in the plurality of bounding boxes intersects the first cache tile. Finally, the method includes, for each bounding box included in the first set of one or more bounding boxes, processing one or more graphics primitives associated with the bounding box. One advantage of the disclosed technique is that the number of intersection calculations performed for each cache tile is reduced. | 05-01-2014 |
20140122812 | TILED CACHE INVALIDATION - One embodiment of the present invention sets forth a graphics subsystem. The graphics subsystem includes a first tiling unit associated with a first set of raster tiles and a crossbar unit. The crossbar unit is configured to transmit a first set of primitives to the first tiling unit and to transmit a first cache invalidate command to the first tiling unit. The first tiling unit is configured to determine that a second bounding box associated with primitives included in the first set of primitives overlaps a first cache tile and that the first bounding box overlaps the first cache tile. The first tiling unit is further configured to transmit the primitives and the first cache invalidate command to a first screen-space pipeline associated with the first tiling unit for processing. The screen-space pipeline processes the cache invalidate command to invalidate cache lines specified by the cache invalidate command. | 05-01-2014 |
20140125669 | SETTING DOWNSTREAM RENDER STATE IN AN UPSTREAM SHADER - Techniques are disclosed for processing graphics objects in a stage of a graphics processing pipeline. The techniques include receiving a graphics primitive associated with the graphics object, and determining a plurality of attributes corresponding to one or more vertices associated with the graphics primitive. The techniques further include determining values for one or more state parameters associated with a downstream stage of the graphics processing pipeline based on a visual effect associated with the graphics primitive. The techniques further include transmitting the state parameter values to the downstream stage of the graphics processing pipeline. One advantage of the disclosed techniques is that visual effects are flexibly and efficiently performed. | 05-08-2014 |
20140152652 | ORDER-PRESERVING DISTRIBUTED RASTERIZER - One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock. | 06-05-2014 |
20140168231 | TRIGGERING PERFORMANCE EVENT CAPTURE VIA PIPELINED STATE BUNDLES - One embodiment of the present invention sets forth a method for analyzing the performance of a graphics processing pipeline. A first workload and a second workload are combined together in a pipeline to generate a combined workload. The first workload is associated with a first instance and the second workload is associated with a second instance. A first and second initial event are generated for the combined workload, indicating that the first and second workloads have begun processing at a first position in the graphics processing pipeline. A first and second final event are generated, indicating that the first and second workloads have finished processing at a second position in the graphics processing pipeline. | 06-19-2014 |
20140176588 | TECHNIQUE FOR STORING SHARED VERTICES - A graphics processing unit includes a set of geometry processing units each configured to process graphics primitives in parallel with one another. A given geometry processing unit generates one or more graphics primitives or geometry objects and buffers the associated vertex data locally. The geometry processing unit also buffers different sets of indices to those vertices, where each such set represents a different graphics primitive or geometry object. The geometry processing units may then stream the buffered vertices and indices to global buffers in parallel with one another. A stream output synchronization unit coordinates the parallel streaming of vertices and indices by providing each geometry processing unit with a different base address within a global vertex buffer where vertices may be written. The stream output synchronization unit also provides each geometry processing unit with a different base address within a global index buffer where indices may be written. | 06-26-2014 |
20140176589 | TECHNIQUE FOR STORING SHARED VERTICES - A graphics processing unit includes a set of geometry processing units each configured to process graphics primitives in parallel with one another. A given geometry processing unit generates one or more graphics primitives or geometry objects and buffers the associated vertex data locally. The geometry processing unit also buffers different sets of indices to those vertices, where each such set represents a different graphics primitive or geometry object. The geometry processing units may then stream the buffered vertices and indices to global buffers in parallel with one another. A stream output synchronization unit coordinates the parallel streaming of vertices and indices by providing each geometry processing unit with a different base address within a global vertex buffer where vertices may be written. The stream output synchronization unit also provides each geometry processing unit with a different base address within a global index buffer where indices may be written. | 06-26-2014 |
20140184617 | MID-PRIMITIVE GRAPHICS EXECUTION PREEMPTION - One embodiment of the present invention sets forth a technique for mid-primitive execution preemption. When preemption is initiated no new instructions are issued, in-flight instructions progress to an execution unit boundary, and the execution state is unloaded from the processing pipeline. The execution units within the processing pipeline, including the coarse rasterization unit complete execution of in-flight instructions and become idle. However, rasterization of a triangle may be preempted at a coarse raster region boundary. The amount of context state to be stored is reduced because the execution units are idle. Preempting at the mid-primitive level during rasterization reduces the time from when preemption is initiated to when another process can execute because the entire triangle is not rasterized. | 07-03-2014 |
20140232729 | POWER EFFICIENT ATTRIBUTE HANDLING FOR TESSELLATION AND GEOMETRY SHADERS - Attributes of graphics objects are processed in a plurality of graphics processing pipelines. A streaming multiprocessor (SM) retrieves a first set of parameters associated with a set of graphics objects from a first set of buffers. The SM performs a first set of operations on the first set of parameters according to a first phase of processing to produce a second set of parameters stored in a second set of buffers. The SM performs a second set of operations on the second set of parameters according to a second phase of processing to produce a third set of parameters stored in a third set of buffers. One advantage of the disclosed techniques is that work is redistributed from a first phase to a second phase of graphics processing without having to copy the attributes to and retrieve the attributes from the cache or system memory, resulting in reduced power consumption. | 08-21-2014 |
20140237187 | ADAPTIVE MULTILEVEL BINNING TO IMPROVE HIERARCHICAL CACHING - A device driver calculates a tile size for a plurality of cache memories in a cache hierarchy. The device driver calculates a storage capacity of a first cache memory. The device driver calculates a first tile size based on the storage capacity of the first cache memory and one or more additional characteristics. The device driver calculates a storage capacity of a second cache memory. The device driver calculates a second tile size based on the storage capacity of the second cache memory and one or more additional characteristics, where the second tile size is different than the first tile size. The device driver transmits the second tile size to a second coalescing binning unit. One advantage of the disclosed techniques is that data locality and cache memory hit rates are improved where tile size is optimized for each cache level in the cache hierarchy. | 08-21-2014 |
20140267319 | TECHNIQUE FOR IMPROVING THE PERFORMANCE OF A TESSELLATION PIPELINE - A tessellation pipeline includes an alpha phase and a beta phase. The alpha phase includes pre-tessellation processing stages, while the beta phase includes post-tessellation processing stages. A processing unit configured to implement a processing stage in the alpha phase stores input graphics data within a buffer and then copies over that buffer with output graphics data, thereby conserving memory resources. The processing unit may also copy output graphics data directly to a level 2 (L2) cache for beta phase processing by other tessellation pipelines, thereby avoiding the need for fixed function copy-out hardware. | 09-18-2014 |
20140267320 | TECHNIQUE FOR IMPROVING THE PERFORMANCE OF A TESSELLATION PIPELINE - A tessellation pipeline includes an alpha phase and a beta phase. The alpha phase includes pre-tessellation processing stages, while the beta phase includes post-tessellation processing stages. A processing unit configured to implement a processing stage in the alpha phase stores input graphics data within a buffer and then copies over that buffer with output graphics data, thereby conserving memory resources. The processing unit may also copy output graphics data directly to a level 2 (L2) cache for beta phase processing by other tessellation pipelines, thereby avoiding the need for fixed function copy-out hardware. | 09-18-2014 |