Patent application number | Description | Published |
20110242117 | BINDLESS TEXTURE AND IMAGE API - One embodiment of the present invention sets for a method for accessing data objects stored in a memory that is accessible by a graphics processing unit (GPU). The method comprises the steps of creating a data object in the memory based on a command received from an application program, wherein the data object is organized non-linearly in the memory, transmitting a first handle associated with the data object to the application program such that data associated with different draw commands can be accessed by the GPU, wherein the first handle includes an address related to the location of the data object in the memory, receiving a first draw command as well as the first handle from the application program, and transmitting the first draw command and the first handle to the GPU for processing. | 10-06-2011 |
20110242119 | GPU Work Creation and Stateless Graphics in OPENGL - One embodiment of the present invention sets forth a method for generating work to be processed by a graphics pipeline residing within a graphics processor. The method includes the steps of receiving an indication that a first graphics workload is to be submitted to a command queue associated with the graphics processor, allocating a first portion of shader accessible memory for one or more units of state information that are necessary for processing the first graphics workload, populating the first portion of shader accessible memory with the one or more units of state information, and transmitting to the command queue of the graphics processor the one or more units of state information stored within the first portion of shader accessible memory, wherein the first graphics workload is processed within the graphics pipeline based on the one or more units of state information. | 10-06-2011 |
20110242125 | BINDLESS MEMORY ACCESS IN DIRECT 3D - One embodiment of the present invention sets for a method for accessing data objects stored in a memory that is accessible by a graphics processing unit (GPU). The method comprises the steps of creating a data object in the memory based on a command received from an application program, transmitting a first handle associated with the data object to the application program such that data associated with different graphics commands can be accessed by the GPU, wherein the first handle includes a memory address that provides access to only a particular portion of the data object, receiving a first graphics command as well as the first handle from the application program, wherein the first graphics command includes a draw command or a compute grid launch, and transmitting the first graphics command and the first handle to the GPU for processing. | 10-06-2011 |
20110285735 | SYSTEM AND METHOD FOR COMPOSITING PATH COLOR IN PATH RENDERING - One embodiment of the present invention sets forth a technique for compositing a rendered path object into an image buffer. A shader program executing within a graphics processing unit (GPU) performs a stenciling operation for the path object and subsequently performs a texture barrier operation, which invalidates caches configured to store texture and frame buffer data within the GPU. The shader program then performs covering operation for the path object in which the shader renders color samples for the path object and composites the color samples into an image buffer. The shader program binds to the image buffer for access as both a texture map and a writeable image. Stencil values are reset when corresponding pixels are written once per path object, and texture caches are invalidated via the texture barrier operation, which is performed after each covering operation per path object. | 11-24-2011 |
20130162661 | SYSTEM AND METHOD FOR LONG RUNNING COMPUTE USING BUFFERS AS TIMESLICES - A system and method for using command buffers as timeslices or periods of execution for a long running compute task on a graphics processor. Embodiments of the present invention allow execution of long running compute applications with operating systems that manage and schedule graphics processing unit (GPU) resources and that may have a predetermined execution time limit for each command buffer. The method includes receiving a request from an application and determining a plurality of command buffers required to execute the request. Each of the plurality of command buffers may correspond to some portion of execution time or timeslice. The method further includes sending the plurality of command buffers to an operating system operable for scheduling the plurality of command buffers for execution on a graphics processor. The command buffers from a different request are time multiplexed within the execution of the plurality of command buffers on the graphics processor. | 06-27-2013 |
20130187935 | LOW LATENCY CONCURRENT COMPUTATION - One embodiment of the present invention sets forth a technique for performing low latency computation on a parallel processing subsystem. A low latency functional node is exposed to an operating system. The low latency functional node and a generic functional node are configured to target the same underlying processor resource within the parallel processing subsystem. The operating system stores low latency tasks generated by a user application within a low latency command buffer associated with the low latency functional node. The parallel processing subsystem advantageously executes tasks from the low latency command buffer prior to completing execution of tasks in the generic command buffer, thereby reducing completion latency for the low latency tasks. | 07-25-2013 |
20140015843 | STENCIL DATA COMPRESSION SYSTEM AND METHOD AND GRAPHICS PROCESSING UNIT INCORPORATING THE SAME - A system and method for compressing stencil data attendant to rendering an image. In one embodiment, the method includes: (1) selecting a base stencil value for a particular group, (2) selecting a single-bit delta value for each sample in the particular group and (3) storing the stencil base value and the delta values in a frame buffer. | 01-16-2014 |
20140043333 | APPLICATION LOAD TIMES BY CACHING SHADER BINARIES IN A PERSISTENT STORAGE - A method for compiling a shader for execution by a graphics processor. The method comprises selecting a shader for execution. A key is computed for the selected shader. A memory is searched for a copy of the computed key. A shader binary stored in the memory is passed to the graphics processor for execution if the copy of the computed key is located in the memory. Otherwise, the shader is compiled to produce the shader binary for execution by the graphics processor and storing the shader binary in the memory. The shader binary is associated with the computed key and the copy of the computed key. | 02-13-2014 |
20140118363 | MANAGING DEFERRED CONTEXTS IN A CACHE TILING ARCHITECTURE - A method for managing bind-render-target commands in a tile-based architecture. The method includes receiving a requested set of bound render targets and a draw command. The method also includes, upon receiving the draw command, determining whether a current set of bound render targets includes each of the render targets identified in the requested set. The method further includes, if the current set does not include each render target identified in the requested set, then issuing a flush-tiling-unit-command to a parallel processing subsystem, modifying the current set to include each render target identified in the requested set, and issuing bind-render-target commands identifying the requested set to the tile-based architecture for processing. The method further includes, if the current set of render targets includes each render target identified in the requested set, then not issuing the flush-tiling-unit-command. | 05-01-2014 |
20140118373 | TECHNIQUES FOR MANAGINGGRAPHICS PROCESSING RESOURCES IN A TILE-BASED ARCHITECTURE - One embodiment of the present invention sets forth a technique for managing graphics processing resources in a tile-based architecture. The technique includes storing a release packet associated with a graphics processing resource in a buffer and initiating a replay of graphics primitives stored in the buffer and associated with the graphics processing resource. The technique further includes, for each tile included in a plurality of tiles and processed during the replay, reading the release packet and determining whether the tile is a last tile processed during the replay. The technique further includes determining not to transmit the release packet to a screen-space pipeline and continuing to read graphics data stored in the buffer if the tile is not the last tile to be processed during the replay, or transmitting the release packet to the screen-space pipeline if the tile is the last tile to be processed during the replay. | 05-01-2014 |
20140122838 | WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION - One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks. | 05-01-2014 |
20140123144 | WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION - One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks. | 05-01-2014 |
20140168222 | OPTIMIZING TRIANGLE TOPOLOGY FOR PATH RENDERING - A technique for efficiently rendering path images tessellates path contours into triangle tans comprising a set of representative triangles. Topology of the set of representative triangles is then optimized for greater rasterization efficiency by applying a flip operator to selected triangle pairs within the set of representative triangles. The optimized triangle pairs are then rendered using a path rendering technique, such as stencil and cover. | 06-19-2014 |
20140184633 | CONSERVATIVE BOUNDING REGION RASTERIZATION - A method for rendering paths. The method includes accessing data comprising a path, stenciling the path, wherein a bounding region of a plurality of stencil samples updated during the stenciling is accumulated, and provoking GPU hardware to produce a rasterized region for covering the bounding region as one object without interior edges. | 07-03-2014 |
20140267366 | TARGET INDEPENDENT RASTERIZATION WITH MULTIPLE COLOR SAMPLES - A graphics processing pipeline within a parallel processing unit (PPU) is configured to perform path rendering by generating a collection of graphics primitives that represent each path to be rendered. The graphics processing pipeline determines the coverage of each primitive at a number of stencil sample locations within each different pixel. Then, the graphics processing pipeline reduces the number of stencil samples down to a smaller number of color samples, for each pixel. The graphics processing pipeline is configured to modulate a given color sample associated with a given pixel based on the color values of any graphics primitives that cover the stencil samples from which the color sample was reduced. The final color of the pixel is determined by downsampling the color samples associated with the pixel. | 09-18-2014 |
20140267373 | STENCIL THEN COVER PATH RENDERING WITH SHARED EDGES - One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges. | 09-18-2014 |
20140267374 | STENCIL THEN COVER PATH RENDERING WITH SHARED EDGES - One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges. | 09-18-2014 |
20140267375 | STENCIL THEN COVER PATH RENDERING WITH SHARED EDGES - One embodiment of the present invention includes techniques for rasterizing primitives that include edges shared between paths. For each edge, a rasterizer unit selects and applies a sample rule from multiple sample rules. If the edge is shared, then the selected sample rule causes each group of coverage samples associated with a single color sample to be considered as either fully inside or fully outside the edge. Consequently, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Advantageously, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with shared edges. | 09-18-2014 |
20140267386 | RENDERING COVER GEOMETRY WITHOUT INTERNAL EDGES - One embodiment of the present invention includes techniques for rasterizing geometries. First, a processing unit defines a bounding primitive that covers the geometry and does not include any internal edges. If the bounding primitive intersects any enabled clip plane, then the processing unit generates fragments to fill a current viewport. Alternatively, the processing unit generates fragments to fill the bounding primitive. Because the rasterized region includes no internal edges, conflation artifacts caused when the number of coverage samples per pixel exceeds the number of color samples per pixel may be reduced. In prior-art techniques, reducing such conflation artifacts typically involves increasing the number of color samples per pixel to equal the number of coverage samples per pixel. Consequently, the disclosed techniques enable rendering using algorithms that reduce the ratio of color to coverage samples, thereby decreasing memory consumption and memory bandwidth use, without causing conflation artifacts associated with cover geometries. | 09-18-2014 |
20150077420 | EFFICIENT SETUP AND EVALUATION OF FILLED CUBIC BEZIER PATHS - A graphics processing system includes a central processing unit that processes a cubic Bezier curve corresponding to a filled cubic Bezier path. Additionally, the graphics processing system includes a cubic preprocessor coupled to the central processing unit that formats the cubic Bezier curve to provide a formatted cubic Bezier curve having quadrilateral control points corresponding to a mathematically simple cubic curve. The graphics processing system further includes a graphics processing unit coupled to the cubic preprocessor that employs the formatted cubic Bezier curve in rendering the filled cubic Bezier path. A rendering unit and a display cubic Bezier path filling method are also provided. | 03-19-2015 |
20150084975 | LOAD/STORE OPERATIONS IN TEXTURE HARDWARE - Approaches are disclosed for performing memory access operations in a texture processing pipeline having a first portion configured to process texture memory access operations and a second portion configured to process non-texture memory access operations. A texture unit receives a memory access request. The texture unit determines whether the memory access request includes a texture memory access operation. If the memory access request includes a texture memory access operation, then the texture unit processes the memory access request via at least the first portion of the texture processing pipeline, otherwise, the texture unit processes the memory access request via at least the second portion of the texture processing pipeline. One advantage of the disclosed approach is that the same processing and cache memory may be used for both texture operations and load/store operations to various other address spaces, leading to reduced surface area and power consumption. | 03-26-2015 |
20150097847 | MANAGING MEMORY REGIONS TO SUPPORT SPARSE MAPPINGS - One embodiment of the present invention includes a memory management unit (MMU) that is configured to manage sparse mappings. The MMU processes requests to translate virtual addresses to physical addresses based on page table entries (PTEs) that indicate a sparse status. If the MMU determines that the PTE does not include a mapping from a virtual address to a physical address, then the MMU responds to the request based on the sparse status. If the sparse status is active, then the MMU determines the physical address based on whether the type of the request is a write operation and, subsequently, generates an acknowledgement of the request. By contrast, if the sparse status is not active, then the MMU generates a page fault. Advantageously, the disclosed embodiments enable the computer system to manage sparse mappings without incurring the performance degradation associated with both page faults and conventional software-based sparse mapping management. | 04-09-2015 |
20150154733 | STENCIL BUFFER DATA COMPRESSION - A raster operations (ROP) unit is configured to compress stencil values included in a stencil buffer. The ROP unit divides the stencil values into groups, subdivides each group into two halves, and selects an anchor value for each half. If the difference between each of the stencil values and the corresponding anchor lies within an offset range, and the difference between the two anchors lies within a delta range, then the group is compressible. For a compressible group, the ROP unit encodes the anchor value, offsets from anchors, and an anchor delta. This encoding enables the ROP unit to operate on the compressed group instead of the uncompressed stencil values, reducing the number of memory and computational operations associated with the stencil values. Consequently, the ROP unit reduces memory bandwidth use, reduces power consumption, and increases rendering rate compared to conventional ROP units that implement less flexible compression techniques. | 06-04-2015 |
Patent application number | Description | Published |
20140267315 | MULTI-SAMPLE SURFACE PROCESSING USING ONE SAMPLE - A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and an encoding state associated with the multi-sample pixel data is determined. Data for one sample of a multi-sample pixel and the encoding state are provided to a processing unit. The one sample of the multi-sample pixel is processed by the processing unit to generate processed data for the one sample that represents processed multi-sample pixel data for all samples of the multi-sample pixel or two or more samples of the multi-sample pixel. | 09-18-2014 |
20140267356 | MULTI-SAMPLE SURFACE PROCESSING USING SAMPLE SUBSETS - A system, method, and computer program product are provided for multi-sample processing. The multi-sample pixel data is received and is analyzed to identify subsets of samples of a multi-sample pixel that have equal data, such that data for one sample in a subset represents multi-sample pixel data for all samples in the subset. An encoding state is generated that indicates which samples of the multi-sample pixel are included in each one of the subsets. | 09-18-2014 |
20140267376 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR ACCESSING MULTI-SAMPLE SURFACES - A system, method, and computer program product are provided for accessing multi-sample surfaces. A multi-sample store instruction that specifies data for a single sample of a multi-sample pixel and a sample mask is received and the data for the single sample is stored to each sample of the multi-sample pixel that is enabled according to the sample mask. A multi-sample load instruction that specifies a multi-sample pixel is received, and, in response to executing the multi-sample load instruction, data for one sample of the multi-sample pixel is received. A determination is made that the data for the one sample of the multi-sample pixel represents multi-sample pixel data for at least one additional sample of the multi-sample pixel. | 09-18-2014 |
20150054836 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR REDISTRIBUTING A MULTI-SAMPLE PROCESSING WORKLOAD BETWEEN THREADS - A system, method, and computer program product are provided for redistributing multi-sample processing workloads between threads. A workload for a plurality of multi-sample pixels is received and each thread in a parallel thread group is associated with a corresponding multi-sample pixel of the plurality of pixels. The workload is redistributed between the threads in the parallel thread group based on a characteristic of the workload and the workload is processed by the parallel thread group. In one embodiment, the characteristic is rasterized coverage information for the plurality of multi-sample pixels. | 02-26-2015 |
20150070380 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR USING COMPRESSION WITH PROGRAMMABLE SAMPLE LOCATIONS - A system, method, and computer program product are provided for using compression with programmable sample locations, where the compression is a function of the programmable sample locations. The method includes the steps of storing a first value specifying a programmed sample location within a pixel in a sample pattern table and storing, in a memory, geometric surface parameters corresponding to a first attribute at the programmed sample location within a first pixel of a display surface. An instruction to store a second value specifying the programmed sample location within the pixel in the sample pattern table is received. The attribute is reconstructed based on the geometric surface parameters and the first value. | 03-12-2015 |
20150070381 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR USING COMPRESSION WITH PROGRAMMABLE SAMPLE LOCATIONS - A system, method, and computer program product are provided for using compression with programmable sample locations, where the compression is a function of the programmable sample locations. The method includes the steps of storing a first value specifying a programmed sample location within a pixel in a first sample pattern table that is associated with a first display surface and storing, in a memory, geometric surface parameters corresponding to a first attribute at the programmed sample location within a first pixel of the first display surface. A second value specifying the programmed sample location within the pixel in a second sample pattern table that is associated with a second display surface is also stored and the first attribute is reconstructed based on the geometric surface parameters and the first value. | 03-12-2015 |
20150138228 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING ANTI-ALIASING OPERATIONS USING A PROGRAMMABLE SAMPLE PATTERN TABLE - A system, method, and computer program product are provided for implementing anti-aliasing operations using a programmable sample pattern table. The method includes the steps of receiving an instruction that causes one or more values to be stored in one or more corresponding entries of the programmable sample pattern table and performing an anti-aliasing operation based on at least one value stored in the programmable sample pattern table. At least one value is selected from the programmable sample pattern table based on, at least in part, a location of one or more corresponding pixels. | 05-21-2015 |
20160035129 | CONTROL OF A SAMPLE MASK FROM A FRAGMENT SHADER PROGRAM - A method, system, and computer program product for controlling a sample mask from a fragment shader are disclosed. The method includes the steps of generating a fragment for each pixel that is covered, at least in part, by a primitive and determining coverage information for each fragment corresponding to the primitive. Then, for each fragment, the method includes the steps of generating a sample mask by a fragment shader, replacing the coverage information for the fragment with the sample mask, and writing, based on the sample mask, a result generated by the fragment shader to a memory. The method may be implemented on a parallel processing unit configured to implement, at least in part, a graphics processing pipeline. | 02-04-2016 |
Patent application number | Description | Published |
20080302336 | Fuel Injection Valve - The fuel injection device provides a particularly effective sound-decoupling construction. The fuel injection device has at least one fuel injection valve, a receptacle bore for the fuel injection valve in a cylinder head, and a fuel distributor line having a fitting in which the fuel injection valve is placed in partially overlapping fashion. A connecting element is situated in the receptacle bore such that the fuel injection valve is held in the connecting element such that the fuel injection valve and the connecting element are held so that they do not contact any surfaces or walls of the receptacle bore of the cylinder head that do not run axially parallel to the fuel injection valve. For this purpose, the connecting element is attached immediately on the fitting of the fuel distributor line. | 12-11-2008 |
20090056674 | HOLD-DOWN DEVICE FOR A FUEL INJECTION DEVICE, AND FUEL INJECTION DEVICE - The hold-down device for a fuel injection device is distinguished by a particularly simple design that nonetheless permits a very effective holding down of a fuel injection valve. The fuel injection device includes at least one fuel injection valve, a receptacle bore for the fuel injection valve, and a connecting fitting of a fuel distributor line, the hold-down device being clamped between a shoulder of the fuel injection valve and an end surface of the connecting fitting. The hold-down device has a partially annular base element from which there extends, in a bent-away fashion, an axially flexible hold-down clip that has at least two webs, two oblique segments, and two support segments. The fuel injection valve is particularly suitable for use in fuel injection systems of mixture-compressing externally ignited internal combustion engines. | 03-05-2009 |
20110186016 | HOLD-DOWN DEVICE FOR A FUEL INJECTION DEVICE - The hold-down device for a fuel injection device has a design which is simple in particular, which nonetheless enables a fuel injector ( | 08-04-2011 |
20130319375 | TUBULAR PRESSURE ACCUMULATOR, IN PARTICULAR FOR MIXTURE-COMPRESSING, SPARK-IGNITED INTERNAL COMBUSTION ENGINES - A tubular pressure accumulator which is used, in particular, as a fuel distribution rail for a mixture-compressing, spark-ignited internal combustion engine includes a tubularly bent metal wall. In this way, longitudinal sides, which are assigned to one another, of tubularly bent metal wall are connected to one another through a weld. Furthermore, the tubularly bent metal wall has at least one design feature implemented by the machining of the flat metal wall and the bending of metal wall, which take place prior to the welding. | 12-05-2013 |
20150330347 | SYSTEM HAVING A FUEL DISTRIBUTOR AND MULTIPLE FUEL INJECTORS - A system, which is used especially as a fuel injection system for the high-pressure injection in internal combustion engines, includes a fuel distributor and a plurality of fuel injectors. Each fuel injector is situated on a cup of the fuel distributor. At least one of the fuel injectors is fastened to the associated cup by a holding element. The holding element has an at least essentially straight first leg and an at least essentially straight second leg. The cup includes at least one recess, which extends through a wall of the cup. The first leg and the second leg are guided through the at least one recess. Furthermore, the connection sleeve of the fuel injector has a collar, which is braced on the first leg of the holding element and on the second leg of the holding element in order to secure the fuel injector on the cup. This makes it possible to fasten the fuel injector on the cup in a reliable manner. | 11-19-2015 |