Patent application number | Description | Published |
20080238920 | Color Buffer Contrast Threshold for Adaptive Anti-Aliasing - According to one embodiment of the invention, by increasing the number of rays issued through adjacent pixels with colors of high contrast while maintaining the number of rays issued through adjacent pixels which do not have colors of high contrast, a ray tracing image processing system may render an anti-aliased image while minimizing the increase in workload experienced by the image processing system. Additionally, according to another embodiment of the invention, by maintaining the number of rays issued through adjacent pixels which have colors of low contrast while increasing the number of rays issued through adjacent pixels which do not have colors of low contrast, the image processing system may reduce workload experienced while performing ray tracing while maintaining the quality of the rendered image. | 10-02-2008 |
20080263339 | Method and Apparatus for Context Switching and Synchronization - A method, computer-readable medium, and apparatus for context switching between a first thread and a second thread. The method includes detecting an exception, wherein the exception is generated in response to receiving a packet of information directed to one of the first thread and the second thread, and in response to detecting the exception, invoking an exception handler. The exception handler is configured to execute one or more instructions removing access to at least a portion of a processor cache. The portion of the processor cache contains cached information for the first thread using a first address translation. Removing access to the portion of the processor cache prevents the second thread using a second address translation from accessing the cached information in the processor cache. The exception handler is also configured to branch to at least one of the first thread and the second thread. | 10-23-2008 |
20080307147 | COMPUTER SYSTEM BUS BRIDGE - A bus bridge between a high speed computer processor bus and a high speed output bus. The preferred embodiment is a bus bridge between a GPUL bus for a GPUL PowerPC microprocessor from International Business Machines Corporation (IBM) and an output high speed interface (MPI). Another preferred embodiment is a bus bridge in a bus transceiver on a multi-chip module. | 12-11-2008 |
20090015589 | Store Misaligned Vector with Permute - Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized. | 01-15-2009 |
20090019228 | Data Cache Invalidate with Data Dependent Expiration Using a Step Value - According to embodiments of the invention, a step value and a step-interval cache coherency protocol may be used to update and invalidate data stored within cache memory. A step value may be an integer value and may be stored within a cache directory entry associated with data in the memory cache. Upon reception of a cache read request, along with the normal address comparison to determine if the data is located within the cache a current step value may be compared with the stored step value to determine if the data is current. If the step values match, the data may be current and a cache hit may occur. However, if the step values do not match, the requested data may be provided from another source. Furthermore, an application may update the current step value to invalidate old data stored within the cache and associated with a different step value. | 01-15-2009 |
20090033653 | Adaptive Sub-Sampling for Reduction in Issued Rays - According to one embodiment of the invention, by increasing the number of rays issued through adjacent pixels with colors of high contrast while maintaining the number of rays issued through adjacent pixels which do not have colors of high contrast, a ray tracing image processing system may render an anti-aliased image while minimizing the increase in workload experienced by the image processing system. Additionally, according to another embodiment of the invention, by maintaining the number of rays issued through adjacent pixels which have colors of low contrast while increasing the number of rays issued through adjacent pixels which do not have colors of low contrast, the image processing system may reduce workload experienced while performing ray tracing while maintaining the quality of the rendered image. | 02-05-2009 |
20090037694 | Load Misaligned Vector with Permute and Mask Insert - Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized. | 02-05-2009 |
20090063608 | Full Vector Width Cross Product Using Recirculation for Area Optimization - Embodiments of the invention are generally related to the field of image processing, and more specifically to vector units for supporting image processing. A vector unit may comprise a plurality of operand multiplexers associated with each vector processing lane of the vector unit. The operand multiplexers may select vector operands from one or more register files for performing a cross product operation. A first multiply operation may be performed in a first pipeline stage by multiplying a first set of operands in a multiplier. In a second pipeline stage, a second multiply operation may be performed by multiplying a second set of operands. The results of the first multiply operation and the second multiply operation may be transferred to an adder to complete the cross product instruction. | 03-05-2009 |
20090070398 | Method and Apparatus for an Area Efficient Transcendental Estimate Algorithm - A method, computer-readable medium, and an apparatus for generating a transcendental value. The method includes receiving an input containing an input value and an opcode and determining whether the opcode corresponds to a trigonometric operation or a power-of-two operation. The method also includes calculating a fractional value and an integer value from the input value, generating the transcendental value based on the fractional value by adding at least a portion of the fractional value with at least one of a shifted fractional value produced by shifting the portion of the fractional value and a constant value, and providing the transcendental value in response to the request. In this fashion, the same circuit area may be used to carry out both trigonometric and power-of-two calculations, leading to greater circuit area savings and performance advantages while not sacrificing significant accuracy. | 03-12-2009 |
20090073167 | Cooperative Utilization of Spacial Indices Between Application and Rendering Hardware - According to embodiments of the invention, a data structure may be created which may be used by both a ray tracing unit and by a rendering engine. The data structure may have an initial or upper portion representing bounding volumes which partition a three-dimensional scene and a second or lower portion representing objects within the three-dimensional scene. The integrated acceleration data structure may be used by a rendering engine to render a two-dimensional image from a three-dimensional scene, and by a ray tracing unit to perform intersection tests. | 03-19-2009 |
20090106525 | DESIGN STRUCTURE FOR SCALAR PRECISION FLOAT IMPLEMENTATION ON THE "W" LANE OF VECTOR UNIT - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for image processing, and more specifically to vector units for supporting image processing is provided. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved. | 04-23-2009 |
20090106527 | Scalar Precision Float Implementation on the "W" Lane of Vector Unit - Embodiments of the invention are generally related to image processing, and more specifically to vector units for supporting image processing. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved. | 04-23-2009 |
20090113181 | Method and Apparatus for Executing Instructions - A method and apparatus for executing instructions in a processor are provided. In one embodiment of the invention, the method includes receiving a plurality of instructions. The plurality of instructions includes first instructions in a first thread and second instructions in a second thread. The method further includes forming a common issue group including an instruction of a first instruction type and an instruction of a second instruction type. The method also includes issuing the common issue group to a first execution unit and a second execution unit. The instruction of the first instruction type is issued to the first execution unit and the instruction of the second instruction type is issued to the second execution unit. | 04-30-2009 |
20090150647 | Processing Unit Incorporating Vectorizable Execution Unit - A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units. | 06-11-2009 |
20090150648 | Vector Permute and Vector Register File Write Mask Instruction Variant State Extension for RISC Length Vector Instructions - Embodiments of the invention generally relate to the field of image processing, and more specifically to instructions and hardware for supporting image processing. An integrated processing unit configured to process vector instructions and vector permute instructions is provided. A vector permute instruction may be issued to the integrated processing unit to set controls of one or more multiplexers so that the multiplexers rearrange the results of a subsequent vector instruction. | 06-11-2009 |
20090179902 | Dynamic Data Type Aligned Cache Optimized for Misaligned Packed Structures - A method and apparatus for processing vector data is provided. A processing core may have a data cache and a relatively smaller vector data cache. The vector data cache may be optimally sized to store vector data structures that are smaller than full data cache lines. | 07-16-2009 |
20090182987 | Processing Unit Incorporating Multirate Execution Unit - A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput. | 07-16-2009 |
20090187734 | Efficient Texture Processing of Pixel Groups with SIMD Execution Unit - A circuit arrangement and method perform concurrent texture processing of groups of pixels with a single instruction multiple data (SIMD) execution unit to improve the utilization of the SIMD execution unit when performing scalar operations associated with a texture processing algorithm. In addition, when utilized in connection with a multi-threaded SIMD execution unit, groups of pixels may be concurrently processed in different threads executed by the SIMD execution unit to further maximize the utilization of the SIMD execution unit by reducing the adverse effects of dependencies in scalar and/or vector operations incorporated into a texture processing algorithm. | 07-23-2009 |
20090231348 | Image Processing with Highly Threaded Texture Fragment Generation - A circuit arrangement and method support a multithreaded rendering architecture capable of dynamically routing pixel fragments from a pixel fragment generator to any pixel shader from among a pool of pixel shaders. The pixel fragment generator is therefore not tied to a specific pixel shader, but is instead able to utilize multiple pixel shaders in a pool of pixel shaders to minimize bottlenecks and improve overall hardware utilization and performance during image processing. | 09-17-2009 |
20090231349 | Rolling Context Data Structure for Maintaining State Data in a Multithreaded Image Processing Pipeline - A multithreaded rendering software pipeline architecture utilizes a rolling context data structure to store multiple contexts that are associated with different image elements that are being processed in the software pipeline. Each context stores state data for a particular image element, and the association of each image element with a context is maintained as the image element is passed from stage to stage of the software pipeline, thus ensuring that the state used by the different stages of the software pipeline when processing the image element remains coherent irrespective of state changes made for other image elements being processed by the software pipeline. Multiple image elements may therefore be processed concurrently by the software pipeline, and often without regard for synchronization or serialization of state changes that affect only certain image elements. | 09-17-2009 |
20090256836 | HYBRID RENDERING OF IMAGE DATA UTILIZING STREAMING GEOMETRY FRONTEND INTERCONNECTED TO PHYSICAL RENDERING BACKEND THROUGH DYNAMIC ACCELERATED DATA STRUCTURE GENERATOR - A circuit arrangement and method provide a hybrid rendering architecture capable of interfacing a streaming geometry frontend with a physical rendering backend using a dynamic accelerated data structure (ADS) generator. The dynamic ADS generator effectively parallelizes the generation of the ADS, such that an ADS may be built using a plurality of parallel threads of execution. By doing so, both the frontend and backend rendering processes are amendable to parallelization, and enabling if so desired real time rendering using physical rendering techniques such as ray tracing and photon mapping. Furthermore, conventional streaming geometry frontends such as OpenGL and DirectX compatible frontends can readily be adapted for use with physical rendering backends, thereby enabling developers to continue to develop with known API's, yet still obtain the benefits of physical rendering techniques. | 10-15-2009 |
20090315908 | Anisotropic Texture Filtering with Texture Data Prefetching - A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm. | 12-24-2009 |
20100100712 | Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic - A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation. | 04-22-2010 |
20100115250 | CONTEXT SWITCHING AND SYNCHRONIZATION - A method, computer-readable medium, and apparatus for context switching between a first thread and a second thread. The method includes detecting an exception, wherein the exception is generated in response to receiving a packet of information directed to one of the first thread and the second thread, and in response to detecting the exception, invoking an exception handler. The exception handler is configured to execute one or more instructions removing access to at least a portion of a processor cache. The portion of the processor cache contains cached information for the first thread using a first address translation. Removing access to the portion of the processor cache prevents the second thread using a second address translation from accessing the cached information in the processor cache. The exception handler is also configured to branch to at least one of the first thread and the second thread. | 05-06-2010 |
20100188396 | Updating Ray Traced Acceleration Data Structures Between Frames Based on Changing Perspective - A method, program product and system for conducting a ray tracing operation where the rendering compute requirement is reduced or otherwise adjusted in response to a changing vantage point. Aspects may update or reuse an acceleration data structure between frames in response to the changing vantage point. Tree and image construction quality may be adjusted in response to rapid changes in the camera perspective. Alternatively or additionally, tree building cycles may be skipped. All or some of the tree structure may be built in intervals, e.g., after a preset number of frames. More geometric image data may be added per leaf node in the tree in response to an increase in the rate of change. The quality of the rendering algorithm may additionally be reduced. A ray tracing algorithm may decrease the depth of recursion, and generate fewer cast and secondary rays. The ray tracer may further reduce the quality of soft shadows, resolution and global illumination samples, among other quality parameters. Alternatively, tree rebuilding may be skipped entirely in response to a high camera rate of change. Associated processes may create blur between frames to simulate motion blur. | 07-29-2010 |
20100188403 | Tree Insertion Depth Adjustment Based on View Frustrum and Distance Culling - A method, program product and system for conducting a ray tracing operation where the rendering compute requirement is reduced by varying the size of bounding volumes into which image data is divided and/or by varying a number of primitives included within nodes of an acceleration data structure that correspond to the bounding volumes. | 07-29-2010 |
20100228781 | Resetting of Dynamically Grown Accelerated Data Structure - A circuit arrangement, program product and method are provided for resetting a dynamically grown Accelerated Data Structure (ADS) used in image processing in which an ADS is initialized by reusing the root node of a prior ADS and resetting at least one node in the prior ADS to break a link between the reset node and a linked-to node in the prior ADS. By doing so, the memory allocated to the prior ADS may be reused for the new ADS, without having to clear or wipe out all of the allocated memory. In addition, in some instances, given the similarity of many image frames, often some or all of the node structure of a prior ADS may be reused for a new ADS, requiring only the contents of nodes to be cleared, instead of having to clear out all of the nodes in the prior ADS. As a result, the processing overhead associated with initializing a new ADS can be significantly reduced. | 09-09-2010 |
20100333099 | MESSAGE SELECTION FOR INTER-THREAD COMMUNICATION IN A MULTITHREADED PROCESSOR - A method and circuit arrangement process a workload in a multithreaded processor that includes a plurality of hardware threads. Each thread receives at least one message carrying data to process the workload through a respective inbox from among a plurality of inboxes. A plurality of messages are received at a first inbox among the plurality of inboxes, wherein the first inbox is associated with a first thread among the plurality of hardware threads, and wherein each message is associated with a priority. From the plurality of received messages, a first message is selected to process in the first thread based on that first message being associated with the highest priority among the received messages. A second message is selected to process in the first thread based on that second message being associated with the earliest time stamp among the received messages and in response to processing the first message. | 12-30-2010 |
20110063285 | RENDERING OF STEREOSCOPIC IMAGES WITH MULTITHREADED RENDERING SOFTWARE PIPELINE - A circuit arrangement, program product and circuit arrangement render stereoscopic images in a multithreaded rendering software pipeline using first and second rendering channels respectively configured to render left and right views for the stereoscopic image. Separate transformations are applied to received vertex data to generate transformed vertex data for use by each of the first and second rendering channels in rendering the left and right views for the stereoscopic image. | 03-17-2011 |
20110283086 | STREAMING PHYSICS COLLISION DETECTION IN MULTITHREADED RENDERING SOFTWARE PIPELINE - A circuit arrangement, program product and method stream level of detail components between hardware threads in a multithreaded circuit arrangement to perform physics collision detection. Typically, a master hardware thread, e.g., a component loader hardware thread, is used to retrieve level of detail data for an object from a memory and stream the data to one or more slave hardware threads, e.g., collision detection hardware threads, to perform the actual collision detection. Because the slave hardware threads receive the level of detail data from the master thread, typically the slave hardware threads are not required to load the data from the memory, thereby reducing memory bandwidth requirements and accelerating performance. | 11-17-2011 |
20120169755 | ANISOTROPIC TEXTURE FILTERING WITH TEXTURE DATA PREFETCHING - A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm. | 07-05-2012 |
20120236001 | Tree Insertion Depth Adjustment Based on View Frustrum and Distance Culling - A computer-implemented method includes initializing a driver associated with an input/output adapter in response to receiving an initialize driver request from a client application. The computer-implemented method includes initializing the input/output adapter to enable adapter capabilities of the input/output adapter to be determined. The computer-implemented method also includes determining the adapter capabilities of the input/output adapter. The computer-implemented method further includes determining slot capabilities of a slot associated with the input/output adapter. The computer-implemented method also includes setting configurable capabilities of the input/output adapter based on the adapter capabilities and the slot capabilities. | 09-20-2012 |