| Patent application number | Description | Published |
| 20090049113 | Method and Apparatus for Implementing a Multiple Operand Vector Floating Point Summation to Scalar Function - Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises computing an arithmetic result of a pair of operands in each processing lane of a vector unit. The arithmetic results generated in each processing lane of the vector unit may be transferred to a dot product unit. The dot product unit may compute an arithmetic result using the arithmetic result computed by each processing lane of the vector unit to generate an arithmetic result of more than two operands. | 02-19-2009 |
| 20090063608 | Full Vector Width Cross Product Using Recirculation for Area Optimization - Embodiments of the invention are generally related to the field of image processing, and more specifically to vector units for supporting image processing. A vector unit may comprise a plurality of operand multiplexers associated with each vector processing lane of the vector unit. The operand multiplexers may select vector operands from one or more register files for performing a cross product operation. A first multiply operation may be performed in a first pipeline stage by multiplying a first set of operands in a multiplier. In a second pipeline stage, a second multiply operation may be performed by multiplying a second set of operands. The results of the first multiply operation and the second multiply operation may be transferred to an adder to complete the cross product instruction. | 03-05-2009 |
| 20090070398 | Method and Apparatus for an Area Efficient Transcendental Estimate Algorithm - A method, computer-readable medium, and an apparatus for generating a transcendental value. The method includes receiving an input containing an input value and an opcode and determining whether the opcode corresponds to a trigonometric operation or a power-of-two operation. The method also includes calculating a fractional value and an integer value from the input value, generating the transcendental value based on the fractional value by adding at least a portion of the fractional value with at least one of a shifted fractional value produced by shifting the portion of the fractional value and a constant value, and providing the transcendental value in response to the request. In this fashion, the same circuit area may be used to carry out both trigonometric and power-of-two calculations, leading to greater circuit area savings and performance advantages while not sacrificing significant accuracy. | 03-12-2009 |
| 20090083357 | Method and Apparatus Implementing a Floating Point Weighted Average Function - A method, computer-readable medium, and an apparatus for implementing a floating point weighted average function. The method includes receiving an input containing 2 | 03-26-2009 |
| 20090150647 | Processing Unit Incorporating Vectorizable Execution Unit - A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units. | 06-11-2009 |
| 20090182986 | Processing Unit Incorporating Issue Rate-Based Predictive Thermal Management - A circuit arrangement and method utilize an issue rate-based predictive thermal management technique in a microprocessor or other integrated circuit that tracks the rate in which instructions are issued to one or more execution units in the processing unit, and selectively delays the issuance of subsequent instructions to the execution unit(s) based upon the tracked issue rate to predictively control the thermal output of the integrated circuit. | 07-16-2009 |
| 20090182987 | Processing Unit Incorporating Multirate Execution Unit - A multirate execution unit is capable of being operated in a plurality of modes, with the execution unit being capable of clocked at multiple different rates relative to a multithreaded issue unit such that, in applications where maximum performance is desired, the execution unit can be clocked at a rate that is faster than the clock rate for the multithreaded issue unit, and in applications where a lower power profile is desired, the execution unit can be throttled back to a slower rate to reduce the power consumption of the execution unit. When the execution unit is clocked at a faster rate than the multithreaded issue unit, the issue unit is permitted to issue more instructions per cycle than when the execution unit is throttled to the slower rate to increase overall instruction throughput. | 07-16-2009 |
| 20090240920 | Execution Unit with Data Dependent Conditional Write Instructions - An execution unit supports data dependent conditional write instructions that write data to a target only when a particular condition is met. In one implementation, a data dependent conditional write instruction identifies a condition as well as data to be tested against that condition. The data is tested against that condition, and the result of the test is used to selectively enable or disable a write to a target associated with the data dependent conditional write instruction. Then, a write is attempted while the write to the target is enabled or disabled such that the write will update the contents of the target only when the write is selectively enabled as a result of the test. By doing so, dependencies are typically avoided, as is use of an architected condition register that might otherwise introduce branch prediction mispredict penalties, enabling improved performance with z-buffer test and similar types of algorithms. | 09-24-2009 |
| 20090292907 | Dynamic Merging of Pipeline Stages in an Execution Pipeline to Reduce Power Consumption - A pipelined execution unit incorporates one or more low power modes that reduce power consumption by dynamically merging pipeline stages in an execution pipeline together with one another. In particular, the execution logic in successive pipeline stages in an execution pipeline may be dynamically merged together by setting one or more latches that are intermediate to such pipeline stages to a transparent state such that the output of the pipeline stage preceding such latches is passed to the subsequent pipeline stage during the same clock cycle so that both such pipeline stages effectively perform steps for the same instruction during each clock cycle. Then, with the selected pipeline stages merged, the power consumption of the execution pipeline can be reduced (e.g., by reducing the clock frequency and/or operating voltage of the execution pipeline), often with minimal adverse impact on performance. | 11-26-2009 |
| 20090293061 | Structural Power Reduction in Multithreaded Processor - A circuit arrangement and method utilize a plurality of execution units having different power and performance characteristics and capabilities within a multithreaded processor core, and selectively route instructions having different performance requirements to different execution units based upon those performance requirements. As such, instructions that have high performance requirements, such as instructions associated with primary tasks or time sensitive tasks, can be routed to a higher performance execution unit to maximize performance when executing those instructions, while instructions that have low performance requirements, such as instructions associated with background tasks or non-time sensitive tasks, can be routed to a reduced power execution unit to reduce the power consumption (and associated heat generation) associated with executing those instructions. | 11-26-2009 |
| 20090300335 | Execution Unit With Inline Pseudorandom Number Generator - A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG. In many instances, an instruction executed by an execution unit may be able to perform an arithmetic operation using both an operand specified by the instruction and a pseudorandom number generated by the PRNG during the execution of the instruction, so that the generation of the pseudorandom number and the performance of the arithmetic operation occur during a single pass of an execution unit. | 12-03-2009 |
| 20090315908 | Anisotropic Texture Filtering with Texture Data Prefetching - A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm. | 12-24-2009 |
| 20100031009 | Floating Point Execution Unit for Calculating a One Minus Dot Product Value in a Single Pass - A floating point execution unit calculates a one minus dot product value in a single pass. As such, the dependency that otherwise would be required to perform the calculations is eliminated, resulting in a substantially faster performance of such calculations. The floating point execution unit may be used, for example, to accelerate pixel shading algorithms such as Fresnel and electron microscope effects. | 02-04-2010 |
| 20100100712 | Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic - A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation. | 04-22-2010 |
| 20100125719 | Instruction Target History Based Register Address Indexing - A circuit arrangement and method support instruction target history based register address indexing, whereby register addresses to be used by an instruction are decoded using a target history table of previous target register addresses, and an index into the target history table supplied by an index value in the instruction. An instruction may include at least one index value that identifies a previously used register address. During execution of the instruction, the index is retrieved from the instruction, and then a register address is retrieved from the target history table using the index. | 05-20-2010 |
| 20100188396 | Updating Ray Traced Acceleration Data Structures Between Frames Based on Changing Perspective - A method, program product and system for conducting a ray tracing operation where the rendering compute requirement is reduced or otherwise adjusted in response to a changing vantage point. Aspects may update or reuse an acceleration data structure between frames in response to the changing vantage point. Tree and image construction quality may be adjusted in response to rapid changes in the camera perspective. Alternatively or additionally, tree building cycles may be skipped. All or some of the tree structure may be built in intervals, e.g., after a preset number of frames. More geometric image data may be added per leaf node in the tree in response to an increase in the rate of change. The quality of the rendering algorithm may additionally be reduced. A ray tracing algorithm may decrease the depth of recursion, and generate fewer cast and secondary rays. The ray tracer may further reduce the quality of soft shadows, resolution and global illumination samples, among other quality parameters. Alternatively, tree rebuilding may be skipped entirely in response to a high camera rate of change. Associated processes may create blur between frames to simulate motion blur. | 07-29-2010 |
| 20100188403 | Tree Insertion Depth Adjustment Based on View Frustrum and Distance Culling - A method, program product and system for conducting a ray tracing operation where the rendering compute requirement is reduced by varying the size of bounding volumes into which image data is divided and/or by varying a number of primitives included within nodes of an acceleration data structure that correspond to the bounding volumes. | 07-29-2010 |
| 20100191937 | Implied Storage Operation Decode Using Redundant Target Address Detection - A logic arrangement and method to support implied storage operation decode uses redundant target address detection, whereby target addresses of previous instructions are compared with the target address of the current instruction, and if equal, and the target addresses of previous instructions are not used as sources, the current instruction is decoded as a store instruction. This allows a redundant operation in an instruction set architecture to be redefined as a store instruction, freeing up opcodes normally used for store instructions to be used for other instructions. | 07-29-2010 |