| Patent application number | Description | Published |
| 20090158013 | Method and Apparatus Implementing a Minimal Area Consumption Multiple Addend Floating Point Summation Function in a Vector Microprocessor - Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed. | 06-18-2009 |
| 20090182990 | Method and Apparatus for a Pipelined Multiple Operand Minimum and Maximum Function - Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands. | 07-16-2009 |
| 20100023568 | Dynamic Range Adjusting Floating Point Execution Unit - A floating point execution unit is capable of selectively repurposing a subset of the significand bits in a floating point value for use as additional exponent bits to dynamically provide an extended range for floating point calculations. A significand field of a floating point operand may be considered to include first and second portions, with the first portion capable of being concatenated with the second portion to represent the significand for a floating point value, or, to provide an extended range, being concatenated with the exponent field of the floating point operand to represent the exponent for a floating point value. | 01-28-2010 |
| 20100042812 | Data Dependent Instruction Decode - A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode. | 02-18-2010 |
| 20100042813 | Redundant Execution of Instructions in Multistage Execution Pipeline During Unused Execution Cycles - A pipelined execution unit uses the bubbles that occur during execution to selectively repeat operations performed in one or more stages of a multistage execution pipeline to verify the results of such operations during otherwise unused execution cycles for the execution pipeline. Whenever a bubble follows a particular instruction within an execution pipeline, the result of an operation that is performed for that instruction by a particular stage of the execution pipeline may be stored, and the operation may be repeated by the stage in a subsequent execution cycle in which no productive operation would otherwise be performed due to the presence of the bubble. The results of the operations may then be compared and used to either verify the original result or identify a potential error in the execution of the instruction. | 02-18-2010 |
| 20100091787 | DIRECT INTER-THREAD COMMUNICATION BUFFER THAT SUPPORTS SOFTWARE CONTROLLED ARBITRARY VECTOR OPERAND SELECTION IN A DENSELY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for retrieving arbitrarily aligned vector operands within a highly threaded Network On a Chip (NOC) processor are presented. Multiple nodes in a NOC are able to access a single Compressed Direct Interthread Communication Buffer (CDICB), which contains a misaligned but compacted set of operands. Using information from a Special Purpose Register (SPR) within the NOC, each node is able to selectively extract one or more operands from the CDICB for use in an execution unit within that node. Output from the execution unit is then sent to the CDICB to update the compacted set of operands. | 04-15-2010 |
| 20100100707 | DATA STRUCTURE FOR CONTROLLING AN ALGORITHM PERFORMED ON A UNIT OF WORK IN A HIGHLY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation. | 04-22-2010 |
| 20100100770 | SOFTWARE DEBUGGER FOR PACKETS IN A NETWORK ON A CHIP - A breakpoint packet is dispatched to a Network On A Chip (NOC). The breakpoint packet instructs one or more specified nodes on the NOC to place the specified nodes, or a core or hardware thread within a specified node, to execute in “single step” mode, in order to enable a debugging of a work packet that is dispatched to the specific node. | 04-22-2010 |
| 20100100934 | SECURITY METHODOLOGY TO PREVENT USER FROM COMPROMISING THROUGHPUT IN A HIGHLY THREADED NETWORK ON A CHIP PROCESSOR - A computer-implemented method, system and computer program product for preventing an untrusted work unit message from compromising throughput in a highly threaded Network On a Chip (NOC) processor are presented. A security message, which is associated with the untrusted work unit message, directs other resources within the NOC to operate in a secure mode while a specified node, within the NOC, executes instructions from the work unit message in a less privileged non-secure mode. Thus, throughput within the NOC is uncompromised due to resources, other than the first node, being protected from the untrusted work unit message. | 04-22-2010 |
| 20100106940 | Processing Unit With Operand Vector Multiplexer Sequence Control - Operand vector multiplexer sequence control is used in a vector-based execution unit to control the shuffling of data elements in operand vectors used by a sequence of vector instructions processed by the vector-based execution unit. A swizzle sequence instruction is defined in an instruction set for the vector-based execution unit and is used to selectively apply a sequence of vector data element shuffle orders to one or more operand vectors to be used by the associated sequence of vector instructions. As a result, when a common sequence of data element shuffle orders is used frequently for a sequence of vector instructions, a single swizzle sequence instruction may be used to select the desired sequence of custom data element ordering for each of the vector instructions in the sequence. | 04-29-2010 |
| 20100125722 | Multithreaded Processing Unit With Thread Pair Context Caching - A circuit arrangement and method utilize thread pair context caching, where a pair of hardware threads in a multithreaded processor, which are each capable of executing a process, are effectively paired together, at least temporarily, to perform context switching operations such as context save and/or load operations in advance of context switches performed in one or more of such paired hardware threads. By doing so, the overall latency of a context switch, where both the context for a process being switched from must be saved, and the context for the process being switched to must be loaded, may be reduced. | 05-20-2010 |
| 20100188402 | User-Defined Non-Visible Geometry Featuring Ray Filtering - A method, system and computer program product for managing secondary rays during ray-tracing are presented. A non-visible unidirectional ray tracing object logically surrounds a user-selected virtual object in a computer generated illustration. This unidirectional ray tracing object prevents secondary tracing rays from emanating from the user-selected virtual object during ray tracing. | 07-29-2010 |
| 20100189111 | STREAMING DIRECT INTER-THREAD COMMUNICATION BUFFER PACKETS THAT SUPPORT HARDWARE CONTROLLED ARBITRARY VECTOR OPERAND ALIGNMENT IN A DENSELY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for arbitrarily aligning vector operands, which are transmitted in inter-thread communication buffer packets within a highly threaded Network On a Chip (NOC) processor, are presented. A set of multiplexers in a node in the NOC realigns and extracts data word aggregations from an incoming compressed inter-thread communication buffer packet. The extracted data word aggregations are used as operands by an execution unit within the node. | 07-29-2010 |
| 20100191939 | TRIGONOMETRIC SUMMATION VECTOR EXECUTION UNIT - A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product. | 07-29-2010 |
| 20100191940 | SINGLE STEP MODE IN A SOFTWARE PIPELINE WITHIN A HIGHLY THREADED NETWORK ON A CHIP MICROPROCESSOR - A hardware thread is selectively forced to single step the execution of software instructions from a work packet granule. A “single step” packet is associated with a work packet granule. The work packet granule, with the associated “single step” packet, is dispatched as an appended work packet granule to a preselected hardware thread in a processor core, which, in one embodiment, is located at a node in a Network On a Chip (NOC). The work packet granule then executes in a single step mode until completion. | 07-29-2010 |
| 20100192014 | PSEUDO RANDOM PROCESS STATE REGISTER FOR FAST RANDOM PROCESS TEST GENERATION - A method, system and computer program product are presented for providing pseudo-random input test data to a test program. A seed value is generated and stored in a seed register. Using the seed value as an input, a pseudo-random input test value is generated by a Linear Feedback Shift Register (LFSR), and stored in a GPR within a processor core. Using the pseudo-random input test value from the GPR, a test program is executed within the processor core. | 07-29-2010 |
| 20100199067 | Split Vector Loads and Stores with Stride Separated Words - A method, system and computer program product are presented for causing a parallel load/store of stride-separated words from a data vector using different memory chips in a computer. | 08-05-2010 |
| 20100269123 | Performance Event Triggering Through Direct Interthread Communication On a Network On Chip - Performance event triggering through direct interthread communication (‘DITC’) on a network on chip (‘NOC’), the NOC including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, including enabling performance event monitoring in a selected set of IP blocks distributed throughout the NOC, each IP block within the selected set of IP blocks having one or more event counters; collecting performance results from the one or more event counters; and returning performance results from the one or more event counters to a destination repository, the returning being initiated by a triggering event occurring within the NOC. | 10-21-2010 |
| 20110047355 | Offset Based Register Address Indexing - A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address. | 02-24-2011 |