Adam J. Muff, Rochester US

Adam J. Muff, Rochester, MN US

Patent application number	Description	Published
20090158013	Method and Apparatus Implementing a Minimal Area Consumption Multiple Addend Floating Point Summation Function in a Vector Microprocessor - Embodiments of the invention provide methods and apparatus for executing a multiple operand instruction. Executing the multiple operand instruction comprises transferring more than two operands to a vector unit, each operand being transferred to a respective one of a plurality of processing lanes of the vector unit. The operands may be transferred from the vector unit to a dot product unit wherein an arithmetic operation using the more than two operands may be performed.	06-18-2009
20090182990	Method and Apparatus for a Pipelined Multiple Operand Minimum and Maximum Function - Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands.	07-16-2009
20100023568	Dynamic Range Adjusting Floating Point Execution Unit - A floating point execution unit is capable of selectively repurposing a subset of the significand bits in a floating point value for use as additional exponent bits to dynamically provide an extended range for floating point calculations. A significand field of a floating point operand may be considered to include first and second portions, with the first portion capable of being concatenated with the second portion to represent the significand for a floating point value, or, to provide an extended range, being concatenated with the exponent field of the floating point operand to represent the exponent for a floating point value.	01-28-2010
20100042812	Data Dependent Instruction Decode - A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode.	02-18-2010
20100042813	Redundant Execution of Instructions in Multistage Execution Pipeline During Unused Execution Cycles - A pipelined execution unit uses the bubbles that occur during execution to selectively repeat operations performed in one or more stages of a multistage execution pipeline to verify the results of such operations during otherwise unused execution cycles for the execution pipeline. Whenever a bubble follows a particular instruction within an execution pipeline, the result of an operation that is performed for that instruction by a particular stage of the execution pipeline may be stored, and the operation may be repeated by the stage in a subsequent execution cycle in which no productive operation would otherwise be performed due to the presence of the bubble. The results of the operations may then be compared and used to either verify the original result or identify a potential error in the execution of the instruction.	02-18-2010
20100091787	DIRECT INTER-THREAD COMMUNICATION BUFFER THAT SUPPORTS SOFTWARE CONTROLLED ARBITRARY VECTOR OPERAND SELECTION IN A DENSELY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for retrieving arbitrarily aligned vector operands within a highly threaded Network On a Chip (NOC) processor are presented. Multiple nodes in a NOC are able to access a single Compressed Direct Interthread Communication Buffer (CDICB), which contains a misaligned but compacted set of operands. Using information from a Special Purpose Register (SPR) within the NOC, each node is able to selectively extract one or more operands from the CDICB for use in an execution unit within that node. Output from the execution unit is then sent to the CDICB to update the compacted set of operands.	04-15-2010
20100106940	Processing Unit With Operand Vector Multiplexer Sequence Control - Operand vector multiplexer sequence control is used in a vector-based execution unit to control the shuffling of data elements in operand vectors used by a sequence of vector instructions processed by the vector-based execution unit. A swizzle sequence instruction is defined in an instruction set for the vector-based execution unit and is used to selectively apply a sequence of vector data element shuffle orders to one or more operand vectors to be used by the associated sequence of vector instructions. As a result, when a common sequence of data element shuffle orders is used frequently for a sequence of vector instructions, a single swizzle sequence instruction may be used to select the desired sequence of custom data element ordering for each of the vector instructions in the sequence.	04-29-2010
20100189111	STREAMING DIRECT INTER-THREAD COMMUNICATION BUFFER PACKETS THAT SUPPORT HARDWARE CONTROLLED ARBITRARY VECTOR OPERAND ALIGNMENT IN A DENSELY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for arbitrarily aligning vector operands, which are transmitted in inter-thread communication buffer packets within a highly threaded Network On a Chip (NOC) processor, are presented. A set of multiplexers in a node in the NOC realigns and extracts data word aggregations from an incoming compressed inter-thread communication buffer packet. The extracted data word aggregations are used as operands by an execution unit within the node.	07-29-2010
20100191939	TRIGONOMETRIC SUMMATION VECTOR EXECUTION UNIT - A unique instruction and exponent adjustment adder selectively shift outputs from multiple execution units, including a plurality of multipliers, in a processor core in order to scale mantissas for related trigonometric functions used in a vector dot product.	07-29-2010
20110047355	Offset Based Register Address Indexing - A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address.	02-24-2011
20110167296	REGISTER FILE SOFT ERROR RECOVERY - Register file soft error recovery including a system that includes a first register file and a second register file that mirrors the first register file. The system also includes an arithmetic pipeline for receiving data read from the first register file, and error detection circuitry to detect whether the data read from the first register file includes corrupted data. The system further includes error recovery circuitry to insert an error recovery instruction into the arithmetic pipeline in response to detecting the corrupted data. The inserted error recovery instruction replaces the corrupted data in the first register file with a copy of the data from the second register file.	07-07-2011
20110219208	MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER - A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.	09-08-2011
20110283090	Instruction Addressing Using Register Address Sequence Detection - A circuit arrangement and method support efficient indexing into large register files by utilizing register address sequence detection, wherein register addresses to be used by an instruction are produced by concatenating a portion of the address that is contained in the instruction with another portion that is speculatively produced by sequence detection logic. The portion of the correct full address that is not contained in the instruction is stored in a software accessible special purpose register. If the end of a particular sequence of addresses is detected by the sequence detection logic, the invention speculatively assumes that the next address in the sequence will be used. Since only a portion of the full addresses are stored in the instruction, they occupy less instruction space than the full address widths. An instruction may include at least one address portion that identifies a register address.	11-17-2011
20110298788	PERFORMING VECTOR MULTIPLICATION - A method includes receiving packed data corresponding to pixel components to be processed at a graphics pipeline. The method includes unpacking the packed data to generate floating point numbers that correspond to the pixel components. The method also includes routing each of the floating point numbers to a separate lane of the graphics pipeline. Each of the floating point numbers are to be processed by multiplier units of the graphics pipeline.	12-08-2011
20110302450	FAULT TOLERANT STABILITY CRITICAL EXECUTION CHECKING USING REDUNDANT EXECUTION PIPELINES - A circuit arrangement and method utilize existing redundant execution pipelines in a processing unit to execute multiple instances of stability critical instructions in parallel so that the results of the multiple instances of the instructions can be compared for the purpose of detecting errors. For other types of instructions for which fault tolerant or stability critical execution is not required or desired, the redundant execution pipelines are utilized in a more conventional manner, enabling multiple non-stability critical instructions to be concurrently issued to and executed by the redundant execution pipelines. As such, for non-stability critical program code, the performance benefits of having multiple redundant execution units are preserved, yet in the instances where fault tolerant or stability critical execution is desired for certain program code, the redundant execution units may be repurposed to provide greater assurances as to the fault-free execution of such instructions.	12-08-2011
20110321049	Programmable Integrated Processor Blocks - An integrated processor block of the network on a chip is programmable to perform a first function. The integrated processor block includes an inbox to receive incoming packets from other integrated processor blocks of a network on a chip, an outbox to send outgoing packets to the other integrated processor blocks, an on-chip memory, and a memory management unit to enable access to the on-chip memory.	12-29-2011
20120084535	Opcode Space Minimizing Architecture Utilizing Instruction Address to Indicate Upper Address Bits - Due to the ever expanding number of registers and new instructions in modern microprocessor cores, the address widths present in the instruction encoding continue to widen, and fewer instruction opcodes are available, making it more difficult to add new instructions to existing architectures without resorting to inelegant tricks that have drawbacks such as source destructive operations. The disclosed invention utilizes specialized decode and address calculation hardware that concatenates a fixed number of least significant bits of the instruction address onto the upper address bits of each register address portion contained in the instruction, yielding the full register address, instead of providing the full register address widths for every register used in the instruction. This frees up valuable opcode space for other instructions and avoids compiler complexity. This aligns nicely with how most loops are unrolled in assembly language, where independent operations are near each other in memory.	04-05-2012
20130036296	FLOATING POINT EXECUTION UNIT WITH FIXED POINT FUNCTIONALITY - A floating point execution unit is capable of selectively repurposing one or more adders in an exponent path of the floating point execution unit to perform fixed point addition operations, thereby providing fixed point functionality in the floating point execution unit.	02-07-2013
20130111186	INSTRUCTION ADDRESS ADJUSTMENT IN RESPONSE TO LOGICALLY NON-SIGNIFICANT OPERATIONS	05-02-2013
20130111190	OPERATIONAL CODE EXPANSION IN RESPONSE TO SUCCESSIVE TARGET ADDRESS DETECTION	05-02-2013
20130138918	DIRECT INTERTHREAD COMMUNICATION DATAPORT PACK/UNPACK AND LOAD/SAVE - A circuit arrangement, method, and program product for compressing and decompressing data in a node of a system including a plurality of nodes interconnected via an on-chip network. Compressed data may be received and stored at an input buffer of a node, and in parallel with moving the compressed data to an execution register of the node, decompression logic of the node may decompress the data to generate uncompressed data, such that uncompressed data is stored in the execution register for utilization by an execution unit of the node. Uncompressed data may be output by the execution unit into the execution register, and in parallel with moving the uncompressed data to an output buffer of the node connected to the on-chip network, compression logic may compress the uncompressed data to generate compressed data, such that compressed data is stored at the output buffer.	05-30-2013
20130138925	PROCESSING CORE WITH SPECULATIVE REGISTER PREPROCESSING - A method and circuit arrangement speculatively preprocess data stored in a register file during otherwise unused cycles in an execution unit, e.g., to prenormalize denormal floating point values stored in a floating point register file, to decompress compressed values stored in a register file, to decrypt encrypted values stored in a register file, or to otherwise preprocess data that is stored in an unprocessed form in a register file.	05-30-2013
20130159668	PREDECODE LOGIC FOR AUTOVECTORIZING SCALAR INSTRUCTIONS IN AN INSTRUCTION BUFFER - A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.	06-20-2013
20130159674	INSTRUCTION PREDICATION USING INSTRUCTION FILTERING - A method and circuit arrangement for selectively predicating instructions in an instruction stream based upon a predication filter criteria defined by a predication filter, which describes types or patterns of instructions that should be predicated. Predication logic compares a respective instruction of an instruction stream to predication filter criteria to determine whether the respective instruction matches the predication filter criteria, and the respective instruction is selectively predicated based on whether the respective instruction matches the predication filter criteria.	06-20-2013
20130159675	INSTRUCTION PREDICATION USING UNUSED DATAPATH FACILITIES - A method and circuit arrangement for selectively predicating an instruction in an instruction stream based upon a value corresponding to a predication register address indicated by a portion of an operand associated with the instruction. A first compare instruction in an instruction stream stores a compare result in at a register address of a predication register. The register address of the predication register is stored in a portion of an operand associated with a second instruction, and during decoding the second instruction, the predication register is accessed to determine a value stored at the register address of the predication register, and the second instruction is selectively predicated based on the value stored at the register address of the predication register.	06-20-2013
20130159676	INSTRUCTION SET ARCHITECTURE WITH EXTENDED REGISTER ADDRESSING - A method and circuit arrangement selectively repurpose bits from a primary opcode portion of an instruction for use in decoding one or more operands for the instruction. Decode logic of a processor, for example, may be placed in a predetermined mode that decodes a primary opcode for an instruction that is different from that specified in the primary opcode portion of the instruction, and then utilize one or more bits in the primary opcode portion to decode one or more operands for the instruction. By doing so, additional space is freed up in the instruction to support a larger register file and/or additional instruction types, e.g., as specified by a secondary or extended opcode.	06-20-2013
20130159683	INSTRUCTION PREDICATION USING INSTRUCTION ADDRESS PATTERN MATCHING - A particular method includes receiving, at a processor, an instruction and an address of the instruction. The method also includes preventing execution of the instruction based at least in part on determining that the address is within a range of addresses.	06-20-2013
20130191649	MEMORY ADDRESS TRANSLATION-BASED DATA ENCRYPTION/COMPRESSION - A method and circuit arrangement selectively stream data to an encryption or compression engine based upon encryption and/or compression-related page attributes stored in a memory address translation data structure such as an Effective To Real Translation (ERAT) or Translation Lookaside Buffer (TLB). A memory address translation data structure may be accessed, for example, in connection with a memory access request for data in a memory page, such that attributes associated with the memory page in the data structure may be used to control whether data is encrypted/decrypted and/or compressed/decompressed in association with handling the memory access request.	07-25-2013
20130191651	MEMORY ADDRESS TRANSLATION-BASED DATA ENCRYPTION WITH INTEGRATED ENCRYPTION ENGINE - A method and circuit arrangement utilize an integrated encryption engine within a processing core of a multi-core processor to perform encryption operations, i.e., encryption and decryption of secure data, in connection with memory access requests that access such data. The integrated encryption engine is utilized in combination with a memory address translation data structure such as an Effective To Real Translation (ERAT) or Translation Lookaside Buffer (TLB) that is augmented with encryption-related page attributes to indicate whether pages of memory identified in the data structure are encrypted such that secure data associated with a memory access request in the processing core may be selectively streamed to the integrated encryption engine based upon the encryption-related page attribute for the memory page associated with the memory access request.	07-25-2013
20130191824	VIRTUALIZATION SUPPORT FOR BRANCH PREDICTION LOGIC ENABLE/DISABLE - A hypervisor and one or more guest operating systems resident in a data processing system and hosted by the hypervisor are configured to selectively enable or disable branch prediction logic through separate hypervisor-mode and guest-mode instructions. By doing so, different branch prediction strategies may be employed for different operating systems and user applications hosted thereby to provide finer grained optimization of the branch prediction logic for different operating scenarios.	07-25-2013
20130191825	VIRTUALIZATION SUPPORT FOR SAVING AND RESTORING BRANCH PREDICTION LOGIC STATES - A hypervisor and one or more programs, e.g., guest operating systems and/or user processes or applications hosted by the hypervisor to configured to selectively save and restore the state of branch prediction logic through separate hypervisor-mode and guest-mode and/or user-mode instructions. By doing so, different branch prediction strategies may be employed for different operating systems and user applications hosted thereby to provide finer grained optimization of the branch prediction logic.	07-25-2013

Patent applications by Adam J. Muff, Rochester, MN US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Adam J. Muff, Rochester US

Adam J. Muff, Rochester, MN US