Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Boersma, Holzgerlingen

Maarten Boersma, Holzgerlingen NL

Patent application number	Description	Published
20100228955	METHOD AND APPARATUS FOR IMPROVED POWER MANAGEMENT OF MICROPROCESSORS BY INSTRUCTION GROUPING - A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decode unit; an execution unit coupled to receive and send signals from and to the instruction scheduling unit; and a state machine located within the execution unit, the method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, subsequent to obtaining the number of instructions per cycle, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a signal from the state machine when the instructions are independent of each other without incurring significant loss of performance until an issue queue in the instruction scheduling unit is filled with instruction data.	09-09-2010
20120303991	METHOD AND APPARATUS FOR IMPROVED POWER MANAGEMENT OF MICROPROCESSORS BY INSTRUCTION GROUPING - A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decoder; an execution unit receiving and sending signals from and to the instruction scheduling unit; and a state machine. The method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a signal from the state machine.	11-29-2012

Maarten Boersma, Holzgerlingen DE

Patent application number	Description	Published
20100058266	3-Stack Floorplan for Floating Point Unit - A 3-stack floorplan for a floating point unit includes: an aligner located in the center of the floating point unit; a frontend located directly above the aligner; a multiplier located directly below the frontend and next to the aligner; an adder located directly next to the multiplier and directly below the aligner; a normalizer located directly above the adder; and a rounder located directly above the normalizer.	03-04-2010
20100095099	SYSTEM AND METHOD FOR STORING NUMBERS IN FIRST AND SECOND FORMATS IN A REGISTER FILE - A system and a method for storing numbers in a register file are provided. The system and the method store single precision numbers in double precision format in a register file that is shared between floating point computational units and computational units not supporting floating point numbers.	04-15-2010
20100146023	SHIFTER WITH ALL-ONE AND ALL-ZERO DETECTION - A shifter that includes a plurality of shift stages positioned within the shifter, and receiving and shifting input data to generate a shifted result, and a detection circuit coupled at an input of a final shift stage of the plurality of shifters, in a final stage within the shifter. The detection circuit receives a partially shifted vector at the input of the final shift stage along with a predetermined shift amount, and performing an all-one or all-zero detection operation using a portion of the partially shifted vector and the predetermined shift amount, in parallel, to a shifting operation performed by the final shift stage to generate the shifted result.	06-10-2010
20100146471	FAST ROUTING OF CUSTOM MACROS - A system for creating layout and wiring diagrams for an integrated circuit (IC) includes a placement engine configured to receive a hierarchical schematic and to create a placed layout. The system also includes a flat layout engine configured to receive the hierarchical schematic and to create a flat layout and a back annotation engine coupled to the placement engine and the flat layout engine, the back annotation engine configured to receive the hierarchical placed layout and the flat unplaced layout and to create a flat placed layout there from.	06-10-2010
20100174764	REUSE OF ROUNDER FOR FIXED CONVERSION OF LOG INSTRUCTIONS - A method for converting a signed fixed point number into a floating point number that includes reading an input number corresponding to a signed fixed point number to be converted, determining whether the input number is less than zero, setting a sign bit based upon whether the input number is less than zero or greater than or equal to zero, computing a first intermediate result by exclusive-ORing the input number with the sign bit, computing leading zeros of the first intermediate result, padding the first intermediate result based upon the sign bit, computing a second intermediate result by shifting the padded first intermediate result to the left by the leading zeros, computing an exponent portion and a fraction portion, conditionally incrementing the fraction portion based on the sign bit, correcting the exponent portion and the fraction portion if the incremented fraction portion overflows, and returning the floating point number.	07-08-2010
20130204916	RESIDUE-BASED ERROR DETECTION FOR A PROCESSOR EXECUTION UNIT THAT SUPPORTS VECTOR OPERATIONS - A residue generating circuit for an execution unit that supports vector operations includes an operand register and a residue generator coupled to the operand register. The residue generator includes a first residue generation tree coupled to a first section of the operand register and a second residue generation tree coupled to a second section of the operand register. The first residue generation tree is configured to generate a first residue for first data included in the first section of the operand register. The second residue generation tree is configured to generate a second residue for second data included in a second section of the operand register. The first section of the operand register includes a different number of register bits than the second section of the operand register. Configuring a residue-generation tree to be split into multiple residue generation trees facilitates residue checking multiple independent vector instruction operands within a full width dataflow in parallel or alternatively residue checking a full width operand of a standard instruction.	08-08-2013
20140164462	RESIDUE-BASED ERROR DETECTION FOR A PROCESSOR EXECUTION UNIT THAT SUPPORTS VECTOR OPERATIONS - A residue generating circuit for an execution unit that supports vector operations includes an operand register and a residue generator coupled to the operand register. The residue generator includes a first residue generation tree coupled to a first section of the operand register and a second residue generation tree coupled to a second section of the operand register. The first residue generation tree is configured to generate a first residue for first data included in the first section of the operand register. The second residue generation tree is configured to generate a second residue for second data included in a second section of the operand register. The first section of the operand register includes a different number of register bits than the second section of the operand register.	06-12-2014

Patent applications by Maarten Boersma, Holzgerlingen DE

Maarten J. Boersma, Holzgerlingen DE

Patent application number	Description	Published
20100017635	ZERO INDICATION FORWARDING FOR FLOATING POINT UNIT POWER REDUCTION - A method, system and computer program product for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations.	01-21-2010
20100023573	EFFICIENT FORCING OF CORNER CASES IN A FLOATING POINT ROUNDER - The forcing of the result or output of a rounder portion of a floating point processor occurs only in a fraction non-increment data path within the rounder and not in the fraction increment data path within the rounder. The fraction forcing is active on a corner case such as a disabled overflow exception. A disabled overflow exception may be detected by inspecting the normalized exponent. If a disabled overflow exception is detected, the round mode is selected to execute only in the non-increment data path thereby preventing the fraction increment data path from being selected.	01-28-2010
20100063985	NORMALIZER SHIFT PREDICTION FOR LOG ESTIMATE INSTRUCTIONS - A floating point processor unit includes a shift amount calculation circuit within a normalizer portion of the floating point unit, wherein the shift amount calculation circuit is utilized to compute the normalizer shift amount for a log estimate instruction that runs as a pipelinable instruction.	03-11-2010
20100063987	SUPPORTING MULTIPLE FORMATS IN A FLOATING POINT PROCESSOR - In a binary floating point processor, the exponents of each of the various types of operands are recoded into an internal format, by biasing the exponents with the minimum exponent value of the result precision (“Emin”), i.e., the recoded value of the exponent is the represented value of the exponent minus Emin. Emin depends only on the result precision of the instruction that is currently being executed in the binary floating point processor. The exponent computations are then performed in this new format. The underflow check for all result precisions is a check against zero and overflow checks are performed against a positive number that depends on the result precision. The exponent values are in a 2's complement representation, so the underflow check simply becomes a check of the sign bit.	03-11-2010
20100100713	FAST FLOATING POINT COMPARE WITH SLOWER BACKUP FOR CORNER CASES - A floating point processor unit executes a floating point compare instruction with two operands of the same or different precision by comparing the two operands in integer format, which speeds up the execution of the floating point compare instruction significantly. The floating point processor now executes the floating point compare instruction at least twice as fast or faster (e.g., two clock cycles instead of five clock cycles in the prior art) for nearly most operand cases (e.g., 99% of all cases). Only the rare corner cases require additional operations on one of the operands and thus require additional cycles of execution time because the integer compare operation will not work for these corner cases. This is due to the fact that one operand is a single precision subnormal number in an unnormalized representation (i.e., has two representations) and the other operand is in the SP subnormal range such that the integer compare operation will fail.	04-22-2010
20120110271	MECHANISM TO SPEED-UP MULTITHREADED EXECUTION BY REGISTER FILE WRITE PORT REALLOCATION - Various systems and processes may be used to speed up multi-threaded execution. In certain implementations, a system and process may include the ability to write results of a first group of execution units associated with a first register file into the first register file using a first write port of the first register file and write results of a second group of execution units associated with a second register file into the second register file using a first write port of the second register file. The system, apparatus, and process may also include the ability to connect, in a shared register file mode, results of the second group of execution units to a second write port of the first register file and connect, in a split register file mode, results of a part of the first group of execution units to the second write port of the first register file.	05-03-2012
20120128149	APPARATUS AND METHOD FOR CALCULATING AN SHA-2 HASH FUNCTION IN A GENERAL PURPOSE PROCESSOR - Various systems, apparatuses, processes, and/or products may be used to calculate an SHA-2 hash function in a general-purpose processor. In some implementations, a system, apparatus, process, and/or product may include the ability to calculate at least one SHA-2 sigma function by using an execution unit adapted for performing a processor instruction, the execution unit including an integrated circuit primarily designed for calculating the SHA-2 sigma function(s), and calculating the SHA-2 hash function with general-purpose hardware processing components of the processor based on the sigma function(s). In certain implementations, the calculation of the SHA-2 sigma function(s) can be performed by the integrated circuit within a single instruction, allowing for a faster calculation of the SHA-2 hash function.	05-24-2012
20120150933	METHOD AND DATA PROCESSING UNIT FOR CALCULATING AT LEAST ONE MULTIPLY-SUM OF TWO CARRY-LESS MULTIPLICATIONS OF TWO INPUT OPERANDS, DATA PROCESSING PROGRAM AND COMPUTER PROGRAM PRODUCT - Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus.	06-14-2012
20130159666	REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS - Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.	06-20-2013
20130332501	Fused Multiply-Adder with Booth-Encoding - A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.	12-12-2013
20140075153	REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS - Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction.	03-13-2014
20140095568	Fused Multiply-Adder with Booth-Encoding - A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result.	04-03-2014
20140136815	VERIFICATION OF A VECTOR EXECUTION UNIT DESIGN - A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.	05-15-2014
20140156969	VERIFICATION OF A VECTOR EXECUTION UNIT DESIGN - A method for verification of a vector execution unit design. The method includes issuing an instruction into a first instance and a second instance of a vector execution unit. The method includes issuing a random operand into a first lane of the first instance of the vector execution unit and into a second lane of the second instance of the vector execution unit. The method further includes receiving results from execution of the instruction and the random operand in both the first and the second instance of the vector execution unit and comparing the received results.	06-05-2014
20150067298	SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA - A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.	03-05-2015
20150067299	SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA - A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position.	03-05-2015

Patent applications by Maarten J. Boersma, Holzgerlingen DE

Maarten J. Boersma, Holzgerlingen NL

Patent application number	Description	Published
20140019780	ACTIVE POWER DISSIPATION DETECTION BASED ON ERRONOUS CLOCK GATING EQUATIONS - A method detects active power dissipation in an integrated circuit. The method includes receiving a hardware design for the integrated circuit having one or more clock domains, wherein the hardware design comprises a local clock buffer for a clock domain, wherein the local clock buffer is configured to receive a clock signal and an actuation signal. The method includes adding instrumentation logic to the design for the clock domain, wherein the instrumentation logic is configured to compare a first value of the actuation signal determined at a beginning point of a test period to a second value of the actuation signal determined at a time when the clock domain is in an idle condition. The method includes detecting the clock domain includes unintended active power dissipation, in response to the first value of the actuation signal not being equal to the second value of the actuation signal.	01-16-2014