Patent application number | Description | Published |
20080276076 | METHOD AND APPARATUS FOR REGISTER RENAMING - A method and apparatus for register renaming are provides in the illustrative embodiments. A mapper receives a request for a data in a logical register. The mapper searches an in-flight map table and a set of architected map tables for the data in the logical register. The mapper identifies an entry in one of the in-flight map table and an architected map table in the set of architected map tables that corresponds with the logical register in the request. The mapper returns a location of a physical register, which holds the requested data. | 11-06-2008 |
20090135961 | System and Method for Scanning Sequential Logic Elements - System and Method for Scanning Sequential Logic Elements A digital system and method for scanning sequential logic elements are disclosed. The digital system may comprise a plurality of sequential logic elements subdivided into power domains, wherein at least one of the power domains is power gated; a scan chain configured for processing a scan data sequence; a scan enable switch configured for controlling a scan mode; and at least one shadow engine, wherein the at least one shadow engine comprises a control circuit. At least some of the power domains may be interconnected to the scan chain with the scan enable switch, and the scan enable switch may control the scan mode by asserting a scan enable signal. The at least one power gated power domain with one or more sequential logic elements to be power gated may be bypassed via the at least one shadow engine. | 05-28-2009 |
20090292892 | Method to Reduce Power Consumption of a Register File with Multi SMT Support - A method for reducing the power consumption of a register file of a microprocessor supporting simultaneous multithreading (SMT) is disclosed. Mapping logic and associated table entries monitor a total number of processing threads currently executing in the processor and signal control logic to disable specific register file entries not required for currently executing or pending instruction threads or register file entries not meeting a minimum access threshold using a least recently used algorithm (LRU). The register file utilization is controlled such that a register file address range selected for deactivation is not assigned for pending or future instruction threads. One or more power saving techniques are then applied to disabled register files to reduce overall power dissipation in the system. | 11-26-2009 |
20120110271 | MECHANISM TO SPEED-UP MULTITHREADED EXECUTION BY REGISTER FILE WRITE PORT REALLOCATION - Various systems and processes may be used to speed up multi-threaded execution. In certain implementations, a system and process may include the ability to write results of a first group of execution units associated with a first register file into the first register file using a first write port of the first register file and write results of a second group of execution units associated with a second register file into the second register file using a first write port of the second register file. The system, apparatus, and process may also include the ability to connect, in a shared register file mode, results of the second group of execution units to a second write port of the first register file and connect, in a split register file mode, results of a part of the first group of execution units to the second write port of the first register file. | 05-03-2012 |
20120128149 | APPARATUS AND METHOD FOR CALCULATING AN SHA-2 HASH FUNCTION IN A GENERAL PURPOSE PROCESSOR - Various systems, apparatuses, processes, and/or products may be used to calculate an SHA-2 hash function in a general-purpose processor. In some implementations, a system, apparatus, process, and/or product may include the ability to calculate at least one SHA-2 sigma function by using an execution unit adapted for performing a processor instruction, the execution unit including an integrated circuit primarily designed for calculating the SHA-2 sigma function(s), and calculating the SHA-2 hash function with general-purpose hardware processing components of the processor based on the sigma function(s). In certain implementations, the calculation of the SHA-2 sigma function(s) can be performed by the integrated circuit within a single instruction, allowing for a faster calculation of the SHA-2 hash function. | 05-24-2012 |
20120150933 | METHOD AND DATA PROCESSING UNIT FOR CALCULATING AT LEAST ONE MULTIPLY-SUM OF TWO CARRY-LESS MULTIPLICATIONS OF TWO INPUT OPERANDS, DATA PROCESSING PROGRAM AND COMPUTER PROGRAM PRODUCT - Various systems, apparatuses, processes, and programs may be used to calculate a multiply-sum of two carry-less multiplications of two input operands. In particular implementations, a system, apparatus, process, and program may include the ability to use input data busses for the input operands and an output data bus for an overall calculation result, each bus including a width of 2n bits, where n is an integer greater than one. The system, apparatus, process, and program may also calculate the carry-less multiplications of the two input operands for a lower level of a hierarchical structure and calculating the at least one multiply-sum and at least one intermediate multiply-sum for a higher level of the structure based on the carry-less multiplications of the lower level. A certain number of multiply-sums may be output as an overall calculation result dependent on mode of operation using the full width of said output data bus. | 06-14-2012 |
20130159666 | REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS - Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction. | 06-20-2013 |
20130262370 | Fast Predicate Table Scans Using Single Instruction, Multiple Data Architecture - An approach is provided in which a processor receives a scan request to scan data included in a data table. The processor selects a column in the data table corresponding to the scan request and retrieves column data entries from the selected column. In addition, the processor identifies the width of the selected column and selects a scan algorithm based upon the identified column width. In turn, the processor loads the column data entries into column data vectors and computes scan results from the column data vectors using the selected scan algorithm. | 10-03-2013 |
20130262519 | Fast Predicate Table Scans Using Single Instruction, Multiple Data Architecture - An approach is provided in which a processor receives a scan request to scan data included in a data table. The processor selects a column in the data table corresponding to the scan request and retrieves column data entries from the selected column. In addition, the processor identifies the width of the selected column and selects a scan algorithm based upon the identified column width. In turn, the processor loads the column data entries into column data vectors and computes scan results from the column data vectors using the selected scan algorithm. | 10-03-2013 |
20140075153 | REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS - Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction. | 03-13-2014 |
20140101358 | BYTE SELECTION AND STEERING LOGIC FOR COMBINED BYTE SHIFT AND BYTE PERMUTE VECTOR UNIT - Exemplary embodiments of the present invention disclose a method and system for executing data permute and data shift instructions. In a step, an exemplary embodiment encodes a control index value using the recoding logic into a 1-hot-of-n control for at least one of a plurality of datum positions in the one or more target registers. In another step, an exemplary embodiment conditions the 1-hot-of-n control by a gate-free logic configured for at least one of the plurality of datum positions in the one or more target registers for each of the data permute instructions and the at least one data shift instruction. In another step, an exemplary embodiment selects the 1-hot-of-n control or the conditioned 1-hot-of-n control based on a current instruction mode. In another step, an exemplary embodiment transforms the selected 1-hot-of-n control into a format applicable for the crossbar switch. | 04-10-2014 |
20140129809 | BYTE SELECTION AND STEERING LOGIC FOR COMBINED BYTE SHIFT AND BYTE PERMUTE VECTOR UNIT - Exemplary embodiments of the present invention disclose a method and system for executing data permute and data shift instructions. In a step, an exemplary embodiment encodes a control index value using the recoding logic into a 1-hot-of-n control for at least one of a plurality of datum positions in the one or more target registers. In another step, an exemplary embodiment conditions the 1-hot-of-n control by a gate-free logic configured for at least one of the plurality of datum positions in the one or more target registers for each of the data permute instructions and the at least one data shift instruction. In another step, an exemplary embodiment selects the 1-hot-of-n control or the conditioned 1-hot-of-n control based on a current instruction mode. In another step, an exemplary embodiment transforms the selected 1-hot-of-n control into a format applicable for the crossbar switch. | 05-08-2014 |
Patent application number | Description | Published |
20110289297 | INSTRUCTION SCHEDULING APPROACH TO IMPROVE PROCESSOR PERFORMANCE - A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream. | 11-24-2011 |
20120216016 | INSTRUCTION SCHEDULING APPROACH TO IMPROVE PROCESSOR PERFORMANCE - A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream. | 08-23-2012 |
20130227375 | Techniques for Reusing Components of a Logical Operations Functional Block as an Error Correction Code Correction Unit - A logical operations functional block for an execution unit of a processor includes a first input data link for a first operand and a second input data link for a second operand. The execution unit includes a register connected to an error correction code detection unit. The logical operations functional block includes a look-up table configured to receive an error correction code syndrome from the error correction code detection unit. The logical operations functional block also includes a multiplexer configured to receive an output signal from the look-up table at a first input and the first operand at a second input, wherein an output of the multiplexer is coupled to the first input data link of a logical functional unit. | 08-29-2013 |
20150127926 | INSTRUCTION SCHEDULING APPROACH TO IMPROVE PROCESSOR PERFORMANCE - A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream. | 05-07-2015 |
Patent application number | Description | Published |
20150324204 | PARALLEL SLICE PROCESSOR WITH DYNAMIC INSTRUCTION STREAM MAPPING - A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The dispatch routing network is controlled to dynamically vary the relationship between the slices and instruction streams according to execution requirements for the instruction streams and the availability of resources in the instruction execution slices. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution and ordinary instruction execution on a per-instruction basis, permitting the mixture of those instruction types. Instructions having an operand width greater than the width of a single instruction execution slice may be processed by multiple instruction execution slices configured to act in concert for the particular instructions. When an instruction execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150324205 | PROCESSING OF MULTIPLE INSTRUCTION STREAMS IN A PARALLEL SLICE PROCESSOR - Techniques for managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provide flexibility in execution of program instructions by a processor core. An event is detected indicating that either resource requirement or resource availability will not be met by the execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution. When an instruction execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150324206 | PARALLEL SLICE PROCESSOR WITH DYNAMIC INSTRUCTION STREAM MAPPING - A method of operation of a processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues coupled by a dispatch routing network provides flexible and efficient use of internal resources. The dispatch routing network is controlled to dynamically vary the relationship between the slices and instruction streams according to execution requirements for the instruction streams and the availability of resources in the instruction execution slices. The instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution and ordinary instruction execution on a per-instruction basis. Instructions having an operand width greater than the width of a single instruction execution slice may be processed by multiple instruction execution slices configured to act in concert for the particular instructions. When an instruction execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |
20150324207 | PROCESSING OF MULTIPLE INSTRUCTION STREAMS IN A PARALLEL SLICE PROCESSOR - A method of managing instruction execution for multiple instruction streams using a processor core having multiple parallel instruction execution slices provides instruction processing flexibility. An event is detected indicating that either resource requirement or resource availability for a subsequent instruction of an instruction stream will not be met by the instruction execution slice currently executing the instruction stream. In response to detecting the event, dispatch of at least a portion of the subsequent instruction is made to another instruction execution slice. The event may be a compiler-inserted directive, may be an event detected by logic in the processor core, or may be determined by a thread sequencer. The execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution, ordinary instruction execution, wide instruction execution. When an execution slice is busy processing a current instruction for one of the streams, another slice can be selected to proceed with execution. | 11-12-2015 |