Patent application number | Description | Published |
20090063899 | Register Error Correction of Speculative Data in an Out-of-Order Processor - In one embodiment, a processor comprises a first register file configured to store speculative register state, a second register file configured to store committed register state, a check circuit and a control unit. The first register file is protected by a first error protection scheme and the second register file is protected by a second error protection scheme. A check circuit is coupled to receive a value and corresponding one or more check bits read from the first register file to be committed to the second register file in response to the processor selecting a first instruction to be committed. The check circuit is configured to detect an error in the value responsive to the value and the check bits. Coupled to the check circuit, the control unit is configured to cause reexecution of the first instruction responsive to the error detected by the check circuit. | 03-05-2009 |
20090248779 | Processor which Implements Fused and Unfused Multiply-Add Instructions in a Pipelined Manner - Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit. | 10-01-2009 |
20100246814 | APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE DATA ENCRYPTION STANDARD (DES) ALGORITHM - A processor including instruction support for implementing the Data Encryption Standard (DES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more DES instructions defined within the ISA. In addition, the DES instructions may be executable by the cryptographic unit to implement portions of an DES cipher that is compliant with Federal Information Processing Standards Publication 46-3 (FIPS 46-3). In response to receiving a DES key expansion instruction defined within the ISA, the cryptographic unit may generate one or more expanded cipher keys of the DES cipher key schedule from an input key. | 09-30-2010 |
20100246815 | APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE KASUMI CIPHER ALGORITHM - A processor including instruction support for implementing the Kasumi block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more Kasumi instructions defined within the ISA. In addition, the Kasumi instructions may be executable by the cryptographic unit to implement portions of a Kasumi cipher that is compliant with 3 | 09-30-2010 |
20100250639 | APPARATUS AND METHOD FOR IMPLEMENTING HARDWARE SUPPORT FOR DENORMALIZED OPERANDS FOR FLOATING-POINT DIVIDE OPERATIONS - A floating-point circuit may include a floating-point operand normalization circuit configured to receive input floating-point operands of a given floating-point divide operation, the operands comprising a dividend and a divisor, as well as a divide engine coupled to the normalization circuit. In response to determining that one or more of the input floating-point operands is a denormal number, the operand normalization circuit may be further configured to normalize the one or more of the input floating-point operands and output a normalized dividend and normalized divisor to the divide engine, and dependent upon respective numbers of leading zeros of the dividend and divisor prior to normalization, generate a value indicative of a maximum possible number of digits of a quotient (NDQ). The divide engine may be configured to iteratively generate NDQ digits of a floating-point quotient from the normalized dividend and the normalized divisor provided by the floating-point operand normalization circuit. | 09-30-2010 |
20100250964 | APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE CAMELLIA CIPHER ALGORITHM - A processor including instruction support for implementing the Camellia block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more Camellia instructions defined within the ISA. In addition, the Camellia instructions may be executable by the cryptographic unit to implement portions of a Camellia cipher that is compliant with Internet Engineering Task Force (IETF) Request For Comments (RFC) 3713. In response to receiving a Camellia F( )-operation instruction defined within the ISA, the cryptographic unit may perform an F( ) operation, as defined by the Camellia cipher, upon a data input operand and a subkey operand, in which the data input operand and subkey operand may be specified by the Camellia F( )-operation instruction. | 09-30-2010 |
20100250965 | APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM - A processor including instruction support for implementing the Advanced Encryption Standard (AES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more AES instructions defined within the ISA. In addition, the AES instructions may be executable by the cryptographic unit to implement portions of an AES cipher that is compliant with Federal Information Processing Standards Publication 197 (FIPS 197). In response to receiving a first AES encryption round instruction defined within the ISA, the cryptographic unit may perform an encryption round of the AES cipher on a first group of columns of cipher state having a plurality of rows and columns. A maximum number of columns included in the first group may be fewer than all of the columns of the cipher state. | 09-30-2010 |
20100250966 | PROCESSOR AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR HASH ALGORITHMS - A processor including instruction support for implementing hash algorithms may issue, for execution, programmer-selectable hash instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include hash instructions defined within the ISA. In addition, the hash instructions may be executable by the cryptographic unit to implement a hash that is compliant with one or more respective hash algorithm specifications. In response to receiving a particular hash instruction defined within the ISA, the cryptographic unit may retrieve a set of input data blocks from a predetermined set of architectural registers of the processor, and generate a hash value of the set of input data blocks according to a hash algorithm that corresponds to the particular hash instruction. | 09-30-2010 |
20100268920 | MECHANISM FOR HANDLING UNFUSED MULTIPLY-ACCUMULATE ACCRUED EXCEPTION BITS IN A PROCESSOR - A mechanism for handling unfused multiply-add accrued exception bits includes a processor including a floating point unit, a storage, and exception logic. The floating-point unit may be configured to execute an unfused multiply-accumulate instruction defined with the instruction set architecture (ISA). The unfused multiply-accumulate instruction may include a multiply sub-operation and an accumulate sub-operation. The storage may be configured to maintain floating-point exception state information. The exception logic may be configured to capture the floating-point exception state after completion of the multiply sub-operation and prior to completion of the accumulate sub-operation, for example, and to update the storage to reflect the floating-point exception state. | 10-21-2010 |
20100325188 | PROCESSOR AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR MULTIPLICATION OF LARGE OPERANDS - A processor including instruction support for implementing large-operand multiplication may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include an instruction execution unit comprising a hardware multiplier datapath circuit, where the hardware multiplier datapath circuit is configured to multiply operands having a maximum number of bits M. In response to receiving a single instance of a large-operand multiplication instruction defined within the ISA, wherein at least one of the operands of the large-operand multiplication instruction includes more than the maximum number of bits M, the instruction execution unit is configured to multiply operands of the large-operand multiplication instruction within the hardware multiplier datapath circuit to determine a result of the large-operand multiplication instruction without execution of programmer-selected instructions within the ISA other than the large-operand multiplication instruction. | 12-23-2010 |
20100329450 | INSTRUCTIONS FOR PERFORMING DATA ENCRYPTION STANDARD (DES) COMPUTATIONS USING GENERAL-PURPOSE REGISTERS - Some embodiments of the present invention provide a processor, which includes a set of general-purpose registers and at least one execution unit. Each general-purpose register in the set of general-purpose registers is at least 64 bits wide, and the execution unit supports one or more Data Encryption Standard (DES) instructions. Specifically, the execution unit may support a permutation-rotation instruction for performing DES permutation operations and DES rotation operations. The execution unit may also support a round instruction to perform a DES round operation. Since the DES instructions use general-purpose registers instead of special-purpose registers to perform DES-specific operations, the processor's circuit complexity and area are reduced. Furthermore, in some embodiments, since the DES instructions require at most two operands, the number of bits required to specify the location of the operands are reduced, thereby enabling a larger number of instructions to be supported by the processor. | 12-30-2010 |
20110078414 | MULTIPORTED REGISTER FILE FOR MULTITHREADED PROCESSORS AND PROCESSORS EMPLOYING REGISTER WINDOWS - A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output. | 03-31-2011 |
20110087895 | APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS - A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor. | 04-14-2011 |
20110091035 | HARDWARE KASUMI CYPHER WITH HYBRID SOFTWARE INTERFACE - A system including a memory; a software interface, operatively connected to the memory, and configured to generate a modified version of a confidentially key (CKey), and a modified version of an integrity key (IKey); and a Kasumi engine having a hardware implementation of a Kasumi cipher and configured to load the modified version of the CKey from the memory to perform a confidentiality function, and to load the modified version of the IKey from memory to perform an integrity function. | 04-21-2011 |
20110231636 | APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR PERFORMING A CYCLIC REDUNDANCY CHECK (CRC) - Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41. In some embodiments, the first instance of the CRC instruction specifies an initial value to be used in performing the first CRC operation, the set of data, and a storage location in which the cryptographic unit is configured to store the checksum value produced by the first CRC operation. | 09-22-2011 |
20110276783 | THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS - Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced. | 11-10-2011 |
20110276790 | INSTRUCTION SUPPORT FOR PERFORMING MONTGOMERY MULTIPLICATION - Techniques are disclosed relating to a processor including instruction support for performing a Montgomery multiplication. The processor may issue, for execution, programmer-selectable instruction from a defined instruction set architecture (ISA). The processor may include an instruction execution unit configured to receive instructions including a first instance of a Montgomery-multiply instruction defined within the ISA. The Montgomery-multiply instruction is executable by the processor to operate on at least operands A, B, and N residing in respective portions of a general-purpose register file of the processor, where at least one of operands A, B, N spans at least two registers of general-purpose register file. The instruction execution unit is configured to calculate P mod N in response to receiving the first instance of the Montgomery-multiply instruction, where P is the product of at least operand A, operand B, and R̂−1. | 11-10-2011 |
20110296142 | PROCESSOR AND METHOD PROVIDING INSTRUCTION SUPPORT FOR INSTRUCTIONS THAT UTILIZE MULTIPLE REGISTER WINDOWS - A processor including instruction support for large-operand instructions that use multiple register windows may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may also include an instruction execution unit that, during operation, receives instructions for execution from the instruction fetch unit and executes a large-operand instruction defined within the ISA, where execution of the large-operand instruction is dependent upon a plurality of registers arranged within a plurality of register windows. The processor may further include control circuitry (which may be included within the fetch unit, the execution unit, or elsewhere within the processor) that determines whether one or more of the register windows depended upon by the large-operand instruction are not present. In response to determining that one or more of these register windows are not present, the control circuitry causes them to be restored. | 12-01-2011 |
20120060057 | Register Error Correction of Speculative Data in an Out-of-Order Processor - In one embodiment, a processor comprises a first register file configured to store speculative register state, a second register file configured to store committed register state, a check circuit and a control unit. The first register file is protected by a first error protection scheme and the second register file is protected by a second error protection scheme. A check circuit is coupled to receive a value and corresponding one or more check bits read from the first register file to be committed to the second register file in response to the processor selecting a first instruction to be committed. The check circuit is configured to detect an error in the value responsive to the value and the check bits. Coupled to the check circuit, the control unit is configured to cause reexecution of the first instruction responsive to the error detected by the check circuit. | 03-08-2012 |
20120087492 | EXECUTION UNIT FOR PERFORMING THE DATA ENCRYPTION STANDARD - Described is an execution unit for performing at least part of the Data Encryption Standard that includes a Left Half input; a Key input; and a Table input, as well as a first group of transistors configured to receive the Table input, perform a table look-up, and output data. The execution unit further includes a first exclusive-or operator having two inputs and an output that is configured to receive the Left Half input and the Key input. The execution unit also includes a second exclusive-or operator having two inputs and an output that is configured to receive the data output by the first group of transistors and to receive the output of the first exclusive-or operator. The execution unit also includes a third exclusive-or operator having two inputs and an output that is configured to receive the Left Half input and the data output by the first group of transistors. | 04-12-2012 |
20120216020 | INSTRUCTION SUPPORT FOR PERFORMING STREAM CIPHER - Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2′, and to perform a modular addition using a value R2 to produce a value R1'. | 08-23-2012 |
20120221614 | Processor Pipeline which Implements Fused and Unfused Multiply-Add Instructions - Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit. | 08-30-2012 |
20120233234 | SYSTEM AND METHOD OF BYPASSING UNROUNDED RESULTS IN A MULTIPLY-ADD PIPELINE UNIT - A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products. | 09-13-2012 |
20120259907 | PIPELINED DIVIDE CIRCUIT FOR SMALL OPERAND SIZES - A pipelined circuit for performing a divide operation on small operand sizes. The circuit includes a plurality of stages connected together in a series to perform a subtractive divide algorithm based on iterative subtractions and shifts. Each stage computes two quotient bits and outputs a partial remainder value to the next stage in the series. The first and last stages utilize a radix-4 serial architecture with edge modifications to increase efficiency. The intermediate stages utilize a radix-4 parallel architecture. The divide architecture is pipelined such that input operands can be applied to the divider on each clock cycle. | 10-11-2012 |
20120290817 | BRANCH TARGET STORAGE AND RETRIEVAL IN AN OUT-OF-ORDER PROCESSOR - A processor configured to facilitate transfer and storage of predicted targets for control transfer instructions (CTIs). In certain embodiments, the processor may be multithreaded and support storage of predicted targets for multiple threads. In some embodiments, a CTI branch target may be stored by one element of a processor and a tag may indicate the location of the stored target. The tag may be associated with the CTI rather than associating the complete target address with the CTI. When the CTI reaches an execution stage of the processor, the tag may be used to retrieve the predicted target address. In some embodiments using a tag to retrieve a predicted target, CTI instructions from different processor threads may be interleaved without affecting retrieval of predicted targets. | 11-15-2012 |
20120290820 | SUPPRESSION OF CONTROL TRANSFER INSTRUCTIONS ON INCORRECT SPECULATIVE EXECUTION PATHS - Techniques are disclosed relating to a processor that is configured to execute control transfer instructions (CTIs). In some embodiments, the processor includes a mechanism that suppresses results of mispredicted younger CTIs on a speculative execution path. This mechanism permits the branch predictor to maintain its fidelity, and eliminates spurious flushes of the pipeline. In one embodiment, a misprediction bit is be used to indicate that a misprediction has occurred, and younger CTIs than the CTI that was mispredicted are suppressed. In some embodiments, the processor may be configured to execute instruction streams from multiple threads. Each thread may include a misprediction indication. CTIs in each thread may execute in program order with respect to other CTIs of the thread, while instructions other than CTIs may execute out of program order. | 11-15-2012 |
20130138888 | STORING A TARGET ADDRESS OF A CONTROL TRANSFER INSTRUCTION IN AN INSTRUCTION FIELD - A control transfer instruction (CTI), such as a branch, jump, etc., may have an offset value for a control transfer that is to be performed. The offset value may be usable to compute a target address for the CTI (e.g., the address of a next instruction to be executed for a thread or instruction stream). The offset may be specified relative to a program counter. In response to detecting a specified offset value, the CTI may be modified to include at least a portion of a computed target address. Information indicating this modification has been performed may be stored, for example, in a pre-decode bit. In some cases, CTI modification may be performed only when a target address is a “near” target, rather than a “far” target. Modifying CTIs as described herein may eliminate redundant address calculations and produce a savings of power and/or time in some embodiments. | 05-30-2013 |
20130179664 | DIVISION UNIT WITH MULTIPLE DIVIDE ENGINES - Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit. | 07-11-2013 |