# Wajdi K. Feghali, Boston US

## Wajdi K. Feghali, Boston, MA US

Patent application number | Description | Published |
---|---|---|

20080240421 | Method and apparatus for advanced encryption standard (AES) block cipher - The speed at which encrypt and decrypt operations may be performed in a general purpose processor is increased by providing a separate encrypt data path and decrypt data path. With separate data paths, each of the data paths may be individually optimized in order to reduce delays in a critical path. In addition, delays may be hidden in a non-critical last round. | 10-02-2008 |

20080240422 | Efficient advanced encryption standard (AES) Datapath using hybrid rijndael S-Box - The speed at which an AES decrypt operation may be performed in a general purpose processor is increased by providing a separate decrypt data path. The critical path delay of the aes decrypt path is reduced by combining multiply and inverse operations in the Inverse SubBytes transformation. A further decrease in critical path delay in the aes decrypt data path is provided by merging appropriate constants of the inverse mix-column transform into a map function. | 10-02-2008 |

20080240426 | Flexible architecture and instruction for advanced encryption standard (AES) - A flexible aes instruction set for a general purpose processor is provided. The instruction set includes instructions to perform a “one round” pass for aes encryption or decryption and also includes instructions to perform key generation. An immediate may be used to indicate round number and key size for key generation for 128/192/256 bit keys. The flexible aes instruction set enables full use of pipelining capabilities because it does not require tracking of implicit registers. | 10-02-2008 |

20080304659 | METHOD AND APPARATUS FOR EXPANSION KEY GENERATION FOR BLOCK CIPHERS - A key scheduler performs a key-expansion to generate round keys for AES encryption and decryption just-in-time for each AES round. The key scheduler pre-computes slow operations in a current clock cycle to reduce the critical delay path for computing the round key for a next AES round. | 12-11-2008 |

20090003593 | UNIFIED SYSTEM ARCHITECTURE FOR ELLIPTIC-CURVE CRYTPOGRAPHY - A system for performing public key encryption is provided. The system supports mathematical operations for a plurality of public key encryption algorithms such as Rivert, Shamir, Aldeman (RSA) and Diffie-Hellman key exchange (DH) and Elliptic Curve Cryptosystem (ECC). The system supports both prime fields and different composite binary fields. | 01-01-2009 |

20090003594 | MODULUS SCALING FOR ELLIPTIC-CURVE CRYPTOGRAPHY - Modulus scaling applied a reduction techniques decreases time to perform modular arithmetic operations by avoiding shifting and multiplication operations. Modulus scaling may be applied to both integer and binary fields and the scaling multiplier factor is chosen based on a selected reduction technique for the modular arithmetic operation. | 01-01-2009 |

20090003595 | SCALE-INVARIANT BARRETT REDUCTION FOR ELLIPTIC-CURVE CYRPTOGRAPHY - The computation time to perform scalar point multiplication in an Elliptic Curve Group is reduced by modifying the Barrett Reduction technique. Computations are performed using an N-bit scaled modulus based a modulus m having k-bits to provide a scaled result, with N being greater than k. The N-bit scaled result is reduced to a k-bit result using a pre-computed N-bit scaled reduction parameter in an optimal manner avoiding shifting/aligning operations for any arbitrary values of k, N. | 01-01-2009 |

20090003596 | EFFICIENT ELLIPTIC-CURVE CRYPTOGRAPHY BASED ON PRIMALITY OF THE ORDER OF THE ECC-GROUP - Time to perform scalar point multiplication used for ECC is reduced by minimizing the number of shifting operations. These operations are minimized by applying modulus scaling by performing selective comparisons of points at intermediate computations based on primality of the order of an ECC group. | 01-01-2009 |

20090006511 | POLYNOMIAL-BASIS TO NORMAL-BASIS TRANSFORMATION FOR BINARY GALOIS-FIELDS GF(2m) - Basis conversion from polynomial-basis form to normal-basis form is provided for both generic polynomials and special irreducible polynomials in the form of “all ones”, referred to as “all-ones-polynomials” (AOP). Generation and storing of large matrices is minimized by creating matrices on the fly, or by providing an alternate means of computing a result with minimal hardware extensions. | 01-01-2009 |

20090006512 | NORMAL-BASIS TO CANONICAL-BASIS TRANSFORMATION FOR BINARY GALOIS-FIELDS GF(2m) - Basis conversion from normal form to canonical form is provided for both generic polynomials and special irreducible polynomials in the form of “all ones”, referred to as “all-ones-polynomials” (AOP). Generation and storing of large matrices is minimized by creating matrices on the fly, or by providing an alternate means of computing a result with minimal hardware extensions. | 01-01-2009 |

20090006517 | UNIFIED INTEGER/GALOIS FIELD (2m) MULTIPLIER ARCHITECTURE FOR ELLIPTIC-CURVE CRYTPOGRAPHY - A unified integer/Galois-Field 2 | 01-01-2009 |

20090019342 | Determining a Message Residue - A technique of determining a message residue includes accessing a message and simultaneously determining a set of modular remainders with respect to a polynomial for different respective segments of the message. The technique also includes determining a modular remainder with respect to the polynomial for the message based on the set of modular remainders and a set of constants determined prior to accessing the message. The modular remainder with respect to the polynomial for the message is stored in a memory. | 01-15-2009 |

20090089617 | METHOD AND APPARATUS FOR TESTING MATHEMATICAL ALGORITHMS - A method and apparatus for testing mathematical programs where code coverage is exceedingly difficult to hit with random data test vectors (probability <2 | 04-02-2009 |

20090164543 | APPARATUS AND METHOD TO COMPUTE RECIPROCAL APPROXIMATIONS - A method and apparatus for reducing memory required to store reciprocal approximations as specified in Institute of Electrical and Electronic Engineers (IEEE) standards such as IEEE 754 is presented. Monotonic properties of the reciprocal function are used to bound groups of values. Efficient bit-vectors are used to represent information in groups resulting in a very compact table representation about four times smaller than storing all of the reciprocal approximations in a table. | 06-25-2009 |

20090164546 | METHOD AND APPARATUS FOR EFFICIENT PROGRAMMABLE CYCLIC REDUNDANCY CHECK (CRC) - A method and apparatus to optimize each of the plurality of reduction stages in a Cyclic Redundancy Check (CRC) circuit to produce a residue for a block of data decreases area used to perform the reduction while maintaining the same delay through the plurality of stages of the reduction logic. A hybrid mix of Karatsuba algorithm, classical multiplications and serial division in various stages in the CRC reduction circuit results in about a twenty percent reduction in area on the average with no decrease in critical path delay. | 06-25-2009 |

20100332578 | Method and apparatus for performing efficient side-channel attack resistant reduction - A time-invariant method and apparatus for performing modular reduction that is protected against cache-based and branch-based attacks is provided. The modular reduction technique adds no performance penalty and is side-channel resistant. The side-channel resistance is provided through the use of lazy evaluation of carry bits, elimination of data-dependent branches and use of even cache accesses for all memory references. | 12-30-2010 |

20110145683 | Instruction-set architecture for programmable cyclic redundancy check (CRC) computations - A method and apparatus to perform Cyclic Redundancy Check (CRC) operations on a data block using a plurality of different n-bit polynomials is provided. A flexible CRC instruction performs a CRC operation using a programmable n-bit polynomial. The n-bit polynomial is provided to the CRC instruction by storing the n-bit polynomial in one of two operands. | 06-16-2011 |

20110153700 | Method and apparatus for performing a shift and exclusive or operation in a single instruction - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. | 06-23-2011 |

20110153993 | Add Instructions to Add Three Source Operands - A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. | 06-23-2011 |

20110153994 | Multiplication Instruction for Which Execution Completes Without Writing a Carry Flag - A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. | 06-23-2011 |

20110161635 | Rotate instructions that complete execution without reading carry flag - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag. | 06-30-2011 |

20120002804 | ARCHITECTURE AND INSTRUCTION SET FOR IMPLEMENTING ADVANCED ENCRYPTION STANDARD (AES) - A flexible aes instruction for a general purpose processor is provided that performs aes encryption or decryption using n rounds, where n includes the standard aes set of rounds {10, 12, 14}. A parameter is provided to allow the type of aes round to be selected, that is, whether it is a “last round”. In addition to standard aes, the flexible aes instruction allows an AES-like cipher with 20 rounds to be specified or a “one round” pass. | 01-05-2012 |

20120151183 | ENHANCING PERFORMANCE BY INSTRUCTION INTERLEAVING AND/OR CONCURRENT PROCESSING OF MULTIPLE BUFFERS - An embodiment may include circuitry to execute, at least in part, a first list of instructions and/or to concurrently process, at least in part, first and second buffers. The execution of the first list of instructions may result, at least in part, from invocation of a first function call. The first list of instructions may include at least one portion of a second list of instructions interleaved, at least in part, with at least one other portion of a third list of instructions. The portions may be concurrently carried out, at least in part, by one or more sets of execution units of the circuitry. The second and third lists of instructions may implement, at least in part, respective algorithms that are amenable to being invoked by separate respective function calls. The concurrent processing may involve, at least in part, complementary algorithms. | 06-14-2012 |

20130007573 | EFFICIENT AND SCALABLE CYCLIC REDUNDANCY CHECK CIRCUIT USING GALOIS-FIELD ARITHMETIC - Embodiments of the present disclosure describe methods, apparatus, and system configurations for cyclic redundancy check circuits using Galois-field arithmetic. | 01-03-2013 |

20130227252 | Add Instructions to Add Three Source Operands - A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. | 08-29-2013 |

20130275722 | METHOD AND APPARATUS TO PROCESS KECCAK SECURE HASHING ALGORITHM - A processor includes a plurality of registers, an instruction decoder to receive an instruction to process a KECCAK state cube of data representing a KECCAK state of a KECCAK hash algorithm, to partition the KECCAK state cube into a plurality of subcubes, and to store the subcubes in the plurality of registers, respectively, and an execution unit coupled to the instruction decoder to perform the KECCAK hash algorithm on the plurality of subcubes respectively stored in the plurality of registers in a vector manner. | 10-17-2013 |

20130283064 | METHOD AND APPARATUS TO PROCESS SHA-1 SECURE HASHING ALGORITHM - A processor includes an instruction decoder to receive a first instruction to process a SHA-1 hash algorithm, the first instruction having a first operand to store a SHA-1 state, a second operand to store a plurality of messages, and a third operand to specify a hash function, and an execution unit coupled to the instruction decoder to perform a plurality of rounds of the SHA-1 hash algorithm on the SHA-1 state specified in the first operand and the plurality of messages specified in the second operand, using the hash function specified in the third operand. | 10-24-2013 |

20130290285 | DIGEST GENERATION - In one embodiment, circuitry may generate digests to be combined to produce a hash value. The digests may include at least one digest and at least one other digest generated based at least in part upon at least one CRC value and at least one other CRC value. The circuitry may include cyclical redundancy check (CRC) generator circuitry to generate the at least one CRC value based at least in part upon at least one input string. The CRC generator circuitry also may generate the at least one other CRC value based least in part upon at least one other input string. The at least one other input string resulting at least in part from at least one pseudorandom operation involving, at least in part, the at least one input string. Many modifications, variations, and alternatives are possible without departing from this embodiment. | 10-31-2013 |

20130326201 | PROCESSOR-BASED APPARATUS AND METHOD FOR PROCESSING BIT STREAMS - An apparatus and method are described for processing bit streams using bit-oriented instructions. For example, a method according to one embodiment includes the operations of: executing an instruction to get bits for an operation, the instruction identifying a start bit address and a number of bits to be retrieved; retrieving the bits identified by the start bit address and number of bits from a bit-oriented register or cache; and performing a sequence of specified bit operations on the retrieved bits to generate results. | 12-05-2013 |

20140003602 | Flexible Architecture and Instruction for Advanced Encryption Standard (AES) | 01-02-2014 |

20140006536 | TECHNIQUES TO ACCELERATE LOSSLESS COMPRESSION | 01-02-2014 |

20140006753 | MATRIX MULTIPLY ACCUMULATE INSTRUCTION | 01-02-2014 |

20140013086 | ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS - A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register. | 01-09-2014 |

20140016773 | INSTRUCTIONS PROCESSORS, METHODS, AND SYSTEMS TO PROCESS BLAKE SECURE HASHING ALGORITHM - A method of an aspect includes receiving an instruction indicating a first source having at least one set of four state matrix data elements, which represent a complete set of four inputs to a G function of a cryptographic hashing algorithm. The algorithm uses a sixteen data element state matrix, and alternates between updating data elements in columns and diagonals. The instruction also indicates a second source having data elements that represent message and constant data. In response to the instruction, a result is stored in a destination indicated by the instruction. The result includes updated state matrix data elements including at least one set of four updated state matrix data elements. Each of the four updated state matrix data elements represents a corresponding one of the four state matrix data elements of the first source, which has been updated by the G function. | 01-16-2014 |

20140016774 | INSTRUCTIONS TO PERFORM GROESTL HASHING - A method is described. The method includes executing an instruction to perform one or more Galois Field (GF) multiply by 2 operations on a state matrix and executing an instruction to combine results of the one or more GF multiply by 2 operations with exclusive or (XOR) functions to generate a result matrix. | 01-16-2014 |

20140019693 | PARALLEL PROCESSING OF A SINGLE DATA BUFFER - Technologies for executing a serial data processing algorithm on a single variable length data buffer includes streaming segments of the buffer into a data register, executing the algorithm on each of the segments in parallel, and combining the results of executing the algorithm on each of the segments to form the output of the serial data processing algorithm. | 01-16-2014 |

20140019694 | PARALLELL PROCESSING OF A SINGLE DATA BUFFER - Technologies for executing a serial data processing algorithm on a single variable-length data buffer includes padding data segments of the buffer, streaming the data segments into a data register and executing the serial data processing algorithm on each of the segments in parallel. | 01-16-2014 |

20140019764 | METHOD FOR SIGNING AND VERIFYING DATA USING MULTIPLE HASH ALGORITHMS AND DIGESTS IN PKCS - Methods, systems, and apparatuses are disclosed for signing and verifying data using multiple hash algorithms and digests in PKCS including, for example, retrieving, at the originating computing device, a message for signing at the originating computing device to yield a signature for the message; identifying multiple hashing algorithms to be supported by the signature; for each of the multiple hashing algorithms identified to be supported by the signature, hashing the message to yield multiple hashes of the message corresponding to the multiple hashing algorithms identified; constructing a single digest having therein each of the multiple hashes of the messages corresponding to the multiple hashing algorithms identified and further specifying the multiple hashing algorithms to be supported by the signature; applying a signing algorithm to the single digest using a private key of the originating computing device to yield the signature for the message; and distributing the message and the signature to receiving computing devices. Other related embodiments are disclosed. | 01-16-2014 |

20140053000 | INSTRUCTIONS TO PERFORM JH CRYPTOGRAPHIC HASHING - A method is described. The method includes executing one or more JH_SBOX_L instruction to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_Permute instruction to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed | 02-20-2014 |

20140082451 | EFFICIENT AND SCALABLE CYCLIC REDUNDANCY CHECK CIRCUIT USING GALOIS-FIELD ARITHMETIC - Embodiments of the present disclosure describe methods, apparatus, and system configurations for cyclic redundancy check circuits using Galois-field arithmetic. | 03-20-2014 |

20140101460 | ARCHITECTURE AND INSTRUCTION SET FOR IMPLEMENTING ADVANCED ENCRYPTION STANDARD (AES) - A flexible aes instruction for a general purpose processor is provided that performs aes encryption or decryption using n rounds, where n includes the standard aes set of rounds {10, 12, 14}. A parameter is provided to allow the type of aes round to be selected, that is, whether it is a “last round”. In addition to standard aes, the flexible aes instruction allows an AES-like cipher with 20 rounds to be specified or a “one round” pass. | 04-10-2014 |

20140122839 | APPARATUS AND METHOD OF EXECUTION UNIT FOR CALCULATING MULTIPLE ROUNDS OF A SKEIN HASHING ALGORITHM - An apparatus is described that includes an execution unit within an instruction pipeline. The execution unit has multiple stages of a circuit that includes a) and b) as follows. a) a first logic circuitry section having multiple mix logic sections each having: i) a first input to receive a first quad word and a second input to receive a second quad word; ii) an adder having a pair of inputs that are respectively coupled to the first and second inputs; iii) a rotator having a respective input coupled to the second input; iv) an XOR gate having a first input coupled to an output of the adder and a second input coupled to an output of the rotator. b) permute logic circuitry having inputs coupled to the respective adder and XOR gate outputs of the multiple mix logic sections. | 05-01-2014 |

20140164467 | APPARATUS AND METHOD FOR VECTOR INSTRUCTIONS FOR LARGE INTEGER ARITHMETIC - An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register. | 06-12-2014 |

20140189289 | INSTRUCTION FOR ACCELERATING SNOW 3G WIRELESS SECURITY ALGORITHM - Vector instructions for performing SNOW 3G wireless security operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first operand of the first instruction specifying a first vector register that stores a current state of a finite state machine (FSM). The execution circuitry also receives a second operand of the first instruction specifying a second vector register that stores data elements of a liner feedback shift register (LFSR) that are needed for updating the FSM. The execution circuitry executes the first instruction to produce a updated state of the FSM and an output of the FSM in a destination operand of the first instruction. | 07-03-2014 |

20140189290 | INSTRUCTION FOR FAST ZUC ALGORITHM PROCESSING - Vector instructions for performing ZUC stream cipher operations are received and executed by the execution circuitry of a processor. The execution circuitry receives a first vector instruction to perform an update to a liner feedback shift register (LFSR), and receives a second vector instruction to perform an update to a state of a finite state machine (FSM), where the FSM receives inputs from re-ordered bits of the LFSR. The execution circuitry executes the first vector instruction and the second vector instruction in a single-instruction multiple data (SIMD) pipeline. | 07-03-2014 |

20140195782 | METHOD AND APPARATUS TO PROCESS SHA-2 SECURE HASHING ALGORITHM - A processor includes an instruction decoder to receive a first instruction to process a secure hash algorithm 2 (SHA-2) hash algorithm, the first instruction having a first operand associated with a first storage location to store a SHA-2 state and a second operand associated with a second storage location to store a plurality of messages and round constants. The processor further includes an execution unit coupled to the instruction decoder to perform one or more iterations of the SHA-2 hash algorithm on the SHA-2 state specified by the first operand and the plurality of messages and round constants specified by the second operand, in response to the first instruction. | 07-10-2014 |

20140195817 | THREE INPUT OPERAND VECTOR ADD INSTRUCTION THAT DOES NOT RAISE ARITHMETIC FLAGS FOR CRYPTOGRAPHIC APPLICATIONS - A method is described that includes performing the following within an instruction execution pipeline implemented on a semiconductor chip: summing three input vector operands through execution of a single instruction; and, not raising any arithmetic flags even though a result of the summing creates more bits than circuitry designed to transport the summation is able to transport. | 07-10-2014 |

20140205084 | INSTRUCTIONS TO PERFORM JH CRYPTOGRAPHIC HASHING IN A 256 BIT DATA PATH - A method is described. The method includes executing one or more JH_SBOX_L instructions to perform S-Box mappings and a linear (L) transformation on a JH state and executing one or more JH_P instructions to perform a permutation function on the JH state once the S-Box mappings and the L transformation have been performed. | 07-24-2014 |