Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Gregory F. Grohoski, Bee Cave US

Gregory F. Grohoski, Bee Cave, TX US

Patent application numberDescriptionPublished
20090327646Minimizing TLB Comparison Size - In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.12-31-2009
20100246814APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE DATA ENCRYPTION STANDARD (DES) ALGORITHM - A processor including instruction support for implementing the Data Encryption Standard (DES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more DES instructions defined within the ISA. In addition, the DES instructions may be executable by the cryptographic unit to implement portions of an DES cipher that is compliant with Federal Information Processing Standards Publication 46-3 (FIPS 46-3). In response to receiving a DES key expansion instruction defined within the ISA, the cryptographic unit may generate one or more expanded cipher keys of the DES cipher key schedule from an input key.09-30-2010
20100246815APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE KASUMI CIPHER ALGORITHM - A processor including instruction support for implementing the Kasumi block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more Kasumi instructions defined within the ISA. In addition, the Kasumi instructions may be executable by the cryptographic unit to implement portions of a Kasumi cipher that is compliant with 309-30-2010
20100250964APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE CAMELLIA CIPHER ALGORITHM - A processor including instruction support for implementing the Camellia block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more Camellia instructions defined within the ISA. In addition, the Camellia instructions may be executable by the cryptographic unit to implement portions of a Camellia cipher that is compliant with Internet Engineering Task Force (IETF) Request For Comments (RFC) 3713. In response to receiving a Camellia F( )-operation instruction defined within the ISA, the cryptographic unit may perform an F( ) operation, as defined by the Camellia cipher, upon a data input operand and a subkey operand, in which the data input operand and subkey operand may be specified by the Camellia F( )-operation instruction.09-30-2010
20100250965APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM - A processor including instruction support for implementing the Advanced Encryption Standard (AES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more AES instructions defined within the ISA. In addition, the AES instructions may be executable by the cryptographic unit to implement portions of an AES cipher that is compliant with Federal Information Processing Standards Publication 197 (FIPS 197). In response to receiving a first AES encryption round instruction defined within the ISA, the cryptographic unit may perform an encryption round of the AES cipher on a first group of columns of cipher state having a plurality of rows and columns. A maximum number of columns included in the first group may be fewer than all of the columns of the cipher state.09-30-2010
20100299499DYNAMIC ALLOCATION OF RESOURCES IN A THREADED, HETEROGENEOUS PROCESSOR - Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.11-25-2010
20100325394System and Method for Balancing Instruction Loads Between Multiple Execution Units Using Assignment History - A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window.12-23-2010
20100329450INSTRUCTIONS FOR PERFORMING DATA ENCRYPTION STANDARD (DES) COMPUTATIONS USING GENERAL-PURPOSE REGISTERS - Some embodiments of the present invention provide a processor, which includes a set of general-purpose registers and at least one execution unit. Each general-purpose register in the set of general-purpose registers is at least 64 bits wide, and the execution unit supports one or more Data Encryption Standard (DES) instructions. Specifically, the execution unit may support a permutation-rotation instruction for performing DES permutation operations and DES rotation operations. The execution unit may also support a round instruction to perform a DES round operation. Since the DES instructions use general-purpose registers instead of special-purpose registers to perform DES-specific operations, the processor's circuit complexity and area are reduced. Furthermore, in some embodiments, since the DES instructions require at most two operands, the number of bits required to specify the location of the operands are reduced, thereby enabling a larger number of instructions to be supported by the processor.12-30-2010
20100332786System and Method to Invalidate Obsolete Address Translations - A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.12-30-2010
20100332787System and Method to Manage Address Translation Requests - A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.12-30-2010
20110078425BRANCH PREDICTION MECHANISM FOR PREDICTING INDIRECT BRANCH TARGETS - A multithreaded microprocessor includes an instruction fetch unit that may fetch and maintain a plurality of instructions belonging to one or more threads and one or more execution units that may concurrently execute the one or more threads. The instruction fetch unit includes a target branch prediction unit that may provide a predicted branch target address in response to receiving an instruction fetch address of a current indirect branch instruction. The branch prediction unit includes a primary storage and a control unit. The storage includes a plurality of entries, and each entry may store a predicted branch target address corresponding to a previous indirect branch instruction. The control unit may generate an index value for accessing the storage using a portion of the instruction fetch address of the current indirect branch instruction, and branch direction history information associated with a currently executing thread of the one or more threads.03-31-2011
20110087866PERCEPTRON-BASED BRANCH PREDICTION MECHANISM FOR PREDICTING CONDITIONAL BRANCH INSTRUCTIONS ON A MULTITHREADED PROCESSOR - A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.04-14-2011
20110087895APPARATUS AND METHOD FOR LOCAL OPERAND BYPASSING FOR CRYPTOGRAPHIC INSTRUCTIONS - A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor.04-14-2011

Patent applications by Gregory F. Grohoski, Bee Cave, TX US