Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Rodney E. Hooker, Austin US

Rodney E. Hooker, Austin, TX US

Patent application numberDescriptionPublished
20080256336MICROPROCESSOR WITH PRIVATE MICROCODE RAM - A microprocessor includes a private RAM (PRAM), for use by microcode, which is non-user-accessible and within its own distinct address space from the system memory address space. The PRAM is denser and slower than user-accessible registers of the microprocessor macroarchitecture, thereby enabling it to provide significantly more storage for microcode. The microinstruction set includes a microinstruction for loading data from the PRAM into the user-accessible registers, and a microinstruction for storing data from user-accessible registers to the PRAM. The microcode may also use the two microinstructions to load/store between the PRAM and non-user-accessible registers of the microarchitecture. Examples of PRAM uses include: computational temporary storage area; storage of x86 VMX VMCS in response to VMREAD and VMWRITE macroinstructions; instantiation of non-user-accessible storage, such as the x86 SMBASE register; and instantiation of x86 MSRs that tolerate the additional access latency of the PRAM, such as the IA32_SYSENTER_CS MSR.10-16-2008
20090204800MICROPROCESSOR WITH MICROARCHITECTURE FOR EFFICIENTLY EXECUTING READ/MODIFY/WRITE MEMORY OPERAND INSTRUCTIONS - The microprocessor includes an instruction translator that translates a macroinstruction of a macroinstruction set in its macroarchitecture into exactly three microinstructions to perform a read/modify/write operation on a memory operand. The first microinstruction instructs the microprocessor to load the memory operand into the microprocessor from a memory location and to calculate a destination address of the memory location. The second microinstruction instructs the microprocessor to perform an arithmetic or logical operation on the loaded memory operand to generate a result. The third microinstruction instructs the microprocessor to write the result to the memory location whose destination address is calculated by the first microinstruction. A first execution unit receives the first microinstruction and responsively loads the memory operand into the microprocessor from the memory location, and a second distinct execution unit also receives the first microinstruction and responsively calculates the destination address of the memory location.08-13-2009
20100011188MICROPROCESSOR THAT PERFORMS SPECULATIVE TABLEWALKS - A microprocessor performs a speculative page tablewalk. The microprocessor includes a tablewalk engine that determines whether at least one of a predetermined set of conditions exists with respect to characteristics of the page of memory whose physical address specified by a memory access instruction is missing in the TLB, performs operations of the tablewalk in an out-of-order manner with respect to the execution of unretired program instructions older than the memory access instruction while none of the predetermined set of conditions exists, and waits to perform the operations of the tablewalk until the microprocessor has retired all program instructions older than the memory access instruction when at least one of the predetermined set of conditions exists. The predetermined set of conditions may include the tablewalk needing to load information from a strongly-ordered page, update page mapping information, or access a global page.01-14-2010
20100011198MICROPROCESSOR WITH MULTIPLE OPERATING MODES DYNAMICALLY CONFIGURABLE BY A DEVICE DRIVER BASED ON CURRENTLY RUNNING APPLICATIONS - A computing system includes a microprocessor that receives values for configuring operating modes thereof. A device driver monitors which software applications currently running on the microprocessor are in a predetermined list and responsively dynamically writes the values to the microprocessor to configure its operating modes. Examples of the operating modes the device driver may configure relate to the following: data prefetching; branch prediction; instruction cache eviction; instruction execution suspension; sizes of cache memories, reorder buffer, store/load/fill queues; hashing algorithms related to data forwarding and branch target address cache indexing; number of instruction translation, formatting, and issuing per clock cycle; load delay mechanism; speculative page tablewalks; instruction merging; out-of-order execution extent; caching of non-temporal hinted data; and serial or parallel access of an L2 cache and processor bus in response to an instruction cache miss.01-14-2010
20100049952MICROPROCESSOR THAT PERFORMS STORE FORWARDING BASED ON COMPARISON OF HASHED ADDRESS BITS - An apparatus for decreasing the likelihood of incorrectly forwarding store data includes a hash generator, which hashes J address bits to K hashed bits. The J address bits are a memory address specified by a load/store instruction, where K is an integer greater than zero and J is an integer greater than K. The apparatus also includes a comparator, which outputs a first value if L address bits specified by the load instruction match L address bits specified by the store instruction and K hashed bits of the load instruction match corresponding K hashed bits of the store instruction, and otherwise to output a second value, where L is greater than zero. The apparatus also includes forwarding logic, which forwards data from the store instruction to the load instruction if the comparator outputs the first value and foregoes forwarding the data when the comparator outputs the second value.02-25-2010
20100064107MICROPROCESSOR CACHE LINE EVICT ARRAY - An apparatus for ensuring data coherency within a cache memory hierarchy of a microprocessor during an eviction of a cache line from a lower-level memory to a higher-level memory in the hierarchy includes an eviction engine and an array of storage elements. The eviction engine is configured to move the cache line from the lower-level memory to the higher-level memory. The array of storage elements are coupled to the eviction engine. Each storage element is configured to store an indication for a corresponding cache line stored in the lower-level memory. The indication indicates whether or not the eviction engine is currently moving the cache line from the lower-level memory to the higher-level memory.03-11-2010
20100070741MICROPROCESSOR WITH FUSED STORE ADDRESS/STORE DATA MICROINSTRUCTION - A microprocessor includes an instruction translator that translates a store macroinstruction into exactly one fused store microinstruction. The store macroinstruction in the microprocessor's macroarchitecture macroinstruction set instructs the microprocessor to store data from a general purpose register of the microprocessor to a memory location. The fused store microinstruction is an instruction in the microprocessor's microarchitecture microinstruction set. A reorder buffer (ROB) receives the fused store microinstruction from the instruction translator into exactly one of its plurality of entries. An instruction dispatcher dispatches for execution a store address microinstruction and a store data microinstruction to different respective execution units of the microprocessor in response to receiving the fused store microinstruction. Neither the store address microinstruction nor the store data microinstruction occupy any of the ROB entries. The ROB retires the fused store microinstruction after being notified that both the store address microinstruction and the store data microinstruction have been executed.03-18-2010
20100205406OUT-OF-ORDER EXECUTION MICROPROCESSOR THAT SPECULATIVELY EXECUTES DEPENDENT MEMORY ACCESS INSTRUCTIONS BY PREDICTING NO VALUE CHANGE BY OLDER INSTRUCTIONS THAT LOAD A SEGMENT REGISTER - An out-of-order execution microprocessor executes an architectural segment register-loading instruction that instructs the microprocessor to load a new value into an architectural segment register of the microprocessor. A comparator compares the new value specified by the architectural segment register-loading instruction with a current contents of the architectural segment register. A control unit causes to be re-executed using the new value all instructions in the microprocessor that used the current architectural segment register contents as a source operand and that are newer in program order than the architectural segment register-loading instruction whenever the comparator indicates the new value does not equal the current contents. An instruction scheduler retrieves the current contents and issues for execution instructions that use the retrieved current contents, even though the instructions are newer in program order than the register-loading instruction and the register-loading instruction has not yet written the new value to the architectural segment register.08-12-2010
20100250859PREFETCHING OF NEXT PHYSICALLY SEQUENTIAL CACHE LINE AFTER CACHE LINE THAT INCLUDES LOADED PAGE TABLE ENTRY - A microprocessor includes a cache memory, a load unit, and a prefetch unit, coupled to the load unit. The load unit is configured to receive a load request that includes an indicator that the load request is loading a page table entry. The prefetch unit is configured to receive from the load unit a physical address of a first cache line that includes the page table entry specified by the load request. The prefetch unit is further configured to responsively generate a request to prefetch into the cache memory a second cache line. The second cache line is the next physically sequential cache line to the first cache line. In an alternate embodiment, the second cache line is the previous physically sequential cache line to the first cache line rather than the next physically sequential cache line to the first cache line.09-30-2010
20100299484LOW POWER HIGH SPEED LOAD-STORE COLLISION DETECTOR - An apparatus detects a load-store collision within a microprocessor between a load operation and an older store operation each of which accesses data in the same cache line. Load and store byte masks specify which bytes contain the data specified by the load and store operation within a word of the cache line in which the load and data begins, respectively. Load and store word masks specify which words contain the data specified by the load and store operations within the cache line, respectively. Combinatorial logic uses the load and store byte masks to detect the load-store collision if the data specified by the load and store operations begin in the same cache line word, and uses the load and store word masks to detect the load-store collision if the data specified by the load and store operations do not begin in the same cache line word.11-25-2010
20100306475DATA CACHE WITH MODIFIED BIT ARRAY - A cache memory system includes a first array of storage elements each configured to store a cache line, a second array of storage elements corresponding to the first array of storage elements each configured to store a first partial status of the cache line in the corresponding storage element of the first array, and a third array of storage elements corresponding to the first array of storage elements each configured to store a second partial status of the cache line in the corresponding storage element of the first array. The second partial status indicates whether or not the cache line has been modified. When the cache memory system modifies the cache line within a storage element of the first array, it writes only the second partial status in the corresponding storage element of the third array to indicate that the cache line has been modified but refrains from writing the first partial status in the corresponding storage element of the second array. The cache memory system reads both the first partial status and the second partial status to determine the full status.12-02-2010
20100306478DATA CACHE WITH MODIFIED BIT ARRAY - A microprocessor includes first and second functional units and a data cache having a data array having a write port, a modified bit array having a read port and a write port, and a tag array having a read port, each array having the corresponding predetermined organization. The first functional unit writes data to a cache line of the data array. The first functional unit sets a modified bit in the modified bit array to indicate that the corresponding cache line in the data array has been modified. The second functional unit reads the modified bit from the modified bit array to determine whether or not the cache line has been modified. The second functional unit reads a partial status of the corresponding cache line from the tag array. The partial status does not indicate whether the cache line has been modified. The tag array does not include a port by which the first functional unit may update the partial status of the corresponding cache line.12-02-2010
20100306503GUARANTEED PREFETCH INSTRUCTION - A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions.12-02-2010
20100306506MICROPROCESSOR WITH SELECTIVE OUT-OF-ORDER BRANCH EXECUTION - A pipelined out-of-order execution in-order retire microprocessor includes a branch predictor that predicts a target address of a branch instruction, a fetch unit that fetches instructions at the predicted target address, and an execution unit that: resolves a target address of the branch instruction and detects that the predicted and resolved target addresses are different; determines whether there is an unretired instruction that must be corrected and that is older in program order than the branch instruction, in response to detecting that the predicted and resolved target addresses are different; execute the branch instruction by flushing instructions fetched at the predicted target address and causing the fetch unit to fetch from the resolved target address, if there is not an unretired instruction that must be corrected and that is older in program order than the branch instruction; and otherwise, refrain from executing the branch instruction.12-02-2010
20100306507OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing the likelihood of having to replay a load instruction due to a store collision. The microprocessor includes a queue of entries, each entry configured to hold information that identifies sources of a store instruction used to compute its store address and to hold a dependency that identifies an instruction upon which the store instruction depends for its data. A register alias table (RAT), coupled to the queue of entries, is configured to encounter instructions in program order and to generate dependencies used to determine when the instructions may execute out of program order. In response to encountering a load instruction the RAT determines whether sources of the load instruction used to compute its load address match the sources of the store instruction in an entry of the queue, and if so, causes the load instruction to share the dependency of the matching store instruction.12-02-2010
20100306508OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing load instruction replay likelihood due to store collisions. A register alias table (RAT) is coupled to first and second queues of entries and generates dependencies used to determine when instructions may execute out of order. The RAT allocates an entry of the first queue and populates the allocated entry with an instruction pointer of a load instruction, when it determines that the load instruction must be replayed. The RAT allocates an entry of the second queue when it encounters a store instruction and populates the allocated entry with a dependency that identifies an instruction upon which the store instruction depends for its data. The RAT causes a subsequent instance of the load instruction to share the dependency when it encounters the subsequent instance of the load instruction and determines that its instruction pointer matches the instruction pointer of an entry of the first queue.12-02-2010
20100306509OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing the likelihood of having to replay a load instruction due to a store collision. The microprocessor includes a queue of entries, each entry configured to hold an instruction pointer of a load instruction and to hold information useable to identify a store instruction that caused the load instruction to be replayed on a first instance of the load instruction. A register alias table (RAT) encounters instructions in program order and generates dependencies used to determine when the instructions may execute out of program order. The RAT encounters the load instruction on a second instance, determines that the load instruction second instance instruction pointer matches the instruction pointer of an entry of the queue, and causes the load instruction on the second instance to have a dependency on the store instruction identified by the information in the matching entry.12-02-2010
20110004644DYNAMIC FLOATING POINT REGISTER PRECISION CONTROL - Apparatus and methods are provided to perform floating point operations that are adaptive to the precision formats of input operands. The apparatus includes adaptive conversion logic and a tagged register file. The adaptive conversion logic receives the input operands, where each of the input operands is of a corresponding precision. The adaptive conversion logic also records the corresponding precision for use in subsequent floating point operations. The tagged register file is coupled to the adaptive conversion logic. The tagged register file stores the each of the input operands, and stores the corresponding precision and furthermore associates the corresponding precision with the each of the input operands. The subsequent floating point operations are performed at a precision level according to the corresponding precision.01-06-2011
20110010501EFFICIENT DATA PREFETCHING IN THE PRESENCE OF LOAD HITS - A BIU prioritizes L1 requests above L2 requests. The L2 generates a first request to the BIU and detects the generation of a snoop request and L1 request to the same cache line. The L2 determines whether a bus transaction to fulfill the first request may be retried and, if so, generates a miss, and otherwise generates a hit. Alternatively, the L2 detects the L1 generated a request to the L2 for the same line and responsively requests the BIU to refrain from performing a transaction on the bus to fulfill the first request if the BIU has not yet been granted the bus. Alternatively, a prefetch cache and the L2 allow the same line to be simultaneously present. If an L1 request hits in both the L2 and in the prefetch cache, the prefetch cache invalidates its copy of the line and the L2 provides the line to the L1.01-13-2011
20110010506DATA PREFETCHER WITH MULTI-LEVEL TABLE FOR PREDICTING STRIDE PATTERNS - A data prefetcher includes a table of entries to maintain a history of load operations. Each entry stores a tag and a corresponding next stride. The tag comprises a concatenation of first and second strides. The next stride comprises the first stride. The first stride comprises a first cache line address subtracted from a second cache line address. The second stride comprises the second cache line address subtracted from a third cache line address. The first, second and third cache line addresses each comprise a memory address of a cache line implicated by respective first, second and third temporally preceding load operations. Control logic calculates a current stride by subtracting a previous cache line address from a new load cache line address, looks up in the table a concatenation of a previous stride and the current stride, and prefetches a cache line using the hitting table entry next stride.01-13-2011
20110035551MICROPROCESSOR WITH REPEAT PREFETCH INDIRECT INSTRUCTION - A microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table.02-10-2011
20110035569MICROPROCESSOR WITH ALU INTEGRATED INTO LOAD UNIT - A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.02-10-2011
20110035570MICROPROCESSOR WITH ALU INTEGRATED INTO STORE UNIT - A superscalar pipelined microprocessor includes a register set defined by an instruction set architecture of the microprocessor, execution units, and a store unit, coupled to the cache memory and distinct from the other execution units of the microprocessor. The store unit comprises an ALU. The store unit receives an instruction that specifies a source register of the register set and an operation to be performed on a source operand to generate a result. The store unit reads the source operand from the source register. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The store unit operatively writes the result to the cache memory.02-10-2011
20110040955STORE-TO-LOAD FORWARDING BASED ON LOAD/STORE ADDRESS COMPUTATION SOURCE INFORMATION COMPARISONS - A microprocessor includes a queue comprising a plurality of entries each configured to hold store information for a store instruction. The store information specifies sources of operands used to calculate a store address. The store instruction specifies store data to be stored to a memory location identified by the store address. The microprocessor also includes control logic, coupled to the queue, configured to encounter a load instruction. The load instruction includes load information that specifies sources of operands used to calculate a load address. The control logic detects that the load information matches the store information held in a valid one of the plurality of queue entries and responsively predicts that the microprocessor should forward to the load instruction the store data specified by the store instruction whose store information matches the load information.02-17-2011
20110055485EFFICIENT PSEUDO-LRU FOR COLLIDING ACCESSES - An apparatus for allocating entries in a set associative cache memory includes an array that provides a first pseudo-least-recently-used (PLRU) vector in response to a first allocation request from a first functional unit. The first PLRU vector specifies a first entry from a set of the cache memory specified by the first allocation request. The first vector is a tree of bits comprising a plurality of levels. Toggling logic receives the first vector and toggles predetermined bits thereof to generate a second PLRU vector in response to a second allocation request from a second functional unit generated concurrently with the first allocation request and specifying the same set of the cache memory specified by the first allocation request. The second vector specifies a second entry different from the first entry from the same set. The predetermined bits comprise bits of a predetermined one of the levels of the tree.03-03-2011
20110055530FAST REP STOS USING GRABLINE OPERATIONS - A microprocessor includes a cache memory and a grabline instruction. The grabline instruction specifies a memory address that implicates a cache line of the memory. The grabline instruction instructs the microprocessor to initiate a zero-beat read-invalidate transaction on the bus to obtain ownership of the cache line. The microprocessor foregoes initiating the transaction on the bus when executing the grabline instruction if the microprocessor determines that a store to the cache line would cause an exception.03-03-2011
20110060943APPARATUS AND METHOD FOR DETECTION AND CORRECTION OF DENORMAL SPECULATIVE FLOATING POINT OPERAND - A microprocessor includes a plurality of execution units configured to receive instructions and operands thereof and to execute the instructions. An instruction scheduler issues the instructions to the execution units and selects sources of the instruction operands. At least one of the execution units detects one of the operands of one of the instructions is a denormal operand, generates an indication that the instruction needs to be replayed in response to detecting the denormal operand, and provides the denormal operand to the instruction scheduler in response to detecting the denormal operand, rather than normalizing the denormal operand. The instruction scheduler normalizes the denormal operand, in response to the indication, and causes the normalized operand, rather than the denormal operand, to be provided to the execution unit when the instruction is replayed.03-10-2011
20110113196AVOIDING MEMORY ACCESS LATENCY BY RETURNING HIT-MODIFIED WHEN HOLDING NON-MODIFIED DATA - A microprocessor is configured to communicate with other agents on a system bus and includes a cache memory and a bus interface unit coupled to the cache memory and to the system bus. The bus interface unit receives from another agent coupled to the system bus a transaction to read data from a memory address, determines whether the cache memory is holding the data at the memory address in an exclusive state (or a shared state in certain configurations), and asserts a hit-modified signal on the system bus and provides the data on the system bus to the other agent when the cache memory is holding the data at the memory address in an exclusive state. Thus, the delay of an access to the system memory by the other agent is avoided.05-12-2011

Patent applications by Rodney E. Hooker, Austin, TX US