| Patent application number | Description | Published |
| 20080244234 | System and Method for Executing Instructions Prior to an Execution Stage in a Processor - A method of processing a plurality of instructions in multiple pipeline stages within a pipeline processor is disclosed. The method partially or wholly executes a stalled instruction in a pipeline stage that has a function other than instruction execution prior to the execution stage within the processor. Partially or wholly executing the instruction prior to the execution stage in the pipeline speeds up the execution of the instruction and allows the processor to more effectively utilize its resources, thus increasing the processor's efficiency. | 10-02-2008 |
| 20080281996 | Latency Insensitive FIFO Signaling Protocol - Data from a source domain operating at a first data rate is transferred to a FIFO in another domain operating at a different data rate. The FIFO buffers data before transfer to a sink for further processing or storage. A source side counter tracks space available in the FIFO. In disclosed examples, the initial counter value corresponds to FIFO depth. The counter decrements in response to a data ready signal from the source domain, without delay. The counter increments in response to signaling from the sink domain of a read of data off the FIFO. Hence, incrementing is subject to the signaling latency between domains. The source may send one more beat of data when the counter indicates the FIFO is full. The last beat of data is continuously sent from the source until it is indicated that a FIFO position became available; effectively providing one more FIFO position. | 11-13-2008 |
| 20080288753 | Methods and Apparatus for Emulating the Branch Prediction Behavior of an Explicit Subroutine Call - An apparatus for emulating the branch prediction behavior of an explicit subroutine call is disclosed. The apparatus includes a first input which is configured to receive an instruction address and a second input. The second input is configured to receive predecode information which describes the instruction address as being related to an implicit subroutine call to a subroutine. In response to the predecode information, the apparatus also includes an adder configured to add a constant to the instruction address defining a return address, causing the return address to be stored to an explicit subroutine resource, thus, facilitating subsequent branch prediction of a return call instruction. | 11-20-2008 |
| 20080313410 | Promoting a Line from Shared to Exclusive in a Cache - Embodiments include a cache controller adapted to determine whether a memory line for which the processor is to issue an address-only kill request resides in a fill buffer for the cache line in a shared state. If so, the cache controller may mark the fill buffer as not having completed bus transactions and issue the address-only kill request for that fill buffer. The address-only kill request may transmit to other processors on the bus and the other processors may respond by invalidating the cache entries for the memory line. Upon confirmation from the other processors, a bus arbiter may confirm the kill request, promoting the memory line already in that fill buffer to exclusive state. Once promoted, the fill buffer may be marked as having completed the bus transactions and may be written into the cache. | 12-18-2008 |
| 20090019229 | Data Prefetch Throttle - A system and method taught herein control data prefetching for a data cache by tracking prefetch hits and overall hits for the data cache. Data prefetching for the data cache is disabled based on the tracking of prefetch hits and data prefetching is enabled for the data cache based on the tracking of overall hits. For example, in one or more embodiments, a cache controller is configured to track a prefetch hit rate reflecting the percentage of hits on the data cache that involve prefetched data lines and disable data prefetching if the prefetch hit rate falls below a defined threshold. The cache controller also tracks an overall hit rate reflecting the overall percentage of data cache hits (versus misses) and enables data prefetching if the overall hit rate falls below a defined threshold. | 01-15-2009 |
| 20090094444 | Link Stack Repair of Erroneous Speculative Update - Whenever a link address is written to the link stack, the prior value of the link stack entry is saved, and is restored to the link stack after a link stack push operation is speculatively executed following a mispredicted branch. This condition is detected by maintaining a count of the total number of uncommitted link stack write instructions in the pipeline, and a count of the number of uncommitted link stack write instructions ahead of each branch instruction. When a branch is evaluated and determined to have been mispredicted, the count associated with it is compared to the total count. A discrepancy indicates a link stack write instruction was speculatively issued into the pipeline after the mispredicted branch instruction, and pushed a link address onto the link stack. The prior link address is restored to the link stack from the link stack restore buffer. | 04-09-2009 |
| 20090119485 | Predecode Repair Cache For Instructions That Cross An Instruction Cache Line - A predecode repair cache is described in a processor capable of fetching and executing variable length instructions having instructions of at least two lengths which may be mixed in a program. An instruction cache is operable to store in an instruction cache line instructions having at least a first length and a second length, the second length longer than the first length. A predecoder is operable to predecode instructions fetched from the instruction cache that have invalid predecode information to form repaired predecode information. A predecode repair cache is operable to store the repaired predecode information associated with instructions of the second length that span across two cache lines in the instruction cache. Methods for filling the predecode repair cache and for executing an instruction that spans across two cache lines are also described. | 05-07-2009 |
| 20090119486 | Method and a System for Accelerating Procedure Return Sequences - A method for retrieving a return address from a link stack when returning from a procedure in a pipeline processor is disclosed. The method identifies a retrieve instruction operable to retrieve a return address from a software stack. The method further identifies a branch instruction operable to branch to the return address. The method retrieves the return address from the link stack, in response to both the instruction and the branch instruction being identified and fetches instructions using the return address. | 05-07-2009 |
| 20090210663 | Power Efficient Instruction Prefetch Mechanism - A processor includes a conditional branch instruction prediction mechanism that generates weighted branch prediction values. For weakly weighted predictions, which tend to be less accurate than strongly weighted predictions, the power associating with speculatively filling and subsequently flushing the cache is saved by halting instruction prefetching. Instruction fetching continues when the branch condition is evaluated in the pipeline and the actual next address is known. Alternatively, prefetching may continue out of a cache. To avoid displacing good cache data with instructions prefetched based on a mispredicted branch, prefetching may be halted in response to a weakly weighted prediction in the event of a cache miss. | 08-20-2009 |
| 20100023696 | Methods and System for Resolving Simultaneous Predicted Branch Instructions - A method of resolving simultaneous branch predictions prior to validation of the predicted branch instruction is disclosed. The method includes processing two or more predicted branch instructions, with each predicted branch instruction having a predicted state and a corrected state. The method further includes selecting one of the corrected states. Should one of the predicted branch instructions be mispredicted, the selected corrected state is used to direct future instruction fetches. | 01-28-2010 |
| 20100211744 | METHODS AND APARATUS FOR LOW INTRUSION SNOOP INVALIDATION - Efficient techniques are described for tracking a potential invalidation of a data cache entry in a data cache for which coherency is required. Coherency information is received that indicates a potential invalidation of a data cache entry. The coherency information in association with the data cache entry is retained to track the potential invalidation to the data cache entry. The retained coherency information is kept separate from state bits that are utilized in cache access operations. An invalidate bit, associated with a data cache entry, may be utilized to represents a potential invalidation of the data cache entry. The invalidate bit is set in response to the coherency information, to track the potential invalidation of the data cache entry. A valid bit associated with the data cache entry is set in response to the active invalidate bit and a memory synchronization command. The set invalidate bit is cleared after the valid bit has been cleared. | 08-19-2010 |
| 20100306470 | Methods and Apparatus for Issuing Memory Barrier Commands in a Weakly Ordered Storage System - Efficient techniques are described for enforcing order of memory accesses. A memory access request is received from a device which is not configured to generate memory barrier commands. A surrogate barrier is generated in response to the memory access request. A memory access request may be a read request. In the case of a memory write request, the surrogate barrier is generated before the write request is processed. The surrogate barrier may also be generated in response to a memory read request conditional on a preceding write request to the same address as the read request. Coherency is enforced within a hierarchical memory system as if a memory barrier command was received from the device which does not produce memory barrier commands. | 12-02-2010 |