| Patent application number | Description | Published |
| 20080313438 | Unified Cascaded Delayed Execution Pipeline for Fixed and Floating Point Instructions - Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times. | 12-18-2008 |
| 20090006036 | Shared, Low Cost and Featureable Performance Monitor Unit - The present invention related to computer architecture, and more specifically to evaluating performance of processors. A performance monitor may be placed in an L2 cache nest of a processor. The performance monitor may monitor L2 cache accesses and receive performance data from one or more processor cores over a bus coupling the processor cores with the L2 cache nest. In one embodiment the bus may include additional lines for transferring performance data from the processor cores to the performance monitor. | 01-01-2009 |
| 20090006753 | DESIGN STRUCTURE FOR ACCESSING A CACHE WITH AN EFFECTIVE ADDRESS - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for accessing a processor cache is provided. The design structure comprises a processor having a processor core, a level one cache, and circuitry. The circuitry is configured to execute an access instruction in the processor's core, wherein the access instruction provides an untranslated effective address of data to be accessed by the access instruction, determine whether the processor core's level one cache includes the data corresponding to the effective address of the access instruction, wherein the effective address of the access instruction is used without address translation to determine whether the processor core's level one cache includes the data corresponding to the effective address, and provide the data for the access instruction from the level one cache if the level one cache includes the data corresponding to the effective address. | 01-01-2009 |
| 20090006768 | Method and Apparatus for Accessing a Split Cache Directory - A method and apparatus for accessing a cache. The method includes receiving a request to access the cache. The request includes an address of requested data to be accessed. The method also includes using a first portion of the address to perform an access to a first directory for the cache and using a second portion of the address to perform an access to a second directory for the cache. Results from the access to the first directory for the cache and results from the access to the second directory for the cache are used to determine whether the cache includes the requested data to be accessed. | 01-01-2009 |
| 20090006803 | L2 Cache/Nest Address Translation - A method and apparatus for accessing cache memory in a processor. The method includes accessing requested data in one or more level one caches of the processor using requested effective addresses of the requested data. If the one or more level one caches of the processor do not contain requested data corresponding to the requested effective addresses, the requested effective addresses are translated to real addresses. A lookaside buffer includes a corresponding entry for each cache line in each of the one or more level one caches of the processor. The corresponding entry indicates a translation from the effective addresses to the real addresses for the cache line. The translated real addresses are used to access a level two cache. | 01-01-2009 |
| 20090006812 | Method and Apparatus for Accessing a Cache With an Effective Address - A method and apparatus for accessing a processor cache. The method includes executing an access instruction in a processor core of the processor. The access instruction provides an untranslated effective address of data to be accessed by the access instruction. The method also includes determining whether a level one cache for the processor core includes the data corresponding to the effective address of the access instruction. The effective address of the access instruction is used without address translation to determine whether the level one cache for the processor core includes the data corresponding to the effective address. If the level one cache includes the data corresponding to the effective address, the data for the access instruction is provided from the level one cache. | 01-01-2009 |
| 20090006818 | Method and Apparatus for Multiple Load Instruction Execution - A method and apparatus for executing instructions. The method includes receiving a first load instruction and a second load instruction. The method also includes issuing the first load instruction and the second load instruction to a cascaded delayed execution pipeline unit having at least a first execution pipeline and a second execution pipeline, wherein the second execution pipeline executes an instruction in a common issue group in a delayed manner relative to another instruction in the common issue group executed in the first execution pipeline. The method also includes accessing a cache by executing the first load instruction and the second load instruction. A delay between execution of the first load instruction and the second load instruction allows the cache to complete the access with the first load instruction before beginning the access with the second load instruction. | 01-01-2009 |
| 20090006819 | Single Hot Forward Interconnect Scheme for Delayed Execution Pipelines - A method and apparatus for forwarding data in a processor. The method includes providing at least one cascaded delayed execution pipeline unit having a first pipeline and a second pipeline, wherein the second pipeline executes instructions in a common issue group in a delayed manner relative to the first pipeline. The method further includes determining if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline. If the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, the modified data is forwarded from the first pipeline to the second pipeline. | 01-01-2009 |
| 20090006823 | DESIGN STRUCTURE FOR SINGLE HOT FORWARD INTERCONNECT SCHEME FOR DELAYED EXECUTION PIPELINES - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding data in a processor is provided. The design structure includes a processor. The processor includes at least one cascaded delayed execution pipeline unit having a first and second pipeline, wherein the second pipeline is configured to execute instructions in a common issue group in a delayed manner relative to the first pipeline, and circuitry. The circuitry is configured to determine if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline, and if the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, forward the modified data from the first pipeline to the second pipeline. | 01-01-2009 |
| 20090006905 | In Situ Register State Error Recovery and Restart Mechanism - Embodiments of the invention relate to methods and systems for error detection and recovery from errors during pipelined execution of data. A cascaded, delayed execution pipeline may be implemented to maintain a precise machine state. In some embodiments, a delay of one or more clock cycles may be inserted prior to a write back stage of each pipeline to facilitate error detection and recovery. Because a precise machine state is maintained error detection and recovery mechanisms may be built directly into register files of the system. If an error is detected execution of the instruction associated with the error and all subsequent instructions may be restarted. | 01-01-2009 |
| 20090015589 | Store Misaligned Vector with Permute - Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized. | 01-15-2009 |
| 20090037694 | Load Misaligned Vector with Permute and Mask Insert - Embodiments of the invention provide logic within the store data path between a processor and a memory array. The logic may be configured to misalign vector data as it is stored to memory. By misaligning vector data as it is stored to memory, memory bandwidth may be maximized while processing bandwidth required to store vector data misaligned is minimized. Furthermore, embodiments of the invention provide logic within the load data path which allows vector data which is stored misaligned to be aligned as it is loaded into a vector register. By aligning misaligned vector data as it is loaded into a vector register, memory bandwidth may be maximized while processing bandwidth required to align misaligned vector data may be minimized. | 02-05-2009 |
| 20090044049 | Multiple Parallel Pipeline Processor Having Self-Repairing Capability - A multiple parallel pipeline digital processing apparatus has the capability to substitute a second pipeline for a first in the event that a failure is detected in the first pipeline. Preferably, a redundant pipeline is shared by multiple primary pipelines. Preferably, the pipelines are located physically adjacent one another in an array. Preferably, a pipeline failure causes data to be shifted one position within the array of pipelines, to by-pass the failing pipeline, so that each pipeline has only two sources of data, a primary and an alternate. Preferably, selection logic controlling the selection between a primary and alternate source of pipeline data is integrated with other pipeline operand selection logic. | 02-12-2009 |
| 20090106525 | DESIGN STRUCTURE FOR SCALAR PRECISION FLOAT IMPLEMENTATION ON THE "W" LANE OF VECTOR UNIT - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for image processing, and more specifically to vector units for supporting image processing is provided. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved. | 04-23-2009 |
| 20090106526 | Scalar Float Register Overlay on Vector Register File for Efficient Register Allocation and Scalar Float and Vector Register Sharing - Embodiments of the invention are generally related to image processing, and more specifically to register files for supporting image processing. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated. | 04-23-2009 |
| 20090106527 | Scalar Precision Float Implementation on the "W" Lane of Vector Unit - Embodiments of the invention are generally related to image processing, and more specifically to vector units for supporting image processing. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved. | 04-23-2009 |