Patent application number | Description | Published |
20090049286 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING IMPROVED BRANCH TARGET ADDRESS CACHE - A processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a level one branch target address cache (BTAC) and a level two BTAC each having a respective plurality of entries each associating at least a tag with a predicted branch target address. The branch logic accesses the level one and level two BTACs in parallel with a tag portion of a first instruction fetch address to obtain a first predicted branch target address from the level one BTAC for use as a second instruction fetch address in a first processor clock cycle and a second predicted branch target address from the level two BTAC for use as a third instruction fetch address in a later second processor clock cycle. | 02-19-2009 |
20090077354 | Techniques for Predicated Execution in an Out-of-Order Processor - A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved. | 03-19-2009 |
20090113164 | Method, System and Program Product for Address Translation Through an Intermediate Address Space - In a data processing system capable of concurrently executing multiple hardware threads of execution, an intermediate address translation unit in a processing unit translates an effective address for a memory access into an intermediate address. A cache memory is accessed utilizing the intermediate address. In response to a miss in cache memory, the intermediate address is translated into a real address by a real address translation unit that performs address translation for multiple hardware threads of execution. The system memory is accessed with the real address. | 04-30-2009 |
20090132767 | COMPLIER ASSISTED VICTIM CACHE BYPASSING - A method for compiler assisted victim cache bypassing including: identifying a cache line as a candidate for victim cache bypassing; conveying a bypassing-the-victim-cache information to a hardware; and checking a state of the cache line to determine a modified state of the cache line, wherein the cache line is identified for cache bypassing if the cache line that has no reuse within a loop or loop nest and there is no immediate loop reuse or there is a substantial across loop reuse distance so that it will be replaced from both main and victim cache before being reused. | 05-21-2009 |
20090198903 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT VARY AN AMOUNT OF DATA RETRIEVED FROM MEMORY BASED UPON A HINT - In at least one embodiment, a processor detects during execution of program code whether a load instruction within the program code is associated with a hint. In response to detecting that the load instruction is not associated with a hint, the processor retrieves a full cache line of data from the memory hierarchy into the processor in response to the load instruction. In response to detecting that the load instruction is associated with a hint, a processor retrieves a partial cache line of data into the processor from the memory hierarchy in response to the load instruction. | 08-06-2009 |
20090198904 | Techniques for Data Prefetching Using Indirect Addressing with Offset - A technique for performing data prefetching using indirect addressing includes determining a first memory address of a pointer associated with a data prefetch instruction. Content, that is included in a first data block (e.g., a first cache line) of a memory, at the first memory address is then fetched. An offset is then added to the content of the memory at the first memory address to provide a first offset memory address. A second memory address is then determined based on the first offset memory address. A second data block (e.g., a second cache line) that includes data at the second memory address is then fetched (e.g., from the memory or another memory). A data prefetch instruction may be indicated by a unique operational code (opcode), a unique extended opcode, or a field (including one or more bits) in an instruction. | 08-06-2009 |
20090198905 | Techniques for Prediction-Based Indirect Data Prefetching - A technique for data prefetching using indirect addressing includes monitoring data pointer values, associated with an array, in an access stream to a memory. The technique determines whether a pattern exists in the data pointer values. A prefetch table is then populated with respective entries that correspond to respective array address/data pointer pairs based on a predicted pattern in the data pointer values. Respective data blocks (e.g., respective cache lines) are then prefetched (e.g., from the memory or another memory) based on the respective entries in the prefetch table. | 08-06-2009 |
20090198906 | Techniques for Multi-Level Indirect Data Prefetching - A technique for performing data prefetching using multi-level indirect data prefetching includes determining a first memory address of a pointer associated with a data prefetch instruction. Content that is included in a first data block (e.g., a first cache line of a memory) at the first memory address is then fetched. A second memory address is then determined based on the content at the first memory address. Content that is included in a second data block (e.g., a second cache line) at the second memory address is then fetched (e.g., from the memory or another memory). A third memory address is then determined based on the content at the second memory address. Finally, a third data block (e.g., a third cache line) that includes another pointer or data at the third memory address is fetched (e.g., from the memory or the another memory). | 08-06-2009 |
20090198907 | Dynamic Adjustment of Prefetch Stream Priority - A method, processor, and data processing system for dynamically adjusting a prefetch stream priority based on the consumption rate of the data by the processor. The method includes a prefetch engine issuing a prefetch request of a first prefetch stream to fetch one or more data from the memory subsystem. The first prefetch stream has a first assigned priority that determines a relative order for scheduling prefetch requests of the first prefetch stream relative to other prefetch requests of other prefetch streams. Based on the receipt of a processor demand for the data before the data returns to the cache or return of the data along time before the receiving the processor demand, logic of the prefetch engine dynamically changes the first assigned priority to a second higher or lower priority, which priority is subsequently utilized to schedule and issue a next prefetch request of the first prefetch stream. | 08-06-2009 |
20090198909 | Jump Starting Prefetch Streams Across Page Boundaries - A method, processor, and data processing system for enabling utilization of a single prefetch stream to access data across a memory page boundary. A prefetch engine includes an active streams table in which information for one or more scheduled prefetch streams are stored. The prefetch engine also includes a victim table for storing a previously active stream whose next prefetch crosses a memory page boundary. The scheduling logic issues a prefetch request with a real address to fetch data from the lower level memory. Then, responsive to detecting that the real address of the stream's next sequential prefetch crosses the memory page boundary, the prefetch engine determines when the first prefetch stream can continue across the page boundary of the first memory page (via an effective address comparison). The PE automatically reinserts the first prefetch stream into the active stream table to jump start prefetching across the page boundary. | 08-06-2009 |
20090198910 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT SUPPORT A TOUCH OF A PARTIAL CACHE LINE OF DATA - According to method of data processing in a multiprocessor data processing system, in response to a processor touch request targeting a target granule of a cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial touch request that requests a copy of only the target granule for subsequent query access. In response to a combined response to the partial touch request indicating success, the combined response representing a system-wide response to the partial touch request, the processing unit receives the target granule of the target cache line and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the cache line. | 08-06-2009 |
20090198921 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING REDUCED ALIASING IN BRANCH LOGIC - In at least one embodiment, an indexed table circuit includes a plurality of banks for storing data to be accessed and a split index array. The indexed table circuit is organized in a plurality of entries each corresponding to a respective one of a plurality of different entry indices, where each entry includes a storage location in the plurality of banks and the split index array. The indexed table circuit further includes selection logic that, responsive to read access of an entry among the plurality of entries utilizing an entry index of a bit string, utilizes a split index read from the split index array to select a set of one or more bits of a tag of the bit string, utilizes the selected set of one or more bits to select data read from one of the plurality of banks, and outputs the selected data. | 08-06-2009 |
20090198948 | Techniques for Data Prefetching Using Indirect Addressing - A technique for performing indirect data prefetching includes determining a first memory address of a pointer associated with a data prefetch instruction. Content of a memory at the first memory address is then fetched. A second memory address is determined from the content of the memory at the first memory address. Finally, a data block (e.g., a cache line) including data at the second memory address is fetched (e.g., from the memory or another memory). | 08-06-2009 |
20090198950 | Techniques for Indirect Data Prefetching - A processor includes a first address translation engine, a second address translation engine, and a prefetch engine. The first address translation engine is configured to determine a first memory address of a pointer associated with a data prefetch instruction. The prefetch engine is coupled to the first translation engine and is configured to fetch content, included in a first data block (e.g., a first cache line) of a memory, at the first memory address. The second address translation engine is coupled to the prefetch engine and is configured to determine a second memory address based on the content of the memory at the first memory address. The prefetch engine is also configured to fetch (e.g., from the memory or another memory) a second data block (e.g., a second cache line) that includes data at the second memory address. | 08-06-2009 |
20090198962 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE INCLUDING ADDRESS TYPE TAG BIT - In at least one embodiment, a processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution by the execution unit. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a branch target address prediction circuitry concurrently holding a first entry providing storage for a first branch target address prediction associating a first instruction fetch address with a first branch target address to be used as an instruction fetch address and a second entry providing storage for a second branch target address prediction associating the first instruction fetch address with a different second branch target address. The first entry indicates a first instruction address type for the first instruction fetch address, and the second entry indicates a second instruction address type for the first instruction fetch address. | 08-06-2009 |
20090198965 | METHOD AND SYSTEM FOR SOURCING DIFFERING AMOUNTS OF PREFETCH DATA IN RESPONSE TO DATA PREFETCH REQUESTS - According to a method of data processing, a memory controller receives a prefetch load request from a processor core of a data processing system. The prefetch load request specifies a requested line of data. In response to receipt of the prefetch load request, the memory controller determines by reference to a stream of demand requests how much data is to be supplied to the processor core in response to the prefetch load request. In response to the memory controller determining to provide less than all of the requested line of data, the memory controller provides less than all of the requested line of data to the processor core. | 08-06-2009 |
20090198981 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE STORING DIRECT PREDICTIONS - In at least one embodiment, a processor includes at least one execution unit and instruction sequencing logic that fetches instructions for execution by the execution unit. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a branch target address cache (BTAC) having at least one direct entry providing storage for a direct branch target address prediction associating a first instruction fetch address with a branch target address to be used as a second instruction fetch address immediately after the first instruction fetch address and at least one indirect entry providing storage for an indirect branch target address prediction associating a third instruction fetch address with a branch target address to be used as a fourth instruction fetch address subsequent to both the third instruction fetch address and an intervening fifth instruction fetch address. | 08-06-2009 |
20090198982 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE SELECTIVELY APPLYING A DELAYED HIT - In at least one embodiment, a processor includes at least one execution unit that executes instructions and instruction sequencing logic, coupled to the at least one execution unit, that fetches instructions from a memory system for execution by the at least one execution unit. The instruction sequencing logic including branch target address prediction circuitry that stores a branch target address prediction associating a first instruction fetch address with a branch target address to be used as a second instruction fetch address. The branch target address prediction circuitry includes delay logic that, in response to at least a tag portion of a third instruction fetch address matching the first instruction fetch address, delays access to the memory system utilizing the second instruction fetch address if no branch target address prediction was made in an immediately previous cycle of operation. | 08-06-2009 |
20090198985 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE WITH HASHED INDICES - In at least one embodiment, a processor includes at least one execution unit that executes instructions and instruction sequencing logic, coupled to the at least one execution unit, that fetches instructions from a memory system for execution by the at least one execution unit. The instruction sequencing logic including a branch target address cache (BTAC) including a plurality of entries for storing branch target address predictions. The BTAC includes index logic that selects an entry to access utilizing a BTAC index based upon at least a set of higher order bits of an instruction address and a set of lower order bits of the instruction address. | 08-06-2009 |
20090199190 | System and Method for Priority-Based Prefetch Requests Scheduling and Throttling - A method, processor, and data processing system for implementing a framework for priority-based scheduling and throttling of prefetching operations. A prefetch engine (PE) assigns a priority to a first prefetch stream, indicating a relative priority for scheduling prefetch operations of the first prefetch stream. The PE monitors activity within the data processing system and dynamically updates the priority of the first prefetch stream based on the activity (or lack thereof). Low priority streams may be discarded. The PE also schedules prefetching in a priority-based scheduling sequence that corresponds to the priority currently assigned to the scheduled active streams. When there are no prefetches within a prefetch queue, the PE triggers the active streams to provide prefetches for issuing. The PE determines when to throttle prefetching, based on the current usage level of resources relevant to completing the prefetch. | 08-06-2009 |
20090259696 | Node Synchronization for Multi-Processor Computer Systems - A method and apparatus for controlling access by a set of accessing nodes to memory of a home node (in a multimode computer system) determines that each node in the set of nodes has accessed the memory, and forwards a completion message to each node in the set of nodes after it is determined that each node has accessed the memory. The completion message has data indicating that each node in the set of nodes has accessed the memory of the home node. | 10-15-2009 |
20090287908 | PREDICATION SUPPORT IN AN OUT-OF-ORDER PROCESSOR BY SELECTIVELY EXECUTING AMBIGUOUSLY RENAMED WRITE OPERATIONS - A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition. | 11-19-2009 |
20090288063 | PREDICATION SUPPORTING CODE GENERATION BY INDICATING PATH ASSOCIATIONS OF SYMMETRICALLY PLACED WRITE INSTRUCTIONS - A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition. | 11-19-2009 |
20100020810 | Link Services in a Communication Network - In a communication network, links in a transmission path between source and destination terminals are sequentially switched to an operational state in response to a command or a group of commands for transmitting data prior to completion of assembling the data. Each node in the transmission path independently monitors transmission of data. After transmitting the data, the links are selectively switched to pre-determined power saving states. | 01-28-2010 |
20100030973 | CACHE DIRECTED SEQUENTIAL PREFETCH - A technique for performing stream detection and prefetching within a cache memory simplifies stream detection and prefetching. A bit in a cache directory or cache entry indicates that a cache line has not been accessed since being prefetched and another bit indicates the direction of a stream associated with the cache line. A next cache line is prefetched when a previously prefetched cache line is accessed, so that the cache always attempts to prefetch one cache line ahead of accesses, in the direction of a detected stream. Stream detection is performed in response to load misses tracked in the load miss queue (LMQ). The LMQ stores an offset indicating a first miss at the offset within a cache line. A next miss to the line sets a direction bit based on the difference between the first and second offsets and causes prefetch of the next line for the stream. | 02-04-2010 |
20100031006 | THREAD COMPLETION RATE CONTROLLED SCHEDULING - A method, processor and processing system provide management of per-thread pipeline resource allocation in a simultaneous multi-threaded (SMT) processor by counting indications of instruction completion for each of the threads. The indication may be the commit phase of the pipeline, which indicates results of the pipeline instruction execution are ready for write-back. The completion counts are used in a relative or absolute form to control the pipeline resource allocation. The decode or fetch rates of instructions for the threads can be controlled from the relative or absolute completion counts, providing control of scheduling instructions among the threads for execution by execution pipeline(s). Alternatively, or in combination, the thread priority registers in any thread priority management scheme can be controlled by comparison and/or scaling of the completion counts. | 02-04-2010 |
20100115204 | NON-UNIFORM CACHE ARCHITECTURE (NUCA) - In one embodiment, a cache memory includes a cache array including a plurality of entries for caching cache lines of data, where the plurality of entries are distributed between a first region implemented in a first memory technology and a second region implemented in a second memory technology. The cache memory further includes a cache directory of the contents of the cache array and a cache controller that controls operation of the cache memory. | 05-06-2010 |
20100293339 | DATA PROCESSING SYSTEM, PROCESSOR AND METHOD FOR VARYING A DATA PREFETCH SIZE BASED UPON DATA USAGE - A method of data processing in a processor includes maintaining a usage history indicating demand usage of prefetched data retrieved into cache memory. An amount of data to prefetch by a data prefetch request is selected based upon the usage history. The data prefetch request is transmitted to a memory hierarchy to prefetch the selected amount of data into cache memory. | 11-18-2010 |
20110004875 | Method and System for Performance Isolation in Virtualized Environments - A method, a system, an apparatus, and a computer program product for allocating resources of one or more shared devices to one or more partitions of a virtualization environment within a data processing system. At least one user defined resource assignment is received for one or more devices associated with the data processing system. One or more registers, associated with the one or more partitions are dynamically set to execute the at least one resource assignment, whereby the at least one resource assignment enables a user defined quantitative measure (number and/or percentage) of devices to operate when the one or more transactions are executed via the partition. The system enables the one or more devices to execute one or more transactions at a bandwidth/capacity that is less than or equal to the user defined resource assignment and minimizes performance interference among partitions. | 01-06-2011 |
20110022773 | Fine Grained Cache Allocation - A mechanism is provided in a virtual machine monitor for fine grained cache allocation in a shared cache. The mechanism partitions a cache tag into a most significant bit (MSB) portion and a least significant bit (LSB) portion. The MSB portion of the tags is shared among the cache lines in a set. The LSB portion of the tags is private, one per cache line. The mechanism allows software to set the MSB portion of tags in a cache to allocate sets of cache lines. The cache controller determines whether a cache line is locked based on the MSB portion of the tag. | 01-27-2011 |
20110055827 | Cache Partitioning in Virtualized Environments - A mechanism is provided in a virtual machine monitor for providing cache partitioning in virtualized environments. The mechanism assigns a virtual identification (ID) to each virtual machine in the virtualized environment. The processing core stores the virtual ID of the virtual machine in a special register. The mechanism also creates an entry for the virtual machine in a partition table. The mechanism may partition a shared cache using a vertical (way) partition and/or a horizontal partition. The entry in the partition table includes a vertical partition control and a horizontal partition control. For each cache access, the virtual machine passes the virtual ID along with the address to the shared cache. If the cache access results in a miss, the shared cache uses the partition table to select a victim cache line for replacement. | 03-03-2011 |
20110072214 | Read and Write Aware Cache - A mechanism is provided in a cache for providing a read and write aware cache. The mechanism partitions a large cache into a read-often region and a write-often region. The mechanism considers read/write frequency in a non-uniform cache architecture replacement policy. A frequently written cache line is placed in one of the farther banks. A frequently read cache line is placed in one of the closer banks. The size ratio between read-often and write-often regions may be static or dynamic. The boundary between the read-often region and the write-often region may be distinct or fuzzy. | 03-24-2011 |
20110119322 | On-Chip Networks for Flexible Three-Dimensional Chip Integration - Mechanisms for providing an interconnect layer of a three-dimensional integrated circuit device having multiple independent and cooperative on-chip networks are provided. With regard to an apparatus implementing the interconnect layer, such an apparatus comprises a first integrated circuit layer comprising one or more first functional units and an interconnect layer coupled to the first integrated circuit layer. The first integrated circuit layer and interconnect layer are integrated with one another into a single three-dimensional integrated circuit. The interconnect layer comprises a plurality of independent on-chip communication networks that are independently operable and independently able to be powered on and off, each on-chip communication network comprising a plurality of point-to-point communication links coupled together by a plurality of connection points. The one or more first functional units are coupled to a first independent on-chip communication network of the interconnect layer. | 05-19-2011 |
20110145509 | CACHE DIRECTED SEQUENTIAL PREFETCH - A technique for performing stream detection and prefetching within a cache memory simplifies stream detection and prefetching. A bit in a cache directory or cache entry indicates that a cache line has not been accessed since being prefetched and another bit indicates the direction of a stream associated with the cache line. A next cache line is prefetched when a previously prefetched cache line is accessed, so that the cache always attempts to prefetch one cache line ahead of accesses, in the direction of a detected stream. Stream detection is performed in response to load misses tracked in the load miss queue (LMQ). The LMQ stores an offset indicating a first miss at the offset within a cache line. A next miss to the line sets a direction bit based on the difference between the first and second offsets and causes prefetch of the next line for the stream. | 06-16-2011 |
20110238946 | Data Reorganization through Hardware-Supported Intermediate Addresses - A virtual address scheme for improving performance and efficiency of memory accesses of sparsely-stored data items in a cached memory system is disclosed. In a preferred embodiment of the present invention, a special address translation unit is used to translate sets of non-contiguous addresses in real memory into contiguous blocks of addresses in an “intermediate address space.” This intermediate address space is a fictitious or “virtual” address space, but is distinguishable from the virtual address space visible to application programs, and in user-level memory operations, effective addresses seen/manipulated by application programs are translated into intermediate addresses by an additional address translation unit for memory caching purposes. This scheme allows non-contiguous data items in memory to be assembled into contiguous cache lines for more efficient caching/access (due to the perceived spatial proximity of the data from the perspective of the processor). | 09-29-2011 |
20110283067 | Target Memory Hierarchy Specification in a Multi-Core Computer Processing System - Target memory hierarchy specification in a multi-core computer processing system is provided including a system for implementing prefetch instructions. The system includes a first core processor, a dedicated cache corresponding to the first core processor, and a second core processor. The second core processor includes instructions for executing a prefetch instruction that specifies a memory location and the dedicated local cache corresponding to the first core processor. Executing the prefetch instruction includes retrieving data from the memory location and storing the retrieved data on the dedicated local cache corresponding to the first core processor. | 11-17-2011 |
20110292594 | Scalable Space-Optimized and Energy-Efficient Computing System - A scalable space-optimized and energy-efficient computing system is provided. The computing system comprises a plurality of modular compartments in at least one level of a frame configured in a hexadron configuration. The computing system also comprises an air inlet, an air mixing plenum, and at least one fan. In the computing system the plurality of modular compartments are affixed above the air inlet, the air mixing plenum is affixed above the plurality of modular compartments, and the at least one fan is affixed above the air mixing plenum. When at least one module is inserted into one of the plurality of modular compartments, the module couples to a backplane within the frame. | 12-01-2011 |
20110292597 | Stackable Module for Energy-Efficient Computing Systems - A modular processing module is provided. The modular processing module comprises a set of processing module sides. Each processing module side comprises a circuit board, a plurality of connectors coupled to the circuit board, and a plurality of processing nodes coupled to the circuit board. Each processing module side in the set of processing module sides couples to another processing module side using at least one connector in the plurality of connectors such that, when all of the set of processing module sides are coupled together, the modular processing module is formed. The modular processing module comprises an exterior connection to a power source and a communication system. | 12-01-2011 |
20110296097 | Mechanisms for Reducing DRAM Power Consumption - Mechanisms are provided for inhibiting precharging of memory cells of a dynamic random access memory (DRAM) structure. The mechanisms receive a command for accessing memory cells of the DRAM structure. The mechanisms further determine, based on the command, if precharging the memory cells following accessing the memory cells is to be inhibited. Moreover, the mechanisms send, in response to the determination indicating that precharging the memory cells is to be inhibited, a command to blocking logic of the DRAM structure to block precharging of the memory cells following accessing the memory cells. | 12-01-2011 |
20110296107 | Latency-Tolerant 3D On-Chip Memory Organization - A mechanism is provided within a 3D stacked memory organization to spread or stripe cache lines across multiple layers. In an example organization, a 128B cache line takes eight cycles on a 16B-wide bus. Each layer may provide 32B. The first layer uses the first two of the eight transfer cycles to send the first 32B. The next layer sends the next 32B using the next two cycles of the eight transfer cycles, and so forth. The mechanism provides a uniform memory access. | 12-01-2011 |
20110296112 | Reducing Energy Consumption of Set Associative Caches by Reducing Checked Ways of the Set Association - Mechanisms for accessing a set associative cache of a data processing system are provided. A set of cache lines, in the set associative cache, associated with an address of a request are identified. Based on a determined mode of operation for the set, the following may be performed: determining if a cache hit occurs in a preferred cache line without accessing other cache lines in the set of cache lines; retrieving data from the preferred cache line without accessing the other cache lines in the set of cache lines, if it is determined that there is a cache hit in the preferred cache line; and accessing each of the other cache lines in the set of cache lines to determine if there is a cache hit in any of these other cache lines only in response to there being a cache miss in the preferred cache line(s). | 12-01-2011 |
20110296115 | Assigning Memory to On-Chip Coherence Domains - A mechanism is provided for assigning memory to on-chip cache coherence domains. The mechanism assigns caches within a processing unit to coherence domains. The mechanism then assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller. If a cache line is not found within the coherence domain, the memory controller accesses the memory to retrieve the cache line. | 12-01-2011 |
20110296138 | FAST REMOTE COMMUNICATION AND COMPUTATION BETWEEN PROCESSORS - A method, system, and computer usable program product for fast remote communication and computation between processors are provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute. | 12-01-2011 |
20110296149 | Instruction Set Architecture Extensions for Performing Power Versus Performance Tradeoffs - Mechanisms are provided for processing an instruction in a processor of a data processing system. The mechanisms operate to receive, in a processor of the data processing system, an instruction, the instruction including power/performance tradeoff information associated with the instruction. The mechanisms further operate to determine power/performance tradeoff priorities or criteria, specifying whether power conservation or performance is prioritized with regard to execution of the instruction, based on the power/performance tradeoff information. Moreover, the mechanisms process the instruction in accordance with the power/performance tradeoff priorities or criteria identified based on the power/performance tradeoff information of the instruction. | 12-01-2011 |
20110296428 | REGISTER ALLOCATION TO THREADS - A method, system, and computer usable program product for improved register allocation in a simultaneous multithreaded processor. A determination is made that a thread of an application in the data processing environment needs more physical registers than are available to allocate to the thread. The thread is configured to utilize a logical register that is mapped to a memory register. The thread is executed utilizing the physical registers and the memory registers. | 12-01-2011 |
20110296434 | Techniques for Dynamically Sharing a Fabric to Facilitate Off-Chip Communication for Multiple On-Chip Units - A technique for sharing a fabric to facilitate off-chip communication for on-chip units includes dynamically assigning a first unit that implements a first communication protocol to a first portion of the fabric when private fabrics are indicated for the on-chip units. The technique also includes dynamically assigning a second unit that implements a second communication protocol to a second portion of the fabric when the private fabrics are indicated for the on-chip units. In this case, the first and second units are integrated in a same chip and the first and second protocols are different. The technique further includes dynamically assigning, based on off-chip traffic requirements of the first and second units, the first unit or the second unit to the first and second portions of the fabric when the private fabrics are not indicated for the on-chip units. | 12-01-2011 |
20120191946 | FAST REMOTE COMMUNICATION AND COMPUTATION BETWEEN PROCESSORS - A method for fast remote communication and computation between processors is provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute. | 07-26-2012 |
20120198172 | Cache Partitioning in Virtualized Environments - A mechanism is provided in a virtual machine monitor for providing cache partitioning in virtualized environments. The mechanism assigns a virtual identification (ID) to each virtual machine in the virtualized environment. The processing core stores the virtual ID of the virtual machine in a special register. The mechanism also creates an entry for the virtual machine in a partition table. The mechanism may partition a shared cache using a vertical (way) partition and/or a horizontal partition. The entry in the partition table includes a vertical partition control and a horizontal partition control. For each cache access, the virtual machine passes the virtual ID along with the address to the shared cache. If the cache access results in a miss, the shared cache uses the partition table to select a victim cache line for replacement. | 08-02-2012 |
20120265944 | Assigning Memory to On-Chip Coherence Domains - A mechanism for assigning memory to on-chip cache coherence domains assigns caches within a processing unit to coherence domains. The mechanism assigns chunks of memory to the coherence domains. The mechanism monitors applications running on cores within the processing unit to identify needs of the applications. The mechanism may then reassign memory chunks to the cache coherence domains based on the needs of the applications running in the coherence domains. When a memory controller receives the cache miss, the memory controller may look up the address in a lookup table that maps memory chunks to cache coherence domains. Snoop requests are sent to caches within the coherence domain. If a cache line is found in a cache within the coherence domain, the cache line is returned to the originating cache by the cache containing the cache line either directly or through the memory controller. | 10-18-2012 |
20120311265 | Read and Write Aware Cache - A mechanism is provided in a cache for providing a read and write aware cache. The mechanism partitions a large cache into a read-often region and a write-often region. The mechanism considers read/write frequency in a non-uniform cache architecture replacement polity. A frequently written cache line is placed in one of the farther banks. A frequently read cache line is place in one of the closer banks. The size ration between read-often and write-often regions may be static or dynamic. The boundary between the read-often region and the write-often region may be distinct or fuzzy. | 12-06-2012 |