Patent application number | Description | Published |
20080215819 | METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR A CACHE COHERENCY PROTOCOL STATE THAT PREDICTS LOCATIONS OF SHARED MEMORY BLOCKS - A method, apparatus, and computer program product are disclosed for reducing the number of unnecessarily broadcast local requests to reduce the latency to access data from remote nodes in an SMP computer system. A shared invalid cache coherency protocol state is defined that predicts whether a memory read request to read data in a shared cache line can be satisfied within a local node. When a cache line is in the shared invalid state, a valid copy of the data is predicted to be located in the local node. When a cache line is in the invalid state and not in the shared invalid state, a valid copy of the data is predicted to be located in one of the remote nodes. | 09-04-2008 |
20080215821 | DATA PROCESSING SYSTEM AND METHOD FOR EFFICIENT COMMUNICATION UTILIZING AN IN COHERENCY STATE - A cache coherent data processing system includes at least first and second coherency domains each including at least one processing unit. The first coherency domain includes a first cache memory, and the second coherency domain includes a coherent second cache memory. The first cache memory within the first coherency domain of the data processing system holds a memory block in a storage location associated with an address tag and a coherency state field. The coherency state field is set to a state that indicates that the address tag is valid, that the storage location does not contain valid data, and that the memory block is likely cached only within the first coherency domain. | 09-04-2008 |
20090319726 | Efficient Region Coherence Protocol for Clustered Shared-Memory Multiprocessor Systems - A system and method of a region coherence protocol for use in Region Coherence Arrays (RCAs) deployed in clustered shared-memory multiprocessor systems which optimize cache-to-cache transfers by allowing broadcast memory requests to be provided to only a portion of a clustered shared-memory multiprocessor system. Interconnect hierarchy levels can be devised for logical groups of processors, processors on the same chip, processors on chips aggregated into a multichip module, multichip modules on the same printed circuit board, and for processors on other printed circuit boards or in other cabinets. The present region coherence protocol includes, for example, one bit per level of interconnect hierarchy, such that the one bit has a value of “1” to indicate that there may be processors caching copies of lines from the region at that level of the interconnect hierarchy, and the one bit has a value of “0” to indicate that there are no cached copies of any lines from the region at that respective level of the interconnect hierarchy. | 12-24-2009 |
20100223622 | Non-Uniform Memory Access (NUMA) Enhancements for Shared Logical Partitions - In a NUMA-topology computer system that includes multiple nodes and multiple logical partitions, some of which may be dedicated and others of which are shared, NUMA optimizations are enabled in shared logical partitions. This is done by specifying a home node parameter in each virtual processor assigned to a logical partition. When a task is created by an operating system in a shared logical partition, a home node is assigned to the task, and the operating system attempts to assign the task to a virtual processor that has a home node that matches the home node for the task. The partition manager then attempts to assign virtual processors to their corresponding home nodes. If this can be done, NUMA optimizations may be performed without the risk of reducing the performance of the shared logical partition. | 09-02-2010 |
20100281220 | Predictive ownership control of shared memory computing system data - A method, circuit arrangement, and design structure utilize a lock prediction data structure to control ownership of a cache line in a shared memory computing system. In a first node among the plurality of nodes, lock prediction data in a hardware-based lock prediction data structure for a cache line associated with a first memory request is updated in response to that first memory request, wherein at least a portion of the lock prediction data is predictive of whether the cache line is associated with a release operation. The lock prediction data is then accessed in response to a second memory request associated with the cache line and issued by a second node and a determination is made as to whether to transfer ownership of the cache line from the first node to the second node based at least in part on the accessed lock prediction data. | 11-04-2010 |
20100281221 | Shared Data Prefetching with Memory Region Cache Line Monitoring - A method, circuit arrangement, and design structure for prefetching data for responding to a memory request, in a shared memory computing system of the type that includes a plurality of nodes, is provided. Prefetching data comprises, receiving, in response to a first memory request by a first node, presence data for a memory region associated with the first memory request from a second node that sources data requested by the first memory request, and selectively prefetching at least one cache line from the memory region based on the received presence data. Responding to a memory request comprises tracking presence data associated with memory regions associated with cached cache lines in the first node, and, in response to a memory request by a second node, forwarding the tracked presence data for a memory region associated with the memory request to the second node. | 11-04-2010 |
20100287561 | DEVICE FOR AND METHOD OF WEIGHTED-REGION CYCLE ACCOUNTING FOR MULTI-THREADED PROCESSOR CORES - An aspect of the present invention improves the accuracy of measuring processor utilization of multi-threaded cores by providing a calibration facility that derives utilization in the context of the overall dynamic operating state of the core by assigning weights to idle threads and assigning weights to run threads, depending on the status of the core. From previous chip designs it has been established in a Simultaneous Multi Thread (SMT) core that not all idle cycles in a hardware thread can be equally converted into useful work. Competition for core resources reduces the conversion efficiency of one thread's idle cycles when any other thread is running on the same core. | 11-11-2010 |
20110302374 | LOCAL AND GLOBAL MEMORY REQUEST PREDICTOR - A method, circuit arrangement, and design structure utilize broadcast prediction data to determine whether to globally broadcast a memory request in a computing system of the type that includes a plurality of nodes, each node including a plurality of processing units. The method includes updating broadcast prediction data for a cache line associated with a first memory request within a hardware-based broadcast prediction data structure in turn associated with a first processing unit in response to the first memory request, the broadcast prediction data for the cache line including data associated with a history of ownership of the cache line. The method further comprises accessing the broadcast prediction data structure and determining whether to perform an early broadcast of a second memory request to a second node based on broadcast prediction data within the broadcast prediction data structure in response to that second memory request associated with the cache line. | 12-08-2011 |
20120265942 | PREDICTIVE OWNERSHIP CONTROL OF SHARED MEMORY COMPUTING SYSTEM DATA - A method, circuit arrangement, and design structure utilize a lock prediction data structure to control ownership of a cache line in a shared memory computing system. In a first node among the plurality of nodes, lock prediction data in a hardware-based lock prediction data structure for a cache line associated with a first memory request is updated in response to that first memory request, wherein at least a portion of the lock prediction data is predictive of whether the cache line is associated with a release operation. The lock prediction data is then accessed in response to a second memory request associated with the cache line and issued by a second node and a determination is made as to whether to transfer ownership of the cache line from the first node to the second node based at least in part on the accessed lock prediction data. | 10-18-2012 |
20130080712 | Non-Uniform Memory Access (NUMA) Enhancements for Shared Logical Partitions - In a NUMA-topology computer system that includes multiple nodes and multiple logical partitions, some of which may be dedicated and others of which are shared, NUMA optimizations are enabled in shared logical partitions. This is done by specifying a home node parameter in each virtual processor assigned to a logical partition. When a task is created by an operating system in a shared logical partition, a home node is assigned to the task, and the operating system attempts to assign the task to a virtual processor that has a home node that matches the home node for the task. The partition manager then attempts to assign virtual processors to their corresponding home nodes. If this can be done, NUMA optimizations may be performed without the risk of reducing the performance of the shared logical partition. | 03-28-2013 |