Patent application number | Description | Published |
20080301531 | FAULT TOLERANT ENCODING OF DIRECTORY STATES FOR STUCK BITS - A method of handling a stuck bit in a directory of a cache memory, by defining multiple binary encodings to indicate a defective cache state, detecting an error in a tag stored in a member of the directory (wherein the tag at least includes an address field, a state field and an error-correction field), determining that the error is associated with a stuck bit of the directory member, and writing new state information to the directory member which is selected from one of the binary encodings based on a field location of the stuck bit within the directory member. The multiple binary encodings may include a first binary encoding when the stuck bit is in the address field, a second binary encoding when the stuck bit is in the state field, and a third binary encoding when the stuck bit is in the error-correction field. The new state information may also further be selected based on the value of the stuck bit, e.g., a state bit corresponding to the stuck bit is assigned a bit value from the new state information which matches the value of the stuck bit. | 12-04-2008 |
20080307203 | Scaling Instruction Intervals to Identify Collection Points for Representative Instruction Traces - A method, system, and computer program product are provided for identifying instructions to obtain representative traces. A phase instruction budget is calculated for each phase in a set of phases. The phase instruction budget is based on a weight associated with each phase and a global instruction budget. A starting index and an ending index are identified for instructions within a set of intervals in each phase in order to meet the phase instruction budget for that phase, thereby forming a set of interval indices. A determination is made as to whether the instructions within the set of interval indices meet the global instruction budget. Responsive to the global instruction budget being met, the set of interval indices are output as collection points for the representative traces. | 12-11-2008 |
20090055153 | Augmenting of Automated Clustering-Based Trace Sampling Methods by User-Directed Phase Detection - Computer implemented method, system, and computer usable program code for simulating processor operation in a data processing system. An instruction trace is generated, wherein the instruction trace includes markers specified by a user for identifying interval boundaries for at least one interval of the instruction trace. The instruction trace is divided into a plurality of intervals in consideration of the markers, and the plurality of intervals are formed into a plurality of interval clusters, wherein each interval cluster represents one phase of execution of the instruction trace. At least one interval from each of the plurality of interval clusters is selected as a trace sample to provide a plurality of trace samples, wherein each selected interval is of at least a minimum size, a simulation is performed using the plurality of trace samples, and a result of the simulation is provided to the user. | 02-26-2009 |
20090138220 | Power-aware line intervention for a multiprocessor directory-based coherency protocol - A directory-based coherency method, system and program are provided for intervening a requested cache line from a plurality of candidate memory sources in a multiprocessor system on the basis of the sensed temperature or power dissipation value at each memory source. By providing temperature or power dissipation sensors in each of the candidate memory sources (e.g., at cores, cache memories, memory controller, etc.) that share a requested cache line, control logic may be used to determine which memory source should source the cache line by using the power sensor signals to signal only the memory source with acceptable power dissipation to provide the cache line to the requester. | 05-28-2009 |
20090138660 | Power-aware line intervention for a multiprocessor snoop coherency protocol - A snoop coherency method, system and program are provided for intervening a requested cache line from a plurality of candidate memory sources in a multiprocessor system on the basis of the sensed temperature or power dissipation value at each memory source. By providing temperature or power dissipation sensors in each of the candidate memory sources (e.g., at cores, cache memories, memory controller, etc.) that share a requested cache line, control logic may be used to determine which memory source should source the cache line by using the power sensor signals to signal only the memory source with acceptable power dissipation to provide the cache line to the requester. | 05-28-2009 |
20090138682 | Dynamic instruction execution based on transaction priority tagging - A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values for each thread as a tag in the thread's instructions, tagged instructions from different threads that are dispatched through the system are allocated system resources based on the tagged priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies. The test results may be used to optimize the workload allocation of system resources by dynamically assigning thread priority values to individual threads using any desired policy, such as achieving thread execution balance relative to thresholds and to performance of other threads, reducing thread response time, lowering power consumption, etc. | 05-28-2009 |
20090138683 | Dynamic instruction execution using distributed transaction priority registers - A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values in thread priority registers distributed throughout the computer system, instructions from different threads that are dispatched through the system are allocated system resources based on the priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies. The test results may be used to optimize the workload allocation of system resources by dynamically assigning thread priority values to individual threads using any desired policy, such as achieving thread execution balance relative to thresholds and to performance of other threads, reducing thread response time, lowering power consumption, etc. | 05-28-2009 |
20090150617 | CACHE MECHANISM AND METHOD FOR AVOIDING CAST OUT ON BAD VICTIM SELECT AND RECYCLING VICTIM SELECT OPERATION - A method, apparatus, and computer for identifying selection of a bad victim during victim selection at a cache and recovering from such bad victim selection without causing the system to crash or suspend forward progress of the victim selection process. Among the bad victim selection addressed are recovery from selection of a deleted member and recovery from use of LRU state bits that do not map to a member within the congruence class. When LRU victim selection logic generates an output vector identifying a victim, the output vector is checked to ensure that it is a valid vector (non-null) and that it is not pointing to a deleted member. When the output vector is not valid or points to a deleted member, the LRU victim selection logic is triggered to re-start the victim selection process. | 06-11-2009 |
20090164399 | Method for Autonomic Workload Distribution on a Multicore Processor - A multiprocessor system which includes automatic workload distribution. As threads execute in the multiprocessor system, an operating system or hypervisor continuously learns the execution characteristics of the threads and saves the information in thread-specific control blocks. The execution characteristics are used to generate thread performance data. As the thread executes, the operating system continuously uses the performance data to steer the thread to a core that will execute the workload most efficiently. | 06-25-2009 |
20090164755 | Optimizing Execution of Single-Threaded Programs on a Multiprocessor Managed by Compilation - A method for optimizing execution of a single threaded program on a multi-core processor. The method includes dividing the single threaded program into a plurality of discretely executable components while compiling the single threaded program; identifying at least some of the plurality of discretely executable components for execution by an idle core within the multi-core processor; and enabling execution of the at least one of the plurality of discretely executable components on the idle core. | 06-25-2009 |
20090164759 | Execution of Single-Threaded Programs on a Multiprocessor Managed by an Operating System - A method and apparatus for speculatively executing a single threaded program within a multi-core processor which includes identifying an idle core within the multi-core processor, performing a look ahead operation on the single thread instructions to identify speculative instructions within the single thread instructions, and allocating the idle core to execute the speculative instructions. | 06-25-2009 |
20090164812 | Dynamic processor reconfiguration for low power without reducing performance based on workload execution characteristics - A method, system and program are provided for dynamically reconfiguring a pipelined processor to operate with reduced power consumption without reducing existing performance. By monitoring or detecting the performance of individual units or stages in the processor as they execute a given workload, each stage may use high-performance circuitry until such time as a drop in the throughput performance is detected, at which point the stages are reconfigured to use lower-performance circuitry so as to meet the reduced performance throughput requirements using less power. By configuring the processor to back off from high-performance designs to low-performance designs to meet the detected performance characteristics of the executing workload warrant, power dissipation may be optimized. | 06-25-2009 |
20090165016 | Method for Parallelizing Execution of Single Thread Programs - A method and apparatus for speculatively executing a single threaded program within a multi-core processor which includes identifying an idle core within the multi-core processor, performing a look ahead operation on the single thread instructions to identify speculative instructions within the single thread instructions, and allocating the idle core to execute the speculative instructions. | 06-25-2009 |
20090186494 | APPARATUS, SYSTEM, AND METHOD FOR A CONFIGURABLE BLADE CARD - An apparatus, system, and method are disclosed for a configurable blade card. A base card is in physical and electrical communication with a blade connector. The blade connector is in physical and electrical communication with a blade enclosure connector. A secondary card is in physical and electrical communication with the base card to form a blade card. A coupler physically couples the base card and the secondary card. The base card and the secondary card are co-planar and compatible with a blade card form factor. | 07-23-2009 |
20090210727 | APPARATUS AND METHOD TO MANAGE POWER IN A COMPUTING DEVICE - A method to manage power in a computing device comprising a controller assembly and a storage assembly comprising a plurality of data storage devices, by selecting a processor parameter, establishing a threshold processor parameter value, establishing a threshold over-parameter time interval, selecting a data storage device parameter, and establishing a nominal data storage device parameter value. The method determines an actual processor parameter value. If the actual processor parameter value is less than or equal to the threshold processor parameter value, the method operates each of the plurality of data storage devices using the nominal data storage device parameter value. If the actual processor parameter value is greater than the threshold processor parameter value, then the method determines an actual over-parameter time interval. If the actual processor parameter value is greater than the threshold processor parameter value, and if the actual over-parameter time interval is greater than the threshold over-parameter time interval, then the method operates each of the plurality of data storage devices using a data storage device parameter value less than the nominal data storage device parameter value. | 08-20-2009 |
20090276190 | Method and Apparatus For Evaluating Integrated Circuit Design Performance Using Basic Block Vectors, Cycles Per Instruction (CPI) Information and Microarchitecture Dependent Information - A test system or simulator includes an integrated circuit (IC) benchmark software program that executes workload program software on a semiconductor die IC design model. The benchmark software program includes trace, simulation point, basic block vector (BBV) generation, cycles per instruction (CPI) error, clustering and other programs. The test system also includes CPI stack program software that generates CPI stack data that includes microarchitecture dependent information for each instruction interval of workload program software. The CPI stack data may also include an overall analysis of CPI data for the entire workload program. IC designers may utilize the benchmark software and CPI stack program to develop a reduced representative workload program that includes CPI data as well as microarchitecture dependent information. | 11-05-2009 |
20090276191 | Method And Apparatus For Integrated Circuit Design Model Performance Evaluation Using Basic Block Vector Clustering And Fly-By Vector Clustering - A test system or simulator includes an enhanced IC test application sampling software program that executes test application software on a semiconductor die IC design model. The enhanced test application sampling software may include trace, simulation point, CPI error, clustering, instruction budgeting, and other programs. The enhanced test application sampling software generates basic block vectors (BBVs) and fly-by vectors (FBVs) from instruction trace analysis of test application software workloads. The enhanced test application sampling software utilizes the microarchitecture dependent information to generate the FBVs to select representative instruction intervals from the test application software. The enhanced test application sampling software generates a reduced representative test application software program from the BBV and FBV data utilizing a global instruction budgeting analysis method. Designers use the test system with enhanced test application sampling software to evaluate IC design models by using the representative test application software program. | 11-05-2009 |
20100023698 | Enhanced Coherency Tracking with Implementation of Region Victim Hash for Region Coherence Arrays - A method and system for precisely tracking lines evicted from a region coherence array (RCA) without requiring eviction of the lines from a processor's cache hierarchy. The RCA is a set-associative array which contains region entries consisting of a region address tag, a set of bits for the region coherence state, and a line-count for tracking the number of region lines cached by the processor. Tracking of the RCA is facilitated by a non-tagged hash table of counts represented by a Region Victim Hash (RVH). When a region is evicted from the RCA, and lines from the evicted region still reside in the processor's caches (i.e., the region's line-count is non-zero), the RCA line-count is added to the corresponding RVH count. The RVH count is decremented by the value of the region line count following a subsequent processor cache eviction/invalidation of the region previously evicted from the RCA. | 01-28-2010 |
20100049963 | Multicore Processor and Method of Use That Adapts Core Functions Based on Workload Execution - A processor has multiple cores with each core having an associated function to support processor operations. The functions performed by the cores are selectively altered to improve processor operations by balancing the resources applied for each function. For example, each core comprises a field programmable array that is selectively and dynamically programmed to perform a function, such as a floating point function or a fixed point function, based on the number of operations that use each function. As another example, a processor is built with a greater number of cores than can be simultaneously powered, each core associated with a function, so that cores having functions with lower utilization are selectively powered down. | 02-25-2010 |
20100153700 | Multicore Processor And Method Of Use That Configures Core Functions Based On Executing Instructions - A processor having multiple cores coordinates functions performed on the cores to automatically, dynamically and repeatedly reconfigure the cores for optimal performance based on characteristics of currently executing software. A core running a thread detects a multi-core characteristic of the thread and assigns one or more other cores to the thread to dynamically combine the cores into what functionally amounts to a common core for more efficient execution of the thread. | 06-17-2010 |
20100153956 | Multicore Processor And Method Of Use That Configures Core Functions Based On Executing Instructions - A multiprocessor system having plural heterogeneous processing units schedules instruction sets for execution on a selected of the processing units by matching workload processing characteristics of processing units and the instruction sets. To establish an instruction set's processing characteristics, the homogeneous instruction set is executed on each of the plural processing units with one or more performance metrics tracked at each of the processing units to determine which processing unit most efficiently executes the instruction set. Instruction set workload processing characteristics are stored for reference in scheduling subsequent execution of the instruction set. | 06-17-2010 |
20100161282 | WORKLOAD PERFORMANCE PROJECTION FOR FUTURE INFORMATION HANDLING SYSTEMS USING MICROARCHITECTURE DEPENDENT DATA - A performance projection system includes a test IHS and a currently existing IHS. The performance projection system includes surrogate programs and user application software. The test IHS or simulator includes a processor with hardware (HW) counter(s) and an L1 cache. The test IHS employs a memory that includes a virtual future IHS, currently existing IHS, surrogate programs, and user application software for determination of runtime and HW counter performance data. The user application software and surrogate programs execute on the currently existing IHS to provide designers with runtime data and HW counter or microarchitecture dependent data. Designers execute surrogate programs on the future IHS to provide runtime and HW counter data. Designers normalize and weight the runtime and HW counter data to provide a representative surrogate program for comparison to user application software performance on the future IHS. Using a scaling factor, designers may generate a projection of runtime performance for the user application software executing on the future IHS. | 06-24-2010 |
20100162216 | WORKLOAD PERFORMANCE PROJECTION VIA SURROGATE PROGRAM ANALYSIS FOR FUTURE INFORMATION HANDLING SYSTEMS - A performance projection system includes a test IHS and multiple currently existing IHSs. The performance projection system includes user application software and surrogate programs that execute on currently existing IHSs. The performance projection system measures user application software and surrogate program performance during execution on currently existing IHSs. The performance projection systems measures runtime program performance during execution of surrogate programs on a future semiconductor die IC design model or virtualized future system. Designers normalize and compare surrogate program runtime performance data with user application software performance data. Designers un-normalize the normalized runtime performance data to generate a projection of runtime performance on the future system. | 06-24-2010 |
20100186018 | OFF-LOADING OF PROCESSING FROM A PROCESSOR BADE TO STORAGE BLADES - A processor blade determines whether a selected processing task is to be off-loaded to a storage blade for processing. The selected processing task is off-loaded to the storage blade via a planar bus communication path, in response to determining that the selected processing task is to be off-loaded to the storage blade. The off-loaded selected processing task is processed in the storage blade. The storage blade communicates the results of the processing of the off-loaded selected processing task to the processor blade. | 07-22-2010 |
20110153931 | HYBRID STORAGE SUBSYSTEM WITH MIXED PLACEMENT OF FILE CONTENTS - A storage subsystem combining solid state drive (SSD) and hard disk drive (HDD) technologies provides low access latency and low complexity. Separate free lists are maintained for the SSD and the HDD and blocks of file system data are stored uniquely on either the SSD or the HDD. When a read access is made to the subsystem, if the data is present on the SSD, the data is returned, but if the block is present on the HDD, it is migrated to the SSD and the block on the HDD is returned to the HDD free list. On a write access, if the block is present in the either the SSD or HDD, the block is overwritten, but if the block is not present in the subsystem, the block is written to the HDD. | 06-23-2011 |
20110154352 | MEMORY MANAGEMENT SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT - According to one aspect of the present disclosure a method and technique for managing memory access is disclosed. The method includes setting a memory databus utilization threshold for each of a plurality of processors of a data processing system to maintain memory databus utilization of the data processing system at or below a system threshold. The method also includes monitoring memory databus utilization for the plurality of processors and, in response to determining that memory databus utilization for at least one of the processors is below its threshold, reallocating at least a portion of unused databus utilization from the at least one processor to at least one of the other processors. | 06-23-2011 |
20110271283 | ENERGY-AWARE JOB SCHEDULING FOR CLUSTER ENVIRONMENTS - A job scheduler can select a processor core operating frequency for a node in a cluster to perform a job based on energy usage and performance data. After a job request is received, an energy aware job scheduler accesses data that specifies energy usage and job performance metrics that correspond to the requested job and a plurality of processor core operating frequencies. A first of the plurality of processor core operating frequencies is selected that satisfies an energy usage criterion for performing the job based, at least in part, on the data that specifies energy usage and job performance metrics that correspond to the job. The job is assigned to be performed by a node in the cluster at the selected first of the plurality of processor core operating frequencies. | 11-03-2011 |
20110320793 | OPERATING SYSTEM AWARE BRANCH PREDICTOR USING A DYNAMICALLY RECONFIGURABLE BRANCH HISTORY TABLE - A processor resource manager assigns a branch history resource to a first execution mode. The branch history resource is utilized for predicting a branch direction of a branch instruction. Next, the resource manager logs a number of branch mispredictions that occur while the processor executes a second execution mode. The resource manager, in turn, reassigns the branch history resource to the second execution mode based upon the number of branch mispredictions. | 12-29-2011 |
20120002812 | DATA AND CONTROL ENCRYPTION - Secure communication of data between devices includes encrypting unencrypted data at a first device by reordering unencrypted bits provided in parallel on a device bus, including data and control bits, from an unencrypted order to form encrypted data including a plurality of encrypted bits in parallel in an encrypted order defined by a key. The encrypted data may be transmitted to another device where the encrypted data is decrypted by using the key to order the encrypted bits to restore the unencrypted order thereby to reform the unencrypted data. | 01-05-2012 |
20120096240 | Application Performance with Support for Re-Initiating Unconfirmed Software-Initiated Threads in Hardware - A method, system and computer-usable medium are disclosed for managing prefetch streams in a virtual machine environment. Compiled application code in a first core, which comprises a Special Purpose Register (SPR) and a plurality of first prefetch engines, initiates a prefetch stream request. If the prefetch stream request cannot be initiated due to unavailability of a first prefetch engine, then an indicator bit indicating a Prefetch Stream Dispatch Fault is set in the SPR, causing a Hypervisor to interrupt the execution of the prefetch stream request. The Hypervisor then calls its associated operating system (OS), which determines prefetch engine availability for a second core comprising a plurality of second prefetch engines. If a second prefetch engine is available, then the OS migrates the prefetch stream request from the first core to the second core, where it is initiated on an available second prefetch engine. | 04-19-2012 |
20120096241 | Performance of Emerging Applications in a Virtualized Environment Using Transient Instruction Streams - A method, system and computer-usable medium are disclosed for managing transient instruction streams. Transient flags are defined in Branch-and-Link (BRL) instructions that are known to be infrequently executed. A bit is likewise set in a Special Purpose Register (SPR) of the hardware (e.g., a core) that is executing an instruction request thread. Subsequent fetches or prefetches in the request thread are treated as transient and are not written to lower-level caches. If an instruction is non-transient, and if a lower-level cache is non-inclusive of the L1 instruction cache, a fetch or prefetch miss that is obtained from memory may be written in both the L1 and the lower-level cache. If it is not inclusive, a cast-out from the L1 instruction cache may be written in the lower-level cache. | 04-19-2012 |
20120130928 | EFFICIENT STORAGE OF INDIVIDUALS FOR OPTIMIZATION SIMULATION - Candidate solutions to an optimization problem comprise a set of potential values that can be applied to variables in a problem description. Candidate solutions can be large because of the complexity of optimization problems and large number of variables. The populations of candidate solutions may also be large to ensure diversity and effectiveness in computing a solution. When the populations and the candidate solutions are large for an optimization problem, computing a solution to the optimization problem consumes a large amount of memory. In some instances, several generations of candidate solutions are stored in memory. Compression of the candidate solutions can minimize the memory space consumed to compute a solution to an optimization problem. | 05-24-2012 |
20120151141 | DETERMINING SERVER WRITE ACTIVITY LEVELS TO USE TO ADJUST WRITE CACHE SIZE - Provided are a computer program product, system, and method for determining server write activity levels to use to adjust write cache size. Information on server write activity to the cache is gathered. The gathered information on write activity is processed to determine a server write activity level comprising one of multiple write activity levels indicating a level of write activity. The determined server write activity level is transmitted to a storage server having a write cache, wherein the storage server uses the determined server write activity level to determine whether to adjust a size of the storage server write cache. | 06-14-2012 |
20120151297 | Enhanced Coherency Tracking with Implementation of Region Victim Hash for Region Coherence Arrays - A method and system for precisely tracking lines evicted from a region coherence array (RCA) without requiring eviction of the lines from a processor's cache hierarchy. The RCA is a set-associative array which contains region entries consisting of a region address tag, a set of bits for the region coherence state, and a line-count for tracking the number of region lines cached by the processor. Tracking of the RCA is facilitated by a non-tagged hash table of counts represented by a Region Victim Hash (RVH). When a region is evicted from the RCA, and lines from the evicted region still reside in the processor's caches (i.e., the region's line-count is non-zero), the RCA line-count is added to the corresponding RVH count. The RVH count is decremented by the value of the region line count following a subsequent processor cache eviction/invalidation of the region previously evicted from the RCA. | 06-14-2012 |
20120179873 | Performance of Emerging Applications in a Virtualized Environment Using Transient Instruction Streams - A method, system and computer-usable medium are disclosed for managing transient instruction streams. Transient flags are defined in Branch-and-Link (BRL) instructions that are known to be infrequently executed. A bit is likewise set in a Special Purpose Register (SPR) of the hardware (e.g., a core) that is executing an instruction request thread. Subsequent fetches or prefetches in the request thread are treated as transient and are not written to lower-level caches. If an instruction is non-transient, and if a lower-level cache is non-inclusive of the L1 instruction cache, a fetch or prefetch miss that is obtained from memory may be written in both the L1 and the lower-level cache. If it is not inclusive, a cast-out from the L1 instruction cache may be written in the lower-level cache. | 07-12-2012 |
20120180052 | Application Performance with Support for Re-Initiating Unconfirmed Software-Initiated Threads in Hardware - A method, system and computer-usable medium are disclosed for managing prefetch streams in a virtual machine environment. Compiled application code in a first core, which comprises a Special Purpose Register (SPR) and a plurality of first prefetch engines, initiates a prefetch stream request. If the prefetch stream request cannot be initiated due to unavailability of a first prefetch engine, then an indicator bit indicating a Prefetch Stream Dispatch Fault is set in the SPR, causing a Hypervisor to interrupt the execution of the prefetch stream request. The Hypervisor then calls its associated operating system (OS), which determines prefetch engine availability for a second core comprising a plurality of second prefetch engines. If a second prefetch engine is available, then the OS migrates the prefetch stream request from the first core to the second core, where it is initiated on an available second prefetch engine. | 07-12-2012 |
20120191939 | MEMORY MANAGEMENT METHOD - According to one aspect of the present disclosure a method and technique for managing memory access is disclosed. The method includes setting a memory databus utilization threshold for each of a plurality of processors of a data processing system to maintain memory databus utilization of the data processing system at or below a system threshold. The method also includes monitoring memory databus utilization for the plurality of processors and, in response to determining that memory databus utilization for at least one of the processors is below its threshold, reallocating at least a portion of unused databus utilization from the at least one processor to at least one of the other processors. | 07-26-2012 |
20120198121 | METHOD AND APPARATUS FOR MINIMIZING CACHE CONFLICT MISSES - A method for minimizing cache conflict misses is disclosed. A translation table capable of facilitating the translation of a virtual address to a real address during a cache access is provided. The translation table includes multiple entries, and each entry of the translation table includes a page number field and a hash value field. A hash value is generated from a first group of bits within a virtual address, and the hash value is stored in the hash value field of an entry within the translation table. In response to a match on the entry within the translation table during a cache access, the hash value of the matched entry is retrieved from the translation table, and the hash value is concatenated with a second group of bits within the virtual address to form a set of indexing bits to index into a cache set. | 08-02-2012 |
20120198187 | Technique for preserving memory affinity in a non-uniform memory access data processing system - Techniques for preserving memory affinity in a computer system is disclosed. In response to a request for memory access to a page within a memory affinity domain, a determination is made if the request is initiated by a processor associated with the memory affinity domain. If the request is not initiated by a processor associated with the memory affinity domain, a determination is made if there is a page ID match with an entry within a page migration tracking module associated with the memory affinity domain. If there is no page ID match, an entry is selected within the page migration tracking module to be updated with a new page ID and a new memory affinity ID. If there is a page ID match, then another determination is made whether or not there is a memory affinity ID match with the entry with the page ID field match. If there is no memory affinity ID match, the entry is updated with a new memory affinity ID; and if there is a memory affinity ID match, an access counter of the entry is incremented. | 08-02-2012 |
20120215982 | Partial Line Cache Write Injector for Direct Memory Access Write - A cache within a computer system receives a partial write request and identifies a cache hit of a cache line. The cache line corresponds to the partial write request and includes existing data. In turn, the cache receives partial write data and merges the partial write data with the existing data into the cache line. In one embodiment, the existing data is “modified” or “dirty.” In another embodiment, the existing data is “shared.” In this embodiment, the cache changes the state of the cache line to indicate the storing of the partial write data into the cache line. | 08-23-2012 |
20120216205 | ENERGY-AWARE JOB SCHEDULING FOR CLUSTER ENVIRONMENTS - A job scheduler can select a processor core operating frequency for a node in a cluster to perform a job based on energy usage and performance data. After a job request is received, an energy aware job scheduler accesses data that specifies energy usage and job performance metrics that correspond to the requested job and a plurality of processor core operating frequencies. A first of the plurality of processor core operating frequencies is selected that satisfies an energy usage criterion for performing the job based, at least in part, on the data that specifies energy usage and job performance metrics that correspond to the job. The job is assigned to be performed by a node in the cluster at the selected first of the plurality of processor core operating frequencies. | 08-23-2012 |
20120221812 | METHOD FOR PRESERVING MEMORY AFFINITY IN A NON-UNIFORM MEMORY ACCESS DATA PROCESSING SYSTEM - A method for preserving memory affinity in a computer system is disclosed. The method reduces and sometimes eliminates memory affinity loss due to process migration by restoring the proper memory affinity through dynamic page migration. The memory affinity access patterns of individual pages are tracked continuously. If a particular page is found almost always to be accessed from a particular remote access affinity domain for a certain number of times, and without any intervening requests from other access affinity domain, the page will migrate to that particular remote affinity domain so that the subsequent memory access becomes local memory access. As a result, the proper pages are migrated to increase memory affinity. | 08-30-2012 |
20120233283 | DETERMINING SERVER WRITE ACTIVITY LEVELS TO USE TO ADJUST WRITE CACHE SIZE - Provided are a computer program product, system, and method for determining server write activity levels to use to adjust write cache size. Information on server write activity to the cache is gathered. The gathered information on write activity is processed to determine a server write activity level comprising one of multiple write activity levels indicating a level of write activity. The determined server write activity level is transmitted to a storage server having a write cache, wherein the storage server uses the determined server write activity level to determine whether to adjust a size of the storage server write cache. | 09-13-2012 |
20130013903 | Multicore Processor and Method of Use That Adapts Core Functions Based on Workload Execution - A processor has multiple cores with each core having an associated function to support processor operations. The functions performed by the cores are selectively altered to improve processor operations by balancing the resources applied for each function. For example, each core comprises a field programmable array that is selectively and dynamically programmed to perform a function, such as a floating point function or a fixed point function, based on the number of operations that use each function. As another example, a processor is built with a greater number of cores than can be simultaneously powered, each core associated with a function, so that cores having functions with lower utilization are selectively powered down. | 01-10-2013 |
20130091509 | OFF-LOADING OF PROCESSING FROM A PROCESSOR BLADE TO STORAGE BLADES - A processor blade determines whether a selected processing task is to be off-loaded to a storage blade for processing. The selected processing task is off-loaded to the storage blade via a planar bus communication path, in response to determining that the selected processing task is to be off-loaded to the storage blade. The off-loaded selected processing task is processed in the storage blade. The storage blade communicates the results of the processing of the off-loaded selected processing task to the processor blade. | 04-11-2013 |
20130111135 | VARIABLE CACHE LINE SIZE MANAGEMENT | 05-02-2013 |
20130111136 | VARIABLE CACHE LINE SIZE MANAGEMENT | 05-02-2013 |
20130145135 | PERFORMANCE OF PROCESSORS IS IMPROVED BY LIMITING NUMBER OF BRANCH PREDICTION LEVELS - A method utilizes information provided by performance monitoring hardware to dynamically adjust the number of levels of speculative branch predictions allowed (typically 3 or 4 per thread). for a processor core. The information includes cycles-per-instruction (CPI) for the processor core and number of memory accesses per unit time. If the CPI is below a CPI threshold; and the number of memory accesses (NMA) per unit time is above a prescribe threshold, the number of levels of speculative branch predictions is reduced per thread for the processor core. Likewise, the number of levels of speculative branch predictions could be increased, from a low level to maximum allowed, if the CPI threshold is exceeded or the number of memory accesses per unit time is below the prescribed threshold. | 06-06-2013 |
20130151784 | DYNAMIC PRIORITIZATION OF CACHE ACCESS - Some embodiments of the inventive subject matter are directed to determining that a memory access request results in a cache miss and determining an amount of cache resources used to service cache misses within a past period in response to determining that the memory access request results in the cache miss. Some embodiments are further directed to determining that servicing the memory access request would increase the amount of cache resources used to service cache misses within the past period to exceed a threshold. In some embodiments, the threshold corresponds to reservation of a given amount of cache resources for potential cache hits. Some embodiments are further directed to rejecting the memory access request in response to the determining that servicing the memory access request would increase the amount of cache resources used to service cache misses within the past period to exceed the threshold. | 06-13-2013 |
20130151788 | DYNAMIC PRIORITIZATION OF CACHE ACCESS - Some embodiments of the inventive subject matter are directed to a cache comprising a tracking unit and cache state machines. In some embodiments, the tracking unit is configured to track an amount of cache resources used to service cache misses within a past period. In some embodiments, each of the cache state machines is configured to, determine whether a memory access request results in a cache miss or cache hit, and in response to a cache miss for a memory access request, query the tracking unit for the amount of cache resources used to service cache misses within the past period. In some embodiments, the each of the cache state machines is configured to service the memory access request based, at least in part, on the amount of cache resources used to service the cache misses within the past period according to the tracking unit. | 06-13-2013 |
20130218892 | HYBRID STORAGE SUBSYSTEM WITH MIXED PLACEMENT OF FILE CONTENTS - A storage subsystem combining solid state drive (SSD) and hard disk drive (HDD) technologies provides low access latency and low complexity. Separate free lists are maintained for the SSD and the HDD and blocks of file system data are stored uniquely on either the SSD or the HDD. When a read access is made to the subsystem, if the data is present on the SSD, the data is returned, but if the block is present on the HDD, it is migrated to the SSD and the block on the HDD is returned to the HDD free list. On a write access, if the block is present in the either the SSD or HDD, the block is overwritten, but if the block is not present in the subsystem, the block is written to the HDD. | 08-22-2013 |
20140095791 | PERFORMANCE-DRIVEN CACHE LINE MEMORY ACCESS - According to one aspect of the present disclosure a system and technique for performance-driven cache line memory access is disclosed. The system includes: a processor, a cache hierarchy coupled to the processor, and a memory coupled to the cache hierarchy. The system also includes logic executable to, responsive to receiving a request for a cache line: divide the request into a plurality of cache subline requests, wherein at least one of the cache subline requests comprises a high priority data request and at least one of the cache subline requests comprises a low priority data request; service the high priority data request; and delay servicing of the low priority data request until a low priority condition has been satisfied. | 04-03-2014 |
20140095796 | PERFORMANCE-DRIVEN CACHE LINE MEMORY ACCESS - According to one aspect of the present disclosure, a method and technique for performance-driven cache line memory access is disclosed. The method includes: receiving, by a memory controller of a data processing system, a request for a cache line; dividing the request into a plurality of cache subline requests, wherein at least one of the cache subline requests comprises a high priority data request and at least one of the cache subline requests comprises a low priority data request; servicing the high priority data request; and delaying servicing of the low priority data request until a low priority condition has been satisfied. | 04-03-2014 |
20150032977 | MEMORY MANAGEMENT SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT - According to one aspect of the present disclosure a method and technique for managing memory access is disclosed. The method includes setting a memory databus utilization threshold for each of a plurality of processors of a data processing system to maintain memory databus utilization of the data processing system at or below a system threshold. The method also includes monitoring memory databus utilization for the plurality of processors and, in response to determining that memory databus utilization for at least one of the processors is below its threshold, reallocating at least a portion of unused databus utilization from the at least one processor to at least one of the other processors. | 01-29-2015 |