Patent application number | Description | Published |
20080215804 | STRUCTURE FOR REGISTER RENAMING IN A MICROPROCESSOR - A design structure embodied in a machine readable storage medium for at least one of designing, manufacturing, and testing a design for register renaming allows processor hardware to use a larger set of registers than the architected registers visible to the compiler. This larger set of registers is called the physical register file. Thus, dynamically renaming every compiler-suggested architected register to a microarchitecture-specific physical register, allows the processor to overcome name dependencies and the hazards (pipeline slowdowns) induced by name dependencies. | 09-04-2008 |
20080235500 | STRUCTURE FOR INSTRUCTION CACHE TRACE FORMATION - A design structure embodied in a machine readable storage medium for at least one of designing, manufacturing, and testing a design for a single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines is provided. Instruction branches are predicted taken or not taken using a highly accurate branch history table (BHT). Branches that are predicted not taken are appended to a trace buffer and the next basic block is constructed from the remaining instructions in the fetch buffer. Branches that are predicted taken flush the remaining fetch buffer and the next address is determined using a Branch Target Address Register (BTAC). | 09-25-2008 |
20080250205 | STRUCTURE FOR SUPPORTING SIMULTANEOUS STORAGE OF TRACE AND STANDARD CACHE LINES - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for a single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines is provided. A mechanism is described for indexing into the cache, and selecting the desired line. Control is exercised over which lines are contained within the cache. Provision is made for selection between a trace line and a conventional line when both match during a tag compare step. | 10-09-2008 |
20080250206 | STRUCTURE FOR USING BRANCH PREDICTION HEURISTICS FOR DETERMINATION OF TRACE FORMATION READINESS - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for a single unified level one instruction(s) cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instruction(s) consistent with conventional cache lines is provided. Formation of trace lines in the cache is delayed on initial operation of the system to assure quality of the trace lines stored. | 10-09-2008 |
20080250207 | DESIGN STRUCTURE FOR CACHE MAINTENANCE - A single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines. Control is exercised over which lines are contained within the cache. This invention avoids inefficiencies in the cache by removing trace lines experiencing early exits from the cache, or trace lines that are short, by maintaining a few bits of information about the accuracy of the control flow in a trace cache line and using that information in addition to the LRU (Least Recently Used) bits that maintain the recency information of a cache line, in order to make a replacement decision. | 10-09-2008 |
20080288752 | DESIGN STRUCTURE FOR FORWARDING STORE DATA TO LOADS IN A PIPELINED PROCESSOR - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding store data to loads in a pipelined processor is provided. In one implementation, a processor is provided that includes a decoder operable to decode an instruction, and a plurality of execution units operable to respectively execute a decoded instruction from the decoder. The plurality of execution units include a load/store execution unit operable to execute decoded load instructions and decoded store instructions and generate corresponding load memory operations and store memory operations. The store queue is operable to buffer one or more store memory operations prior to the one or more memory operations being completed, and the store queue is operable to forward store data of the one or more store memory operations buffered in the store queue to a load memory operation on a byte-by-byte basis. | 11-20-2008 |
20090157943 | TRACKING LOAD STORE ORDERING HAZARDS - A method and system for processing data. In one embodiment, the method includes receiving a plurality of stores into a store queue, where each store is a result from a processor, and where the plurality of stores are destined for at least one memory address. The method also includes marking a most recent store of the plurality of stores for each unique memory address, comparing a load request against the store queue, and identifying only the most recent store for each unique memory address for the purpose of handling load-hit-store ordering hazards. | 06-18-2009 |
20090157944 | TRACKING STORE ORDERING HAZARDS IN AN OUT-OF-ORDER STORE QUEUR - A method and system for processing data. In one embodiment, the method includes receiving a first store and receiving a second store subsequent to the first store. The method also includes generating a pointer that points to the last store that needs to retire before the second store retires, where the pointer is associated with the second store, and the last store that needs to retire is the first store. | 06-18-2009 |
20090164729 | SYNC-ID FOR MULTIPLE CONCURRENT SYNC DEPENDENCIES IN AN OUT-OF-ORDER STORE QUEUE - A method, system and process for retiring data entries held within a store queue (STQ). The STQ of a processor cache is modified to receive and process multiple synchronized groups (sync-groups). Sync groups comprise thread of execution synchronized (thread-sync) entries, all thread of execution synchronized (all-thread-sync) entries, and regular store entries (non-thread-sync and non-all-thread-sync). The task of storing data entries, from the STQ out to memory or an input/output device, is modified to increase the effectiveness of the cache. Sync-groups are created for each thread and tracked within the STQ via a synchronized identification (SID). An entry is eligible for retirement when the entry is within a currently retiring sync-group as identified by the SID. | 06-25-2009 |
20090164734 | MULTIPLE CONCURRENT SYNC DEPENDENCIES IN AN OUT-OF-ORDER STORE QUEUE - A method, system and processing device for retiring data entries held within a store queue (STQ). The STQ of a processor cache is modified to receive and process several types of data entries including: non-synchronized (non-sync), thread of execution synchronized (thread-sync), and all thread of execution synchronized (all-thread-sync). The task of storing data entries, from the STQ out to memory or an input/output device, is modified to increase the effectiveness of the cache. The modified STQ allows non-sync, thread-sync, and all-thread-sync instructions to coexist in the STQ regardless of the thread of execution. Stored data entries, or stores are deterministically selected for retirement, according to the data entry type. | 06-25-2009 |
20110131394 | APPARATUS AND METHOD FOR USING BRANCH PREDICTION HEURISTICS FOR DETERMINATION OF TRACE FORMATION READINESS - A single unified level one instruction(s) cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instruction(s) consistent with conventional cache lines. Formation of trace lines in the cache is delayed on initial operation of the system to assure quality of the trace lines stored. | 06-02-2011 |
20120089979 | Performance Monitor Design for Counting Events Generated by Thread Groups - A number of hypervisor register fields are set to specify which processor cores are allowed to generate a number of performance events for a particular thread group. A plurality of threads for an application running in the computing environment to a plurality of thread groups are configured by a plurality of thread group fields in a plurality of control registers. A number of counter sets are allowed to count a number of thread group events originating from one of a shared resource and a shared cache are specified by a number of additional hypervisor register fields. | 04-12-2012 |
20120089984 | Performance Monitor Design for Instruction Profiling Using Shared Counters - Counter registers are shared among multiple threads executing on multiple processor cores. An event within the processor core is selected. A multiplexer in front of each of a number of counters is configured to route the event to a counter. A number of counters are assigned for the event to each of a plurality of threads running for a plurality of applications on a plurality of processor cores, wherein each of the counters includes a thread identifier in the interrupt thread identification field and a processor identifier in the processor identification field. The number of counters is configured to have a number of interrupt thread identification fields and a number of processor identification fields to identify a thread that will receive a number of interrupts. | 04-12-2012 |
20120089985 | Sharing Sampled Instruction Address Registers for Efficient Instruction Sampling in Massively Multithreaded Processors - Sampled instruction address registers are shared among multiple threads executing on a plurality of processor cores. Each of a plurality of sampled instruction address registers are assigned to a particular thread running for an application on the plurality of processor cores. Each of the sampled instruction address registers are configured by storing in each of the sampled instruction address registers a thread identification of the particular thread in a thread identification field and a processor identification of a particular processor on which the particular thread is running in a processor identification field. | 04-12-2012 |
20120151142 | TRANSFER OF BUS-BASED OPERATIONS TO DEMAND-SIDE MACHINES - An L2 cache, method and computer program product for transferring an inbound bus operation to a processor side handling machine. The method includes a bus operation handling machine accepting the inbound bus operation received over a system interconnect, the bus operation handling machine identifying a demand operation of the processor side handling machine that will complete the bus operation, the bus operation handling machine sending the identified demand operation to the processor side handling machine, and the processor side handling machine performing the identified demand operation. | 06-14-2012 |
20120159082 | Direct Access To Cache Memory - Methods and apparatuses are disclosed for direct access to cache memory. Embodiments include receiving, by a direct access manager that is coupled to a cache controller for a cache memory, a region scope zero command describing a region scope zero operation to be performed on the cache memory; in response to receiving the region scope zero command, generating a direct memory access region scope zero command, the direct memory access region scope zero command having an operation code and an identification of the physical addresses of the cache memory on which the operation is to be performed; sending the direct memory access region scope zero command to the cache controller for the cache memory; and performing, by the cache controller, the direct memory access region scope zero operation in dependence upon the operation code and the identification of the physical addresses of the cache memory. | 06-21-2012 |
20120159086 | Cache Management - Methods, apparatuses, and computer program products are disclosed for cache management. Embodiments include receiving, by a cache controller, a request to insert a new cache line into a cache; determining, by the cache controller, whether the new cache line is associated with a forced injection; in response to determining that the new cache line is associated with a forced injection, accepting, by the cache controller, the insertion of the new cache line into the cache; and in response to determining that the new cache line is not associated with a forced injection, determining, by the cache controller, whether to accept the insertion of the new cache line based on a comparison of an address of the new cache line to a predefined range of addresses. | 06-21-2012 |
20120159087 | Ensuring Forward Progress of Token-Required Cache Operations In A Shared Cache - Ensuring forward progress of token-required cache operations in a shared cache, including: snooping an instruction to execute a token-required cache operation; determining if a snoop machine is available and if the snoop machine is set to a reservation state; if the snoop machine is available and the snoop machine is in the reservation state, determining whether the instruction to execute the token-required cache operation owns a token or is a joint instruction; if the instruction is a joint instruction, instructing the operation to retry; if the instruction to execute the token-required cache operation owns a token, dispatching a cache controller; determining whether all required cache controllers of relevant compute nodes are available to execute the instruction; executing the instruction if the required cache controllers are available otherwise not executing the instruction. | 06-21-2012 |
20120159640 | Acquiring Access To A Token Controlled System Resource - Acquiring access to a token controlled system resource, including: receiving, by a token broker, a command that requires access to the token controlled system resource, where the token broker is automated computing machinery for acquiring tokens and distributing the command to the token controlled system resource for execution; identifying, by the token broker, a first need state, the first need state indicating that the token broker requires access to the token controlled system resource to which the token broker does not possess a token; requesting, by the token broker, a configurable number of tokens to gain access to the token controlled system resource, without dispatching an operation handler for executing the command until at least one token is acquired; assigning, by the token broker, an acquired token to the operation handler; and dispatching, by the token broker, the operation handler and its assigned token for executing the command. | 06-21-2012 |
20120198178 | ADDRESS-BASED HAZARD RESOLUTION FOR MANAGING READ/WRITE OPERATIONS IN A MEMORY CACHE - One embodiment provides a cached memory system including a memory cache and a plurality of read-claim (RC) machines configured for performing read and write operations dispatched from a processor. According to control logic provided with the cached memory system, a hazard is detected between first and second read or write operations being handled by first and second RC machines. The second RC machine is suspended and a subset of the address bits of the second operation at specific bit positions are recorded. The subset of address bits of the first operation at the specific bit positions are broadcast in response to the first operation being completed. The second operation is then re-requested. | 08-02-2012 |
20120203946 | LIVELOCK PREVENTION MECHANISM IN A RING SHAPED INTERCONNECT UTILIZING ROUND ROBIN SAMPLING - A novel and useful cost effective mechanism for detecting the livelock/starvation of transactions in a ring shaped interconnect that utilizes minimal logic resources. Rather than monitor all transactions concurrently in the ring, the mechanism monitors only a single transaction in the ring. A sampling point is located at a point in the ring which contains a set of N latches. If the monitored transaction is not being starved, it is released and the detection logic moves on the next candidate transaction in round robin fashion. If the monitored transaction passes the sampling point a threshold number of times, it is deemed to be starved and a starvation prevention handling procedure is activated. By traversing the entire ring a single transaction at a time, all starving transactions will eventually be detected with an upper limit on the detection time of O(N | 08-09-2012 |
20120331184 | POLLING OF A TARGET REGISTER WITHIN A PERIPHERAL DEVICE - In a disclosed example of a method, a requested value of a target register may be specified as a precondition to performing a requested read or write operation. The requested read or write operation may be generated by a requesting device, such as a processor, and sent over a bus to a peripheral device containing the target register. The target register may be polled internally to the peripheral device without generating additional bus traffic between the requesting device and the peripheral device. A ring topology may be used to internally poll the target register and to perform the requested read or write operation when the polled value of the target register equals the requested value. | 12-27-2012 |
20140237186 | FILTERING SNOOP TRAFFIC IN A MULTIPROCESSOR COMPUTING SYSTEM - Filtering snoop traffic in a multiprocessor computing system, each processor in the multiprocessor computing system coupled to a high level cache and a low level cache, the including: receiving a snoop message that identifies an address in shared memory targeted by a write operation; identifying a set in the high level cache that maps to the address in shared memory; determining whether the high level cache includes an entry associated with the address in shared memory; responsive to determining that the high level cache does not include an entry corresponding to the address in shared memory: determining whether the set in the high level cache has been bypassed by an entry in the low level cache; and responsive to determining that the set in the high level cache has not been bypassed by an entry in the low level cache, discarding the snoop message. | 08-21-2014 |
20140310468 | METHODS AND APPARATUS FOR IMPROVING PERFORMANCE OF SEMAPHORE MANAGEMENT SEQUENCES ACROSS A COHERENT BUS - Techniques are described for a multi-processor having two or more processors that increases the opportunity for a load-exclusive command to take a cache line in an Exclusive state, which results in increased performance when a store-exclusive is executed. A new bus operation read prefer exclusive is used as a hint to other caches that a requesting master is likely to store to the cache line, and, if possible, the other cache should give the line up. In most cases, this will result in the other master giving the line up and the requesting master taking the line Exclusive. In most cases, two or more processors are not performing a semaphore management sequence to the same address at the same time. Thus, a requesting master's load-exclusive is able to take a cache line in the Exclusive state an increased number of times. | 10-16-2014 |
20150074357 | DIRECT SNOOP INTERVENTION - A low latency cache intervention mechanism implements a snoop filter to dynamically select an intervener cache for a cache “hit” in a multiprocessor architecture of a computer system. The selection of the intervener is based on variables such as latency, topology, frequency, utilization, load, wear balance, and/or power state of the computer system. | 03-12-2015 |