Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Instruction data cache

Subclass of:

711 - Electrical computers and digital processing systems: memory

711100000 - STORAGE ACCESSING AND CONTROL

711117000 - Hierarchical memories

711118000 - Caching

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20090198898PARALLEL DATA PROCESSING APPARATUS - A controller for controlling a data processor having a plurality of processor arrays, each of which includes a plurality of processing elements, comprises a retrieval unit operable to retrieve a plurality of incoming instructions streams in parallel with one another, and a distribution unit operable to supply such incoming instruction streams to respective ones of the said plurality of processor arrays.08-06-2009
20120246409ARITHMETIC PROCESSING UNIT AND ARITHMETIC PROCESSING METHOD - An arithmetic processing unit includes a cache memory, a register configured to hold data used for arithmetic processing, a correcting controller configured to detect an error in data retrieved from the register, a cache controller configured to access a cache area of a memory space via the cache memory or a noncache area of the memory space without using the cache memory in response to an instruction executing request for executing a requested instruction, and notify a report indicating that the requested instruction is a memory access instruction for accessing the noncache area, and an instruction executing controller configured to delay execution of other instructions subjected to error detection by the correcting controller while the cache controller executes the memory access instruction for accessing the noncache area when the instruction executing controller receives the notified report.09-27-2012
20120246408ARITHMETIC PROCESSING DEVICE AND CONTROLLING METHOD THEREOF - A physical process ID (PPID) is stored for each cache block of each set, and a MAX WAY number for each PPID value is stored for each of index values #09-27-2012
20100077145METHOD AND SYSTEM FOR PARALLEL EXECUTION OF MEMORY INSTRUCTIONS IN AN IN-ORDER PROCESSOR - A method of parallel execution of a first and a second instruction in an in-order processor. Embodiments of the invention enable parallel execution of memory instructions that are stalled by cache memory misses. The in-order processor processes cache memory misses of instructions in parallel by overlapping the first cache memory miss with cache memory misses that occur after the first cache memory miss. Memory-level parallelism in the in-order processor can be increased when more parallel and outstanding cache memory misses are generated.03-25-2010
20130086327AUTOMATIC CACHING OF PARTIAL RESULTS WHILE EDITING SOFTWARE - An automatic caching system is described herein that automatically determines user-relevant points at which to incrementally cache expensive to obtain data, resulting in faster computation of dependent results. The system can intelligently choose between caching data locally and pushing computation to a remote location collocated with the data, resulting in faster computation of results. The automatic caching system uses stable keys to uniquely refer to programmatic identifiers. The system annotates programs before execution with additional code that utilizes the keys to associate and cache intermediate programmatic results. The system can maintain the cache in a separate process or even on a separate machine to allow cached results to outlive program execution and allow subsequent execution to utilize previously computed results. Cost estimations are performed in order to choose whether utilizing cached values or remote execution would result in a faster computation of a result.04-04-2013
20100138609Technique for controlling computing resources - A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance.06-03-2010
20130080706PREVENTION OF CLASSLOADER MEMORY LEAKS IN MULTITIER ENTERPRISE APPLICATIONS - A classloader cache class definition is obtained by a processor. The classloader cache class definition includes code that creates a classloader object cache that is referenced by a strong internal reference by a classloader object in response to instantiation of the classloader cache class definition. A classloader object cache is instantiated using the obtained classloader cache class definition. The strong internal reference is created at instantiation of the classloader object cache. A public interface to the classloader object cache is provided. The public interface to the classloader object cache operates as a weak reference to the classloader object cache and provides external access to the classloader object cache.03-28-2013
20130080707PREVENTION OF CLASSLOADER MEMORY LEAKS IN MULTITIER ENTERPRISE APPLICATIONS - A classloader cache class definition is obtained by a processor. The classloader cache class definition includes code that creates a classloader object cache that is referenced by a strong internal reference by a classloader object in response to instantiation of the classloader cache class definition. A classloader object cache is instantiated using the obtained classloader cache class definition. The strong internal reference is created at instantiation of the classloader object cache. A public interface to the classloader object cache is provided. The public interface to the classloader object cache operates as a weak reference to the classloader object cache and provides external access to the classloader object cache.03-28-2013
20130086328General Purpose Digital Data Processor, Systems and Methods - The invention provides improved data processing apparatus, systems and methods that include one or more nodes, e.g., processor modules or otherwise, that include or are otherwise coupled to cache, physical or other memory (e.g., attached flash drives or other mounted storage devices) collectively, “system memory.” At least one of the nodes includes a cache memory system that stores data (and/or instructions) recently accessed (and/or expected to be accessed) by the respective node, along with tags specifying addresses and statuses (e.g., modified, reference count, etc.) for the respective data (and/or instructions). The tags facilitate translating system addresses to physical addresses, e.g., for purposes of moving data (and/or instructions) between system memory (and, specifically, for example, physical memory—such as attached drives or other mounted storage) and the cache memory system.04-04-2013
20090172284METHOD AND APPARATUS FOR MONITOR AND MWAIT IN A DISTRIBUTED CACHE ARCHITECTURE - A method and apparatus for monitor and mwait in a distributed cache architecture is disclosed. One embodiment includes an execution thread sending a MONITOR request for an address to a portion of a distributed cache that stores the data corresponding to that address. At the distributed cache portion the MONITOR request and an associated speculative state is recorded locally for the execution thread. The execution thread then issues an MWAIT instruction for the address. At the distributed cache portion the MWAIT and an associated wait-to-trigger state are recorded for the execution thread. When a write request matching the address is received at the distributed cache portion, a monitor-wake event is then sent to the execution thread and the associated monitor state at the distributed cache portion for that execution thread can be reset to idle.07-02-2009
20100095066METHOD FOR CACHING RESOURCE REPRESENTATIONS IN A CONTEXTUAL ADDRESS SPACE - A method to generate and save a resource representation recited by a request encoded in a computer algorithm, wherein the method receives from a requesting algorithm an Unresolved resource request. The method resolves the resource request to an endpoint and evaluates the resolved resource request by the endpoint to generate a resource representation. The method further generates and saves in a cache at least one Unresolved request scope key, a resolved request scope key, and a cache entry comprising the resource representation. The method associates the cache entry with the resolved request scope key and with the at least one Unresolved request scope key using a mapping function encoded in the cache.04-15-2010
20090119456PROCESSOR AND MEMORY CONTROL METHOD - A processor and a memory management method are provided. The processor includes a processor core, a cache which transceives data to/from the processor core via a single port, and stores the data accessed by the processor core, and a Scratch Pad Memory (SPM) which transceives the data to/from the processor core via at least one of a plurality of multi ports.05-07-2009
20100268887INFORMATION HANDLING SYSTEM WITH IMMEDIATE SCHEDULING OF LOAD OPERATIONS IN A DUAL-BANK CACHE WITH DUAL DISPATCH INTO WRITE/READ DATA FLOW - An information handling system (IHS) includes a processor with a cache memory system. The processor includes a processor core with an L1 cache memory that couples to an L2 cache memory. The processor includes an arbitration mechanism that controls load and store requests to the L2 cache memory. The arbitration mechanism includes control logic that enables a load request to interrupt a store request that the L2 cache memory is currently servicing. The L2 cache memory includes dual data banks so that one bank may perform a load operation while the other bank performs a store operation. The cache system provides dual dispatch points into the data flow to the dual cache banks of the L2 cache memory.10-21-2010
20090037659Cache coloring method and apparatus based on function strength information - A method of performing cache coloring includes the steps of generating a dynamic function flow representing a temporal sequence in which a plurality of functions are called at a time of executing a program comprised of the plurality of functions by executing the program by a computer, generating function strength information in response to the dynamic function flow by use of the computer, the function strength information including information about runtime relationships between any given one of the plurality of functions and all the other ones of the plurality of functions in terms of a way the plurality of functions are called and further including information about degree of certainty of cache miss occurrence, and allocating the plurality of functions to memory space by use of the computer in response to the function strength information such as to reduce instruction cache conflict.02-05-2009
20090070531System and Method of Using An N-Way Cache - A system and method of using an n-way cache are disclosed. In an embodiment, a method includes determining a first way of a first instruction stored in a cache and storing the first way in a list of ways. The method also includes determining a second way of a second instruction stored in the cache and storing the second way in the list of ways. In an embodiment, the first way may be used to access a first cache line containing the first instruction and the second way may be used to access a second cache line containing the second instruction.03-12-2009
20120117327Bimodal Branch Predictor Encoded in a Branch Instruction - Each branch instruction having branch prediction support has branch prediction bits in architecture specified bit positions in the branch instruction. An instruction cache supports modifying the branch instructions with updated branch prediction bits that are dynamically determined when the branch instruction executes.05-10-2012
20120198170Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.08-02-2012
20090013132Cache memory - A cache memory comprises a first set of storage locations for holding syllables and addressable by a first group of addresses; a second set of storage locations for holding syllables and addressable by a second group of addresses; addressing circuitry operable to provide in each addressing cycle a pair of addresses comprising one from the first group and one from the second group, thereby accessing a plurality of syllables from each set of storage locations; and selection circuitry operable to select from said plurality of syllables to output to a processor lane based on whether a required syllable is addressable by an address in the first or second group.01-08-2009
20090013131LOW POWER SEMI-TRACE INSTRUCTION CACHE - A semi-trace cache combines elements and features of an instruction cache and a trace cache. An ICache portion of the semi-trace cache is filled with instructions fetched from the next level of the memory hierarchy while a TCache portion is filled with traces gleaned either from the actual stream of retired instructions or predicted before execution.01-08-2009
20120089783OPCODE LENGTH CACHING - A computer system caches variable-length instructions in a data structure. The computer system locates a first copy of an instruction in the cached data structure using a current value of the instruction pointer as a key. The computer system determines a predictive length of the instruction, and reads a portion of the instruction from an instruction memory as a second copy. The second copy has the predictive length. Based on the comparison of the first copy with the second copy, the computer system determines whether or not to read the rest of the instruction from the instruction memory, and then interprets the instruction for use by the computer system.04-12-2012
20080276045Apparatus and Method for Dynamic Cache Management - The apparatus of the present invention improves performance of computing systems by enabling a multi-core or multi-processor system to deterministically identify cache memory (11-06-2008
20100268888PROCESSING A DATA STREAM BY ACCESSING ONE OR MORE HARDWARE REGISTERS - Disclosed are a method, a system, and a program product for processing a data stream by accessing one or more hardware registers of a processor. In one or more embodiments, a first program instruction or subroutine can associate a hardware register of the processor with a data stream. With this association, the hardware register can be used as a stream head which can be used by multiple program instructions to access the data stream. In one or more embodiments, data from the data stream can be fetched automatically as needed and with one or more patterns which may include one or more start positions, one or more lengths, one or more strides, etc. to allow the cache to be populated with sufficient amounts of data to reduce memory latency and/or external memory bandwidth when executing an application which accesses the data stream through the one or more registers.10-21-2010
20090276576Methods and Apparatus storing expanded width instructions in a VLIW memory for deferred execution - Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.11-05-2009
20100169577Cache control device and control method - In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.07-01-2010
20100077146INSTRUCTION CACHE SYSTEM, INSTRUCTION-CACHE-SYSTEM CONTROL METHOD, AND INFORMATION PROCESSING APPARATUS - An instruction cache system includes an instruction-cache data storage unit that stores cache data per index, and an instruction cache controller that compresses and writes the cache data in the instruction-cache data storage unit, and controls a compression ratio of the written cache data. The instruction cache controller calculates a memory capacity of a redundant area generated due to compression in a memory area belonging to an index, in which n pieces of cache data are written based on the controlled compression ratio, to compress and write new cache data in the redundant area based on the calculated memory capacity.03-25-2010
20090144502Meta-Architecture Defined Programmable Instruction Fetch Functions Supporting Assembled Variable Length Instruction Processors - In an implementation, a processing system includes an instruction fetch (IF) memory storing IF instructions; an arithmetic/logic (AL) instruction memory (IMemory) storing AL instructions; and a programmable instruction fetch mechanism to generate IMemory instruction addresses, from IF instructions fetched from the IF memory, to select AL instructions to be fetched from the IMemory for execution, wherein at least one IF instruction includes a loop count field indicating a number of iterations of a loop to be performed, a loop start address of the loop, and a loop end address of the loop.06-04-2009
20100325360MULTI-CORE PROCESSOR AND MULTI-CORE PROCESSOR SYSTEM - In one embodiment, a multi-core processor includes a plurality of processor cores that each includes a cache and that uses a management target area allocated as a main memory in a memory area. The multi-core processor includes a state managing unit and a management-target-area. The state managing unit manages a first state in which the small area is not allocated to the processor core and a second state in which the small area is allocated to the processor core for each small area included in the management target area. The management-target-area increasing and decreasing unit increases and decreases the management target area by increasing and decreasing the small area in the first state in the management target area.12-23-2010
20090240890PROGRAM THREAD SYNCRONIZATION - A barrier for synchronizing program threads for a plurality of processors includes a filter configured to be coupled to a plurality of processors executing a plurality of threads to be synchronized. The filter is configured to monitor and selectively block fill requests for instruction cache lines. A method for synchronizing program threads for a plurality of processors includes configuring a filter to monitor and selectively block fill requests for instruction cache lines for a plurality of processors executing a plurality of threads to be synchronized.09-24-2009
20080209127System and method for efficient implementation of software-managed cache - A system and method for an efficient implementation of a software-managed cache is presented. When an application thread executes on a simple processor, the application thread uses a conditional data select instruction for eliminating a conditional branch instruction when accessing a software-managed cache. An application thread issues a conditional data select instruction (DMA transfer) after a cache directory lookup, wherein the size of the requested data is dependent upon the outcome of the cache directory lookup. When the cache directory lookup results in a cache hit, the application thread requests a transfer of zero bits of data, which results in a DMA controller (DMAC) performing a no-op instruction. When the cache directory lookup results in a cache miss, the application thread requests a data block transfer the size of a corresponding cache line.08-28-2008
20110173394ORDERING OF GUARDED AND UNGUARDED STORES FOR NO-SYNC I/O - A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.07-14-2011
20100299484LOW POWER HIGH SPEED LOAD-STORE COLLISION DETECTOR - An apparatus detects a load-store collision within a microprocessor between a load operation and an older store operation each of which accesses data in the same cache line. Load and store byte masks specify which bytes contain the data specified by the load and store operation within a word of the cache line in which the load and data begins, respectively. Load and store word masks specify which words contain the data specified by the load and store operations within the cache line, respectively. Combinatorial logic uses the load and store byte masks to detect the load-store collision if the data specified by the load and store operations begin in the same cache line word, and uses the load and store word masks to detect the load-store collision if the data specified by the load and store operations do not begin in the same cache line word.11-25-2010
20080250205STRUCTURE FOR SUPPORTING SIMULTANEOUS STORAGE OF TRACE AND STANDARD CACHE LINES - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for a single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines is provided. A mechanism is described for indexing into the cache, and selecting the desired line. Control is exercised over which lines are contained within the cache. Provision is made for selection between a trace line and a conventional line when both match during a tag compare step.10-09-2008
20080250206STRUCTURE FOR USING BRANCH PREDICTION HEURISTICS FOR DETERMINATION OF TRACE FORMATION READINESS - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for a single unified level one instruction(s) cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instruction(s) consistent with conventional cache lines is provided. Formation of trace lines in the cache is delayed on initial operation of the system to assure quality of the trace lines stored.10-09-2008
20080270702METHOD AND APPARATUS FOR AN EFFICIENT MULTI-PATH TRACE CACHE DESIGN - A novel trace cache design and organization to efficiently store and retrieve multi-path traces. A goal is to design a trace cache, which is capable of storing multi-path traces without significant duplication in the traces. Furthermore, the effective access latency of these traces is reduced.10-30-2008
20080276044PROCESSING APPARATUS - A processing apparatus which executes a program and performs processes of the program, includes an execution circuit including a plurality of central processing units, each having a respective cache memory, and each of the respective cache memories has an N-way set-associative structure with N-ways in which one line is made up of plural words. Each of the respective cache memories includes a data memory array which is simultaneously read-out in multiple-word-widths, and can be read-out using one of a type one read-out and a type two read-out. In the type one read-out, plural words in the same word positions within respective lines are simultaneously read-out from corresponding lines belonging to different ways, and in the type two read out, plural words making up one line of one way are simultaneously read-out. The cache memory has a first read-out mode and a second read-out mode. In the first read-out mode, a word belonging to a way which is hit by a memory access is selected from among the plural words read-out using the type-one read out, and the selected word is outputted, and in the second read-out mode, plural words are read-out from a way which is hit, using the type-two read out, and read-out plural words are outputted.11-06-2008
20080250207DESIGN STRUCTURE FOR CACHE MAINTENANCE - A single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines. Control is exercised over which lines are contained within the cache. This invention avoids inefficiencies in the cache by removing trace lines experiencing early exits from the cache, or trace lines that are short, by maintaining a few bits of information about the accuracy of the control flow in a trace cache line and using that information in addition to the LRU (Least Recently Used) bits that maintain the recency information of a cache line, in order to make a replacement decision.10-09-2008
20090119457MULTITHREADED CLUSTERED MICROARCHITECTURE WITH DYNAMIC BACK-END ASSIGNMENT - A multithreaded clustered microarchitecture with dynamic back-end assignment is presented. A processing system may include a plurality of instruction caches and front-end units each to process an individual thread from a corresponding one of the instruction caches, a plurality of back-end units, and an interconnect network to couple the front-end and back-end units. A method may include measuring a performance metric of a back-end unit, comparing the measurement to a first value, and reassigning, or not, the back-end unit according to the comparison. Computer systems according to embodiments of the invention may include: a random access memory; a system bus; and a processor having a plurality of instruction caches, a plurality of front-end units each to process an individual thread from a corresponding one of the instruction caches; a plurality of back-end units; and an interconnect network coupled to the plurality of front-end units and the plurality of back-end units.05-07-2009
20080229022EFFICIENT SYSTEM BOOTSTRAP LOADING - An efficient system for bootstrap loading scans cache lines into a cache store queue during a scan phase, and then transmits the cache lines from the cache store queue to a cache memory array during a functional phase. Scan circuitry stores a given cache line in a set of latches associated with one of a plurality of cache entries in the cache store queue, and passes the cache line from the latch set to the associated cache entry. The cache lines may be scanned from test software that is external to the computer system. Read/claim dispatch logic dispatches store instructions for the cache entries to read/claim machines which write the cache lines to the cache memory array without obtaining write permission, after the read/claim machines evaluate a mode bit which indicates that cache entries in the cache store queue are scanned cache lines. In the illustrative embodiment the cache memory is an L2 cache.09-18-2008
20090157967Pre-Fetch Data and Pre-Fetch Data Relative - A prefetch data machine instruction having an M field performs a function on a cache line of data specifying an address of an operand. The operation comprises either prefetching a cache line of data from memory to a cache or reducing the access ownership of store and fetch or fetch only of the cache line in the cache or a combination thereof. The address of the operand is either based on a register value or the program counter value pointing to the prefetch data machine instruction.06-18-2009
20120198168VARIABLE CACHING STRUCTURE FOR MANAGING PHYSICAL STORAGE - A method for managing a variable caching structure for managing storage for a processor. The method includes using a multi-way tag array to store a plurality of pointers for a corresponding plurality of different size groups of physical storage of a storage stack, wherein the pointers indicate guest addresses that have corresponding converted native addresses stored within the storage stack, and allocating a group of storage blocks of the storage stack, wherein the size of the allocation is in accordance with a corresponding size of one of the plurality of different size groups. Upon a hit on the tag, a corresponding entry is accessed to retrieve a pointer that indicates where in the storage stack a corresponding group of storage blocks of converted native instructions reside. The converted native instructions are then fetched from the storage stack for execution.08-02-2012
20090177842DATA PROCESSING SYSTEM AND METHOD FOR PREFETCHING DATA AND/OR INSTRUCTIONS - A data processing system for processing at least one application is provided. The data processing system comprises a processor (07-09-2009
20110145502META-DATA BASED DATA PREFETCHING - A technique for prefetching data into a cache memory system includes prefetching data based on meta information indicative of data access patterns. A method includes tagging data of a program with meta information indicative of data access patterns. The method includes prefetching the data from main memory at least partially based on the meta information, by a processor executing the program. In at least one embodiment, the method includes generating an executable at least partially based on the meta information. The executable includes at least one instruction to prefetch the data. In at least one embodiment, the method includes inserting one or more instructions for prefetching the data into an intermediate form of program code while translating program source code into the intermediate form of program code.06-16-2011
20120144120PROGRAMMABLE ATOMIC MEMORY USING HARDWARE VALIDATION AGENT - A processing core in a multi-processing core system is configured to execute a sequence of instructions as an atomic memory transaction. Executing each instruction in the sequence comprises validating that the instruction meets a set of one or more atomicity criteria, including that executing the instruction does not require accessing shared memory. Executing the atomic memory transaction may comprise storing memory data from a source cache line into a target register, reading or modifying the memory data stored in the target register as part of executing the sequence, and storing a value from the target register to the source cache line.06-07-2012
20090055592DIGITAL SIGNAL PROCESSOR CONTROL ARCHITECTURE - A system includes a control store memory populated with data path instructions indexable by control store addresses and jump addresses. The system further includes a control state machine to provide at least one control store address and at least one jump address to the control store memory, wherein the control store memory is configured to identify one or more data path instructions for both the control store address and the jump address.02-26-2009
20090100228METHODS AND SYSTEMS FOR IMPLEMENTING A CACHE MODEL IN A PREFETCHING SYSTEM - The present invention relates to systems and methods of enhancing prefetch operations. The method includes fetching an object from a page on a web server. The method further includes storing, at a proxy server, caching instructions for the fetched object. The proxy server is connected with the client and the object is cached at the client. Furthermore, the method includes identifying a prefetchable reference to the fetched object in a subsequent web page and using the caching instructions stored on the proxy server to determine if a fresh copy of the object will be requested by the client. Further, the method includes, based on the determination that the object will be requested, sending a prefetch request for the object using an If-Modified-Since directive, and transmitting a response to the If-Modified-Since directive prefetch request to a proxy client.04-16-2009
20090204764Cache Pooling for Computing Systems - In a computing system a method and apparatus for cache pooling is introduced. Threads are assigned priorities based on the criticality of their tasks. The most critical threads are assigned to main memory locations such that they are subject to limited or no cache contention. Less critical threads are assigned to main memory locations such that their cache contention with critical threads is minimized or eliminated. Thus, overall system performance is improved, as critical threads execute in a substantially predictable manner.08-13-2009
20090210627METHOD AND SYSTEM FOR HANDLING CACHE COHERENCY FOR SELF-MODIFYING CODE - A method for handling cache coherency includes allocating a tag when a cache line is not exclusive in a data cache for a store operation, and sending the tag and an exclusive fetch for the line to coherency logic. An invalidation request is sent within a minimum amount of time to an I-cache, preferably only if it has fetched to the line and has not been invalidated since, which request includes an address to be invalidated, the tag, and an indicator specifying the line is for a PSC operation. The method further includes comparing the request address against stored addresses of prefetched instructions, and in response to a match, sending a match indicator and the tag to an LSU, within a maximum amount of time. The match indicator is timed, relative to exclusive data return, such that the LSU can discard prefetched instructions following execution of the store operation that stores to a line subject to an exclusive data return, and for which the match is indicated.08-20-2009
20090222624Method and Apparatus For Economical Cache Population - A technique for efficiently populating a cache in a data processing system with resources is disclosed. In particular, a node in accordance with the illustrative embodiment of the present invention defers populating its cache with a resource until at least two requests for the resource have been received. This is advantageous because it prevents the cache from being populated with infrequently requested resources. Furthermore, the illustrative embodiment of the present invention populates a cache with a resource only when: at least i requests for the resource have been received at a given node within an elapsed time interval, Δt, wherein i is an integer greater than one; and at least one request for the resource has been received from at least n of the m filial nodes of the given node within an elapsed time interval, Δt, wherein m is an integer greater than one, n is an integer greater than one, and m≧n. Embodiments of the present invention are particularly advantageous in computer networks that comprise a logical hierarchical topology, but are useful an any computer network, and in individual data processing systems and routers that comprise a cache memory.09-03-2009
20090248985Data Transfer Optimized Software Cache for Regular Memory References - Mechanisms are provided for optimizing regular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which regular memory references access a storage of a software cache of a data processing system through a high locality cache mechanism of the software cache.10-01-2009
20080201529CONTEXT SWITCH DATA PREFETCHING IN MULTITHREADED COMPUTER - An apparatus, program product and method initiate, in connection with a context switch operation, a prefetch of data likely to be used by a thread prior to resuming execution of that thread. As a result, once it is known that a context switch will be performed to a particular thread, data may be prefetched on behalf of that thread so that when execution of the thread is resumed, more of the working state for the thread is likely to be cached, or at least in the process of being retrieved into cache memory, thus reducing cache-related performance penalties associated with context switching.08-21-2008
20120198169Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.08-02-2012
20100250854METHOD AND SYSTEM FOR DATA PREFETCHING FOR LOOPS BASED ON LINEAR INDUCTION EXPRESSIONS - An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.09-30-2010
20100235579Cache Management Within A Data Processing Apparatus - A data processing apparatus, and method of managing at least one cache within such an apparatus, are provided. The data processing apparatus has at least one processing unit for executing a sequence of instructions, with each such processing unit having a cache associated therewith, each cache having a plurality of cache lines for storing data values for access by the associated processing unit when executing the sequence of instructions. Identification logic is provided which, for each cache, monitors data traffic within the data processing apparatus and based thereon generates a preferred for eviction identification identifying one or more of the data values as preferred for eviction. Cache maintenance logic is then arranged, for each cache, to implement a cache maintenance operation during which selection of one or more data values for eviction from that cache is performed having regard to any preferred for eviction identification generated by the identification logic for data values stored in that cache. It has been found that such an approach provides a very flexible technique for seeking to improve cache storage utilisation.09-16-2010
20100228920PARALLEL PROCESSING PROCESSOR SYSTEM - A parallel processing processor system includes multiple processor elements, a main memory, and a shared memory, whose latency with the processors is less than the latency between the main memory and the processors. Each of the multiple processor elements has a DSP (Digital Signal Processor) and an instruction cache. Firmware executed by the DSPs is transferred from the main memory to the shared memory and is shared by the DSPs. Updating of the instruction caches in the case where a cache miss has occurred is performed by, for example, copying, into the instruction caches, the content of the shared memory corresponding to an address accessed by a DSP.09-09-2010
20100138610METHOD AND APPARATUS FOR INCLUSION OF TLB ENTRIES IN A MICRO-OP CACHE OF A PROCESSOR - Methods and apparatus for inclusion of TLB (translation look-aside buffer) in processor micro-op caches are disclosed. Some embodiments for inclusion of TLB entries have micro-op cache inclusion fields, which are set responsive to accessing the TLB entry. Inclusion logic may the flush the micro-op cache or portions of the micro-op cache and clear corresponding inclusion fields responsive to a replacement or invalidation of a TLB entry whenever its associated inclusion field had been set. Front-end processor state may also be cleared and instructions refetched when replacement resulted from a TLB miss.06-03-2010
20100138611METHOD AND APPARATUS FOR PIPELINE INCLUSION AND INSTRUCTION RESTARTS IN A MICRO-OP CACHE OF A PROCESSOR - Methods and apparatus for instruction restarts and inclusion in processor micro-op caches are disclosed. Embodiments of micro-op caches have way storage fields to record the instruction-cache ways storing corresponding macroinstructions. Instruction-cache in-use indications associated with the instruction-cache lines storing the instructions are updated upon micro-op cache hits. In-use indications can be located using the recorded instruction-cache ways in micro-op cache lines. Victim-cache deallocation micro-ops are enqueued in a micro-op queue after micro-op cache miss synchronizations, responsive to evictions from the instruction-cache into a victim-cache. Inclusion logic also locates and evicts micro-op cache lines corresponding to the recorded instruction-cache ways, responsive to evictions from the instruction-cache.06-03-2010
20120144119PROGRAMMABLE ATOMIC MEMORY USING STORED ATOMIC PROCEDURES - A processing core in a multi-processing core system is configured to execute a sequence of instructions as a single atomic memory transaction. The processing core validates that the sequence meets a set of one or more atomicity criteria, including that no instruction in the sequence instructs the processing core to access shared memory. After validating the sequence, the processing core executes the sequence as a single atomic memory transaction, such as by locking a source cache line that stores shared memory data, executing the validated sequence of instructions, storing a result of the sequence into the source cache line, and unlocking the source cache line.06-07-2012
20100306476CONTROLLING SIMULATION OF A MICROPROCESSOR INSTRUCTION FETCH UNIT THROUGH MANIPULATION OF INSTRUCTION ADDRESSES - Instruction fetch unit (IFU) verification is improved by dynamically monitoring the current state of the IFU model and detecting any predetermined states of interest. The instruction address sequence is automatically modified to force a selected address to be fetched next by the IFU model. The instruction address sequence may be modified by inserting one or more new instruction addresses, or by jumping to a non-sequential address in the instruction address sequence. In exemplary implementations, the selected address is a corresponding address for an existing instruction already loaded in the IFU cache, or differs only in a specific field from such an address. The instruction address control is preferably accomplished without violating any rules of the processor architecture by sending a flush signal to the IFU model and overwriting an address register corresponding to a next address to be fetched.12-02-2010
20090070532System and Method for Efficiently Testing Cache Congruence Classes During Processor Design Verification and Validation - A system and method for using a single test case to test each sector within multiple congruence classes is presented. A test case generator builds a test case for accessing each sector within a congruence class. Since a congruence class spans multiple congruence pages, the test case generator builds the test case over multiple congruence pages in order for the test case to test the entire congruence class. During design verification and validation, a test case executor modifies a congruence class identifier (e.g., patches a base register), which forces the test case to test a specific congruence class. By incrementing the congruence class identifier after each execution of the test case, the test case executor is able to test each congruence class in the cache using a single test case.03-12-2009
20100281218INTELLIGENT CACHE REPLACEMENT MECHANISM WITH VARYING AND ADAPTIVE TEMPORAL RESIDENCY REQUIREMENTS - A method for replacing cached data is disclosed. The method in one aspect associates an importance value to each block of data in the cache. When a new entry needs to be stored in the cache, a cache block for replacing is selected based on the importance values associated with cache blocks. In another aspect, the importance values are set according to the hardware and/or software's knowledge of the memory access patterns. The method in one aspect may also include varying the importance value over time over different processing requirements.11-04-2010
20130138888STORING A TARGET ADDRESS OF A CONTROL TRANSFER INSTRUCTION IN AN INSTRUCTION FIELD - A control transfer instruction (CTI), such as a branch, jump, etc., may have an offset value for a control transfer that is to be performed. The offset value may be usable to compute a target address for the CTI (e.g., the address of a next instruction to be executed for a thread or instruction stream). The offset may be specified relative to a program counter. In response to detecting a specified offset value, the CTI may be modified to include at least a portion of a computed target address. Information indicating this modification has been performed may be stored, for example, in a pre-decode bit. In some cases, CTI modification may be performed only when a target address is a “near” target, rather than a “far” target. Modifying CTIs as described herein may eliminate redundant address calculations and produce a savings of power and/or time in some embodiments.05-30-2013
20100325359TRACING OF DATA FLOW - Embodiments for tracing dataflow for a computer program are described. The computer program includes machine instructions that are executable on a microprocessor. A decoding module can be configured to decode machine instructions obtained from a computer memory. In addition, a dataflow primitive engine can receive a decoded machine instruction from the decoding module and generate at least one dataflow primitive for the decoded machine instruction based on a dataflow primitive classification into which the decoded machine instruction are categorized by the dataflow primitive engine. A dataflow state table can be configured to track addressed data locations that are affected by dataflow. The dataflow primitives can be applied to the dataflow state table to update a dataflow status for the addressed data locations affected by the decoded machine instruction.12-23-2010
20100332759METHOD OF PROGRAM OBFUSCATION AND PROCESSING DEVICE FOR EXECUTING OBFUSCATED PROGRAMS - A program is obfuscated by reordering its instructions. Original instruction addresses are mapped to target addresses. A cache efficient obfuscated program is realized by restricting target addresses of a sequence of instructions to a limited set of the disjoint ranges (12-30-2010
20100332760MECHANISM TO HANDLE EVENTS IN A MACHINE WITH ISOLATED EXECUTION - A platform and method for secure handling of events in an isolated environment. A processor executing in isolated execution “IsoX” mode may leak data when an event occurs as a result of the event being handled in a traditional manner based on the exception vector. By defining a class of events to be handled in IsoX mode, and switching between a normal memory map and an IsoX memory map dynamically in response to receipt of an event of the class, data security may be maintained in the face of such events.12-30-2010
20100332758Cache memory device, processor, and processing method - A cache memory device includes: a data memory storing data written by an arithmetic processing unit; a connecting unit connecting an input path from the arithmetic processing unit to the data memory and an output path from the data memory to a main storage unit; a selecting unit provided on the output path to select data from the data memory or data from the arithmetic processing unit via the connecting unit, and to transfer the selected data to the output path; and a control unit controlling the selecting unit such that the data from the data memory is transferred to the output path when the data is written from the data memory to the main storage unit, and such that the data is transferred to the output path via the connecting unit when the data is written from the arithmetic processing unit to the main storage unit.12-30-2010
20110035551MICROPROCESSOR WITH REPEAT PREFETCH INDIRECT INSTRUCTION - A microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table.02-10-2011
20110040939MICROPROCESSOR WITH INTEGRATED HIGH SPEED MEMORY - The present invention relates to the field of (micro)computer design and architecture, and in particular to microarchitecture associated with moving data values between a (micro)processor and memory components. Particularly, the present invention relates to a computer system with an processor architecture in which register addresses are generated with more than one execution channel controlled by one central processing unit with at least one load/store unit for loading and storing data objects, and at least one cache memory associated to the processor holding data objects accessed by the processor, wherein said processor's load/store unit contains a high speed memory directly interfacing said load/store unit to the cache. The present invention improves the of architectures with dual ported microprocessor implementations comprising two execution pipelines capable of two load/store data transactions per cycle. By including a cache memory inside the load/store unit, the processor is directly interfaced from its load/store units to the caches. Thus, the present invention accelerates data accesses and transactions from and to the load/store units of the processor and the data cache memory.02-17-2011
20110145503ON-LINE OPTIMIZATION OF SOFTWARE INSTRUCTION CACHE - A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.06-16-2011
20100153648Block Driven Computation Using A Caching Policy Specified In An Operand Data Structure - A processor has an associated memory hierarchy including a cache memory. The processor includes an instruction sequencing unit that fetches instructions for processing, an operand data structure including a plurality of entries corresponding to operands of operations to be performed by the processor, and a computation engine. A first entry among the plurality of entries in the operand data structure specifies a first caching policy for a first operand, and a second entry specifies a second caching policy for a second operand. The computation engine computes and stores operands in the memory hierarchy in accordance with the cache policies indicated within the operand data structure.06-17-2010
20100030967METHOD AND SYSTEM FOR SECURING INSTRUCTION CACHES USING SUBSTANTIALLY RANDOM INSTRUCTION MAPPING SCHEME - A method and system is provided for securing micro-architectural instruction caches (I-caches). Securing an I-cache involves maintaining a different substantially random instruction mapping policy into an I-cache for each of multiple processes, and for each process, performing a substantially random mapping scheme for mapping a process instruction into the I-cache based on the substantially random instruction mapping policy for said process. Securing the I-cache may further involve dynamically partitioning the I-cache into multiple logical partitions, and sharing access to the I-cache by an I-cache mapping policy that provides access to each I-cache partition by only one logical processor.02-04-2010
20090164729SYNC-ID FOR MULTIPLE CONCURRENT SYNC DEPENDENCIES IN AN OUT-OF-ORDER STORE QUEUE - A method, system and process for retiring data entries held within a store queue (STQ). The STQ of a processor cache is modified to receive and process multiple synchronized groups (sync-groups). Sync groups comprise thread of execution synchronized (thread-sync) entries, all thread of execution synchronized (all-thread-sync) entries, and regular store entries (non-thread-sync and non-all-thread-sync). The task of storing data entries, from the STQ out to memory or an input/output device, is modified to increase the effectiveness of the cache. Sync-groups are created for each thread and tracked within the STQ via a synchronized identification (SID). An entry is eligible for retirement when the entry is within a currently retiring sync-group as identified by the SID.06-25-2009
20090216952METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR MANAGING CACHE MEMORY - A method for managing cache memory including receiving an instruction fetch for an instruction stream in a cache memory, wherein the instruction fetch includes an instruction fetch reference tag for the instruction stream and the instruction stream is at least partially included within a cache line, comparing the instruction fetch reference tag to a previous instruction fetch reference tag, maintaining a cache replacement status of the cache line if the instruction fetch reference tag is the same as the previous instruction fetch reference tag, and upgrading the cache replacement status of the cache line if the instruction fetch reference tag is different from the previous instruction fetch reference tag, whereby the cache replacement status of the cache line is upgraded if the instruction stream is independently fetched more than once. A corresponding system and computer program product.08-27-2009
20100070713DEVICE AND METHOD FOR FETCHING INSTRUCTIONS - A device and a method for fetching instructions. The device includes a processor adapted to execute instructions; a high level memory unit adapted to store instructions; a direct memory access (DMA) controller that is controlled by the processor; an instruction cache that includes a first input port and a second input port; wherein the instruction cache is adapted to provide instructions to the processor in response to read requests that are generated by the processor and received via the first input port; wherein the instruction cache is further adapted to fetch instructions from a high level memory unit in response to read requests, generated by the DMA controller and received via the second input port.03-18-2010
20110153947INFORMATON PROCESSING DEVICE AND INFORMATION PROCESSING METHOD - In an information processing device for processing VLIW includes memory banks, a memory banks are used to store an instruction word group constituting a very-long instruction. A program counter outputs an instruction address indicating a head memory bank containing a head part of the very long instruction of the next cycle. A memory bank control device uses information regarding the instruction address for the very long instruction and the number of memory banks associated with the very long instruction to specify the use memory bank to be used in the next cycle and the nonuse memory bank not to be used in the next cycle. The memory bank control device controls the operation of the nonuse memory bank. The instruction decoder decodes the very long instruction fetched from the use memory bank. An arithmetic device executes the decoded very long instruction.06-23-2011
20090089507OVERLAY INSTRUCTION ACCESSING UNIT AND OVERLAY INSTRUCTION ACCESSING METHOD - The present invention provides an overlay instruction accessing unit and method, and a method and apparatus for compressing and storing a program. The overlay instruction accessing unit is used to execute a program stored in a memory in the form of a plurality of compressed program segments, and compresses: a buffer; a processing unit for issuing an instruction reading request, reading an instruction from the buffer, and executing the instruction; and a decompressing unit for reading a requested compressed instruction segment from the memory in response to the instruction reading request of the processing unit, decompressing the compressed instruction segment, and storing the decompressed instruction segment in the buffer, wherein while the processing unit is executing the instruction segment, the decompressing unit reads, according to a storage address of a compressed program segment to be invoked in a header corresponding to the instruction segment, a corresponding compressed instruction segment from the memory, decompresses the compressed instruction segment, and stores the decompressed instruction segment in the buffer for later use by the processing unit.04-02-2009
20120303900PROCESSING PIPELINE CONTROL - A graphics processing unit 11-29-2012
20110029735METHOD FOR MANAGING AN EMBEDDED SYSTEM TO ENHANCE PERFORMANCE THEREOF, AND ASSOCIATED EMBEDDED SYSTEM - A method for managing an embedded system is provided. The method includes selecting one of a first memory and a second memory according to at least one criterion, where the selected memory is a source from which the embedded system reads commands of a program, and an access speed of the first memory is different from that of the second memory; and controlling the embedded system to execute the program by utilizing the selected memory as the source.02-03-2011
20130159628METHODS AND APPARATUS FOR SOURCE OPERAND COLLECTOR CACHING - Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction.06-20-2013
20100064106DATA PROCESSOR AND DATA PROCESSING SYSTEM - The present invention provides a data processor capable of automatically discriminating a loop program and performing a reduction in power by size-variable lock control on an instruction buffer. The instruction buffer of the data processor includes a buffer controller for controlling a memory unit that stores each fetched instruction therein. When an execution history of a fetched condition branch instruction suggests condition establishment, and in the case that the branch direction of the fetched condition branch instruction is a direction opposite to the order of an instruction execution and the difference of instruction addresses from the branch source to the branch target based on the condition branch instruction is a range held in the storage capacity of the instruction buffer, the buffer controller retains an instruction sequence from a branch source to a branch target based on the condition branch instruction in the instruction buffer. While the instruction execution of the instruction sequence retained therein is repeated, the buffer controller supplies the corresponding instruction of the instruction sequence from the instruction buffer to the instruction decoder and releases retention of the instruction sequence when the instruction execution is exited from the instruction sequence.03-11-2010
20080229021Systems and Methods of Revalidating Cached Objects in Parallel with Request for Object - The present solution provides a variety of techniques for accelerating and optimizing network traffic, such as HTTP based network traffic. The solution described herein provides techniques in the areas of proxy caching, protocol acceleration, domain name resolution acceleration as well as compression improvements. In some cases, the present solution provides various prefetching and/or prefreshening techniques to improve intermediary or proxy caching, such as HTTP proxy caching. In other cases, the present solution provides techniques for accelerating a protocol by improving the efficiency of obtaining and servicing data from an originating server to server to clients. In another cases, the present solution accelerates domain name resolution more quickly. As every HTTP access starts with a URL that includes a hostname that must be resolved via domain name resolution into an IP address, the present solution helps accelerate HTTP access. In some cases, the present solution improves compression techniques by prefetching non-cacheable and cacheable content to use for compressing network traffic, such as HTTP. The acceleration and optimization techniques described herein may be deployed on the client as a client agent or as part of a browser, as well as on any type and form of intermediary device, such as an appliance, proxying device or any type of interception caching and/or proxying device.09-18-2008
20080282034Memory Subsystem having a Multipurpose Cache for a Stream Graphics Multiprocessor - A method and a computing system are provided. The computing system may include a system memory configured to store data in a first data format. The computing system may also include a computational core comprising a plurality of execution units (EU). The computational core may be configured to request data from the system memory and to process data in a second data format. Each of the plurality of EU may include an execution control and datapath and a specialized L1 cache pool. The computing system may include a multipurpose L2 cache in communication with the each of the plurality of EU and the system memory. The multipurpose L2 cache may be configured to store data in the first data format and the second data format. The computing system may also include an orthogonal data converter in communication with at least one of the plurality of EU and the system memory.11-13-2008
20110047333ALLOCATING PROCESSOR CORES WITH CACHE MEMORY ASSOCIATIVITY - Techniques are generally described related to a multi-core processor with a plurality of processor cores and a cache memory shared by at least some of the processor cores. The multi-core processor can be configured for separately allocating a respective level of cache memory associativity to each of the processing cores.02-24-2011
20080256300System and method for dynamically reconfiguring a vertex cache - A system to process a plurality of vertices to model an object. An embodiment of the system includes a processor, a front end unit coupled to the processor, and cache configuration logic coupled to the front end unit and the processor. The processor is configured to process the plurality of vertices. The front end unit is configured to communicate vertex data to the processor. The cache configuration logic is configured to establish a cache line size of a vertex cache based on a vertex size of a drawing command.10-16-2008
20080256301PROTECTION OF THE EXECUTION OF A PROGRAM - A method for controlling the execution of at least one program in an electronic circuit and a processor for executing a program, in which at least one volatile memory area of the circuit is, prior to the execution of the program to be controlled, filled with first instructions resulting in an exception processing; the program contains instructions for replacing all or part of the first instructions with second valid instructions; and the area is called for execution of all or part of the instruction that it contains at the end of the execution of the instruction program.10-16-2008
20110055484Detecting Task Complete Dependencies Using Underlying Speculative Multi-Threading Hardware - Mechanisms are provided for tracking dependencies of threads in a multi-threaded computer program execution. The mechanisms detect a dependency of a first thread's execution on results of a second thread's execution in an execution flow of the multi-threaded computer program. The mechanisms further store, in a hardware thread dependency vector storage associated with the first thread's execution, an identifier of the dependency by setting at least one bit in the hardware thread dependency vector storage corresponding to the second thread. Moreover, the mechanisms schedule tasks performed by the multi-threaded computer program based on the hardware thread dependency vector storage to minimize squashing of threads.03-03-2011
20110055483TRANSACTIONAL MEMORY SYSTEM WITH EFFICIENT CACHE SUPPORT - A computer implemented method for use by a transaction program for managing memory access to a shared memory location for transaction data of a first thread, the shared memory location being accessible by the first thread and a second thread. A string of instructions to complete a transaction of the first thread are executed, beginning with one instruction of the string of instructions. It is determined whether the one instruction is part of an active atomic instruction group (AIG) of instructions associated with the transaction of the first thread. A cache structure and a transaction table which together provide for entries in an active mode for the AIG are located if the one instruction is part of an active AIG. The next instruction is executed under a normal execution mode in response to determining that the one instruction is not part of an active AIG.03-03-2011
20080201528MEMORY ACCESS SYSTEMS FOR CONFIGURING WAYS AS CACHE OR DIRECTLY ADDRESSABLE MEMORY - A memory system is provided. A processor provides a data access address. A memory device includes a predetermined number of ways. The processor selectively configures a selected number less than or equal to the predetermined number of the ways as cache memory belonging to a cacheable region, and configures remaining ways as directly addressable memory belonging to a directly addressable region by memory configuration information. A memory controller determines the data access address corresponding to the cacheable region or the directly addressable region, selects only the way in the directly addressable region corresponding to the data access address when the data access address corresponds to the directly addressable region, and selects only the way(s) belonging to the cacheable region when the data access address corresponds to the cacheable region. A configuration controller monitors the status of the ways and adjusting the memory configuration information according to the status of the ways.08-21-2008
20110264863DEVICE, SYSTEM, AND METHOD FOR USING A MASK REGISTER TO TRACK PROGRESS OF GATHERING ELEMENTS FROM MEMORY - A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.10-27-2011
20100332757Integrated Pipeline Write Hazard Handling Using Memory Attributes - A system and method for write hazard handling are described, including a method comprising pre-computing a memory management unit policy for a write request using an address that is at least one clock cycle before data; registering the pre-computed memory management unit policy; using the pre-computed memory management unit policy to control a pipeline stall to ensure that non-bufferable writes are pipeline-protected, ensuring that no non-bufferable locations are bypassed from within the pipeline and all subsequent non-bufferable reads will get data from a final destination; bypassing a read request only after a corresponding write request is updated an write pending buffer; decoding the write request with the write request aligned to data; registering the write request in the write pending buffer; allowing arbitration logic to force the pipeline stall for a region that will have a write conflict; and stalling read requests to protect against write hazards.12-30-2010
20120151145Data Driven Micro-Scheduling of the Individual Processing Elements of a Wide Vector SIMD Processing Unit - A method for optimizing processing in a SIMD core. The method comprises processing units of data within a working domain, wherein the processing includes one or more working items executing in parallel within a persistent thread. The method further comprises retrieving a unit of data from within a working domain, processing the unit of data, retrieving other units of data when processing of the unit of data has finished, processing the other units of data, and terminating the execution of the working items when processing of the working domain has finished.06-14-2012
20110138126Atomic Commit Predicated on Consistency of Watches - Mechanisms for performing predicated atomic commits based on consistency of watches is provided. These mechanisms include executing, by a thread executing on a processor of the data processing system, an atomic release instruction. A determination is made as to whether a speculative store has been lost, due to an eviction of a memory block to which the speculative store is performed, since a previous atomic release instruction was processed. In response to the speculative store having been lost, invalidating, by the processor, speculative stores that have been performed since the previous atomic release instruction was processed. In addition, the method comprises, in response to the speculative store not having been lost, committing, by the processor, speculative stores that have been performed since the previous atomic release instruction was processed.06-09-2011
20110072215Cache system and control method of way prediction for cache memory - A cache device according to an exemplary aspect of the present invention includes a way information buffer that stores way information that is a result of selecting a way in an instruction that accesses a cache memory; and a control unit that controls a storage processing and a read processing, while a series of instruction groups are repeatedly executed, the storage processing being for storing the way information in the instruction groups to the way information memory, the read processing being for reading the way information from the way information memory.03-24-2011
20100122034STORAGE ARRAY TILE SUPPORTING SYSTOLIC MOVEMENT OPERATIONS - A tile for use in a tiled storage array provides re-organization of values within the tile array without requiring sophisticated global control. The tiles operate to move a requested value to a front-most storage element of the tile array according to a global systolic clock. The previous occupant of the front-most location is moved or swapped backward according to the systolic clock, and the new occupant is moved forward according to the systolic clock, according to the operation of the tiles, while providing for multiple in-flight access requests within the tile array. The placement heuristic that moves the values is determined according to the position of the tiles within the array and the behavior of the tiles. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the tile array.05-13-2010
20110093658CLASSIFYING AND SEGREGATING BRANCH TARGETS - A system and method for branch prediction in a microprocessor. A branch prediction unit stores an indication of a location of a branch target instruction relative to its corresponding branch instruction. For example, a target instruction may be located within a first region of memory as a branch instruction. Alternatively, the target instruction may be located outside the first region, but within a larger second region. The prediction unit comprises a branch target array corresponding to each region. Each array stores a bit range of a branch target address, wherein the stored bit range is based upon the location of the target instruction relative to the branch instruction. The prediction unit constructs a predicted branch target address by concatenating a bits stored in the branch target arrays.04-21-2011
20110179226DATA PROCESSOR - The present invention provides a data processor capable of reducing power consumption at the time of execution of a spin wait loop for a spinlock. A CPU executes a weighted load instruction at the time of performing a spinlock process and outputs a spin wait request to a corresponding cache memory. When the spin wait request is received from the CPU, the cache memory temporarily stops outputting an acknowledge response to a read request from the CPU until a predetermined condition (snoop write hit, interrupt request, or lapse of predetermined time) is satisfied. Therefore, pipeline execution of the CPU is stalled and the operation of the CPU and the cache memory can be temporarily stopped, and power consumption at the time of executing a spin wait loop can be reduced.07-21-2011
20090187712Operation Frame Filtering, Building, And Execution - The present subject matter relates to operation frame filtering, building, and execution. Some embodiments include identifying a frame signature, counting a number of execution occurrences of the frame signature, and building a frame of operations to execute instead of operations identified by the frame signature.07-23-2009
20100030968Methods of Cache Bounded Reference Counting - A computer implemented method of cache bounded reference counting for computer languages having automated memory management in which, for example, a reference to an object “Z” initially stored in an object “O” is fetched and the cache hardware is queried whether the reference to the object “Z” is a valid reference, is in the cache, and has a continuity flag set to “on”. If so, the object “O” is locked for an update, a reference counter is decremented for the object “Z” if the object “Z” resides in the cache, and a return code is set to zero to indicate that the object “Z” is de-referenced and that its storage memory can be released and re-used if the reference counter for the object “Z” reaches zero. Thereafter, the cache hardware is similarly queried regarding an object “N” that will become a new reference of object “O”.02-04-2010
20100023696Methods and System for Resolving Simultaneous Predicted Branch Instructions - A method of resolving simultaneous branch predictions prior to validation of the predicted branch instruction is disclosed. The method includes processing two or more predicted branch instructions, with each predicted branch instruction having a predicted state and a corrected state. The method further includes selecting one of the corrected states. Should one of the predicted branch instructions be mispredicted, the selected corrected state is used to direct future instruction fetches.01-28-2010
20110307662MANAGING CACHE COHERENCY FOR SELF-MODIFYING CODE IN AN OUT-OF-ORDER EXECUTION SYSTEM - A method, system, and computer program product for managing cache coherency for self-modifying code in an out-of-order execution system are disclosed. A program-store-compare (PSC) tracking manager identifies a set of addresses of pending instructions in an address table that match an address requested to be invalidated by a cache invalidation request. The PSC tracking manager receives a fetch address register identifier associated with a fetch address register for the cache invalidation request. The fetch address register is associated with the set of addresses and is a PSC tracking resource reserved by a load store unit (LSU) to monitor an exclusive fetch for a cache line in a high level cache. The PSC tracking manager determines that the set of entries in an instruction line address table associated with the set of addresses is invalid and instructs the LSU to free the fetch address register.12-15-2011
20110307663Storage Unsharing - A method is described to partition the memory of application-specific hardware compiled from a software program. Applying the invention generates multiple small memories that need not be kept coherent and are defined over a specific region of the program. The invention creates application specific hardware which preserves the memory image and addressing model of the original software program. The memories are dynamically initialized and flushed at the entries and exits of the program region they are defined in.12-15-2011
20120042129ARRANGEMENT METHOD OF PROGRAMS TO MEMORY SPACE, APPARATUS, AND RECORDING MEDIUM - For a program that is made up of functions in units, each function is divided into instruction code blocks having a size CS where CS is the instruction cache line size of a target processor and an instruction code block that is X02-16-2012
20090172285TRACKING TEMPORAL USE ASSOCIATED WITH CACHE EVICTIONS - A method and apparatus for tracking temporal use associated with cache evictions to reduce allocations in a victim cache is disclosed. Access data for a number of sets of instructions in an instruction cache is tracked at least until the data for one or more of the sets reach a predetermined threshold condition. Determinations whether to allocate entry storage in the victim cache may be made responsive in part to the access data for sets reaching the predetermined threshold condition. A micro-operation can be inserted into the execution pipeline in part to synchronize the access data for all the sets. Upon retirement of the micro-operation from the execution pipeline, access data for the sets can be synchronized and/or any previously allocated entry storage in the victim cache can be invalidated.07-02-2009
20120047329Reducing Cache Power Consumption For Sequential Accesses - In some embodiments, a cache may include a tag array and a data array, as well as circuitry that detects whether accesses to the cache are sequential (e.g., occupying the same cache line). For example, a cache may include a tag array and a data array that stores data, such as multiple bundles of instructions per cache line. During operation, it may be determined that successive cache requests are sequential and do not cross a cache line boundary. Responsively, various cache operations may be inhibited to conserve power. For example, access to the tag array and/or data array, or portions thereof, may be inhibited.02-23-2012
20120159076LATE-SELECT, ADDRESS-DEPENDENT SENSE AMPLIFIER - Sense amplifiers in a memory may be activated and deactivated. In one embodiment, a processor may include a memory. The memory may include a number of sense amplifiers. Based on a late arriving address bit of an address used to access data from the memory, a sense amplifier may be activated while another sense amplifier may be deactivated.06-21-2012
20090132766SYSTEMS AND METHODS FOR LOOKAHEAD INSTRUCTION FETCHING FOR PROCESSORS - Systems and methods may be provided for lookahead instruction fetching for processors. The systems and methods may include an L1 instruction cache, where the L1 instruction cache may include a plurality of lines of data, where each line of data may include one or more instructions. The systems and methods may also include a tagless hit instruction cache, where the tagless hit instruction cache may store a subset of the lines of data in the L1 instruction cache, where instructions in the lines of data stored in the tagless hit instruction cache may be stored with metadata indicative of whether a next instruction is guaranteed to reside in the tagless hit instruction cache, where an instruction fetcher may be arranged to have direct access to the L1 instruction cache and the tagless hit instruction cache, and where the tagless hit instruction cache may be arranged to have direct access to the L1 instruction cache.05-21-2009
20120124292Computer System Having Cache Subsystem Performing Demote Requests - Computer system having cache subsystem wherein demote requests are performed by the cache subsystem. Software indicates to hardware of a processing system that its storage modification to a particular cache line is done, and will not be doing any modification for the time being. With this indication, the processor actively releases its exclusive ownership by updating its line ownership from exclusive to read-only (or shared) in its own cache directory and in the storage controller (SC). By actively giving up the exclusive rights, another processor can immediately be given exclusive ownership to that said cache line without waiting on any processor's explicit cross invalidate acknowledgement. This invention also describes the hardware design needed to provide this support.05-17-2012
20100205376METHOD FOR THE IMPROVEMENT OF MICROPROCESSOR SECURITY - A method for the improvement of the security of microprocessors (08-12-2010
20120137077MISS BUFFER FOR A MULTI-THREADED PROCESSOR - A multi-threaded processor configured to allocate entries in a buffer for instruction cache misses is disclosed. Entries in the buffer may store thread state information for a corresponding instruction cache miss for one of a plurality of threads executable by the processor. The buffer may include dedicated entries and dynamically allocable entries, where the dedicated entries are reserved for a subset of the plurality of threads and the dynamically allocable entries are allocable to a group of two or more of the plurality of threads. In one embodiment, the dedicated entries are dedicated for use by a single thread and the dynamically allocable entries are allocable to any of the plurality of threads. The buffer may store two or more entries for a given thread at a given time. In some embodiments, the buffer may help ensure none of the plurality of threads experiences starvation with respect to instruction fetches.05-31-2012
20120137076Control of entry of program instructions to a fetch stage within a processing pipepline - A processing pipeline 05-31-2012
20120173821Predicting the Instruction Cache Way for High Power Applications - A mechanism for accessing a cache memory is provided. With the mechanism of the illustrative embodiments, a processor of the data processing system performs a first execution a portion of code. During the first execution of the portion of code, information identifying which cache lines in the cache memory are accessed during the execution of the portion of code is stored in a storage device of the data processing system. Subsequently, during a second execution of the portion of code, power to the cache memory is controlled such that only the cache lines that were accessed during the first execution of the portion of code are powered-up.07-05-2012
20100274972SYSTEMS, METHODS, AND APPARATUSES FOR PARALLEL COMPUTING - Systems, methods, and apparatuses for parallel computing are described. In some embodiments, a processor is described that includes a front end and back end. The front includes an instruction cache to store instructions of a strand. The back end includes a scheduler, register file, and execution resources to execution the strand's instructions.10-28-2010
20120179873Performance of Emerging Applications in a Virtualized Environment Using Transient Instruction Streams - A method, system and computer-usable medium are disclosed for managing transient instruction streams. Transient flags are defined in Branch-and-Link (BRL) instructions that are known to be infrequently executed. A bit is likewise set in a Special Purpose Register (SPR) of the hardware (e.g., a core) that is executing an instruction request thread. Subsequent fetches or prefetches in the request thread are treated as transient and are not written to lower-level caches. If an instruction is non-transient, and if a lower-level cache is non-inclusive of the L1 instruction cache, a fetch or prefetch miss that is obtained from memory may be written in both the L1 and the lower-level cache. If it is not inclusive, a cast-out from the L1 instruction cache may be written in the lower-level cache.07-12-2012
20120284461Methods and Apparatus for Storage and Translation of Entropy Encoded Software Embedded within a Memory Hierarchy - A system for translating compressed instructions to instructions in an executable format is described. A translation unit is configured to decompress compressed instructions into a native instruction format using X and Y indices accessed from a memory, a translation memory, and a program specified mix mask. A level 1 cache is configured to store the native instruction format for each compressed instruction. The memory may be configured as a paged instruction cache to store pages of compressed instructions intermixed with pages of uncompressed instructions. Methods of determining a mix mask for efficiently translating compressed instructions is also described. A genetic method uses pairs of mix masks as genes from a seed population of mix masks that are bred and may be mutated to produce pairs of offspring mix masks to update the seed population. A mix mask for efficiently translating compressed instructions is determined from the updated seed population.11-08-2012
20090063773Technique to enable store forwarding during long latency instruction execution - A technique to allow independent loads to be satisfied during high-latency instruction processing. Embodiments of the invention relate to a technique in which a storage structure is used to hold store operations in program order while independent load instructions are satisfied during a time in which a high-latency instruction is being processed. After the high-latency instruction is processed, the store operations can be restored in program order without searching the storage structure.03-05-2009
20120260042LOAD MULTIPLE AND STORE MULTIPLE INSTRUCTIONS IN A MICROPROCESSOR THAT EMULATES BANKED REGISTERS - A microprocessor supports an instruction set architecture that specifies: processor modes, architectural registers associated with each mode, and a load multiple instruction that instructs the microprocessor to load data from memory into specified ones of the registers. Direct storage holds data associated with a first portion of the registers and is coupled to an execution unit to provide the data thereto. Indirect storage holds data associated with a second portion of the registers and cannot directly provide the data to the execution unit. Which architectural registers are in the first and second portions varies dynamically based upon the current processor mode. If a specified register is currently in the first portion, the microprocessor loads data from memory into the direct storage, whereas if in the second portion, the microprocessor loads data from memory into the direct storage and then stores the data from the direct storage to the indirect storage.10-11-2012
20120226868SYSTEMS AND METHODS FOR PROVIDING DETERMINISTIC EXECUTION - Devices and methods for providing deterministic execution of multithreaded applications are provided. In some embodiments, each thread is provided access to an isolated memory region, such as a private cache. In some embodiments, more than one private cache are synchronized via a modified MOESI coherence protocol. The modified coherence protocol may be configured to refrain from synchronizing the isolated memory regions until the end of an execution quantum. The execution quantum may end when all threads experience a quantum end event such as reaching a threshold instruction count, overflowing the isolated memory region, and/or attempting to access a lock released by a different thread in the same quantum.09-06-2012
20090019225INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING SYSTEM - With respect to memory access instructions contained in an internal representation program, an information processing apparatus generates a load cache instruction, a cache hit judgment instruction, and a cache miss instruction that is executed in correspondence with a result of a judgment process performed according to the cache hit judgment instruction. In a case where the internal representation program contains a plurality of memory access instruction having a possibility of using mutually the same cache line in a cache memory when mutually different cache lines in a main memory are accessed, the information processing apparatus generates a combine instruction instructing that judgment results of the judgment processes that are performed according to the cache hit judgment instruction should be combined into one judgment result. The information processing apparatus outputs an output program that contains these instructions that have been generated.01-15-2009
20080301369PROCESSING OF SELF-MODIFYING CODE IN MULTI-ADDRESS-SPACE AND MULTI-PROCESSOR SYSTEMS - A method and system of storing to an instruction stream with a multiprocessor or multiple-address-space system is disclosed. A central processing unit may cache instructions in a cache from a page of primary code stored in a memory storage unit. The central processing unit may execute cached instructions from the cache until a serialization operation is executed. The central processing unit may check in a message queue for a notification message indicating potential storing to the page. If the notification message is present in the message queue, cached instructions from the page are invalidated.12-04-2008
20120272006DYNAMIC LOCKSTEP CACHE MEMORY REPLACEMENT LOGIC - To facilitate dynamic lockstep support, replacement states and/or logic used to select particular cache lines for replacement with new allocations in accord with replacement algorithms or strategies may be enhanced to provide generally independent replacement contexts for use in respective lockstep and performance modes. In some cases, replacement logic that may be otherwise conventional in its selection of cache lines for new allocations in accord with a first-in, first-out (FIFO), round-robin, random, least recently used (LRU), pseudo LRU, or other replacement algorithm/strategy is at least partially replicated to provide lockstep and performance instances that respectively cover lockstep and performance partitions of a cache. In some cases, a unified instance of replacement logic may be reinitialized with appropriate states at (or coincident with) transitions between performance and lockstep modes of operation.10-25-2012
20110213932DECODING APPARATUS AND DECODING METHOD - A decoding apparatus is used which includes: a CABAC processing unit (09-01-2011
20110238917HIGH-PERFORMANCE CACHE SYSTEM AND METHOD - A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions, Further, the cache control unit is also configured to examine instructions being filled from the first memory to the second memory to extract instruction information containing at least branch information, to create a plurality of tracks based on the extracted instruction information; and to fill the at least one or more instructions based on one or more tracks from the plurality of instruction tracks,09-29-2011
20110276764CRACKING DESTRUCTIVELY OVERLAPPING OPERANDS IN VARIABLE LENGTH INSTRUCTIONS - A method, information processing system, and computer program product manage computer executable instructions. At least one machine instruction for execution is received. The at least one machine instruction is analyzed. The machine instruction is identified as a predefined instruction for storing a variable length first operand in a memory location. Responsive to this identification and based on fields of the machine instruction, a relative location of a variable length second operand of the instruction with location of the first operand is determined. Responsive to the relative location having the predefined relationship, a first cracking operation is performed. The first cracking operation cracks the instruction into a first set of micro-ops (Uops) to be executed in parallel. The second set of Uops is for storing a first plurality of first blocks in the first operand. Each of said first block to be stored are identical. The first set Uops are executed.11-10-2011
20110320724DMA-BASED ACCELERATION OF COMMAND PUSH BUFFER BETWEEN HOST AND TARGET DEVICES - Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer.12-29-2011
20110320723METHOD AND SYSTEM TO REDUCE THE POWER CONSUMPTION OF A MEMORY DEVICE - A method and system to reduce the power consumption of a memory device. In one embodiment of the invention, the memory device is a N-way set-associative level one (L1) cache memory and there is logic coupled with the data cache memory to facilitate access to only part of the N-ways of the N-way set-associative L1 cache memory in response to a load instruction or a store instruction. By reducing the number of ways to access the N-way set-associative L1 cache memory for each load or store request, the power requirements of the N-way set-associative L1 cache memory is reduced in one embodiment of the invention. In one embodiment of the invention, when a prediction is made that the accesses to cache memory only requires the data arrays of the N-way set-associative L1 cache memory, the access to the fill buffers are deactivated or disabled.12-29-2011
20110320722MANAGEMENT OF MULTIPURPOSE COMMAND QUEUES IN A MULTILEVEL CACHE HIERARCHY - An apparatus for controlling access to a pipeline includes a plurality of command queues including a first subset of the plurality of command queues being assigned processes the commands of first command type, a second subset of the plurality of command queues being assigned to process commands of the second command type, and a third subset of the plurality of the command queues not being assigned to either the first subset or the second subset. The apparatus also includes an input controller configured to receive requests having the first command type and the second command type and assign requests having the first command type to command queues in the first subset until all command queues in the first subset are filled and then assign requests having the first command type to command queues in the third subset.12-29-2011
20130019063STORAGE CONTROLLER CACHE PAGE MANAGEMENTAANM Astigarraga; TaraAACI TucsonAAST AZAACO USAAGP Astigarraga; Tara Tucson AZ USAANM Browne; Michael E.AACI StaatsburgAAST NYAACO USAAGP Browne; Michael E. Staatsburg NY USAANM Demczar; JosephAACI Salt PointAAST NYAACO USAAGP Demczar; Joseph Salt Point NY USAANM Wieder; Eric C.AACI New PaltzAAST NYAACO USAAGP Wieder; Eric C. New Paltz NY US - A cache page management method can include paging out a memory page to an input/output controller, paging the memory page from the input/output controller into a real memory, modifying the memory page in the real memory to an updated memory page and purging the memory page paged to the input/output controller.01-17-2013
20110161594INFORMATION PROCESSING DEVICE AND CACHE MEMORY CONTROL DEVICE - An information processor includes processing units each processes an out-of-order memory access and includes a cache memory, an instruction port that holds instructions for accessing data in the cache memory, a first determinator that validates a first flag when a request for invalidating cache data is received after a target data of a load instruction is transferred from the cache memory and a load instruction having a cache index identical to that of a target address of the received invalidating instruction exists, a second determinator that validates a second flag when the target data of the load instruction in the instruction port is transferred after a cache miss of the target data occurred, and a re-execution determinator that instructs to re-execute an instruction that follows the load instruction if the first and the second flags are valid when a load instruction in the instruction port has been completed.06-30-2011
20110161593Cache unit, arithmetic processing unit, and information processing unit - A cache unit comprising a register file that selects an entry indicated by a cache index of n bits (n is a natural number) that is used to search for an instruction cache tag, using multiplexer groups having n stages respectively corresponding to the n bits of the cache index. Among the multiplexer groups having n stages, a multiplexer group in an m06-30-2011
20110161592Dynamic system reconfiguration - In some embodiments system reconfiguration code and data to be used to perform a dynamic hardware reconfiguration of a system including a plurality of processor cores is cached and any direct or indirect memory accesses during the dynamic hardware reconfiguration are prevented. One of the processor cores executes the cached system reconfiguration code and data in order to dynamically reconfigure the hardware. Other embodiments are described and claimed.06-30-2011
20110264862REDUCING PIPELINE RESTART PENALTY - Techniques are disclosed relating to reducing the latency of restarting a pipeline in a processor that implements scouting. In one embodiment, the processor may reduce pipeline restart latency using two instruction fetch units that are configured to fetch and re-fetch instructions in parallel with one another. In some embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to determining that a commit operation is to be attempted with respect to one or more deferred instructions. In other embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to receiving an indication that a request for a set of data has been received by a cache, where the indication is sent by the cache before determining whether the data is present in the cache or not.10-27-2011
20080222360MULTI-PORT INTEGRATED CACHE - A multi-port instruction/data integrated cache which is provided between a parallel processor and a main memory and stores therein a part of instructions and data stored in the main memory has a plurality of banks, and a plurality of ports including an instruction port unit consisting of at least one instruction port used to access an instruction from the parallel processor and a data port unit consisting of at least one data port used to access data from the parallel processor. Further, a data width which can be specified to the bank from the instruction port is set larger than a data width which can be specified to the bank from the data port.09-11-2008
20120254542GATHER CACHE ARCHITECTURE - Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction.10-04-2012
20100299483EARLY RELEASE OF CACHE DATA WITH START/END MARKS WHEN INSTRUCTIONS ARE ONLY PARTIALLY PRESENT - An apparatus extracts instructions from a stream of undifferentiated instruction bytes in a microprocessor having an instruction set architecture in which the instructions are variable length. Decoders generate an associated start/end mark for each instruction byte of a line from a first queue of entries each storing a line of instruction bytes. A second queue has entries each storing a line received from the first queue along with the associated start/end marks. Control logic detects a condition where the length of an instruction whose initial portion within a first line in the first queue is yet undeterminable because the instruction's remainder resides in a second line yet to be loaded into the first queue from the instruction cache; loads the first line and corresponding start/end marks into the second queue and refrains from shifting the first line out of the first queue, in response to detecting the condition; and extracts instructions from the first line in the second queue based on the corresponding start/end marks. The instructions exclude the yet undeterminable length instruction.11-25-2010
20120042128CACHE UNIT AND PROCESSING SYSTEM - According to one embodiment, a cache unit transferring data from a memory connected to the cache unit via a bus incompatible with a critical word first (CWF) to an L1-cache having a first line size and connected to the cache unit via a bus compatible with the CWF. The unit includes cache and un-cache controllers. The cache controller includes an L2-cache and a request converter. The L2-cache has a second line size greater than or equal to the first line size. The request converter converts a first refill request into a second refill request when a head address of a burst transfer of the first refill request is in the L2-cache. The un-cache controller transfers the second refill request to the memory, receives data to be processed corresponding to the second refill request from the memory, and transfers the received data to the L1-cache.02-16-2012

Patent applications in class Instruction data cache