Class / Patent application number | Description | Number of patent applications / Date published |
711121000 | Private caches | 37 |
20080244181 | Dynamic run-time cache size management - Methods and apparatus relating to dynamic management of cache sizes during run-time are described. In one embodiment, the size of an active portion of a cache may be adjusted (e.g., increased or decreased) based on a cache busyness metric. Other embodiments are also disclosed. | 10-02-2008 |
20080263280 | LOW COMPLEXITY SPECULATIVE MULTITHREADING SYSTEM BASED ON UNMODIFIED MICROPROCESSOR CORE - A system, method and computer program product for supporting thread level speculative execution in a computing environment having multiple processing units adapted for concurrent execution of threads in speculative and non-speculative modes. Each processing unit includes a cache memory hierarchy of caches operatively connected therewith. The apparatus includes an additional cache level local to each processing unit for use only in a thread level speculation mode, each additional cache for storing speculative results and status associated with its associated processor when handling speculative threads. The additional local cache level at each processing unit are interconnected so that speculative values and control data may be forwarded between parallel executing threads. A control implementation is provided that enables speculative coherence between speculative threads executing in the computing environment. | 10-23-2008 |
20090006750 | Leveraging transactional memory hardware to accelerate virtualization and emulation - Various technologies and techniques are disclosed for using transactional memory hardware to accelerate virtualization or emulation. State isolation can be facilitated by providing isolated private state on transactional memory hardware and storing the stack of a host that is performing an emulation in the isolated private state. Memory accesses performed by a central processing unit can be monitored by software to detect that a guest being emulated has made a self modification to its own code sequence. Transactional memory hardware can be used to facilitate dispatch table updates in multithreaded environments by taking advantage of the atomic commit feature. An emulator is provided that uses a dispatch table stored in main memory to convert a guest program counter into a host program counter. The dispatch table is accessed to see if the dispatch table contains a particular host program counter for a particular guest program counter. | 01-01-2009 |
20090006751 | Leveraging transactional memory hardware to accelerate virtualization and emulation - Various technologies and techniques are disclosed for using transactional memory hardware to accelerate virtualization or emulation. A central processing unit is provided with the transactional memory hardware. Code backpatching can be facilitated by providing transactional memory hardware that supports a facility to maintain private memory state and an atomic commit feature. Changes made to certain code are stored in the private state facility. Backpatching changes are enacted by attempting to commit all the changes to memory at once using the atomic commit feature. An efficient call return stack can be provided by using transactional memory hardware. A call return cache stored in the private state facility captures a host address to return to after execution of a guest function completes. A direct-lookup hardware-based hash table is used for the call return cache. | 01-01-2009 |
20090113131 | METHOD AND APPARATUS FOR TRACKING LOAD-MARKS AND STORE-MARKS ON CACHE LINES - Embodiments of the present invention provide a system that handles load-marked and store-marked cache lines. Upon asserting a load-mark or a store-mark for a cache line during a given phase of operation, the system adds an entry to a private buffer and in doing so uses an address of the cache line as a key for the entry in the private buffer. The system also updates the entry in the private buffer with information about the load-mark or store-mark and uses pointers for the entry and for the last entry added to the private buffer to add the entry to a sequence of private buffer entries placed during the phase of operation. The system then uses the entries in the private buffer to remove the load-marks and store-marks from cache lines when the phase of operation is completed. | 04-30-2009 |
20090157965 | Method and Apparatus for Active Software Disown of Cache Line's Exlusive Rights - Software indicates to hardware of a processing system that its storage modification to a particular cache line is done, and will not be doing any modification for the time being. With this indication, the processor actively releases its exclusive ownership by updating its line ownership from exclusive to read-only (or shared) in its own cache directory and in the storage controller (SC). By actively giving up the exclusive rights, another processor can immediately be given exclusive ownership to that said cache line without waiting on any processor's explicit cross invalidate acknowledgement. This invention also describes the hardware design needed to provide this support. | 06-18-2009 |
20090240889 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR CROSS-INVALIDATION HANDLING IN A MULTI-LEVEL PRIVATE CACHE - A method, system, and computer program product for cross-invalidation handling in a multi-level private cache are provided. The system includes a processor. The processor includes a fetch address register logic in communication with a level 1 data cache, a level 1 instruction cache, a level 2 cache, and a higher level cache. The processor also includes a set of cross-invalidate snapshot counter implemented in the fetch address register. Each cross-invalidate snapshot counter tracks an amount of pending higher level cross-invalidations received before new data for the corresponding cache miss is returned from the higher-level cache. The processor also includes logic executing on the fetch address register for handling level 1 data cache misses and interfacing with the level 2 cache. In response to the new data, and upon determining that older cross-invalidations are pending, the new data is prevented from being used by the processor. | 09-24-2009 |
20090271572 | Dynamically Re-Classifying Data In A Shared Cache - In one embodiment, the present invention includes a method for determining if a state of data is indicative of a first class of data, re-classifying the data from a second class to the first class based on the determination, and moving the data to a first portion of a shared cache associated with a first requester unit based on the re-classification. Other embodiments are described and claimed. | 10-29-2009 |
20100042786 | SNOOP-BASED PREFETCHING - A processing system is disclosed. The processing system includes a memory and a first core configured to process applications. The first core includes a first cache. The processing system includes a mechanism configured to capture a sequence of addresses of the application that miss the first cache in the first core and to place the sequence of addresses in a storage array; and a second core configured to process at least one software algorithm. The at least one software algorithm utilizes the sequence of addresses from the storage array to generate a sequence of prefetch addresses. The second core issues prefetch requests for the sequence of the prefetch addresses to the memory to obtain prefetched data and the prefetched data is provided to the first core if requested. | 02-18-2010 |
20100138607 | Increasing concurrency and controlling replication in a multi-core cache hierarchy - In one embodiment, the present invention includes a directory of a private cache hierarchy to maintain coherency between data stored in the cache hierarchy, where the directory is to enable concurrent cache-to-cache transfer of data to two private caches. Other embodiments are described and claimed. | 06-03-2010 |
20100146210 | METHOD TO VERIFY AN IMPLEMENTED COHERENCY ALGORITHM OF A MULTI PROCESSOR ENVIRONMENT - A method to verify an implemented coherency algorithm of a multi processor environment on a single processor model is described, comprising the steps of:
| 06-10-2010 |
20100268881 | CACHE REGION CONCEPT - A method to associate a storage policy with a cache region is disclosed. In this method, a cache region associated with an application is created. The application runs on virtual machines, and where a first virtual machine has a local memory cache that is private to the first virtual machine. The first virtual machine additionally has a shared memory cache that is shared by the first virtual machine and a second virtual machine. Additionally, the cache region is associated with a storage policy. Here, the storage policy specifies that a first copy of an object to be stored in the cache region is to be stored in the local memory cache and that a second copy of the object to be stored in the cache region is to be stored in the shared memory cache. | 10-21-2010 |
20110145501 | CACHE SPILL MANAGEMENT TECHNIQUES - An apparatus and method is described herein for intelligently spilling cache lines. Usefulness of cache lines previously spilled from a source cache is learned, such that later evictions of useful cache lines from a source cache are intelligently selected for spill. Furthermore, another learning mechanism—cache spill prediction—may be implemented separately or in conjunction with usefulness prediction. The cache spill prediction is capable of learning the effectiveness of remote caches at holding spilled cache lines for the source cache. As a result, cache lines are capable of being intelligently selected for spill and intelligently distributed among remote caches based on the effectiveness of each remote cache in holding spilled cache lines for the source cache. | 06-16-2011 |
20110252200 | COHERENT MEMORY SCHEME FOR HETEROGENEOUS PROCESSORS - Systems, methods, and devices for maintaining cache coherence between two or more heterogeneous processors are provided. In accordance with one embodiment, such an electronic device may include memory, a first processing unit having a first characteristic memory usage rate, and a second processing unit having a second characteristic memory usage rate lower than the first. The first and second processing units may share at least a portion of the memory and one or both of the first and second processing units may maintain internal cache coherence at a first granularity, while maintaining cache coherence between the first processing unit and the second processing unit at a second granularity. The first granularity may be finer than the second granularity. | 10-13-2011 |
20120079201 | SYSTEM AND METHOD FOR EXPLICITLY MANAGING CACHE COHERENCE - One embodiment of the present invention sets forth am extension to a cache coherence protocol with two explicit control states, P (private), and R (read-only), that provide explicit program control of cache lines for which the program logic can guarantee correct behavior. In the private state, only the owner of a cache line can access the cache line for read or write operations. In the read-only state, only read operations can be performed on the cache line, thereby disallowing write operations to be performed. | 03-29-2012 |
20120226865 | NETWORK-ON-CHIP SYSTEM INCLUDING ACTIVE MEMORY PROCESSOR - Disclosed is a network-on-chip system including an active memory processor for processing increased communication latency by multiple processors and memories. The network-on-chip system includes a plurality of processing elements that request to perform an active memory operation for a predetermined operation from a shared memory to reduce access latency of the shared memory, and an active memory processor connected to the processing elements through a network, storing codes for processing custom transaction in request to the active memory operation, performing an operation addresses or data stored in a shared cache memory or the shared memory based on the codes and transmitting the performed operation result to the processing elements. | 09-06-2012 |
20130097382 | MULTI-CORE PROCESSOR SYSTEM, COMPUTER PRODUCT, AND CONTROL METHOD - A multi-core processor system includes a first processor that among cores of the multi-core processor, identifies other cores having a cache miss-hit rate lower than that of a given core storing a specific program in a cache, based on a task information volume of each core; a control circuit that migrates the specific program from the cache of the given core to a cache of the identified core; and a second processor that, after the specific program is migrated to the cache of the identified core, sets as a write-inhibit area, an area that is of the cache of the identified core and to which the specific program is stored. | 04-18-2013 |
20130254484 | System and Method for Conditionally Sending a Request for Data to a Home Node - A system, method, and computer program product are provided for conditionally sending a request for data to a home node. In operation, a first request for data is sent to a first cache of a node. Additionally, if the data does not exist in the first cache, a second request for the data is sent to a second cache of the node. Furthermore, a third request for the data is conditionally sent to a home node | 09-26-2013 |
20130346693 | Data Cache Method, Device, and System in a Multi-Node System - A data cache method, device, and system in a multi-node system are provided. The method includes: dividing a cache area of a cache medium into multiple sub-areas, where each sub-area is corresponding to a node in the system; dividing each of the sub-areas into a thread cache area and a global cache area; when a process reads a file, detecting a read frequency of the file; when the read frequency of the file is greater than a first threshold and the size of the file does not exceed a second threshold, caching the file in the thread cache area; or when the read frequency of the file is greater than the first threshold and the size of the file exceeds the second threshold, caching the file in the global cache area. Thus overheads of remote access of a system are reduced, and I/O performance of the system is improved. | 12-26-2013 |
20140006713 | Cache Collaboration in Tiled Processor Systems | 01-02-2014 |
20140032843 | MANAGEMENT OF CHIP MULTIPROCESSOR COOPERATIVE CACHING BASED ON EVICTION RATE - Techniques described herein generally include methods and systems related to cooperatively caching data in a chip multiprocessor. Cooperatively caching of data in the chip multiprocessor is managed based on an eviction rate of data blocks from private caches associated with each individual processor core in the chip multiprocessor. The eviction rate of data blocks from each private cache in the cooperative caching system is monitored and used to determine an aggregate eviction rate for all private caches. When the aggregate eviction rate exceeds a predetermined value, for example the threshold beyond which network flooding can occur, the cooperative caching system for the chip multiprocessor is disabled, thereby avoiding network flooding of the chip multiprocessor. | 01-30-2014 |
20140173204 | ANALYZING UPDATE CONDITIONS FOR SHARED VARIABLE DIRECTORY INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for analyzing update conditions for shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer receiving a compare-and-swap operation header. The compare-and-swap operation header includes an SVD key, a first SVD address, and an updated first SVD address. The first SVD address is associated with the SVD key in a first SVD associated with a first task. Embodiments also include the runtime optimizer retrieving from a remote address cache associated with the second task, a second SVD address indicating a location within a memory partition associated with the first SVD in response to receiving the compare-and-swap operation header. Embodiments also include the runtime optimizer determining whether the second SVD address matches the first SVD address and transmitting a result indicating whether the second SVD address matches the first SVD address. | 06-19-2014 |
20140173205 | ANALYZING UPDATE CONDITIONS FOR SHARED VARIABLE DIRECTORY INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for analyzing update conditions for shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer receiving a compare-and-swap operation header. The compare-and-swap operation header includes an SVD key, a first SVD address, and an updated first SVD address. The first SVD address is associated with the SVD key in a first SVD associated with a first task. Embodiments also include the runtime optimizer retrieving from a remote address cache associated with the second task, a second SVD address indicating a location within a memory partition associated with the first SVD in response to receiving the compare-and-swap operation header. Embodiments also include the runtime optimizer determining whether the second SVD address matches the first SVD address and transmitting a result indicating whether the second SVD address matches the first SVD address. | 06-19-2014 |
20140173206 | Power Gating A Portion Of A Cache Memory - In an embodiment, a processor includes multiple tiles, each including a core and a tile cache hierarchy. This tile cache hierarchy includes a first level cache, a mid-level cache (MLC) and a last level cache (LLC), and each of these caches is private to the tile. A controller coupled to the tiles includes a cache power control logic to receive utilization information regarding the core and the tile cache hierarchy of a tile and to cause the LLC of the tile to be independently power gated, based at least in part on this information. Other embodiments are described and claimed. | 06-19-2014 |
20140201446 | HIGH BANDWIDTH FULL-BLOCK WRITE COMMANDS - A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received. | 07-17-2014 |
20140351516 | METHOD AND SYSTEM FOR VIRTUALIZATION OF SOFTWARE APPLICATIONS - A method of virtualizing an application to execute on a plurality of operating systems without installation. The method includes creating an input configuration file for each operating system. The templates each include a collection of configurations that were made by the application during installation on a computing device executing the operating system. The templates are combined into a single application template having a layer including the collection of configurations for each operating system. The collection of configurations includes files and registry entries. The collections also identifies and configures environmental variables, systems, and the like. Files in the collection of configurations and references to those files may be replaced with references to files stored on installation media. The application template is used to build an executable of the virtualized application. The application template may be incorporated into a manifest listing other application templates and made available to users from a website. | 11-27-2014 |
20140359220 | Scatter/Gather Capable System Coherent Cache - In accordance with some embodiments, a scatter/gather memory approach may be enabled that is exposed or backed by system memory and uses conventional tags and addresses. Thus, such a technique may be more amenable to conventional software developers and their conventional techniques. | 12-04-2014 |
20150106567 | Computer Processor Employing Cache Memory With Per-Byte Valid Bits - A computer processing system with a hierarchical memory system that associates a number of valid bits for each cache line of the hierarchical memory system. The valid bits are provided for each cache line stored in a respective cache and make explicit which bytes are semantically defined and which are not for the associated given cache line. Memory requests to the cache(s) of the hierarchical memory system can include an address specifying a requested cache line as well as a mask that includes a number of bits each corresponding to a different byte of the requested cache line. The values of the bits of the byte mask indicate which bytes of the requested cache line are to be returned from the hierarchical memory system. The memory request is processed by the top level cache of the hierarchical memory system, looking for one or more valid bytes of the requested cache line corresponding to the target address of the memory request. The valid bytes of the cache line corresponding to the byte mask as stored in cache can be identified by reading out the valid bit(s) and data byte(s) stored by the cache for putative matching cache lines for those data bytes that are specified by the byte mask of the memory request, while ignoring the valid bit(s) and data byte(s) stored by the cache for putative matching cache lines for those data bytes that are not specified by the byte mask of the memory request. Extensions to shared multiprocessor systems is also described and claimed. | 04-16-2015 |
20150143044 | MECHANISM FOR SHARING PRIVATE CACHES IN A SOC - Systems, processors, and methods for sharing an agent's private cache with other agents within a SoC. Many agents in the SoC have a private cache in addition to the shared caches and memory of the SoC. If an agent's processor is shut down or operating at less than full capacity, the agent's private cache can be shared with other agents. When a requesting agent generates a memory request and the memory request misses in the memory cache, the memory cache can allocate the memory request in a separate agent's cache rather than allocating the memory request in the memory cache. | 05-21-2015 |
20150363319 | FAST WARM-UP OF HOST FLASH CACHE AFTER NODE FAILOVER - Examples described herein include a system for storing data. The data storage system retrieves a first set of metadata associated with data stored on a first cache memory, and stores the first set of metadata on a primary storage device. The primary storage device is a backing store for the data stored on the first cache memory. The storage system selectively copies data form the primary storage device to a second cache memory based, at least in part, on the first set of metadata stored on the primary storage device. For some aspects, the storage system may copy the data from the primary storage device to the second cache memory upon determining that the first cache memory is in a failover state. | 12-17-2015 |
20150370707 | DISUNITED SHARED-INFORMATION AND PRIVATE-INFORMATION CACHES - Systems and methods pertain to a multiprocessor system comprising disunited cache structures. A first private-information cache is coupled to a first processor of the multiprocessor system. The first private-information cache is configured to store information that is private to the first processor. A first shared-information cache which is disunited from the first private-information cache is also coupled to the first processor. The first shared-information cache is configured to store information that is shared/shareable between the first processor and one or more other processors of the multiprocessor system. | 12-24-2015 |
20160026570 | MIRRORING MULTIPLE WRITEABLE STORAGE ARRAYS - Systems, methods, and computer program products for mirroring dual writeable storage arrays are provided. Various embodiments provide configurations including two or more mirrored storage arrays that are each capable of being written to by different hosts. When commands to write data to corresponding mirrored data blocks within the respective storage arrays are received from different hosts at substantially the same time, write priority for writing data to the mirrored data blocks is given to one of the storage arrays based on a predetermined criterion or multiple predetermined criteria. | 01-28-2016 |
20160041909 | MOVING DATA BETWEEN CACHES IN A HETEROGENEOUS PROCESSOR SYSTEM - Apparatus, computer readable medium, integrated circuit, and method of moving a plurality of data items to a first cache or a second cache are presented. The method includes receiving an indication that the first cache requested the plurality of data items. The method includes storing information indicating that the first cache requested the plurality of data items. The information may include an address for each of the plurality of data items. The method includes determining based at least on the stored information to move the plurality of data items to the second cache. The method includes moving the plurality of data items to the second cache. The method may include determining a time interval between receiving the indication that the first cache requested the plurality of data items and moving the plurality of data items to the second cache. A scratch pad memory is disclosed. | 02-11-2016 |
20160055086 | DYNAMIC CACHE PARTITIONING APPARATUS AND METHOD - A dynamic cache partitioning apparatus and method is described, which may be used in a multi-core system having a plurality of cores. The apparatus includes at least one multi-core processor and at least one shared cache shared by a plurality of processing cores, the shared cache including a plurality of cache lines and being partitioned into a plurality of sub-caches including at least one private sub-cache for each single processing core and at least one public sub-cache being shared by all of the processing cores; means for detecting shared cache hit information for each respective processing core of the plurality of processing cores; and means for determining whether a cache line should be allocated to public sub-cache or to a private sub-cache associated with one of the plurality of processing cores. | 02-25-2016 |
20160077970 | Virtual Shared Cache Mechanism in a Processing Device - In accordance with embodiments disclosed herein, there is provided systems and methods for providing a virtual shared cache mechanism. A processing device includes a plurality of clusters allocated into a virtual private shared cache. Each of the clusters includes a plurality of cores and a plurality of cache slices co-located within the plurality of cores. The processing device also includes a virtual shared cache including the plurality of clusters such that the cache data in the plurality of cache slices is shared among the plurality of clusters. | 03-17-2016 |
20160196213 | DEMOTE INSTRUCTION FOR RELINQUISHING CACHE LINE OWNERSHIP | 07-07-2016 |
20160378542 | MULTITHREADED TRANSACTIONS - Embodiments relate to multithreaded transactions. An aspect includes assigning a same transaction identifier (ID) corresponding to the multithreaded transaction to a plurality of threads of the multithreaded transaction, wherein the plurality of threads execute the multithreaded transaction in parallel. Another aspect includes determining one or more memory areas that are owned by the multithreaded transaction. Another aspect includes receiving a memory access request from a requester that is directed to a memory area that is owned by the transaction. Yet another aspect includes based on determining that the requester has a transaction ID that matches the transaction ID of the multithreaded transaction, performing the memory access request without aborting the multithreaded transaction. | 12-29-2016 |