Patent application number | Description | Published |
20130179643 | OBSCURING MEMORY ACCESS PATTERNS IN CONJUNCTION WITH DEADLOCK DETECTION OR AVOIDANCE - Methods, apparatus and systems for memory access obscuration are provided. A first embodiment provides memory access obscuration in conjunction with deadlock avoidance. Such embodiment utilizes processor features including an instruction to enable monitoring of specified cache lines and an instruction that sets a status bit responsive to any foreign access (e.g., write or eviction due to a read) to the specified lines. A second embodiment provides memory access obscuration in conjunction with deadlock detection. Such embodiment utilizes the monitoring feature, as well as handler registration. A user-level handler may be asynchronously invoked responsive to a foreign write to any of the specified lines. Invocation of the handler more frequently than expected indicates that a deadlock may have been encountered. In such case, a deadlock policy may be enforced. Other embodiments are also described and claimed. | 07-11-2013 |
20130326145 | METHODS AND APPARATUS FOR EFFICIENT COMMUNICATION BETWEEN CACHES IN HIERARCHICAL CACHING DESIGN - In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus. | 12-05-2013 |
20140006746 | VIRTUAL MEMORY ADDRESS RANGE REGISTER | 01-02-2014 |
20140156935 | Unified Exclusive Memory - In one embodiment, a processor includes at least one execution unit, a near memory, and memory management logic. The memory management logic may be to manage the near memory and a far memory as a unified exclusive memory, where the far memory is external to the processor. Other embodiments are described and claimed. | 06-05-2014 |
20140189192 | APPARATUS AND METHOD FOR A MULTIPLE PAGE SIZE TRANSLATION LOOKASIDE BUFFER (TLB) - An apparatus and method for implementing a multiple page size translation lookaside buffer (TLB). For example, a method according to one embodiment comprises: reading a first group of bits and a second group of bits from a linear address; determining whether the linear address is associated with a large page size or a small page size; identifying a first cache set using the first group of bits if the linear address is associated with a first page size and identifying a second cache set using the second group of bits if the linear address is associated with a second page size; and identifying a first cache way if the linear address is associated with a first page size and identifying a second cache way if the linear address is associated with a second page size. | 07-03-2014 |
20140189254 | Snoop Filter Having Centralized Translation Circuitry and Shadow Tag Array - A processor is described that includes a plurality of processing cores. The processor includes an interconnection network coupled to each of said processing cores. The processor includes snoop filter logic circuitry coupled to the interconnection network and associated with coherence plane logic circuitry of the processor. The snoop filter logic circuitry contains circuitry to hold information that identifies not only which of the processing cores are caching specific cache lines that are cached by the processing cores, but also, where in respective caches of the processing cores the cache lines are cached. | 07-03-2014 |
20140189261 | ACCESS TYPE PROTECTION OF MEMORY RESERVED FOR USE BY PROCESSOR LOGIC - A processor of an aspect includes operation mode check logic to determine whether to allow an attempted access to an operation mode and access type protected memory based on an operation mode that is to indicate whether the attempted access is by an on-die processor logic. Access type check logic is to determine whether to allow the attempted access to the operation mode and access type protected memory based on an access type of the attempted access to the operation mode and access type protected memory. Protection logic is coupled with the operation mode check logic and is coupled with the access type check logic. The protection logic is to deny the attempted access to the operation mode and access type protected memory if at least one of the operation mode check logic and the access type check logic determines not to allow the attempted access. | 07-03-2014 |
20140192069 | APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTION - An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU. | 07-10-2014 |
20140208031 | APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTIONS - An apparatus and method are described for efficiently transferring data from a producer core to a consumer core within a central processing unit (CPU). For example, one embodiment of a method comprises: A method for transferring a chunk of data from a producer core of a central processing unit (CPU) to consumer core of the CPU, comprising: writing data to a buffer within the producer core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the fill buffer to a cache accessible by both the producer core and the consumer core; and upon the consumer core detecting that data is available in the cache, providing the data to the consumer core from the cache upon receipt of a read signal from the consumer core. | 07-24-2014 |
20140215161 | BALANCED P-LRU TREE FOR A "MULTIPLE OF 3" NUMBER OF WAYS CACHE - In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing a balanced P-LRU tree for a “multiple of 3” number of ways cache. For example, in one embodiment, such means may include an integrated circuit having a cache and a plurality of ways. In such an embodiment the plurality of ways include a quantity that is a multiple of three and not a power of two, and further in which the plurality of ways are organized into a plurality of pairs. In such an embodiment, means further include a single bit for each of the plurality of pairs, in which each single bit is to operate as an intermediate level decision node representing the associated pair of ways and a root level decision node having exactly two individual bits to point to one of the single bits to operate as the intermediate level decision nodes representing an associated pair of ways. In this exemplary embodiment, the total number of bits is N−1, wherein N is the total number of ways in the plurality of ways. Alternative structures are also presented for full LRU implementation, a “multiple of 5” number of cache ways, and variations of the “multiple of 3” number of cache ways. | 07-31-2014 |
20140223105 | METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING - In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires. | 08-07-2014 |
20140245273 | PROTECTING THE INTEGRITY OF BINARY TRANSLATED CODE - The technologies provided herein relate to protecting the integrity of original code that has been optimized. For example, a processor may perform a fetch operation to obtain specified code from a memory. During execution, the code may be optimized and stored in a portion of the memory. The processor may obtain the optimized code from the portion of the memory. An entry of a first table may be modified to indicate a relationship between the particular code and the optimized code. One or more entries of a second table may be modified to specify the one or more physical memory locations. Each of the one or more entries of the second table may correspond to the entry of the first table. The processor may execute the optimized code when each of the one or more entries of the second table are valid. | 08-28-2014 |
20150074373 | SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE - Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores. | 03-12-2015 |
Patent application number | Description | Published |
20150261434 | STORAGE SYSTEM AND SERVER - A data storage system includes a storage server, including non-volatile memory (NVM) and a server network interface controller (NIC), which couples the storage server to a network. A host computer includes a host central processing unit (CPU), a host memory and a host NIC, which couples the host computer to the network. The host computer runs a driver program that is configured to receive, from processes running on the host computer, commands in accordance with a protocol defined for accessing local storage devices connected to a peripheral component interface bus of the host computer, and upon receiving a storage access command in accordance with the protocol, to initiate a remote direct memory access (RDMA) operation to be performed by the host and server NICs so as to execute on the storage server, via the network, a storage transaction specified by the command. | 09-17-2015 |
20150261720 | ACCESSING REMOTE STORAGE DEVICES USING A LOCAL BUS PROTOCOL - A method for data storage includes configuring a driver program on a host computer to receive commands in accordance with a protocol defined for accessing local storage devices connected to a peripheral component interface bus of the host computer. When the driver program receives, from an application program running on the host computer a storage access command in accordance with the protocol, specifying a storage transaction, a remote direct memory access (RDMA) operation is performed by a network interface controller (NIC) connected to the host computer so as to execute the storage transaction via a network on a remote storage device. | 09-17-2015 |