Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Shlomo Raikin

Shlomo Raikin, Ofer IL

Patent application number	Description	Published
20130179643	OBSCURING MEMORY ACCESS PATTERNS IN CONJUNCTION WITH DEADLOCK DETECTION OR AVOIDANCE - Methods, apparatus and systems for memory access obscuration are provided. A first embodiment provides memory access obscuration in conjunction with deadlock avoidance. Such embodiment utilizes processor features including an instruction to enable monitoring of specified cache lines and an instruction that sets a status bit responsive to any foreign access (e.g., write or eviction due to a read) to the specified lines. A second embodiment provides memory access obscuration in conjunction with deadlock detection. Such embodiment utilizes the monitoring feature, as well as handler registration. A user-level handler may be asynchronously invoked responsive to a foreign write to any of the specified lines. Invocation of the handler more frequently than expected indicates that a deadlock may have been encountered. In such case, a deadlock policy may be enforced. Other embodiments are also described and claimed.	07-11-2013
20130326145	METHODS AND APPARATUS FOR EFFICIENT COMMUNICATION BETWEEN CACHES IN HIERARCHICAL CACHING DESIGN - In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus.	12-05-2013
20140006746	VIRTUAL MEMORY ADDRESS RANGE REGISTER	01-02-2014
20140156935	Unified Exclusive Memory - In one embodiment, a processor includes at least one execution unit, a near memory, and memory management logic. The memory management logic may be to manage the near memory and a far memory as a unified exclusive memory, where the far memory is external to the processor. Other embodiments are described and claimed.	06-05-2014
20140189192	APPARATUS AND METHOD FOR A MULTIPLE PAGE SIZE TRANSLATION LOOKASIDE BUFFER (TLB) - An apparatus and method for implementing a multiple page size translation lookaside buffer (TLB). For example, a method according to one embodiment comprises: reading a first group of bits and a second group of bits from a linear address; determining whether the linear address is associated with a large page size or a small page size; identifying a first cache set using the first group of bits if the linear address is associated with a first page size and identifying a second cache set using the second group of bits if the linear address is associated with a second page size; and identifying a first cache way if the linear address is associated with a first page size and identifying a second cache way if the linear address is associated with a second page size.	07-03-2014
20140189254	Snoop Filter Having Centralized Translation Circuitry and Shadow Tag Array - A processor is described that includes a plurality of processing cores. The processor includes an interconnection network coupled to each of said processing cores. The processor includes snoop filter logic circuitry coupled to the interconnection network and associated with coherence plane logic circuitry of the processor. The snoop filter logic circuitry contains circuitry to hold information that identifies not only which of the processing cores are caching specific cache lines that are cached by the processing cores, but also, where in respective caches of the processing cores the cache lines are cached.	07-03-2014
20140189261	ACCESS TYPE PROTECTION OF MEMORY RESERVED FOR USE BY PROCESSOR LOGIC - A processor of an aspect includes operation mode check logic to determine whether to allow an attempted access to an operation mode and access type protected memory based on an operation mode that is to indicate whether the attempted access is by an on-die processor logic. Access type check logic is to determine whether to allow the attempted access to the operation mode and access type protected memory based on an access type of the attempted access to the operation mode and access type protected memory. Protection logic is coupled with the operation mode check logic and is coupled with the access type check logic. The protection logic is to deny the attempted access to the operation mode and access type protected memory if at least one of the operation mode check logic and the access type check logic determines not to allow the attempted access.	07-03-2014
20140192069	APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTION - An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.	07-10-2014
20140208031	APPARATUS AND METHOD FOR MEMORY-HIERARCHY AWARE PRODUCER-CONSUMER INSTRUCTIONS - An apparatus and method are described for efficiently transferring data from a producer core to a consumer core within a central processing unit (CPU). For example, one embodiment of a method comprises: A method for transferring a chunk of data from a producer core of a central processing unit (CPU) to consumer core of the CPU, comprising: writing data to a buffer within the producer core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the fill buffer to a cache accessible by both the producer core and the consumer core; and upon the consumer core detecting that data is available in the cache, providing the data to the consumer core from the cache upon receipt of a read signal from the consumer core.	07-24-2014
20140215161	BALANCED P-LRU TREE FOR A "MULTIPLE OF 3" NUMBER OF WAYS CACHE - In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing a balanced P-LRU tree for a “multiple of 3” number of ways cache. For example, in one embodiment, such means may include an integrated circuit having a cache and a plurality of ways. In such an embodiment the plurality of ways include a quantity that is a multiple of three and not a power of two, and further in which the plurality of ways are organized into a plurality of pairs. In such an embodiment, means further include a single bit for each of the plurality of pairs, in which each single bit is to operate as an intermediate level decision node representing the associated pair of ways and a root level decision node having exactly two individual bits to point to one of the single bits to operate as the intermediate level decision nodes representing an associated pair of ways. In this exemplary embodiment, the total number of bits is N−1, wherein N is the total number of ways in the plurality of ways. Alternative structures are also presented for full LRU implementation, a “multiple of 5” number of cache ways, and variations of the “multiple of 3” number of cache ways.	07-31-2014
20140223105	METHOD AND APPARATUS FOR CUTTING SENIOR STORE LATENCY USING STORE PREFETCHING - In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.	08-07-2014
20140245273	PROTECTING THE INTEGRITY OF BINARY TRANSLATED CODE - The technologies provided herein relate to protecting the integrity of original code that has been optimized. For example, a processor may perform a fetch operation to obtain specified code from a memory. During execution, the code may be optimized and stored in a portion of the memory. The processor may obtain the optimized code from the portion of the memory. An entry of a first table may be modified to indicate a relationship between the particular code and the optimized code. One or more entries of a second table may be modified to specify the one or more physical memory locations. Each of the one or more entries of the second table may correspond to the entry of the first table. The processor may execute the optimized code when each of the one or more entries of the second table are valid.	08-28-2014
20150074373	SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE - Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.	03-12-2015

Patent applications by Shlomo Raikin, Ofer IL

Shlomo Raikin, Ofer, Z IL

Patent application number	Description	Published
20140189274	APPARATUS AND METHOD FOR PAGE WALK EXTENSION FOR ENHANCED SECURITY CHECKS - An apparatus and method for managing a protection table by a processor. For example, a processor according to one embodiment of the invention comprises: protection table management logic to manage a protection table, the protection table having an entry for each protected page or each group of protected pages in memory; wherein the protection table management logic prevents direct access to the protection table by user application program code and operating system program code but permits direct access by the processor.	07-03-2014

Shlomo Raikin, Moshay Sde IL

Patent application number	Description	Published
20140223226	APPARATUS AND METHOD FOR DETECTING AND RECOVERING FROM DATA FETCH ERRORS - An apparatus and method are described for detecting and correcting data fetch errors within a processor core. For example, one embodiment of an instruction processing apparatus for detecting and recovering from data fetch errors comprises: at least one processor core having a plurality of instruction processing stages including a data fetch stage and a retirement stage; and error processing logic in communication with the processing stages to perform the operations of: detecting an error associated with data in response to a data fetch operation performed by the data fetch stage; and responsively performing one or more operations to ensure that the error does not corrupt an architectural state of the processor core within the retirement stage.	08-07-2014

Shlomo Raikin, Kibutz Yassur IL

Patent application number	Description	Published
20150212817	Direct IO access from a CPU's instruction stream - A method for network access of remote memory directly from a local instruction stream using conventional loads and stores. In cases where network IO access (a network phase) cannot overlap a compute phase, a direct network access from the instruction stream greatly decreases latency in CPU processing. The network is treated as yet another memory that can be directly read from, or written to, by the CPU. Network access can be done directly from the instruction stream using regular loads and stores. Example scenarios where synchronous network access can be beneficial are SHMEM (symmetric hierarchical memory access) usages (where the program directly reads/writes remote memory), and scenarios where part of system memory (for example DDR) can reside over a network and made accessible by demand to different CPUs.	07-30-2015

Shlomo Raikin, Kibbutz Yassur IL

Patent application number	Description	Published
20150222547	EFFICIENT MANAGEMENT OF NETWORK TRAFFIC IN A MULTI-CPU SERVER - A Network Interface Controller (NIC) includes a network interface, a peer interface and steering logic. The network interface is configured to receive incoming packets from a communication network. The peer interface is configured to communicate with a peer NIC not via the communication network. The steering logic is configured to classify the packets received over the network interface into first incoming packets that are destined to a local Central Processing Unit (CPU) served by the NIC, and second incoming packets that are destined to a remote CPU served by the peer NIC, to forward the first incoming packets to the local CPU, and to forward the second incoming packets to the peer NIC over the peer interface not via the communication network.	08-06-2015

Shlomo Raikin, Moshav Ofer IL

Patent application number	Description	Published
20150261434	STORAGE SYSTEM AND SERVER - A data storage system includes a storage server, including non-volatile memory (NVM) and a server network interface controller (NIC), which couples the storage server to a network. A host computer includes a host central processing unit (CPU), a host memory and a host NIC, which couples the host computer to the network. The host computer runs a driver program that is configured to receive, from processes running on the host computer, commands in accordance with a protocol defined for accessing local storage devices connected to a peripheral component interface bus of the host computer, and upon receiving a storage access command in accordance with the protocol, to initiate a remote direct memory access (RDMA) operation to be performed by the host and server NICs so as to execute on the storage server, via the network, a storage transaction specified by the command.	09-17-2015
20150261720	ACCESSING REMOTE STORAGE DEVICES USING A LOCAL BUS PROTOCOL - A method for data storage includes configuring a driver program on a host computer to receive commands in accordance with a protocol defined for accessing local storage devices connected to a peripheral component interface bus of the host computer. When the driver program receives, from an application program running on the host computer a storage access command in accordance with the protocol, specifying a storage transaction, a remote direct memory access (RDMA) operation is performed by a network interface controller (NIC) connected to the host computer so as to execute the storage transaction via a network on a remote storage device.	09-17-2015