| Patent application number | Description | Published |
| 20080201532 | System and Method for Intelligent Software-Controlled Cache Injection - A system and method to provide injection of important data directly into a processor's cache location when that processor has previously indicated interest in the data. The memory subsystem at a target processor will determine if the memory address of data to be written to a memory location associated with the target processor is found in a processor cache of the target processor. If it is determined that the memory address is found in a target processor's cache, the data will be directly written to that cache at the same time that the data is being provided to a location in main memory. | 08-21-2008 |
| 20090198762 | Mechanism to Provide Reliability Through Packet Drop Detection - A method and a data processing system for completing checkpoint processing of a distributed job with local tasks communicating with other remote tasks via a host fabric interface (HFI) and assigned HFI window. Each HFI window has a send count and a receive count, which tracks GSM messages that are sent from and received at the HFI window. When a checkpoint is initiated by a master task, each local task forwards the send count and the receive count to the master task. The master task sums the respective counts and then compares the totals to each other. When the send count total is equal to the receive count total, the tasks are permitted to continue processing. However, when the send count total is not equal to the receive count total, the master task notifies each task of the job to rollback to a previous checkpoint or kill the job execution. | 08-06-2009 |
| 20090198891 | Issuing Global Shared Memory Operations Via Direct Cache Injection to a Host Fabric Interface - A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping. | 08-06-2009 |
| 20090198897 | CACHE MANAGEMENT DURING ASYNCHRONOUS MEMORY MOVE OPERATIONS - A data processing system includes a mechanism for completing an asynchronous memory move (AMM) operation in which the processor receives an AMM ST instruction and processes a processor-level move of data in virtual address space and an asynchronous memory mover then completes a physical move of the data within the real address space (memory). A status/control field of the AMM ST instruction includes an indication of a requested treatment of the lower level cache(s) on completion of the AMM operation. When the status/control field indicates an update to at least one cache should be performed, the asynchronous memory mover automatically forwards a copy of the data from the data move to the lower level cache, and triggers an update of a coherency state for a cache line in which the copy of the data is placed. | 08-06-2009 |
| 20090198908 | METHOD FOR ENABLING DIRECT PREFETCHING OF DATA DURING ASYCHRONOUS MEMORY MOVE OPERATION - While an AMM operation is ongoing, a prefetch request for data from the source effective address or the destination effective address triggers a cache injection by the AMM mover (or memory controller) of relevant data from the stream of data being moved in the physical memory. The memory controller forwards the first prefetched line to the prefetch engine and L1 cache. The memory controller also forwards the next cache lines in the sequence of data to the L2 cache and a subsequent set of cache lines to the L3 cache. The memory controller then forwards the remaining data to the destination memory location. Quick access to prefetch data is enabled by buffering the stream of data in the upper caches rather than placing all the moved data within the memory. Also, the memory controller does not overrun the upper caches, by placing moved data into only a subset of the available cache lines of the upper level cache. | 08-06-2009 |
| 20090198917 | SPECIALIZED MEMORY MOVE BARRIER OPERATIONS - An instruction set architecture (ISA) includes an asynchronous memory move (AMM) synchronization (SYNC) instruction. When processor of a data processing system executes the AMM SYNC instruction, the processor prevents an AMM operation generated by a subsequently received/executed AMM ST instruction from proceeding with the data move portion of the AMM operation within the memory subsystem until completion of all ongoing memory access operations within the memory subsystem and fabric. The AMM operation does not wait for a normal barrier operation. The processor forwards the information relevant to initiate the AMM operation to an asynchronous memory mover logic, and signals the logic to not proceed with the AMM operation until signaled of the completion of the AMM SYNC. | 08-06-2009 |
| 20090198918 | Host Fabric Interface (HFI) to Perform Global Shared Memory (GSM) Operations - A data processing system enables global shared memory (GSM) operations across multiple nodes with a distributed EA-to-RA mapping of physical memory. Each node has a host fabric interface (HFI), which includes HFI windows that are assigned to at most one locally-executing task of a parallel job. The tasks perform parallel job execution, but map only a portion of the effective addresses (EAs) of the global address space to the local, real memory of the task's respective node. The HFI window tags all outgoing GSM operations (of the local task) with the job ID, and embeds the target node and HFI window IDs of the node at which the EA is memory mapped. The HFI window also enables processing of received GSM operations with valid EAs that are homed to the local real memory of the receiving node, while preventing processing of other received operations without a valid EA-to-RA local mapping. | 08-06-2009 |
| 20090198935 | METHOD AND SYSTEM FOR PERFORMING AN ASYNCHRONOUS MEMORY MOVE (AMM) VIA EXECUTION OF AMM STORE INSTRUCTION WITHIN INSTRUCTION SET ARCHITECTURE - A data processing system with a processor and memory includes an instruction set architecture (ISA) that provides an asynchronous memory move (AMM) store (ST) instruction. When the processor executes the AMM ST instruction, the processor performs a series of functions, which initiates an asynchronous memory move (AMM) operation. The AMM ST instruction moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) performing a move of the data in virtual address space utilizing a source effective address that is memory mapped to the first memory location and a destination effective address that is memory mapped to the second memory location. When the move is completed in the virtual address space, the AMM operation performs the physical move of the data to the second memory location outside the processor core, without processor involvement. | 08-06-2009 |
| 20090198936 | REPORTING OF PARTIALLY PERFORMED MEMORY MOVE - A method performed in a data processing system initiates an asynchronous memory move (AMM) operation, whereby a processor performs a move of data in virtual address space from a first effective address to a second effective address and forwards parameters of the AMM operation to asynchronous memory mover logic for completion of the physical movement of data from a first memory location to a second memory location. The processor executes a second operation, which checks a status of the completion of the data move and returns a notification indicating the status. The notification indicates a status, which includes one of: data move in progress; data move totally done; data move partially done; data move cannot be performed; and occurrence of a translation look-aside buffer invalidate entry (TLBIE) operation. The processor initiates one or more actions in response to the notification received. | 08-06-2009 |
| 20090198937 | MECHANISMS FOR COMMUNICATING WITH AN ASYNCHRONOUS MEMORY MOVER TO PERFORM AMM OPERATIONS - A data processing system includes a set of architected registers within which the processor places state and other information to communicate with the asynchronous memory mover in order to initiate and control an AMM operation. The asynchronous memory mover performs an asynchronous memory move (AMM) operation in response to receiving a set of parameters within the architected registers, which parameters are associated with an AMM store instruction executed by the processor to initiates a move of data in virtual space before placing the information in the architected registers. The architected registers are processor architected registers, defined on a per thread basis by a compiler, or memory mapped architected registers allocated for communicating with the asynchronous memory mover during a bind and subsequent execution of an application. The architected registers are also utilized to store state information to enable a restore to a point before execution of the AMM operation. | 08-06-2009 |
| 20090198938 | HANDLING OF ADDRESS CONFLICTS DURING ASYNCHRONOUS MEMORY MOVE OPERATIONS - A method within a data processing system in which a processor handles conflicts, which occur during performance by an asynchronous memory mover of an asynchronous memory move (AMM) operation. The asynchronous memory mover performs an asynchronous memory move (AMM) operation by which the actual data is moved from a source to a destination memory location, independent of the processor. The memory mover sets a flag bit to indicate that the asynchronous memory mover is currently performing an AMM operation at the memory. When the processor receives a memory access operation, the processor checks the value of the flag bit before issuing the new memory access operation, and checks the associated address of the AMM operation to determine possible address conflicts. The processor then evaluates and responds to address conflicts to prevent corruption of data during an AMM operation. | 08-06-2009 |
| 20090198939 | LAUNCHING MULTIPLE CONCURRENT MEMORY MOVES VIA A FULLY ASYNCHRONOOUS MEMORY MOVER - A data processing system has an asynchronous memory mover, which includes multiple sets of registers for storing addressing and control parameters utilized to generate one or more asynchronous memory move (AMM) operations. The memory mover detects a receipt of a first set of parameters in a first set of registers from the processor. The processor forwards the parameters after the processor initiates a data move in virtual address space, utilizing a source effective address and a destination effective address. The memory mover responds to receiving the first set of parameters by generating and launching a first asynchronous memory move (AMM) operation. When the memory mover receives a second set of parameters in a second set of registers before the first AMM operation completes, the memory mover generates and launches a second AMM operation concurrently with the first AMM operation if no address conflicts exist. | 08-06-2009 |
| 20090198955 | ASYNCHRONOUS MEMORY MOVE ACROSS PHYSICAL NODES (DUAL-SIDED COMMUNICATION FOR MEMORY MOVE) - A distributed data processing system includes: (1) a first node with a processor, a first memory, and asynchronous memory mover logic; and connection mechanism that connects (2) a second node having a second memory. The processor includes processing logic for completing a cross-node asynchronous memory move (AMM) operation, wherein the processor performs a move of data in virtual address space from a first effective address to a second effective address, and the asynchronous memory mover logic completes a physical move of the data from a first memory location in the first memory having a first real address to a second memory location in the second memory having a second real address. The data is transmitted via the connection mechanism connecting the two nodes independent of the processor. | 08-06-2009 |
| 20090198963 | COMPLETION OF ASYNCHRONOUS MEMORY MOVE IN THE PRESENCE OF A BARRIER OPERATION - A method within a data processing system by which a processor executes an asynchronous memory move (AMM) store (ST) instruction to complete a corresponding AMM operation in parallel with an ongoing (not yet completed), previously issued barrier operation. The processor receives the AMM ST instruction after executing the barrier operation (or SYNC instruction) and before the completion of the barrier operation or SYNC on the system fabric. The processor continues executing the AMM ST instruction, which performs a move in virtual address space and then triggers the generation of the AMM operation. The AMM operation proceeds while the barrier operation continues, independent of the processor. The processor stops further execution of all other memory access requests, excluding AMM ST instructions that are received after the barrier operation, but before completion of the barrier operation. | 08-06-2009 |
| 20090198975 | TERMINATION OF IN-FLIGHT ASYNCHRONOUS MEMORY MOVE - A data processing system has a processor, a memory, and an instruction set architecture (ISA) that includes: (1) an asynchronous memory mover (AMM) store (ST) instruction initiates an asynchronous memory move operation that moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) first performing a move of the data in virtual address space utilizing a source effective address a destination effective address; and (b) when the move is completed, completing a physical move of the data to the second memory location, independent of the processor. The ISA further provides (2) an AMM terminate ST instruction for stopping an ongoing AMM operation before completion of the AMM operation, and (3) a LD CMP instruction for checking a status of an AMM operation. | 08-06-2009 |
| 20090199046 | Mechanism to Perform Debugging of Global Shared Memory (GSM) Operations - A host fabric interface (HFI) enables debugging of global shared memory (GSM) operations received at a local node from a network fabric. The local node has a memory management unit (MMU), which provides an effective address to real address (EA-to-RA) translation table that is utilized by the HFI to evaluate when EAs of GSM operations/data from a received GSM packet is memory-mapped to RAs of the local memory. The HFI retrieves the EA associated with a GSM operation/data within a received GSM packet. The HFI forwards the EA to the MMU, which determines when the EA is mapped to RAs within the local memory for the local task. The HFI processing logic enables processing of the GSM packet only when the EA of the GSM operation/data within the GSM packet is an EA that has a local RA translation. Non-matching EAs result in an error condition that requires debugging. | 08-06-2009 |
| 20090199194 | Mechanism to Prevent Illegal Access to Task Address Space by Unauthorized Tasks - A method and data processing system for tracking global shared memory (GSM) operations to and from a local node configured with a host fabric interface (HFI) coupled to a network fabric. During task/job initialization, the system OS assigns HFI window(s) to handle the GSM packet generation and GSM packet receipt and processing for each local task. HFI processing logic automatically tags each GSM packet generated by the HFI window with a global job identifier (ID) of the job to which the local task is affiliated. The job ID is embedded within each GSM packet placed on the network fabric. On receipt of a GSM packet from the network fabric, the HFI logic retrieves the embedded job ID and compares the embedded job ID with the ID within the HFI window(s). GSM packets are forwarded to an HFI window only when the embedded job ID matches the HFI window's job ID. | 08-06-2009 |
| 20090199195 | Generating and Issuing Global Shared Memory Operations Via a Send FIFO - A method for issuing global shared memory (GSM) operations from an originating task on a first node coupled to a network fabric of a distributed network via a host fabric interface (HFI). The originating task generates a GSM command within an effective address (EA) space. The task then places the GSM command within a send FIFO. The send FIFO is a portion of real memory having real addresses (RA) that are memory mapped to EAs of a globally executing job. The originating task maintains a local EA-to-RA mapping of only a portion of the real address space of the globally executing job. The task enables the HFI to retrieve the GSM command from the send FIFO into an HFI window allocated to the originating task. The HFI window generates a corresponding GSM packet containing GSM operations and/or data, and the HFI window issues the GSM packet to the network fabric. | 08-06-2009 |
| 20090199200 | Mechanisms to Order Global Shared Memory Operations - A method and data processing system for performing fence operations within a global shared memory (GSM) environment having a local task executing on a processor and providing GSM commands for processing by a host fabric interface (HFI) window that is allocated to the task. The HFI window has one or more registers for use during local fence operations. A first register tracks a first count of task-issued GSM commands, and a second register tracks a second count of GSM operations being processed by the HFI. The processing logic detects a locally-issued fence operation, and responds by performing a series of operations, including: automatically stopping the task from issuing additional GSM commands; monitoring for completion of all the task-issued GSM commands at the HFI; and triggering a resumption of issuance of GSM commands by the task when the completion of all previous task-issued GSM commands is registered by the HFI. | 08-06-2009 |
| 20090199201 | Mechanism to Provide Software Guaranteed Reliability for GSM Operations - In a global shared memory (GSM) environment, an initiating task at a first node with a host fabric interface (HFI) uses epochs to provide reliability of transmission of packets via a network fabric to a target task. The HFI generates a packet for the initiating task addressed to the target task, and automatically inserts a current epoch of the initiating task into the packet. A copy of the current epoch is maintained by the target task, which accepts for processing only packets having the correct epoch, unless the packet is tagged for guaranteed-once delivery. When a packet delivery is accepted, the target task sends a notification to the initiating task. If the initiating task does not receive the notification of delivery for the issued packet, the initiating task updates the epoch at both the target node and the initiating node and re-transmits the packet. | 08-06-2009 |
| 20090199209 | Mechanism for Guaranteeing Delivery of Multi-Packet GSM Message - A target task ensures complete delivery of a global shared memory (GSM) message from an originating task to the target task. The target task's HFI receives a first of multiple GSM packets generated from a single GSM message sent from the originating task. The HFI logic assigns a sequence number and corresponding tuple to track receipt of the complete GSM message. The sequence number is unique relative to other sequence numbers assigned to GSM messages that have not been completely received from the initiating task. The HFI updates a count value within the tuple, which comprises the sequence number and the count value for the first GSM packet and for each subsequent GSM packet received for the GSM message. The HFI determines when receipt of the GSM message is complete by comparing the count value with a count total retrieved from the packet header. | 08-06-2009 |
| 20110061053 | MANAGING PREEMPTION IN A PARALLEL COMPUTING SYSTEM - This present invention provides a portable user space application release/reacquire of adapter resources for a given job on a node using information in a network resource table. The information in the network resource table is obtained when a user space application is loaded by some resource manager. The present invention provides a portable solution that will work for any interconnect where adapter resources need to be freed and reacquired without having to write a specific function in the device driver. In the present invention, the preemption request is done on a job basis using a key or “job key” that was previously loaded when the user space application or job originally requested the adapter resources. This is done for each OS instance where the job is run. | 03-10-2011 |
| 20110078410 | EFFICIENT PIPELINING OF RDMA FOR COMMUNICATIONS - Disclosed are a method of and system for multiple party communications in a processing system including multiple processing subsystems. Each of the processing subsystems includes a central processing unit and one or more network adapters for connecting said each processing subsystem to the other processing subsystems. A multitude of nodes are established or created, and each of these nodes is associated with one of the processing subsystems. A first aspect of the invention involves pipelined communication using RDMA among three nodes, where the first node breaks up a large communication into multiple parts and sends these parts one after the other to the second node using RDMA, and the second node in turn absorbs and forwards each of these parts to a third node before all parts of the communication arrive from the first node. | 03-31-2011 |