Patent application number | Description | Published |
20080222303 | LATENCY HIDING MESSAGE PASSING PROTOCOL - A method, system, and article of manufacture that provide latency hiding, high bandwidth message passing protocols used for data communication between nodes of a parallel computer system are disclosed. A source node transmits a request to send message to a receiving node. Prior to receiving a clear to send message, the sending node continues to send deterministically routed (or fully described) data packets to the receiving node, thereby hiding the latency inherent in the request to send—clear to send message exchange. Once the sending node receives the clear to send message, any remaining portion of the message may be sent using partially described packets which may be routed dynamically, thereby maximizing bandwidth. | 09-11-2008 |
20080259916 | OPPORTUNISTIC QUEUEING INJECTION STRATEGY FOR NETWORK LOAD BALANCING - Embodiments of the invention include a method, system, and article of manufacture that provide opportunistic queuing injection strategy used for data communication between nodes of a parallel computer system. A message may be encapsulated into a set of data packets. When the packets are sent, an opportunistic injection queue may be configured to transmit them to multiple hardware injection ports. This approach allows for complete network link saturation. In a parallel system with network links in multiple dimensions, sending message packets using more than one dimension may substantially increase network throughput. | 10-23-2008 |
20080267066 | Remote Direct Memory Access - Methods, parallel computers, and computer program products are disclosed for remote direct memory access. Embodiments include transmitting, from an origin DMA engine on an origin compute node to a plurality target DMA engines on target compute nodes, a request to send message, the request to send message specifying a data to be transferred from the origin DMA engine to data storage on each target compute node; receiving, by each target DMA engine on each target compute node, the request to send message; preparing, by each target DMA engine, to store data according to the data storage reference and the data length, including assigning a base storage address for the data storage reference; sending, by one or more of the target DMA engines, an acknowledgment message acknowledging that all the target DMA engines are prepared to receive a data transmission from the origin DMA engine; receiving, by the origin DMA engine, the acknowledgement message from the one or more of the target DMA engines; and transferring, by the origin DMA engine, data to data storage on each of the target compute nodes according to the data storage reference using a single direct put operation. | 10-30-2008 |
20080270563 | Message Communications of Particular Message Types Between Compute Nodes Using DMA Shadow Buffers - Message communications of particular message types between compute nodes using DMA shadow buffers includes: receiving a buffer identifier specifying an application buffer having a message of a particular type for transmission to a target compute node through a network; selecting one of a plurality of shadow buffers for a DMA engine on the compute node for storing the message, each shadow buffer corresponding to a slot of an injection FIFO buffer maintained by the DMA engine; storing the message in the selected shadow buffer; creating a data descriptor for the message stored in the selected shadow buffer; injecting the data descriptor into the slot of the injection FIFO buffer corresponding to the selected shadow buffer; selecting the data descriptor from the injection FIFO buffer; and transmitting the message specified by the selected data descriptor through the data communications network to the target compute node. | 10-30-2008 |
20080273534 | Signaling Completion of a Message Transfer from an Origin Compute Node to a Target Compute Node - Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local memory FIFO data transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor. | 11-06-2008 |
20080273543 | Signaling Completion of a Message Transfer from an Origin Compute Node to a Target Compute Node - Signaling completion of a message transfer from an origin node to a target node includes: sending, by an origin DMA engine, an RTS message, the RTS message specifying an application message for transfer to the target node from the origin node; receiving, by the origin DMA engine, a remote get message containing a data descriptor for the message and a completion notification descriptor, the completion notification descriptor specifying a local direct put transfer operation for transferring data locally on the origin node; inserting, by the origin DMA engine in an injection FIFO buffer, the data descriptor followed by the completion notification descriptor; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that transfer of the message is complete in dependence upon the completion notification descriptor. | 11-06-2008 |
20080281997 | Low Latency, High Bandwidth Data Communications Between Compute Nodes in a Parallel Computer - Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (‘DMA’) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (‘RTS’) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation. | 11-13-2008 |
20080281998 | Direct Memory Access Transfer Completion Notification - DMA transfer completion notification includes: inserting, by an origin DMA engine on an origin node in an injection first-in-first-out (‘FIFO’) buffer, a data descriptor for an application message to be transferred to a target node on behalf of an application on the origin node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying a packet header for a completion notification packet; transferring, by the origin DMA engine to the target node, the message in dependence upon the data descriptor; sending, by the origin DMA engine, the completion notification packet to a local reception FIFO buffer using a local memory FIFO transfer operation; and notifying, by the origin DMA engine, the application that transfer of the message is complete in response to receiving the completion notification packet in the local reception FIFO buffer. | 11-13-2008 |
20080301327 | Direct Memory Access Transfer Completion Notification - Methods, apparatus, and products are disclosed for DMA transfer completion notification that include: inserting, by an origin DMA engine on an origin compute node in an injection FIFO buffer, a data descriptor for an application message to be transferred to a target compute node on behalf of an application on the origin compute node; inserting, by the origin DMA engine, a completion notification descriptor in the injection FIFO buffer after the data descriptor for the message, the completion notification descriptor specifying an address of a completion notification field in application storage for the application; transferring, by the origin DMA engine to the target compute node, the message in dependence upon the data descriptor; and notifying, by the origin DMA engine, the application that the transfer of the message is complete, including performing a local direct put operation to store predesignated notification data at the address of the completion notification field. | 12-04-2008 |
20080301704 | Controlling Data Transfers from an Origin Compute Node to a Target Compute Node - Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication. | 12-04-2008 |
20080307194 | Parallel, Low-Latency Method for High-Performance Deterministic Element Extraction From Distributed Arrays - The present invention provides a system and method for extracting elements from distributed arrays on a parallel processing system. The system includes a module that populates a local array with elements from input, a module that submits a largest element value in the local array and a processor ID for a local processor, and a module that determines a globally largest element value from the largest element values submitted by each one of the plurality of processors. The system further includes a module that broadcasts a winning globally largest element value and winning processor ID to the plurality of processors, and a module that increments an element pointer to the next value in the local array if the winning processor ID equals the processor ID for the local processor. | 12-11-2008 |
20080307195 | Parallel, Low-Latency Method for High-Performance Speculative Element Extraction From Distributed Arrays - The present invention provides a system and method for extracting elements from distributed arrays on a parallel processing system. The system includes a module that populates a result array with globally largest elements from the input, a module that generates a partition element, a module that counts the number of local elements greater than the partition and a module that determines the globally largest elements. The method for extracting elements from distributed arrays on a parallel processing system includes populating a result array with globally largest elements from the input, generating a partition element, counting the number of local elements greater than the partition and determining the globally largest elements. | 12-11-2008 |
20080313341 | Data Communications - Data communications, including issuing, by an application program to a high level data communications library, a request for initialization of a data communications service; issuing to a low level data communications library a request for registration of data communications functions; registering the data communications functions, including instantiating a factory object for each of the one or more data communications functions; issuing by the application program an instruction to execute a designated data communications function; issuing, to the low level data communications library, an instruction to execute the designated data communications function, including passing to the low level data communications library a call parameter that identifies a factory object; creating with the identified factory object the data communications object that implements the data communications function according to the protocol; and executing by the low level data communications library the designated data communications function. | 12-18-2008 |
20080313376 | Heuristic Status Polling - Methods, compute nodes, and computer program products are provided for heuristic status polling of a component in a computing system. Embodiments include receiving, by a polling module from a requesting application, a status request requesting status of a component; determining, by the polling module, whether an activity history for the component satisfies heuristic polling criteria; polling, by the polling module, the component for status if the activity history for the component satisfies the heuristic polling criteria; and not polling, by the polling module, the component for status if the activity history for the component does not satisfy the heuristic criteria. | 12-18-2008 |
20080313661 | Administering an Epoch Initiated for Remote Memory Access - Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed. | 12-18-2008 |
20090006663 | Direct Memory Access ('DMA') Engine Assisted Local Reduction - Methods, compute nodes, and computer program products are provided for DMA engine assisted local reduction. Embodiments include receiving, by a DMA engine, one or more data descriptors, each descriptor identifying a buffer containing an array for reduction; selecting, in dependence upon the arrays in the buffers and local hardware functional units available to the DMA engine, at least one local hardware functional unit; and reducing one or more arrays in the buffers identified by the data descriptors with the selected local hardware functional unit. | 01-01-2009 |
20090019190 | Low Latency, High Bandwidth Data Communications Between Compute Nodes in a Parallel Computer - Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core. | 01-15-2009 |
20090022156 | Pacing a Data Transfer Operation Between Compute Nodes on a Parallel Computer - Methods, systems, and products are disclosed for pacing a data transfer between compute nodes on a parallel computer that include: transferring, by an origin compute node, a chunk of an application message to a target compute node; sending, by the origin compute node, a pacing request to a target direct memory access (‘DMA’) engine on the target compute node using a remote get DMA operation; determining, by the origin compute node, whether a pacing response to the pacing request has been received from the target DMA engine; and transferring, by the origin compute node, a next chunk of the application message if the pacing response to the pacing request has been received from the target DMA engine. | 01-22-2009 |
20090031001 | Repeating Direct Memory Access Data Transfer Operations for Compute Nodes in a Parallel Computer - Methods, apparatus, and products are disclosed for repeating DMA data transfer operations for nodes in a parallel computer that include: receiving, by a DMA engine on an origin node, a RGET data descriptor that specifies a DMA transfer operation data descriptor and a second RGET data descriptor, the second RGET data descriptor also specifying the DMA transfer operation data descriptor; creating, in dependence upon the RGET data descriptor, an RGET packet that contains the DMA transfer operation data descriptor and the second RGET data descriptor; processing the DMA transfer operation data descriptor included in the RGET packet, including performing a DMA data transfer operation between the origin node and a target node in dependence upon the DMA transfer operation data descriptor; and processing the second RGET data descriptor included in the RGET packet, thereby performing again the DMA transfer operation in dependence upon the DMA transfer operation data descriptor. | 01-29-2009 |
20090031002 | Self-Pacing Direct Memory Access Data Transfer Operations for Compute Nodes in a Parallel Computer - Methods, apparatus, and products are disclosed for self-pacing DMA data transfer operations for nodes in a parallel computer that include: transferring, by an origin DMA on an origin node, a RTS message to a target node, the RTS message specifying an message on the origin node for transfer to the target node; receiving, in an origin injection FIFO for the origin DMA from a target DMA on the target node in response to transferring the RTS message, a target RGET descriptor followed by a DMA transfer operation descriptor, the DMA descriptor for transmitting a message portion to the target node, the target RGET descriptor specifying an origin RGET descriptor on the origin node that specifies an additional DMA descriptor for transmitting an additional message portion to the target node; processing, by the origin DMA, the target RGET descriptor; and processing, by the origin DMA, the DMA transfer operation descriptor. | 01-29-2009 |
20090031055 | Chaining Direct Memory Access Data Transfer Operations for Compute Nodes in a Parallel Computer - Methods, systems, and products are disclosed for chaining DMA data transfer operations for compute nodes in a parallel computer that include: receiving, by an origin DMA engine on an origin node in an origin injection FIFO buffer for the origin DMA engine, a RGET data descriptor specifying a DMA transfer operation data descriptor on the origin node and a second RGET data descriptor on the origin node, the second RGET data descriptor specifying a target RGET data descriptor on the target node, the target RGET data descriptor specifying an additional DMA transfer operation data descriptor on the origin node; creating, by the origin DMA engine, an RGET packet in dependence upon the RGET data descriptor, the RGET packet containing the DMA transfer operation data descriptor and the second RGET data descriptor; and transferring, by the origin DMA engine to a target DMA engine on the target node, the RGET packet. | 01-29-2009 |
20090031325 | Direct Memory Access Transfer completion Notification - Methods, systems, and products are disclosed for DMA transfer completion notification that include: inserting, by an origin DMA on an origin node in an origin injection FIFO, a data descriptor for an application message; inserting, by the origin DMA, a reflection descriptor in the origin injection FIFO, the reflection descriptor specifying a remote get operation for injecting a completion notification descriptor in a reflection injection FIFO on a reflection node; transferring, by the origin DMA to a target node, the message in dependence upon the data descriptor; in response to completing the message transfer, transferring, by the origin DMA to the reflection node, the completion notification descriptor in dependence upon the reflection descriptor; receiving, by the origin DMA from the reflection node, a completion packet; and notifying, by the origin DMA in response to receiving the completion packet, the origin node's processing core that the message transfer is complete. | 01-29-2009 |
20090037707 | Determining When a Set of Compute Nodes Participating in a Barrier Operation on a Parallel Computer are Ready to Exit the Barrier Operation - Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value. | 02-05-2009 |
20090037773 | Link Failure Detection in a Parallel Computer - Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received. | 02-05-2009 |
20090052462 | Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer - Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network. | 02-26-2009 |
20090055474 | Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer - Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network. | 02-26-2009 |
20090300384 | Reducing Power Consumption While Performing Collective Operations On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for reducing power consumption while performing collective operations on a plurality of compute nodes that include: receiving, by each compute node, instructions to perform a type of collective operation; selecting, by each compute node from a plurality of collective operations for the collective operation type, a particular collective operation in dependence upon power consumption characteristics for each of the plurality of collective operations; and executing, by each compute node, the selected collective operation. | 12-03-2009 |
20090300385 | Reducing Power Consumption While Synchronizing A Plurality Of Compute Nodes During Execution Of A Parallel Application - Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation. | 12-03-2009 |
20090300386 | Reducing power consumption during execution of an application on a plurality of compute nodes - Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: powering up, during compute node initialization, only a portion of computer memory of the compute node, including configuring an operating system for the compute node in the powered up portion of computer memory; receiving, by the operating system, an instruction to load an application for execution; allocating, by the operating system, additional portions of computer memory to the application for use during execution; powering up the additional portions of computer memory allocated for use by the application during execution; and loading, by the operating system, the application into the powered up additional portions of computer memory. | 12-03-2009 |
20090300394 | Reducing Power Consumption During Execution Of An Application On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: executing, by each compute node, an application, the application including power consumption directives corresponding to one or more portions of the application; identifying, by each compute node, the power consumption directives included within the application during execution of the portions of the application corresponding to those identified power consumption directives; and reducing power, by each compute node, to one or more components of that compute node according to the identified power consumption directives during execution of the portions of the application corresponding to those identified power consumption directives. | 12-03-2009 |
20090300399 | Profiling power consumption of a plurality of compute nodes while processing an application - Methods, apparatus, and products are disclosed for profiling power consumption of a plurality of compute nodes while processing an application that include: executing the application on the plurality of compute nodes; monitoring performance characteristics for components of the plurality of compute nodes during execution of the application; and recording, in a power profile for the application, power consumption during execution of the application in dependence upon the performance characteristics for components of the plurality of compute nodes. | 12-03-2009 |
20090307036 | Budget-Based Power Consumption For Application Execution On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications. | 12-10-2009 |
20090307703 | Scheduling Applications For Execution On A Plurality Of Compute Nodes Of A Parallel Computer To Manage temperature of the nodes during execution - Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions. | 12-10-2009 |
20090307708 | Thread Selection During Context Switching On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switch if the criteria for a thread context switch are satisfied, including executing the next thread of execution. | 12-10-2009 |
20090327444 | Dynamic Network Link Selection For Transmitting A Message Between Compute Nodes Of A Parallel Comput - Methods, apparatus, and products are disclosed for dynamic network link selection for transmitting a message between nodes of a parallel computer. The nodes are connected using a data communications network. Each node connects to adjacent nodes in the data communications network through a plurality of network links. Each link provides a different data communication path through the network between the nodes of the parallel computer. Such dynamic link selection includes: identifying, by an origin node, a current message for transmission to a target node; determining, by the origin node, whether transmissions of previous messages to the target node have completed; selecting, by the origin node from the plurality of links for the origin node, a link in dependence upon the determination and link characteristics for the plurality of links for the origin node; and transmitting, by the origin node, the current message to the target node using the selected link. | 12-31-2009 |
20090327464 | Load Balanced Data Processing Performed On An Application Message Transmitted Between Compute Nodes - Methods, apparatus, and products are disclosed for load balanced data processing performed on an application message transmitted between compute nodes of a parallel computer that include: identifying, by an origin compute node, an application message for transmission to a target compute node, the message to be processed by a data processing operation; determining, by the origin compute node, origin sub-operations used to carry out a portion of the data processing operation on the origin compute node; determining, by the origin compute node, target sub-operations used to carry out a remaining portion of the data processing operation on the target compute node; processing, by the origin compute node, the message using the origin sub-operations; and transmitting, by the origin compute node, the processed message to the target compute node for processing using the target sub-operations. | 12-31-2009 |
20100005189 | Pacing Network Traffic Among A Plurality Of Compute Nodes Connected Using A Data Communications Network - Methods, apparatus, and products are disclosed for pacing network traffic among a plurality of compute nodes connected using a data communications network. The network has a plurality of network regions, and the plurality of compute nodes are distributed among these network regions. Pacing network traffic among a plurality of compute nodes connected using a data communications network includes: identifying, by a compute node for each region of the network, a roundtrip time delay for communicating with at least one of the compute nodes in that region; determining, by the compute node for each region, a pacing algorithm for that region in dependence upon the roundtrip time delay for that region; and transmitting, by the compute node, network packets to at least one of the compute nodes in at least one of the network regions in dependence upon the pacing algorithm for that region. | 01-07-2010 |
20100005326 | Profiling An Application For Power Consumption During Execution On A Compute Node - Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application. | 01-07-2010 |
20100037035 | Generating An Executable Version Of An Application Using A Distributed Compiler Operating On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for generating an executable version of an application using a distributed compiler operating on a plurality of compute nodes that include: receiving, by each compute node, a portion of source code for an application; compiling, in parallel by each compute node, the portion of the source code received by that compute node into a portion of object code for the application; performing, in parallel by each compute node, inter-procedural analysis on the portion of the object code of the application for that compute node, including sharing results of the inter-procedural analysis among the compute nodes; optimizing, in parallel by each compute node, the portion of the object code of the application for that compute node using the shared results of the inter-procedural analysis; and generating the executable version of the application in dependence upon the optimized portions of the object code of the application. | 02-11-2010 |
20100082848 | INCREASING AVAILABLE FIFO SPACE TO PREVENT MESSAGING QUEUE DEADLOCKS IN A DMA ENVIRONMENT - Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed. | 04-01-2010 |
20100268852 | Replenishing Data Descriptors in a DMA Injection FIFO Buffer - Methods, apparatus, and products are disclosed for replenishing data descriptors in a Direct Memory Access (‘DMA’) injection first-in-first-out (‘FIFO’) buffer that include: determining, by a messaging module on an origin compute node, whether a number of data descriptors in a DMA injection FIFO buffer exceeds a predetermined threshold, each data descriptor specifying an application message for transmission to a target compute node; queuing, by the messaging module, a plurality of new data descriptors in a pending descriptor queue if the number of the data descriptors in the DMA injection FIFO buffer exceeds the predetermined threshold; establishing, by the messaging module, interrupt criteria that specify when to replenish the injection FIFO buffer with the plurality of new data descriptors in the pending descriptor queue; and injecting, by the messaging module, the plurality of new data descriptors into the injection FIFO buffer in dependence upon the interrupt criteria. | 10-21-2010 |
20110173287 | PREVENTING MESSAGING QUEUE DEADLOCKS IN A DMA ENVIRONMENT - Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed. | 07-14-2011 |
20110219208 | MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER - A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency. | 09-08-2011 |
20110238949 | Distributed Administration Of A Lock For An Operational Group Of Compute Nodes In A Hierarchical Tree Structured Network - Distributed administration of a lock for an operational group of compute nodes in a hierarchical tree structured network including assigning the root node of the operational group to send acknowledgments for lock requests, the root lock administration module comprising a module of automated computing machinery; receiving a lock request assigned to a particular node from a child node; determining whether another request from another child is directly ahead in an acknowledgement queue; if a request from another child is directly ahead in the acknowledgement queue, putting the lock request for the particular node in the acknowledgement queue until the lock request directly ahead in the acknowledgement queue is satisfied and when the lock request ahead in the queue is satisfied, sending the particular node for whom the lock request is assigned a message acknowledging the particular node has the lock; and if a request from another child is not directly ahead in a queue, sending to the particular node for whom the lock request is assigned a message acknowledging that the particular node has the lock. | 09-29-2011 |
20110238950 | Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations - Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child. | 09-29-2011 |
20110239003 | Direct Injection of Data To Be Transferred In A Hybrid Computing Environment - Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer. | 09-29-2011 |
20110258245 | Performing A Local Reduction Operation On A Parallel Computer - A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer. | 10-20-2011 |
20110258281 | QUERY PERFORMANCE DATA ON PARALLEL COMPUTER SYSTEM HAVING COMPUTE NODES - Embodiments of the invention provide a method for querying performance counter data on a massively parallel computing system, while minimizing the costs associated with interrupting computer processors and limited memory resources. DMA descriptors may be inserted into an injection FIFO of a remote compute node in the massively parallel computing system. Upon executing the DMA operations described by the DMA descriptors, performance counter data may be transferred from the remote compute node to a destination node. | 10-20-2011 |
20110271263 | Compiling Software For A Hierarchical Distributed Processing System - Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more other nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendants; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendant. | 11-03-2011 |
20110289177 | Effecting Hardware Acceleration Of Broadcast Operations In A Parallel Computer - Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node. | 11-24-2011 |
20110296137 | Performing A Deterministic Reduction Operation In A Parallel Computer - A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology. | 12-01-2011 |
20110296139 | Performing A Deterministic Reduction Operation In A Parallel Computer - Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data. | 12-01-2011 |
20120036384 | Reducing Power Consumption While Synchronizing A Plurality Of Compute Nodes During Execution Of A Parallel Application - Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation. | 02-09-2012 |
20120066284 | Send-Side Matching Of Data Communications Messages - Send-side matching of data communications messages in a distributed computing system comprising a plurality of compute nodes organized for collective operations, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message. | 03-15-2012 |
20120066310 | COMBINING MULTIPLE HARDWARE NETWORKS TO ACHIEVE LOW-LATENCY HIGH-BANDWIDTH POINT-TO-POINT COMMUNICATION OF COMPLEX TYPES - Systems, methods and articles of manufacture are disclosed for performing a vector collective operation on a parallel computing system that includes multiple compute nodes and a network connecting the compute nodes that includes an ALU. A collective operation may be performed to determine displacements for the vector collective operation. Descriptors for the vector collective operation may be generated based on the displacements. The vector collective operation may then be performed using the descriptors. | 03-15-2012 |
20120079035 | Administering Truncated Receive Functions In A Parallel Messaging Interface - Administering truncated receive functions in a parallel messaging interface (‘PMI’) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded. | 03-29-2012 |
20120079133 | Routing Data Communications Packets In A Parallel Computer - Routing data communications packets in a parallel computer that includes compute nodes organized for collective operations, each compute node including an operating system kernel and a system-level messaging module that is a module of automated computing machinery that exposes a messaging interface to applications, each compute node including a routing table that specifies, for each of a multiplicity of route identifiers, a data communications path through the compute node, including: receiving in a compute node a data communications packet that includes a route identifier value; retrieving from the routing table a specification of a data communications path through the compute node; and routing, by the compute node, the data communications packet according to the data communications path identified by the compute node's routing table entry for the data communications packet's route identifier value. | 03-29-2012 |
20120079165 | Paging Memory From Random Access Memory To Backing Storage In A Parallel Computer - Paging memory from random access memory (‘RAM’) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node. | 03-29-2012 |
20120117137 | Fencing Data Transfers In A Parallel Active Messaging Interface Of A Parallel Computer - Fencing data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints. | 05-10-2012 |
20120117138 | Fencing Network Direct Memory Access Data Transfers In A Parallel Active Messaging Interface Of A Parallel Computer - Fencing direct memory access (‘DMA’) data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints. | 05-10-2012 |
20120117211 | Fencing Data Transfers In A Parallel Active Messaging Interface Of A Parallel Computer - Fencing data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint comprising a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI and through data communications resources including a deterministic data communications network, including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints. | 05-10-2012 |
20120117281 | Fencing Direct Memory Access Data Transfers In A Parallel Active Messaging Interface Of A Parallel Computer - Fencing direct memory access (‘DMA’) data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints. | 05-10-2012 |
20120117361 | Processing Data Communications Events In A Parallel Active Messaging Interface Of A Parallel Computer - Processing data communications events in a parallel active messaging interface (‘PAMI’) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context. | 05-10-2012 |
20120124249 | Method Of Data Communications With Reduced Latency - Data communications with reduced latency, including: writing, by a producer, a descriptor and message data into at least two descriptor slots of a descriptor buffer, the descriptor buffer comprising allocated computer memory segmented into descriptor slots, each descriptor slot having a fixed size, the descriptor buffer having a header pointer that identifies a next descriptor slot to be processed by a DMA controller, the descriptor buffer having a tail pointer that identifies a descriptor slot for entry of a next descriptor in the descriptor buffer; recording, by the producer, in the descriptor a value signifying that message data has been written into descriptor slots; and setting, by the producer, in dependence upon the recorded value, a tail pointer to point to a next open descriptor slot. | 05-17-2012 |
20120137294 | Data Communications In A Parallel Active Messaging Interface Of A Parallel Computer - Data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (‘RTS’) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data. | 05-31-2012 |
20120151485 | Data Communications In A Parallel Active Messaging Interface Of A Parallel Computer - Data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint. | 06-14-2012 |
20120179736 | Completion Processing For Data Communications Instructions - Completion processing of data communications instructions in a distributed computing environment, including receiving, in an active messaging interface (AMI) data communications instructions, at least one instruction specifying a callback function; injecting into an injection FIFO buffer of a data communication adapter, an injection descriptor, each slot in the injection FIFO buffer having a corresponding slot in a pending callback list; listing in the pending callback list any callback function specified by an instruction, incrementing a pending callback counter for each listed callback function; transferring payload data as per each injection descriptor, incrementing a transfer counter upon completion of each transfer; determining from counter values whether the pending callback list presently includes callback functions whose data transfers have been completed; calling by the AMI any such callback functions from the pending callback list, decrementing the pending callback counter for each callback function called. | 07-12-2012 |
20120179760 | Completion Processing For Data Communications Instructions - Completion processing of data communications instructions in a distributed computing environment with computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’), injecting for data communications instructions into slots in an injection FIFO buffer a transfer descriptor, at least some of the instructions specifying callback functions; injecting a completion descriptor for each instruction that specifies a callback function into an injection FIFO buffer slot having a corresponding slot in a pending callback list; listing in the pending callback list callback functions specified by data communications instructions; processing each descriptor in the injection FIFO buffer, setting a bit in a completion bit mask corresponding to the slot in the FIFO where the completion descriptor was injected; and calling by the AMI any callback functions in the pending callback list as indicated by set bits in the completion bit mask. | 07-12-2012 |
20120185230 | Distributed Hardware Device Simulation - Distributed hardware device simulation, including: identifying a plurality of hardware components of the hardware device; providing software components simulating the functionality of each hardware component, wherein the software components are installed on compute nodes of a distributed processing system; receiving, in at least one of the software components, one or more messages representing an input to the hardware component; simulating the operation of the hardware component with the software component, thereby generating an output of the software component representing the output of the hardware component; and sending, from the software component to at least one other software component, one or more messages representing the output of the hardware component. | 07-19-2012 |
20120185679 | Endpoint-Based Parallel Data Processing With Non-Blocking Collective Instructions In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing by the parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation. | 07-19-2012 |
20120185873 | Data Communications In A Parallel Active Messaging Interface Of A Parallel Computer - Data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application. | 07-19-2012 |
20120204041 | Profiling An Application For Power Consumption During Execution On A Compute Node - Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application. | 08-09-2012 |
20120210094 | Data Communications In A Parallel Active Messaging Interface Of A Parallel Computer - Eager send data communications in a parallel active messaging interface (PAMI) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address. | 08-16-2012 |
20120246256 | Administering An Epoch Initiated For Remote Memory Access - Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed. | 09-27-2012 |
20120254344 | Endpoint-Based Parallel Data Processing In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks. | 10-04-2012 |
20120265835 | QUERY PERFORMANCE DATA ON PARALLEL COMPUTER SYSTEM HAVING COMPUTE NODES - Embodiments of the invention provide a method for querying performance counter data on a massively parallel computing system, while minimizing the costs associated with interrupting computer processors and limited memory resources. DMA descriptors may be inserted into an injection FIFO of a remote compute node in the massively parallel computing system. Upon executing the DMA operations described by the DMA descriptors, performance counter data may be transferred from the remote compute node to a destination node. | 10-18-2012 |
20120290863 | Budget-Based Power Consumption For Application Execution On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications. | 11-15-2012 |
20120304193 | Scheduling Applications For Execution On A Plurality Of Compute Nodes Of A Parallel Computer To Manage Temperature Of The Nodes During Execution - Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions. | 11-29-2012 |
20120317399 | Performing A Local Reduction Operation On A Parallel Computer - A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer. | 12-13-2012 |
20130042088 | Collective Operation Protocol Selection In A Parallel Computer - Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol includes: iteratively, until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters; determining whether the prospective protocol meets predefined performance criteria by evaluating a predefined performance fit equation, calculating a measure of performance of the protocol for the operating parameters; determining that the prospective protocol meets predetermined performance criteria and selecting the protocol for executing the operation only if the calculated measure of performance is greater than a predefined minimum performance threshold. | 02-14-2013 |
20130042245 | Performing A Global Barrier Operation In A Parallel Computer - Performing a global barrier operation in a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joined the single local barrier. | 02-14-2013 |
20130042254 | Performing A Local Barrier Operation - Performing a local barrier operation with parallel tasks executing on a compute node including, for each task: retrieving a present value of a counter; calculating, in dependence upon the present value of the counter and a total number of tasks performing the local barrier operation, a base value of the counter, the base value representing the counter's value prior to any task joining the local barrier; calculating, in dependence upon the base value and the total number of tasks performing the local barrier operation, a target value of the counter, the target value representing the counter's value when all tasks have joined the local barrier; joining the local barrier, including atomically incrementing the value of the counter; and repetitively, until the present value of the counter is no less than the target value of the counter: retrieving the present value of the counter and determining whether the present value equals the target value. | 02-14-2013 |
20130060557 | DISTRIBUTED HARDWARE DEVICE SIMULATION - Distributed hardware device simulation, including: identifying a plurality of hardware components of the hardware device; providing software components simulating the functionality of each hardware component, wherein the software components are installed on compute nodes of a distributed processing system; receiving, in at least one of the software components, one or more messages representing an input to the hardware component; simulating the operation of the hardware component with the software component, thereby generating an output of the software component representing the output of the hardware component; and sending, from the software component to at least one other software component, one or more messages representing the output of the hardware component. | 03-07-2013 |
20130060844 | DIRECT INJECTION OF DATA TO BE TRANSFERRED IN A HYBRID COMPUTING ENVIRONMENT - Direct injection of a data to be transferred in a hybrid computing environment that includes a host computer and a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module. Each accelerator includes a Power Processing Element (‘PPE’) and a plurality of Synergistic Processing Elements (‘SPEs’). Direct injection includes reserving, by each SPE, a slot in a shared memory region accessible by the host computer; loading, by each SPE into local memory of the SPE, a portion of data to be transferred to the host computer; executing, by each SPE in parallel, a data processing operation on the portion of the data loaded in local memory of each SPE; and writing, by each SPE, the processed data to the SPE's reserved slot in the shared memory region accessible by the host computer. | 03-07-2013 |
20130067111 | ROUTING DATA COMMUNICATIONS PACKETS IN A PARALLEL COMPUTER - Routing data communications packets in a parallel computer that includes compute nodes organized for collective operations, each compute node including an operating system kernel and a system-level messaging module that is a module of automated computing machinery that exposes a messaging interface to applications, each compute node including a routing table that specifies, for each of a multiplicity of route identifiers, a data communications path through the compute node, including: receiving in a compute node a data communications packet that includes a route identifier value; retrieving from the routing table a specification of a data communications path through the compute node; and routing, by the compute node, the data communications packet according to the data communications path identified by the compute node's routing table entry for the data communications packet's route identifier value. | 03-14-2013 |
20130067206 | Endpoint-Based Parallel Data Processing In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks. | 03-14-2013 |
20130067479 | Establishing A Group Of Endpoints In A Parallel Computer - A parallel computer executes a number of tasks, each task includes a number of endpoints and the endpoints are configured to support collective operations. In such a parallel computer, establishing a group of endpoints receiving a user specification of a set of endpoints included in a global collection of endpoints, where the user specification defines the set in accordance with a predefined virtual representation of the endpoints, the predefined virtual representation is a data structure setting forth an organization of tasks and endpoints included in the global collection of endpoints and the user specification defines the set of endpoints without a user specification of a particular endpoint; and defining a group of endpoints in dependence upon the predefined virtual representation of the endpoints and the user specification. | 03-14-2013 |
20130073603 | SEND-SIDE MATCHING OF DATA COMMUNICATIONS MESSAGES - Send-side matching of data communications messages in a distributed computing system comprising a plurality of compute nodes, including: issuing by a receiving node to source nodes a receive message that specifies receipt of a single message to be sent from any source node, the receive message including message matching information, a specification of a hardware-level mutual exclusion device, and an identification of a receive buffer; matching by two or more of the source nodes the receive message with pending send messages in the two or more source nodes; operating by one of the source nodes having a matching send message the mutual exclusion device, excluding messages from other source nodes with matching send messages and identifying to the receiving node the source node operating the mutual exclusion device; and sending to the receiving node from the source node operating the mutual exclusion device a matched pending message. | 03-21-2013 |
20130073751 | FENCING NETWORK DIRECT MEMORY ACCESS DATA TRANSFERS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Fencing direct memory access (‘DMA’) data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints. | 03-21-2013 |
20130073752 | LOW LATENCY, HIGH BANDWIDTH DATA COMMUNICATIONS BETWEEN COMPUTE NODES IN A PARALLEL COMPUTER - Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core. | 03-21-2013 |
20130073832 | PERFORMING A DETERMINISTIC REDUCTION OPERATION IN A PARALLEL COMPUTER - A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology. | 03-21-2013 |
20130074086 | PIPELINING PROTOCOLS IN MISALIGNED BUFFER CASES - Systems, methods and articles of manufacture are disclosed for effecting a desired collective operation on a parallel computing system that includes multiple compute nodes. The compute nodes may pipeline multiple collective operations to effect the desired collective operation. To select protocols suitable for the multiple collective operations, the compute nodes may also perform additional collective operations. The compute nodes may pipeline the multiple collective operations and/or the additional collective operations to effect the desired collective operation more efficiently. | 03-21-2013 |
20130074097 | ENDPOINT-BASED PARALLEL DATA PROCESSING WITH NON-BLOCKING COLLECTIVE INSTRUCTIONS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing by the parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation. | 03-21-2013 |
20130074098 | PROCESSING DATA COMMUNICATIONS EVENTS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Processing data communications events in a parallel active messaging interface (‘PAMI’) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context. | 03-21-2013 |
20130080563 | EFFECTING HARDWARE ACCELERATION OF BROADCAST OPERATIONS IN A PARALLEL COMPUTER - Compute nodes of a parallel computer organized for collective operations via a network, each compute node having a receive buffer and establishing a topology for the network; selecting a schedule for a broadcast operation; depositing, by a root node of the topology, broadcast data in a target node's receive buffer, including performing a DMA operation with a well-known memory location for the target node's receive buffer; depositing, by the root node in a memory region designated for storing broadcast data length, a length of the broadcast data, including performing a DMA operation with a well-known memory location of the broadcast data length memory region; and triggering, by the root node, the target node to perform a next DMA operation, including depositing, in a memory region designated for receiving injection instructions for the target node, an instruction to inject the broadcast data into the receive buffer of a subsequent target node. | 03-28-2013 |
20130081059 | DATA COMMUNICATIONS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint. | 03-28-2013 |
20130086358 | COLLECTIVE OPERATION PROTOCOL SELECTION IN A PARALLEL COMPUTER - Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol includes: iteratively, until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters; determining whether the prospective protocol meets predefined performance criteria by evaluating a predefined performance fit equation, calculating a measure of performance of the protocol for the operating parameters; determining that the prospective protocol meets predetermined performance criteria and selecting the protocol for executing the operation only if the calculated measure of performance is greater than a predefined minimum performance threshold. | 04-04-2013 |
20130086551 | Providing A User With A Graphics Based IDE For Developing Software For Distributed Computing Systems - Graphics based IDE for distributed computing systems software development including providing a graphical representation of a topology of a distributed computing system for which the user is to develop a software application; receiving an identification of a system component upon which a portion of the application is to execute; providing a text editor for receiving from the user computer program instructions forming the portion of the application; inserting, without user intervention as part of the portion of the application, predetermined computer program instructions configured to support the identified system component; receiving, through the text editor, the portion of the application including the predetermined computer program instructions configured to support the identified system component; and storing, the computer program instructions forming the portion of the application, at a user specified location within the application. | 04-04-2013 |
20130091510 | DATA COMMUNICATIONS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a SEND instruction, the SEND instruction specifying a transmission of transfer data from the origin endpoint to a first target endpoint; transmitting from the origin endpoint to the first target endpoint a Request-To-Send (‘RTS’) message advising the first target endpoint of the location and size of the transfer data; assigning by the first target endpoint to each of a plurality of target endpoints separate portions of the transfer data; and receiving by the plurality of target endpoints the transfer data. | 04-11-2013 |
20130097263 | COMPLETION PROCESSING FOR DATA COMMUNICATIONS INSTRUCTIONS - Completion processing of data communications instructions in a distributed computing environment with computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’), injecting for data communications instructions into slots in an injection FIFO buffer a transfer descriptor, at least some of the instructions specifying callback functions; injecting a completion descriptor for each instruction that specifies a callback function into an injection FIFO buffer slot having a corresponding slot in a pending callback list; listing in the pending callback list callback functions specified by data communications instructions; processing each descriptor in the injection FIFO buffer, setting a bit in a completion bit mask corresponding to the slot in the FIFO where the completion descriptor was injected; and calling by the AMI any callback functions in the pending callback list as indicated by set bits in the completion bit mask. | 04-18-2013 |
20130097404 | DATA COMMUNICATIONS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Eager send data communications in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints that specify a client, a context, and a task, including receiving an eager send data communications instruction with transfer data disposed in a send buffer characterized by a read/write send buffer memory address in a read/write virtual address space of the origin endpoint; determining for the send buffer a read-only send buffer memory address in a read-only virtual address space, the read-only virtual address space shared by both the origin endpoint and the target endpoint, with all frames of physical memory mapped to pages of virtual memory in the read-only virtual address space; and communicating by the origin endpoint to the target endpoint an eager send message header that includes the read-only send buffer memory address. | 04-18-2013 |
20130097614 | FENCING DATA TRANSFERS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Fencing data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints. | 04-18-2013 |
20130110901 | COMPLETION PROCESSING FOR DATA COMMUNICATIONS INSTRUCTIONS | 05-02-2013 |
20130111482 | ESTABLISHING A GROUP OF ENDPOINTS IN A PARALLEL COMPUTER | 05-02-2013 |
20130111496 | PERFORMING A LOCAL BARRIER OPERATION | 05-02-2013 |
20130117403 | Managing Internode Data Communications For An Uninitialized Process In A Parallel Computer - A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory. | 05-09-2013 |
20130117761 | Intranode Data Communications In A Parallel Computer - Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process. | 05-09-2013 |
20130117764 | Internode Data Communications In A Parallel Computer - Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory. | 05-09-2013 |
20130124665 | ADMINISTERING AN EPOCH INITIATED FOR REMOTE MEMORY ACCESS - Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed. | 05-16-2013 |
20130124666 | MANAGING INTERNODE DATA COMMUNICATIONS FOR AN UNINITIALIZED PROCESS IN A PARALLEL COMPUTER - A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory. | 05-16-2013 |
20130125135 | INTRANODE DATA COMMUNICATIONS IN A PARALLEL COMPUTER - Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process. | 05-16-2013 |
20130125140 | INTRANODE DATA COMMUNICATIONS IN A PARALLEL COMPUTER - Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory. | 05-16-2013 |
20130159590 | LOW LATENCY, HIGH BANDWIDTH DATA COMMUNICATIONS BETWEEN COMPUTE NODES IN A PARALLEL COMPUTER - Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core. | 06-20-2013 |
20130173675 | PERFORMING A GLOBAL BARRIER OPERATION IN A PARALLEL COMPUTER - Performing a global barrier operation in a parallel computer that includes compute nodes coupled for data communications, where each compute node executes tasks, with one task on each compute node designated as a master task, including: for each task on each compute node until all master tasks have joined a global barrier: determining whether the task is a master task; if the task is not a master task, joining a single local barrier; if the task is a master task, joining the global barrier and the single local barrier only after all other tasks on the compute node have joined the single local barrier. | 07-04-2013 |
20130174180 | FENCING DATA TRANSFERS IN A PARALLEL ACTIVE MESSAGING INTERFACE OF A PARALLEL COMPUTER - Fencing data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint comprising a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI and through data communications resources including a deterministic data communications network, including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints. | 07-04-2013 |
20130179897 | Thread Selection During Context Switching On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switch if the criteria for a thread context switch are satisfied, including executing the next thread of execution. | 07-11-2013 |
20130185465 | Fencing Direct Memory Access Data Transfers In A Parallel Active Messaging Interface Of A Parallel Computer - Fencing direct memory access (‘DMA’) data transfers in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints. | 07-18-2013 |
20130290673 | PERFORMING A DETERMINISTIC REDUCTION OPERATION IN A PARALLEL COMPUTER - Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data. | 10-31-2013 |
20130304948 | Managing A Direct Memory Access ('DMA') Injection First-In-First-Out ('FIFO') Messaging Queue In A Parallel Computer - Managing a direct memory access (‘DMA’) injection first-in-first-out (‘FIFO’) messaging queue in a parallel computer, including: inserting, by a messaging unit management module, a DMA message descriptor into the injection FIFO messaging queue; determining, by the messaging unit management module, the number of extra slots in an immediate messaging queue required to store DMA message data associated with the DMA message descriptor; and responsive to determining that the number of extra slots in the immediate message queue required to store the DMA message data is greater than one, inserting, by the messaging unit management module, a number of DMA dummy message descriptors into the injection FIFO messaging queue, wherein the number of DMA dummy message descriptors is at least as many as the number of extra slots in the immediate messaging queue that are required to store the DMA message data. | 11-14-2013 |
20130304995 | Scheduling Synchronization In Association With Collective Operations In A Parallel Computer - Methods, apparatuses, and computer program products for scheduling synchronization in association with collective operations in a parallel computer that includes a shared memory and a plurality of compute nodes that execute a parallel application utilizing the shared memory are provided. Embodiments include acquiring an available channel of the shared memory; posting to the acquired channel of the shared memory one or more collective operations and a synchronization point; determining that processing within the acquired channel has reached the synchronization point; and posting to the acquired channel, in response to determining that processing within the acquired channel has reached the synchronization point, a background synchronization operation corresponding to the one or more collective operations. | 11-14-2013 |
20140047450 | Utilizing A Kernel Administration Hardware Thread Of A Multi-Threaded, Multi-Core Compute Node Of A Parallel Computer - Methods, apparatuses, and computer program products for utilizing a kernel administration hardware thread of a multi-threaded, multi-core compute node of a parallel computer are provided. Embodiments include a kernel assigning a memory space of a hardware thread of an application processing core to a kernel administration hardware thread of a kernel processing core. A kernel administration hardware thread is configured to advance the hardware thread to a next memory space associated with the hardware thread in response to the assignment of the kernel administration hardware thread to the memory space of the hardware thread. Embodiments also include the kernel administration hardware thread executing an instruction within the assigned memory space. | 02-13-2014 |
20140047451 | Optimizing Collective Communications Within A Parallel Computer - Methods, apparatuses, and computer program products for optimizing collective communications within a parallel computer comprising a plurality of hardware threads for executing software threads of a parallel application are provided. Embodiments include a processor of a parallel computer determining for each software thread, an affinity of the software thread to a particular hardware thread. Each affinity indicates an assignment of a software thread to a particular hardware thread. The processor also generates one or more affinity domains based on the affinities of the software threads. Embodiments also include a processor generating, for each affinity domain, a topology of the affinity domain based on the affinities of the software threads to the hardware threads. According to embodiments of the present application, a processor also performs, based on the generated topologies of the affinity domains, a collective operation on one or more software threads. | 02-13-2014 |
20150058657 | ADAPTIVE CLOCK THROTTLING FOR EVENT PROCESSING - Methods, apparatuses, and computer program products for adaptive clock throttling for event processing are provided. Embodiments include an event processing system receiving a plurality of events from one or more components of the distributed processing system. Embodiments also include the event processing system determining that an arrival attribute of the plurality of events exceeds an arrival threshold. Embodiments also include the event processing system, adjusting, in response to determining that the arrival attribute of the plurality of events exceeds the arrival threshold, a clock speed of at least one of the event processing system and a component of the distributed processing system. | 02-26-2015 |
20150063100 | Data Communications In A Distributed Computing Environment - Data communications may be carried out in a distributed computing environment that includes computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). Such data communications may be carried out by: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing data location at the sender and data size; transmitting, by the sender to the receiver, the SEND data as eager data packets; discarding, by the receiver in dependence upon data flow conditions, eager data packets as they are received from the sender; and transferring, in dependence upon the data flow conditions, by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”), the SEND data. | 03-05-2015 |
20150067067 | Data Communications In A Distributed Computing Environment - Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In such an environment, data communications may include: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing a location and size of a send buffer in which the SEND data is stored; transmitting, by the sender to the receiver, the SEND data as eager data packets; issuing, by the receiver to the sender in dependence upon data flow conditions, a STOP instruction, the STOP instruction including an order to stop transmitting the eager data packets; and transferring the SEND data by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”). | 03-05-2015 |
20150067068 | Data Communications In A Distributed Computing Environment - Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In distributed computing environment, data communications may include: receiving in the AMI from an application an eager SEND instruction that describes the location and size of send data in an application SEND buffer; copying by the AMI the send data from the application SEND buffer to a temporary AMI buffer; advising the application of completion of the SEND instruction before sending the SEND data to the receiver; and after advising the application of completion of the SEND instruction, sending the SEND data by the sender to the receiver. | 03-05-2015 |