| Patent application number | Description | Published |
| 20080222303 | LATENCY HIDING MESSAGE PASSING PROTOCOL - A method, system, and article of manufacture that provide latency hiding, high bandwidth message passing protocols used for data communication between nodes of a parallel computer system are disclosed. A source node transmits a request to send message to a receiving node. Prior to receiving a clear to send message, the sending node continues to send deterministically routed (or fully described) data packets to the receiving node, thereby hiding the latency inherent in the request to send—clear to send message exchange. Once the sending node receives the clear to send message, any remaining portion of the message may be sent using partially described packets which may be routed dynamically, thereby maximizing bandwidth. | 09-11-2008 |
| 20080259916 | OPPORTUNISTIC QUEUEING INJECTION STRATEGY FOR NETWORK LOAD BALANCING - Embodiments of the invention include a method, system, and article of manufacture that provide opportunistic queuing injection strategy used for data communication between nodes of a parallel computer system. A message may be encapsulated into a set of data packets. When the packets are sent, an opportunistic injection queue may be configured to transmit them to multiple hardware injection ports. This approach allows for complete network link saturation. In a parallel system with network links in multiple dimensions, sending message packets using more than one dimension may substantially increase network throughput. | 10-23-2008 |
| 20080263329 | Parallel-Prefix Broadcast for a Parallel-Prefix Operation on a Parallel Computer - A parallel-prefix broadcast for a parallel-prefix operation on a parallel computer includes: configuring, on each node, a parallel-prefix contribution buffer for storing the node's parallel-prefix contribution; configuring, on each node, a parallel-prefix results buffer for storing results of a operation, the results buffer having a position for each node that corresponds to node's rank; and repeatedly for each position in the results buffer: processing in parallel by each node, including: determining, by the node, whether the current position in the results buffer is to include the node's contribution, if the current position is not to include the contribution, contributing the identity element, and if the current position is to include the contribution, contributing the contribution, performing, by each node, the operation using the contributed identity elements and the contributed contributions, yielding a result from the operation, and storing, by each node, the result in the position in the results buffer. | 10-23-2008 |
| 20080281997 | Low Latency, High Bandwidth Data Communications Between Compute Nodes in a Parallel Computer - Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (‘DMA’) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (‘RTS’) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation. | 11-13-2008 |
| 20080301683 | Performing an Allreduce Operation Using Shared Memory - Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit. | 12-04-2008 |
| 20080301704 | Controlling Data Transfers from an Origin Compute Node to a Target Compute Node - Methods, apparatus, and products are disclosed for controlling data transfers from an origin compute node to a target compute node that include: receiving, by an application messaging module on the target compute node, an indication of a data transfer from an origin compute node to the target compute node; and administering, by the application messaging module on the target compute node, the data transfer using one or more messaging primitives of a system messaging module in dependence upon the indication. | 12-04-2008 |
| 20080313341 | Data Communications - Data communications, including issuing, by an application program to a high level data communications library, a request for initialization of a data communications service; issuing to a low level data communications library a request for registration of data communications functions; registering the data communications functions, including instantiating a factory object for each of the one or more data communications functions; issuing by the application program an instruction to execute a designated data communications function; issuing, to the low level data communications library, an instruction to execute the designated data communications function, including passing to the low level data communications library a call parameter that identifies a factory object; creating with the identified factory object the data communications object that implements the data communications function according to the protocol; and executing by the low level data communications library the designated data communications function. | 12-18-2008 |
| 20090037511 | Effecting a Broadcast with an Allreduce Operation on a Parallel Computer - Methods, parallel computers, and computer program products are disclosed for effecting a broadcast with an allreduce operation on a parallel computer, the parallel computer comprising a plurality of compute nodes, the compute nodes organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer, each compute node in the operational group assigned a unique rank, the compute nodes of the operational group coupled for data communications through a global combining network; and one compute node assigned to be a logical root. Embodiments include configuring, by the logical root node, a send buffer having a contribution to be broadcast to each ranked node in the operational group; configuring, by all ranked nodes other than the logical root, a receive buffer for receiving the contribution from the logical root; and repeatedly for each element of the contribution of the logical root in the send buffer: contributing, by the logical root, the element of the contribution in the send buffer; injecting, by all ranked nodes other than the logical root, one or more zeros corresponding to a size of the element; performing, by all the compute nodes of the operational group, an allreduce operation with a bitwise OR using the element and the injected zeros, yielding a result for the allreduce operation; and storing in each receive buffer, by all ranked nodes other than the logical root, the result of the allreduce. | 02-05-2009 |
| 20090037773 | Link Failure Detection in a Parallel Computer - Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received. | 02-05-2009 |
| 20090043988 | Configuring Compute Nodes of a Parallel Computer in an Operational Group into a Plurality of Independent Non-Overlapping Collective Networks - Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collective network such that the master node is a physical root. | 02-12-2009 |
| 20090052462 | Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer - Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network. | 02-26-2009 |
| 20090055474 | Line-Plane Broadcasting in a Data Communications Network of a Parallel Computer - Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network. | 02-26-2009 |
| 20090113308 | Administering Communications Schedules for Data Communications Among Compute Nodes in a Data Communications Network of a Parallel Computer - Methods, apparatus, and products are disclosed for creating and administering communications schedules for data communications among compute nodes in a data communications network of a parallel computer that include: receiving a communications schedule specifying data communications steps in a message passing operation performed by the compute nodes in the data communications network of the parallel computer; parsing the communications schedule to identify the data communications steps; and generating a graphical representation of the communications schedule, including graphing the data communications steps for the message passing operation. | 04-30-2009 |
| 20090154486 | Tracking Network Contention - Methods, apparatus, and product for tracking network contention on links among compute nodes of an operational group in a point-to-point data communications network of a parallel computer are disclosed. In embodiments of the present invention, each compute node is connected to an adjacent compute node in the point-to-point data communications network through a link. Tracking network contention according to embodiments of the present invention includes maintaining, by a network contention module on each compute node in the operational group, a local contention counter for each compute node, each local contention counter representing network contention on links among the compute nodes originating from the compute node; and maintaining a global contention counter, the global contention counter representing network contention currently on all links among the compute nodes in the operational group. | 06-18-2009 |
| 20090248894 | Determining A Path For Network Traffic Between Nodes In A Parallel Computer - Determining a path for network traffic between a source compute node and a destination compute node in a parallel computer including identifying a group of compute nodes, the group of compute nodes having topological network locations included in a predefined topological shape; selecting, from the predefined topological shape, in dependence upon a global contention counter stored on the source compute node, a path on which to send a data communications message from the source compute node to the destination compute node; and sending, by the messaging module of the source compute node, the data communications message along the selected path for network traffic between the source and destination compute nodes. | 10-01-2009 |
| 20090248895 | Determining A Path For Network Traffic Between Nodes In A Parallel Computer - Determining a path for network traffic between a source compute node and a destination compute node in a parallel computer including: beginning with an identified group of compute nodes that includes the source compute node and iteratively until an identified group of compute nodes includes the destination compute node: identifying a group of compute nodes, the group of compute nodes having topological network locations included in a predefined topological shape; selecting a path for network traffic between compute nodes having topological network locations included in the predefined topological shape, and when an identified group of compute nodes includes the destination compute node: selecting a final path for network traffic; and sending a data communications message along the path for network traffic between the source compute node and the destination compute node, the path including, in order of selection, the selected paths and the selected final path. | 10-01-2009 |
| 20090300384 | Reducing Power Consumption While Performing Collective Operations On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for reducing power consumption while performing collective operations on a plurality of compute nodes that include: receiving, by each compute node, instructions to perform a type of collective operation; selecting, by each compute node from a plurality of collective operations for the collective operation type, a particular collective operation in dependence upon power consumption characteristics for each of the plurality of collective operations; and executing, by each compute node, the selected collective operation. | 12-03-2009 |
| 20090300385 | Reducing Power Consumption While Synchronizing A Plurality Of Compute Nodes During Execution Of A Parallel Application - Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation. | 12-03-2009 |
| 20090300386 | Reducing power consumption during execution of an application on a plurality of compute nodes - Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: powering up, during compute node initialization, only a portion of computer memory of the compute node, including configuring an operating system for the compute node in the powered up portion of computer memory; receiving, by the operating system, an instruction to load an application for execution; allocating, by the operating system, additional portions of computer memory to the application for use during execution; powering up the additional portions of computer memory allocated for use by the application during execution; and loading, by the operating system, the application into the powered up additional portions of computer memory. | 12-03-2009 |
| 20090300394 | Reducing Power Consumption During Execution Of An Application On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: executing, by each compute node, an application, the application including power consumption directives corresponding to one or more portions of the application; identifying, by each compute node, the power consumption directives included within the application during execution of the portions of the application corresponding to those identified power consumption directives; and reducing power, by each compute node, to one or more components of that compute node according to the identified power consumption directives during execution of the portions of the application corresponding to those identified power consumption directives. | 12-03-2009 |
| 20090300399 | Profiling power consumption of a plurality of compute nodes while processing an application - Methods, apparatus, and products are disclosed for profiling power consumption of a plurality of compute nodes while processing an application that include: executing the application on the plurality of compute nodes; monitoring performance characteristics for components of the plurality of compute nodes during execution of the application; and recording, in a power profile for the application, power consumption during execution of the application in dependence upon the performance characteristics for components of the plurality of compute nodes. | 12-03-2009 |
| 20090307036 | Budget-Based Power Consumption For Application Execution On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for budget-based power consumption for application execution on a plurality of compute nodes that include: assigning an execution priority to each of one or more applications; executing, on the plurality of compute nodes, the applications according to the execution priorities assigned to the applications at an initial power level provided to the compute nodes until a predetermined power consumption threshold is reached; and applying, upon reaching the predetermined power consumption threshold, one or more power conservation actions to reduce power consumption of the plurality of compute nodes during execution of the applications. | 12-10-2009 |
| 20090307703 | Scheduling Applications For Execution On A Plurality Of Compute Nodes Of A Parallel Computer To Manage temperature of the nodes during execution - Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions. | 12-10-2009 |
| 20090307708 | Thread Selection During Context Switching On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for thread selection during context switching on a plurality of compute nodes that includes: executing, by a compute node, an application using a plurality of threads of execution, including executing one or more of the threads of execution; selecting, by the compute node from a plurality of available threads of execution for the application, a next thread of execution in dependence upon power characteristics for each of the available threads; determining, by the compute node, whether criteria for a thread context switch are satisfied; and performing, by the compute node, the thread context switch if the criteria for a thread context switch are satisfied, including executing the next thread of execution. | 12-10-2009 |
| 20090319725 | Methods, Systems and Computer Program Products for Detection of Frequent Improper Removals of and Changing Writing Policies to Prevent Data Loss in Memory Sticks - Methods, system and computer program products for detection of frequent improper removals of and changing writing policies to prevent data loss in memory sticks. Exemplary embodiments include a method including detecting insertions of the memory stick, detecting removals of the memory stick, tracking a number of times the memory stick has been docked when removed, tracking a number of times the memory stick has been undocked when removed, determining a removal ratio of times the memory has been removed when docked to the number of times the memory stick has been removed when undocked, comparing the removal ratio to a predetermined threshold, caching writes and directory updates, and committing the writes and directory updates to the memory stick when the removal ratio is below the predetermined threshold and, flushing all writes and updates to the memory stick when in the removal ratio is equal to or above the predetermined threshold. | 12-24-2009 |
| 20090327444 | Dynamic Network Link Selection For Transmitting A Message Between Compute Nodes Of A Parallel Comput - Methods, apparatus, and products are disclosed for dynamic network link selection for transmitting a message between nodes of a parallel computer. The nodes are connected using a data communications network. Each node connects to adjacent nodes in the data communications network through a plurality of network links. Each link provides a different data communication path through the network between the nodes of the parallel computer. Such dynamic link selection includes: identifying, by an origin node, a current message for transmission to a target node; determining, by the origin node, whether transmissions of previous messages to the target node have completed; selecting, by the origin node from the plurality of links for the origin node, a link in dependence upon the determination and link characteristics for the plurality of links for the origin node; and transmitting, by the origin node, the current message to the target node using the selected link. | 12-31-2009 |
| 20100005189 | Pacing Network Traffic Among A Plurality Of Compute Nodes Connected Using A Data Communications Network - Methods, apparatus, and products are disclosed for pacing network traffic among a plurality of compute nodes connected using a data communications network. The network has a plurality of network regions, and the plurality of compute nodes are distributed among these network regions. Pacing network traffic among a plurality of compute nodes connected using a data communications network includes: identifying, by a compute node for each region of the network, a roundtrip time delay for communicating with at least one of the compute nodes in that region; determining, by the compute node for each region, a pacing algorithm for that region in dependence upon the roundtrip time delay for that region; and transmitting, by the compute node, network packets to at least one of the compute nodes in at least one of the network regions in dependence upon the pacing algorithm for that region. | 01-07-2010 |
| 20100005326 | Profiling An Application For Power Consumption During Execution On A Compute Node - Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application. | 01-07-2010 |
| 20100014523 | Providing Point To Point Communications Among Compute Nodes In A Global Combining Network Of A Parallel Computer - Methods, apparatus, and products are disclosed for providing point to point data communications among compute nodes in a global combining network of a parallel computer that include: determining a class route identifier available for all of the nodes along a communications path from an origin node to a target node; configuring network hardware of each node along the communications path with routing instructions in dependence upon the available class route identifier and the network's topology; transmitting, by the origin node along the communications path, a network packet to the target node, including encoding the available class route identifier in the network packet; and routing, by the network hardware of each node along the communications path, the network packet to the target node in dependence upon the routing instructions for each node and the available class route identifier. | 01-21-2010 |
| 20100017420 | Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations - Methods, apparatus, and products are disclosed for performing an all-to-all exchange on n number of data buffers using XOR swap operations. Each data buffer has n number of data elements. Performing an all-to-all exchange on n number of data buffers using XOR swap operations includes for each rank value of i and j where i is greater than j and where i is less than or equal to n: selecting data element i in data buffer j; selecting data element j in data buffer i; and exchanging contents of data element i in data buffer j with contents of data element j in data buffer i using an XOR swap operation. | 01-21-2010 |
| 20100023631 | Processing Data Access Requests Among A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for processing data access requests among a plurality of compute nodes. One compute node operates as a processing node, and one compute nodes operates as a requesting node. The processing node receives, from the requesting node, a data access request to access data currently being processed by the processing node. The processing node also receives, from the requesting node, a processing directive. The processing directive specifies data processing operations to be performed on the data specified by the data access request. The processing node performs, on behalf of the requesting node, the data processing operations specified by the processing directive on the data specified by the data access request. The processing node transmits, to the requesting node, results of the data processing operations performed on the data by the processing node on behalf of the requesting node. | 01-28-2010 |
| 20100023723 | Paging Memory Contents Between A Plurality Of Compute Nodes In A Parallel Computer - Methods, apparatus, and products are disclosed for paging memory contents between a plurality of compute nodes in a parallel computer that includes: identifying, by a master node, a memory allocation request for an application executing on the master node, the memory allocation request requesting additional computer memory for use by the application during execution; requesting, by the master node from a slave node, an available memory notification specifying to the master node the computer memory available for allocation on the slave node; allocating, by the master node, at least a portion of the computer memory available for allocation on the slave node in dependence upon the memory allocation request and the available memory notification; and transferring, by the master node, contents of a portion of the computer memory on the master node to the allocated portion of the computer memory on the slave node. | 01-28-2010 |
| 20100037035 | Generating An Executable Version Of An Application Using A Distributed Compiler Operating On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for generating an executable version of an application using a distributed compiler operating on a plurality of compute nodes that include: receiving, by each compute node, a portion of source code for an application; compiling, in parallel by each compute node, the portion of the source code received by that compute node into a portion of object code for the application; performing, in parallel by each compute node, inter-procedural analysis on the portion of the object code of the application for that compute node, including sharing results of the inter-procedural analysis among the compute nodes; optimizing, in parallel by each compute node, the portion of the object code of the application for that compute node using the shared results of the inter-procedural analysis; and generating the executable version of the application in dependence upon the optimized portions of the object code of the application. | 02-11-2010 |
| 20100095303 | Balancing A Data Processing Load Among A Plurality Of Compute Nodes In A Parallel Computer - Methods, apparatus, and products are disclosed for balancing a data processing load among a plurality of compute nodes in a parallel computer that include: partitioning application data for processing on the plurality of compute nodes into data chunks; receiving, by each compute node, at least one of the data chunks for processing; estimating, by each compute node, processing time involved in processing the data chunks received by that compute node for processing; and redistributing, by at least one of the compute nodes to at least one of the other compute nodes, a portion of the data chunks received by that compute node in dependence upon the processing time estimated by that compute node. | 04-15-2010 |