Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Charles Jens Archer, Rochester US

Charles Jens Archer, Rochester, MN US

Patent application numberDescriptionPublished
20080201657SCALABLE PROPERTY VIEWER FOR A MASSIVELY PARALLEL COMPUTER SYSTEM - A method and apparatus for a scalable property viewer for a massively parallel computer system. The property viewer includes a graphical user interface to allow the user to view different properties of the computer system with several different types of views. The different views provide the user with both logical and graphical representations of the properties being monitored and allows the user to link between a logical and physical view of the system. The GUI provides the user with a convenient way to view the elements of a large system and determine elements that are different. Different properties could be placed together in the same view with different colors to allow the user to see the interaction of multiple properties.08-21-2008
20080215916TEMPLATE BASED PARALLEL CHECKPOINTING IN A MASSIVELY PARALLEL COMPUTER SYSTEM - A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.09-04-2008
20080270852MULTI-DIRECTIONAL FAULT DETECTION SYSTEM - An apparatus, program product and method checks for nodal faults in a group of nodes comprising a center node and all adjacent nodes. The center node concurrently communicates with the immediately adjacent nodes in three dimensions. The communications are analyzed to determine a presence of a faulty node or connection.10-30-2008
20080288820MULTI-DIRECTIONAL FAULT DETECTION SYSTEM - An apparatus, program product and method checks for nodal faults in a group of nodes comprising a center node and all adjacent nodes. The center node concurrently communicates with the immediately adjacent nodes in three dimensions. The communications are analyzed to determine a presence of a faulty node or connection.11-20-2008
20080313506BISECTIONAL FAULT DETECTION SYSTEM - An apparatus and program product logically divide a group of nodes and causes node pairs comprising a node from each section to communicate. Results from the communications may be analyzed to determine performance characteristics, such as bandwidth and proper connectivity.12-18-2008
20080320329ROW FAULT DETECTION SYSTEM - An apparatus and program product check for nodal faults in a row of nodes by causing each node in the row to concurrently communicate with its adjacent neighbor nodes in the row. The communications are analyzed to determine a presence of a faulty node or connection.12-25-2008
20080320330ROW FAULT DETECTION SYSTEM - An apparatus, program product and method check for nodal faults in a row of nodes by causing each node in the row to concurrently communicate with its adjacent neighbor nodes in the row. The communications are analyzed to determine a presence of a faulty node or connection.12-25-2008
20090037376DATABASE RETRIEVAL WITH A UNIQUE KEY SEARCH ON A PARALLEL COMPUTER SYSTEM - An apparatus and method retrieves a database record from an in-memory database of a parallel computer system using a unique key. The parallel computer system performs a simultaneous search on each node of the computer system using the unique key and then utilizes a global combining network to combine the results from the searches of each node to efficiently and quickly search the entire database.02-05-2009
20090037377DATABASE RETRIEVAL WITH A NON-UNIQUE KEY ON A PARALLEL COMPUTER SYSTEM - An apparatus and method retrieves a database record from an in-memory database of a parallel computer system using a non-unique key. The parallel computer system performs a simultaneous search on each node of the computer system using the non-unique key and then utilizes a global combining network to combine the local results from the searches of each node to efficiently and quickly search the entire database.02-05-2009
20090044052CELL BOUNDARY FAULT DETECTION SYSTEM - An apparatus and program product determine a nodal fault along the boundary, or face, of a computing cell. Nodes on adjacent cell boundaries communicate with each other, and the communications are analyzed to determine if a node or connection is faulty.02-12-2009
20090067334MECHANISM FOR PROCESS MIGRATION ON A MASSIVELY PARALLEL COMPUTER - Embodiments off the invention provide a mechanism for process migration on a massively parallel computer system. In particular, embodiments of the invention may be used to update process state data for a migrated compute node, such as MPI (or other communication library) state data, across a full collection of compute nodes present in a given parallel system executing a parallel task. Migrating a process form one compute node to another may be useful to address a variety of sub-optimal operating conditions. For example, one or more processes may be migrated to cure network congestion resulting from a poorly mapped task or when a compute node is predicted to experience a hardware failure.03-12-2009
20090178053DISTRIBUTED SCHEMES FOR DEPLOYING AN APPLICATION IN A LARGE PARALLEL SYSTEM - Embodiments of the invention provide a method for deploying and running an application on a massively parallel computer system, while minimizing the costs associated with latency, bandwidth, and limited memory resources. The executable code of a program may be divided into multiple code fragments and distributed to different compute nodes of a parallel computing system. During program execution, one compute node may fetch code fragments from other compute nodes as necessary.07-09-2009
20090187984DATASPACE PROTECTION UTILIZING VIRTUAL PRIVATE NETWORKS ON A MULTI-NODE COMPUTER SYSTEM - A method and apparatus provide data security on a parallel computer system using virtual private networks. An access setup mechanism sets up access control data in the nodes that describes which virtual networks are protected and what applications have access to the protected private networks. When an application accesses data on a protected virtual network, a network access mechanism determines the data is protected and intercepts the data access. The network access mechanism in the kernel may also execute a rule depending on the kind of access that was attempted to the virtual network. Authorized access to the private networks can be made via a system call to the access control mechanism in the kernel. The access control mechanism enforces policy decisions on which data can be distributed through the system via an access control list or other security policies.07-23-2009
20100185718PERFORMING PROCESS MIGRATION WITH ALLREDUCE OPERATIONS - Compute nodes perform allreduce operations that swap processes at nodes. A first allreduce operation generates a first result and uses a first process from a first compute node, a second process from a second compute node, and zeros from other compute nodes. The first compute node replaces the first process with the first result. A second allreduce operation generates a second result and uses the first result from the first compute node, the second process from the second compute node, and zeros from others. The second compute node replaces the second process with the second result, which is the first process. A third allreduce operation generates a third result and uses the first result from first compute node, the second result from the second compute node, and zeros from others. The first compute node replaces the first result with the third result, which is the second process.07-22-2010
20100318835BISECTIONAL FAULT DETECTION SYSTEM - An apparatus, program product and method logically divide a group of nodes and causes node pairs comprising a node from each section to communicate. Results from the communications may be analyzed to determine performance characteristics, such as bandwidth and proper connectivity.12-16-2010

Patent applications by Charles Jens Archer, Rochester, MN US