Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Burkhard Steinmacher-Burow, Boeblingen DE

Burkhard Steinmacher-Burow, Boeblingen DE

Patent application numberDescriptionPublished
20110072241FAST CONCURRENT ARRAY-BASED STACKS, QUEUES AND DEQUES USING FETCH-AND-INCREMENT-BOUNDED AND A TICKET LOCK PER ELEMENT - Implementation primitives for concurrent array-based stacks, queues, double-ended queues (deques) and wrapped deques are provided. In one aspect, each element of the stack, queue, deque or wrapped deque data structure has its own ticket lock, allowing multiple threads to concurrently use multiple elements of the data structure and thus achieving high performance. In another aspect, new synchronization primitives FetchAndIncrementBounded (Counter, Bound) and FetchAndDecrementBounded (Counter, Bound) are implemented. These primitives can be implemented in hardware and thus promise a very fast throughput for queues, stacks and double-ended queues.03-24-2011
20110119399DEADLOCK-FREE CLASS ROUTES FOR COLLECTIVE COMMUNICATIONS EMBEDDED IN A MULTI-DIMENSIONAL TORUS NETWORK - A computer implemented method and a system for routing data packets in a multi-dimensional computer network. The method comprises routing a data packet among nodes along one dimension towards a root node, each node having input and output communication links, said root node not having any outgoing uplinks, and determining at each node if the data packet has reached a predefined coordinate for the dimension or an edge of the subrectangle for the dimension, and if the data packet has reached the predefined coordinate for the dimension or the edge of the subrectangle for the dimension, determining if the data packet has reached the root node, and if the data packet has not reached the root node, routing the data packet among nodes along another dimension towards the root node.05-19-2011
20110119526LOCAL ROLLBACK FOR FAULT-TOLERANCE IN PARALLEL COMPUTING SYSTEMS - A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.05-19-2011
20110173399DISTRIBUTED PARALLEL MESSAGING FOR MULTIPROCESSOR SYSTEMS - A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.07-14-2011
20110173411TLB EXCLUSION RANGE - A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.07-14-2011
20110173413EMBEDDING GLOBAL BARRIER AND COLLECTIVE IN A TORUS NETWORK - Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.07-14-2011
20110173420PROCESSOR RESUME UNIT - A system for enhancing performance of a computer includes a computer system having a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processor. An external unit is external to the processor for monitoring specified computer resources. The external unit is configured to detect a specified condition using the processor. The processor including one or more threads. The thread resumes an active state from a pause state using the external unit when the specified condition is detected by the external unit.07-14-2011
20110173421MULTI-INPUT AND BINARY REPRODUCIBLE, HIGH BANDWIDTH FLOATING POINT ADDER IN A COLLECTIVE NETWORK - To add floating point numbers in a parallel computing system, a collective logic device receives the floating point numbers from computing nodes. The collective logic devices converts the floating point numbers to integer numbers. The collective logic device adds the integer numbers and generating a summation of the integer numbers. The collective logic device converts the summation to a floating point number. The collective logic device performs the receiving, the converting the floating point numbers, the adding, the generating and the converting the summation in one pass. One pass indicates that the computing nodes send inputs only once to the collective logic device and receive outputs only once from the collective logic device.07-14-2011
20110173422PAUSE PROCESSOR HARDWARE THREAD UNTIL PIN - A system and method for enhancing performance of a computer which includes a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processer. The processor processes instructions from the program. A wait state in the processor waits for receiving specified data. A thread in the processor has a pause state wherein the processor waits for specified data. A pin in the processor initiates a return to an active state from the pause state for the thread. A logic circuit is external to the processor, and the logic circuit is configured to detect a specified condition. The pin initiates a return to the active state of the thread when the specified condition is detected using the logic circuit.07-14-2011