Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Gara

Alan Gara, Yorktown Heights, NY US

Patent application numberDescriptionPublished
20110119426LIST BASED PREFETCH - A list prefetch engine improves a performance of a parallel computing system. The list prefetch engine receives a current cache miss address. The list prefetch engine evaluates whether the current cache miss address is valid. If the current cache miss address is valid, the list prefetch engine compares the current cache miss address and a list address. A list address represents an address in a list. A list describes an arbitrary sequence of prior cache miss addresses. The prefetch engine prefetches data according to the list, if there is a match between the current cache miss address and the list address.05-19-2011
20110119475GLOBAL SYNCHRONIZATION OF PARALLEL PROCESSORS USING CLOCK PULSE WIDTH MODULATION - A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.05-19-2011
20110119526LOCAL ROLLBACK FOR FAULT-TOLERANCE IN PARALLEL COMPUTING SYSTEMS - A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.05-19-2011
20110172968DISTRIBUTED PERFORMANCE COUNTERS - A plurality of first performance counter modules is coupled to a plurality of processing cores. The plurality of first performance counter modules is operable to collect performance data associated with the plurality of processing cores respectively. A plurality of second performance counter modules are coupled to a plurality of L2 cache units, and the plurality of second performance counter modules are operable to collect performance data associated with the plurality of L2 cache units respectively. A central performance counter module may be operable to coordinate counter data from the plurality of first performance counter modules and the plurality of second performance modules, the a central performance counter module, the plurality of first performance counter modules, and the plurality of second performance counter modules connected by a daisy chain connection.07-14-2011
20110172969OPCODE COUNTING FOR PERFORMANCE MEASUREMENT - Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.07-14-2011
20110173397PROGRAMMABLE STREAM PREFETCH WITH RESOURCE OPTIMIZATION - A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.07-14-2011
20110173398TWO DIFFERENT PREFETCHING COMPLEMENTARY ENGINES OPERATING SIMULTANEOUSLY - A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.07-14-2011
20110173402HARDWARE SUPPORT FOR COLLECTING PERFORMANCE COUNTERS DIRECTLY TO MEMORY - Hardware support for collecting performance counters directly to memory, in one aspect, may include a plurality of performance counters operable to collect one or more counts of one or more selected activities. A first storage element may be operable to store an address of a memory location. A second storage element may be operable to store a value indicating whether the hardware should begin copying. A state machine may be operable to detect the value in the second storage element and trigger hardware copying of data in selected one or more of the plurality of performance counters to the memory location whose address is stored in the first storage element.07-14-2011
20110173403USING DMA FOR COPYING PERFORMANCE COUNTER DATA TO MEMORY - A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance counter data.07-14-2011
20110173411TLB EXCLUSION RANGE - A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.07-14-2011
20110173413EMBEDDING GLOBAL BARRIER AND COLLECTIVE IN A TORUS NETWORK - Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.07-14-2011
20110173488NON-VOLATILE MEMORY FOR CHECKPOINT STORAGE - A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, the non-volatile memory is a pluggable flash memory card.07-14-2011
20110219208MULTI-PETASCALE HIGHLY EFFICIENT PARALLEL SUPERCOMPUTER - A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.09-08-2011

Alana Gara, Yorktown Heights, NY US

Patent application numberDescriptionPublished
20110072241FAST CONCURRENT ARRAY-BASED STACKS, QUEUES AND DEQUES USING FETCH-AND-INCREMENT-BOUNDED AND A TICKET LOCK PER ELEMENT - Implementation primitives for concurrent array-based stacks, queues, double-ended queues (deques) and wrapped deques are provided. In one aspect, each element of the stack, queue, deque or wrapped deque data structure has its own ticket lock, allowing multiple threads to concurrently use multiple elements of the data structure and thus achieving high performance. In another aspect, new synchronization primitives FetchAndIncrementBounded (Counter, Bound) and FetchAndDecrementBounded (Counter, Bound) are implemented. These primitives can be implemented in hardware and thus promise a very fast throughput for queues, stacks and double-ended queues.03-24-2011

Alan G. Gara, Mount Kidsco, NY US

Patent application numberDescriptionPublished
20090006873POWER THROTTLING OF COLLECTIONS OF COMPUTING ELEMENTS - An apparatus and method for controlling power usage in a computer includes a plurality of computers communicating with a local control device, and a power source supplying power to the local control device and the computer. A plurality of sensors communicate with the computer for ascertaining power usage of the computer, and a system control device communicates with the computer for controlling power usage of the computer.01-01-2009

Alan Gene Gara, Mount Kisco, NY US

Patent application numberDescriptionPublished
20090028073DATA CAPTURE TECHNIQUE FOR HGH SPEED SIGNALING - A data capture technique for high speed signaling to allow for optimal sampling of an asynchronous data stream. This technique allows for extremely high data rates and does not require that a clock be sent with the data as is done in source synchronous systems. The present invention also provides a hardware mechanism for automatically adjusting transmission delays for optimal two-bit simultaneous bi-directional (SiBiDi) signaling.01-29-2009

Peter Gara, Torokbalint HU

Patent application numberDescriptionPublished
20090265844COLLAPSIBLE TOILET - The invention is a collapsible toilet (10-29-2009

Patent applications by Peter Gara, Torokbalint HU

Stephen G. Gara, Souderton, PA US

Patent application numberDescriptionPublished
20120022568Instrument for Use in Bone and Method of Use - Instrument for creating voids and channels in bone; and for obtaining samples of bone tissue. The instruments are suitable for reducing fractures in bone and for compacting the bone to create a barrier for leakage upon material injection. The instruments may also be used for bone biopsy and for obtaining samples of bone tissue. Methods for using these instruments are also disclosed.01-26-2012

Steve Gara, Cooper City, FL US

Patent application numberDescriptionPublished
201001266953-In-1 Portable Beverage Chiller and Warmer - A portable device/appliance for cooling and heating beverages on a center circular metallic object/pad. The appliance body is of square shape made of hard plastic with a circular metallic pad that includes an assembly that heats and cools beverages with the help of a switch. It has an electrical means by plug/adapter for connecting to a source of direct current and for enabling the polarity of the direct current provided to be reversed so that the device can either heat or cool the beverage container that is placed on the circular metallic object/pad. It also has the ability to heat and cool beverages by battery and by USB cable connected to a PC/Computer with a USB outlet. These 3 ways of power give the device it's portability. The fact that it operates via battery power also makes it a true portable device.05-27-2010