Alan Gara, Mount Kisco US

Alan Gara, Mount Kisco, NY US

Patent application number	Description	Published
20090003228	BAD DATA PACKET CAPTURE DEVICE - An apparatus and method for capturing data packets for analysis on a network computing system includes a sending node and a receiving node connected by a bi-directional communication link. The sending node sends a data transmission to the receiving node on the bi-directional communication link, and the receiving node receives the data transmission and verifies the data transmission to determine valid data and invalid data and verify retransmissions of invalid data as corresponding valid data. A memory device communicates with the receiving node for storing the invalid data and the corresponding valid data. A computing node communicates with the memory device and receives and performs an analysis of the invalid data and the corresponding valid data received from the memory device.	01-01-2009
20090006605	EXTENDED WRITE COMBINING USING A WRITE CONTINUATION HINT FLAG - A computing apparatus for reducing the amount of processing in a network computing system which includes a network system device of a receiving node for receiving electronic messages comprising data. The electronic messages are transmitted from a sending node. The network system device determines when more data of a specific electronic message is being transmitted. A memory device stores the electronic message data and communicating with the network system device. A memory subsystem communicates with the memory device. The memory subsystem stores a portion of the electronic message when more data of the specific message will be received, and the buffer combines the portion with later received data and moves the data to the memory device for accessible storage.	01-01-2009
20090049317	Managing Power in a Parallel Computer - Managing power in a parallel computer, the parallel computer including a power supply and a plurality of compute nodes, the plurality of compute nodes powered by the power supply through a plurality of DC-DC converters, each DC-DC converter supplying current to an assigned group of compute nodes, each DC-DC converter having a current sensor. Embodiments include monitoring, by the current sensor, an amount of current supplied by that DC-DC converter to its assigned group of compute nodes; determining, by at least one DC-DC converter, that the amount of current supplied is greater than a predefined threshold value; sending, by the at least one DC-DC converter to the plurality of compute nodes, a global interrupt, including notifying the plurality of compute nodes to reduce power consumption; and reducing, by the plurality of compute nodes in accordance with power consumption ratios, power consumption of the compute nodes.	02-19-2009
20100138684	MEMORY SYSTEM WITH DYNAMIC SUPPLY VOLTAGE SCALING - A memory controller, memory device, and method for dynamic supply voltage scaling in a memory system are provided. The method includes receiving a request for a supply voltage change at the memory controller in the memory system, the supply voltage powering the memory device. The method further includes waiting for any current access of the memory device to complete, and disabling a clock between the memory controller and the memory device. The method also includes changing the supply voltage responsive to the request, and enabling the clock.	06-03-2010
20110047334	Checkpointing in Speculative Versioning Caches - Mechanisms for generating checkpoints in a speculative versioning cache of a data processing system are provided. The mechanisms execute code within the data processing system, wherein the code accesses cache lines in the speculative versioning cache. The mechanisms further determine whether a first condition occurs indicating a need to generate a checkpoint in the speculative versioning cache. The checkpoint is a speculative cache line which is made non-speculative in response to a second condition occurring that requires a roll-back of changes to a cache line corresponding to the speculative cache line. The mechanisms also generate the checkpoint in the speculative versioning cache in response to a determination that the first condition has occurred.	02-24-2011
20110047358	In-Data Path Tracking of Floating Point Exceptions and Store-Based Exception Indication - Mechanisms are provided for tracking exceptions in the execution of vectorized code. A speculative instruction is executed on a vector element of a vector. An exception condition is detected in association with the vector element based on a result of executing the speculative instruction on the vector element. A special exception value is stored in the vector element in a vector register corresponding to the vector, indicative of the exception condition, without invoking an exception handler for the exception condition. The special exception value is propagated with the vector element of the vector through a processor architecture of the processor, without invoking the exception handler for the exception condition. An exception corresponding to the exception condition indicated by the special exception value is generated only in response to a non-speculative instruction being executed that performs a non-speculative operation on the vector element.	02-24-2011
20110047359	Insertion of Operation-and-Indicate Instructions for Optimized SIMD Code - Mechanisms are provided for inserting indicated instructions for tracking and indicating exceptions in the execution of vectorized code. A portion of first code is received for compilation. The portion of first code is analyzed to identify non-speculative instructions performing designated non-speculative operations in the first code that are candidates for replacement by replacement operation-and-indicate instructions that perform the designated non-speculative operations and further perform an indication operation for indicating any exception conditions corresponding to special exception values present in vector register inputs to the replacement operation-and-indicate instructions. The replacement is performed and second code is generated based on the replacement of the at least one non-speculative instruction. The data processing system executing the compiled code is configured to store special exception values in vector output registers, in response to a speculative instruction generating an exception condition, without initiating exception handling.	02-24-2011
20110047362	Version Pressure Feedback Mechanisms for Speculative Versioning Caches - Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache.	02-24-2011
20110119521	REPRODUCIBILITY IN A MULTIPROCESSOR SYSTEM - Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed: a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.	05-19-2011
20110172984	EFFICIENCY OF STATIC CORE TURN-OFF IN A SYSTEM-ON-A-CHIP WITH VARIATION - A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.	07-14-2011
20110173392	EVICT ON WRITE, A MANAGEMENT STRATEGY FOR A PREFETCH UNIT AND/OR FIRST LEVEL CACHE IN A MULTIPROCESSOR SYSTEM WITH SPECULATIVE EXECUTION - In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.	07-14-2011
20110173394	ORDERING OF GUARDED AND UNGUARDED STORES FOR NO-SYNC I/O - A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.	07-14-2011
20110173432	RELIABILITY AND PERFORMANCE OF A SYSTEM-ON-A-CHIP BY PREDICTIVE WEAR-OUT BASED ACTIVATION OF FUNCTIONAL COMPONENTS - A processor-implemented method for determining aging of a processing unit in a processor the method comprising: calculating an effective aging profile for the processing unit wherein the effective aging profile quantifies the effects of aging on the processing unit; combining the effective aging profile with process variation data, actual workload data and operating conditions data for the processing unit; and determining aging through an aging sensor of the processing unit using the effective aging profile, the process variation data, the actual workload data, architectural characteristics and redundancy data, and the operating conditions data for the processing unit.	07-14-2011
20110219187	CACHE DIRECTORY LOOKUP READER SET ENCODING FOR PARTIAL CACHE LINE SPECULATION SUPPORT - In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.	09-08-2011
20110219188	CACHE AS POINT OF COHERENCE IN MULTIPROCESSOR SYSTEM - In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.	09-08-2011
20110219191	READER SET ENCODING FOR DIRECTORY OF SHARED CACHE MEMORY IN MULTIPROCESSOR SYSTEM - In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.	09-08-2011
20110219215	ATOMICITY: A MULTI-PRONGED APPROACH - In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions.	09-08-2011
20110219280	COLLECTIVE NETWORK FOR COMPUTER STRUCTURES - A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.	09-08-2011
20110219381	MULTIPROCESSOR SYSTEM WITH MULTIPLE CONCURRENT MODES OF EXECUTION - A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.	09-08-2011
20120185672	LOCAL-ONLY SYNCHRONIZING OPERATIONS - Performing a series of successive synchronizing operations by a core on data shared by a plurality of cores may include a first core indicating an upcoming synchronizing operation on shared data. A second memory layer stores the shared data and tracks the first core's ownership of the shared data. The second memory layer is shared via coherency operations among the first core and one or more second cores. The first core may perform one or more synchronization operations on the shared data without requiring interaction from the second memory layer.	07-19-2012
20120198118	USING DMA FOR COPYING PERFORMANCE COUNTER DATA TO MEMORY - A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance counter data.	08-02-2012
20120210073	WRITE-THROUGH CACHE OPTIMIZED FOR DEPENDENCE-FREE PARALLEL REGIONS - An apparatus, method and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.	08-16-2012
20120210162	STATE RECOVERY AND LOCKSTEP EXECUTION RESTART IN A SYSTEM WITH MULTIPROCESSOR PAIRING - System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory “nest” (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus. Each selectively paired processor core is includes a transactional execution facility, wherein the system is configured to enable processor rollback to a previous state and reinitialize lockstep execution in order to recover from an incorrect execution when an incorrect execution has been detected by the selective pairing facility.	08-16-2012
20120210164	SCHEDULER FOR MULTIPROCESSOR SYSTEM SWITCH WITH SELECTIVE PAIRING - System, method and computer program product for scheduling threads in a multiprocessing system with selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). The method configures the selective pairing facility to use checking provide one highly reliable thread for high-reliability and allocate threads to corresponding processor cores indicating need for hardware checking. The method configures the selective pairing facility to provide multiple independent cores and allocate threads to corresponding processor cores indicating inherent resilience.	08-16-2012
20120210172	MULTIPROCESSOR SWITCH WITH SELECTIVE PAIRING - System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory “nest” (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus.	08-16-2012
20120266008	SYSTEM-WIDE POWER MANAGEMENT CONTROL VIA CLOCK DISTRIBUTION NETWORK - An apparatus, method and computer program product for automatically controlling power dissipation of a parallel computing system that includes a plurality of processors. A computing device issues a command to the parallel computing system. A clock pulse-width modulator encodes the command in a system clock signal to be distributed to the plurality of processors. The plurality of processors in the parallel computing system receive the system clock signal including the encoded command, and adjusts power dissipation according to the encoded command.	10-18-2012
20120331232	WRITE-THROUGH CACHE OPTIMIZED FOR DEPENDENCE-FREE PARALLEL REGIONS - An apparatus and computer program product for improving performance of a parallel computing system. A first hardware local cache controller associated with a first local cache memory device of a first processor detects an occurrence of a false sharing of a first cache line by a second processor running the program code and allows the false sharing of the first cache line by the second processor. The false sharing of the first cache line occurs upon updating a first portion of the first cache line in the first local cache memory device by the first hardware local cache controller and subsequent updating a second portion of the first cache line in a second local cache memory device by a second hardware local cache controller.	12-27-2012
20130283123	COMBINED GROUP ECC PROTECTION AND SUBGROUP PARITY PROTECTION - A method and system are disclosed for providing combined error code protection and subgroup parity protection for a given group of n bits. The method comprises the steps of identifying a number, m, of redundant bits for said error protection; and constructing a matrix P, wherein multiplying said given group of n bits with P produces m redundant error correction code (ECC) protection bits, and two columns of P provide parity protection for subgroups of said given group of n bits. In the preferred embodiment of the invention, the matrix P is constructed by generating permutations of m bit wide vectors with three or more, but an odd number of, elements with value one and the other elements with value zero; and assigning said vectors to rows of the matrix P.	10-24-2013
20140052970	OPCODE COUNTING FOR PERFORMANCE MEASUREMENT - Methods, systems and computer program products are disclosed for measuring a performance of a program running on a processing unit of a processing system. In one embodiment, the method comprises informing a logic unit of each instruction in the program that is executed by the processing unit, assigning a weight to each instruction, assigning the instructions to a plurality of groups, and analyzing the plurality of groups to measure one or more metrics. In one embodiment, each instruction includes an operating code portion, and the assigning includes assigning the instructions to the groups based on the operating code portions of the instructions. In an embodiment, each type of instruction is assigned to a respective one of the plurality of groups. These groups may be combined into a plurality of sets of the groups.	02-20-2014
20140122980	COLLECTIVE NETWORK FOR COMPUTER STRUCTURES - A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.	05-01-2014
20140207987	MULTIPROCESSOR SYSTEM WITH MULTIPLE CONCURRENT MODES OF EXECUTION - A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.	07-24-2014
20140237045	EMBEDDING GLOBAL BARRIER AND COLLECTIVE IN A TORUS NETWORK - Embodiments of the invention provide a method, system and computer program product for embedding a global barrier and global interrupt network in a parallel computer system organized as a torus network. The computer system includes a multitude of nodes. In one embodiment, the method comprises taking inputs from a set of receivers of the nodes, dividing the inputs from the receivers into a plurality of classes, combining the inputs of each of the classes to obtain a result, and sending said result to a set of senders of the nodes. Embodiments of the invention provide a method, system and computer program product for embedding a collective network in a parallel computer system organized as a torus network. In one embodiment, the method comprises adding to a torus network a central collective logic to route messages among at least a group of nodes in a tree structure.	08-21-2014

Patent applications by Alan Gara, Mount Kisco, NY US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Alan Gara, Mount Kisco US

Alan Gara, Mount Kisco, NY US