Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Elnozahy, TX

Elmootazbellah N. Elnozahy, Austin, TX US

Patent application number	Description	Published
20100088494	Total cost based checkpoint selection - A method, system, and computer usable program product for total cost based checkpoint selection are provided in the illustrative embodiments. A cost associated with taking a checkpoint is determined. The cost includes an energy cost. An interval between checkpoints is computed so as to minimize the cost. An instruction is sent to schedule the checkpoints at the computed interval. The energy cost may further include a cost of energy consumed in collecting and saving data at a checkpoint, a cost of energy consumed in re-computing a computation lost due to a failure after taking the checkpoint, or a combination thereof. The cost may further include, converted to a cost equivalent, administration time consumed in recovering from a checkpoint, computing resources expended in taking a checkpoint, computing resources expended after a failure in restoring information from a checkpoint, performance degradation of an application while taking a checkpoint, or a combination thereof.	04-08-2010
20110004875	Method and System for Performance Isolation in Virtualized Environments - A method, a system, an apparatus, and a computer program product for allocating resources of one or more shared devices to one or more partitions of a virtualization environment within a data processing system. At least one user defined resource assignment is received for one or more devices associated with the data processing system. One or more registers, associated with the one or more partitions are dynamically set to execute the at least one resource assignment, whereby the at least one resource assignment enables a user defined quantitative measure (number and/or percentage) of devices to operate when the one or more transactions are executed via the partition. The system enables the one or more devices to execute one or more transactions at a bandwidth/capacity that is less than or equal to the user defined resource assignment and minimizes performance interference among partitions.	01-06-2011
20110154348	METHOD OF EXPLOITING SPARE PROCESSORS TO REDUCE ENERGY CONSUMPTION - A method, system, and computer program product for reducing power and energy consumption in a server system with multiple processor cores is disclosed. The system may include an operating system for scheduling user workloads among a processor pool. The processor pool may include active licensed processor cores and inactive unlicensed processor cores. The method and computer program product may reduce power and energy consumption by including steps and sets of instructions activating spare cores and adjusting the operating frequency of processor cores, including the newly activated spare cores to provide equivalent computing resources as the original licensed cores operating at a specified clock frequency.	06-23-2011
20110173592	Architectural Support for Automated Assertion Checking - A mechanism is provided for automatic detection of assertion violations. An application may write assertion tuples to the assertion checking mechanism. An assertion tuple forms a Boolean expression (predicate or invariant) that the developer of the application wishes to check. If the assertion defined by the tuple remains true, then the application does not violate the assertion. For any instruction that stores a value to a memory location or register at a target address, the assertion checking mechanism compares the target address to the addresses specified in the assertion tuples. If the target address matches one of the tuple addresses, then the assertion checking mechanism reads a value from the other address in the tuple. The assertion checking mechanism then recomputes the assertion using the retrieved value along with the value to be stored. If the assertion checking mechanism detects an assertion violation, the assertion checking mechanism raises an exception.	07-14-2011
20110231030	Minimizing Aggregate Cooling and Leakage Power - A mechanism is provided for minimizing system power in a data processing system. A management control unit determines whether a convergence has been reached in the data processing system. If convergence fails to be reached, the management control unit determines whether a maximum fan flag is set to indicate that a fan is operating at a maximum speed. Responsive to the maximum fan flag failing to be set, a thermal threshold of the data processing system is either increased or decreased and thereby a fan speed of the data processing system is either increased or decreased based on whether the system power of the data processing system has either increased or decreased and based on whether a temperature of the data processing system has either increased or decreased. Thus, a new thermal threshold and a new fan speed are formed. The process is then repeated until convergence has been met.	09-22-2011
20110258421	Architecture Support for Debugging Multithreaded Code - Mechanisms are provided for debugging application code using a content addressable memory. The mechanisms receive an instruction in a hardware unit of a processor of the data processing system, the instruction having a target memory address that the instruction is attempting to access. A content addressable memory (CAM) associated with the hardware unit is searched for an entry in the CAM corresponding to the target memory address. In response to an entry in the CAM corresponding to the target memory address being found, a determination is made as to whether information in the entry identifies the instruction as an instruction of interest. In response to the entry identifying the instruction as an instruction of interest, an exception is generated and sent to one of an exception handler or a debugger application. In this way, debugging of multithreaded applications may be performed in an efficient manner.	10-20-2011
20110265068	Single Thread Performance in an In-Order Multi-Threaded Processor - A mechanism is provided for improving single-thread performance for a multi-threaded, in-order processor core. In a first phase, a compiler analyzes application code to identify instructions that can be executed in parallel with focus on instruction-level parallelism and removing any register interference between the threads. The compiler inserts as appropriate synchronization instructions supported by the apparatus to ensure that the resulting execution of the threads is equivalent to the execution of the application code in a single thread. In a second phase, an operating system schedules the threads produced in the first phase on the hardware threads of a single processor core such that they execute simultaneously. In a third phase, the microprocessor core executes the threads specified by the second phase such that there is one hardware thread executing an application thread.	10-27-2011
20110292594	Scalable Space-Optimized and Energy-Efficient Computing System - A scalable space-optimized and energy-efficient computing system is provided. The computing system comprises a plurality of modular compartments in at least one level of a frame configured in a hexadron configuration. The computing system also comprises an air inlet, an air mixing plenum, and at least one fan. In the computing system the plurality of modular compartments are affixed above the air inlet, the air mixing plenum is affixed above the plurality of modular compartments, and the at least one fan is affixed above the air mixing plenum. When at least one module is inserted into one of the plurality of modular compartments, the module couples to a backplane within the frame.	12-01-2011
20110292596	Heatsink Allowing In-Situ Maintenance in a Stackable Module - A modular processing module allowing in-situ maintenance is provided. The modular processing module comprises a set of processing module sides. Each processing module side comprises a circuit board, a plurality of connectors, and a plurality of processing nodes. Each processing module side couples to another processing module side using at least one connector in the plurality of connectors such that, when all of the set of processing module sides are coupled together, the modular processing module is formed. The modular processing module comprises an exterior connection to a power source and a communication system and at least one heatsink that couples to at least a portion of the plurality of processing nodes on one of the processing module sides and is designed such that, when a set of heatsinks in the modular processing module are installed, an empty space is left in a center of the modular processing module.	12-01-2011
20110292597	Stackable Module for Energy-Efficient Computing Systems - A modular processing module is provided. The modular processing module comprises a set of processing module sides. Each processing module side comprises a circuit board, a plurality of connectors coupled to the circuit board, and a plurality of processing nodes coupled to the circuit board. Each processing module side in the set of processing module sides couples to another processing module side using at least one connector in the plurality of connectors such that, when all of the set of processing module sides are coupled together, the modular processing module is formed. The modular processing module comprises an exterior connection to a power source and a communication system.	12-01-2011
20110296097	Mechanisms for Reducing DRAM Power Consumption - Mechanisms are provided for inhibiting precharging of memory cells of a dynamic random access memory (DRAM) structure. The mechanisms receive a command for accessing memory cells of the DRAM structure. The mechanisms further determine, based on the command, if precharging the memory cells following accessing the memory cells is to be inhibited. Moreover, the mechanisms send, in response to the determination indicating that precharging the memory cells is to be inhibited, a command to blocking logic of the DRAM structure to block precharging of the memory cells following accessing the memory cells.	12-01-2011
20110296212	Optimizing Energy Consumption and Application Performance in a Multi-Core Multi-Threaded Processor System - A mechanism is provided for scheduling application tasks. A scheduler receives a task that identifies a desired frequency and a desired maximum number of competing hardware threads. The scheduler determines whether a user preference designates either maximization of performance or minimization of energy consumption. Responsive to the user preference designating the performance, the scheduler determines whether there is an idle processor core in a plurality of processor cores available. Responsive to no idle processor being available, the scheduler identifies a subset of processor cores having a smallest load coefficient. From the subset of processor cores, the scheduler determines whether there is at least one processor core that matches desired parameters of the task. Responsive to at least one processor core matching the desired parameters of the task, the scheduler assigns the task to one of the at least one processor core that matches the desired parameters.	12-01-2011
20120173906	Optimizing Energy Consumption and Application Performance in a Multi-Core Multi-Threaded Processor System - A mechanism is provided for scheduling application tasks. A scheduler receives a task that identifies a desired frequency and a desired maximum number of competing hardware threads. The scheduler determines whether a user preference designates either maximization of performance or minimization of energy consumption. Responsive to the user preference designating the performance, the scheduler determines whether there is an idle processor core in a plurality of processor cores available. Responsive to no idle processor being available, the scheduler identifies a subset of processor cores having a smallest load coefficient. From the subset of processor cores, the scheduler determines whether there is at least one processor core that matches desired parameters of the task. Responsive to at least one processor core matching the desired parameters of the task, the scheduler assigns the task to one of the at least one processor core that matches the desired parameters.	07-05-2012
20120203979	Architecture Support for Debugging Multithreaded Code - Mechanisms are provided for debugging application code using a content addressable memory. The mechanisms receive an instruction in a hardware unit of a processor of the data processing system, the instruction having a target memory address that the instruction is attempting to access. A content addressable memory (CAM) associated with the hardware unit is searched for an entry in the CAM corresponding to the target memory address. In response to an entry in the CAM corresponding to the target memory address being found, a determination is made as to whether information in the entry identifies the instruction as an instruction of interest. In response to the entry identifying the instruction as an instruction of interest, an exception is generated and sent to one of an exception handler or a debugger application. In this way, debugging of multithreaded applications may be performed in an efficient manner.	08-09-2012
20120320524	Heatsink Allowing In-Situ Maintenance in a Stackable Module - A modular processing module allowing in-situ maintenance is provided. The modular processing module comprises a set of processing module sides. Each processing module side comprises a circuit board, a plurality of connectors, and a plurality of processing nodes. Each processing module side couples to another processing module side using at least one connector in the plurality of connectors such that, when all of the set of processing module sides are coupled together, the modular processing module is formed. The modular processing module comprises an exterior connection to a power source and a communication system and at least one heatsink that couples to at least a portion of the plurality of processing nodes on one of the processing module sides and is designed such that, when a set of heatsinks in the modular processing module are installed, an empty space is left in a center of the modular processing module.	12-20-2012
20140149382	Technology for Web Site Crawling - A web site page has a reference for providing an address for a next page. The web site is crawled by a crawler program, which parses the reference from one of the web pages and sends the reference to an applet running in a browser. The address for the next page is determined by the browser responsive to the reference and is sent to the crawler. The crawler selects non-hypertext-link parameters from the web page of the web site server by performing a programmed action sequence, including selecting items from lists of the web page in a particular sequence. The crawler sends the applet running in the browser, for the query to the web server for the next page referenced by the one web page, the selected parameters and a context arising from the particular sequence.	05-29-2014

Patent applications by Elmootazbellah N. Elnozahy, Austin, TX US

Elmootazbellah Nabil Elnozahy, Austin, TX US

Patent application number	Description	Published
20080263284	Methods and Arrangements to Manage On-Chip Memory to Reduce Memory Latency - Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.	10-23-2008
20080270726	Apparatus and Method for Providing Remote Access Redirect Capability in a Channel Adapter of a System Area Network - A method and apparatus for providing remote access redirect in a host channel adapter of a system area network are provided. The apparatus and method provide a mechanism by which a host channel adapter, in response to receiving a marker message, places selected channel(s) of the host channel adapter in a remote access redirect (RAR) mode of operation. During the RAR mode of operation, memory access messages received by the host channel adapter that are destined for portions of an application memory space marked as being protected are converted to RAR receive messages and redirected to a queue pair associated with an operating system rather than the queue pair for the application. The operating system is responsible for serializing access to application memory pages outside of the host channel adapter. The mechanisms of the present invention may be used to perform a checkpoint data integrity operation.	10-30-2008
20090144383	Apparatus and Method for Providing Remote Access Redirect Capability in a Channel Adapter of a System Area Network - A method and apparatus for providing remote access redirect in a host channel adapter of a system area network are provided. The apparatus and method provide a mechanism by which a host channel adapter, in response to receiving a marker message, places selected channel(s) of the host channel adapter in a remote access redirect (RAR) mode of operation. During the RAR mode of operation, memory access messages received by the host channel adapter that are destined for portions of an application memory space marked as being protected are converted to RAR receive messages and redirected to a queue pair associated with an operating system rather than the queue pair for the application. The operating system is responsible for serializing access to application memory pages outside of the host channel adapter. The mechanisms of the present invention may be used to perform a checkpoint data integrity operation.	06-04-2009
20110296113	RECOVERY IN SHARED MEMORY ENVIRONMENT - A method, system, and computer usable program product for recovery in a shared memory environment are provided in the illustrative embodiments. A core in a multi-core processor is designated as a user level core (ULC), which executes an instruction to modify a memory while executing an application. A second core is designated as a operating system core (OSC), which manages checkpointing of several segments of the shared memory. A set of flags is accessible to a memory controller to manage a shared memory. A flag in the set of flags corresponds to one segment in the segments of the shared memory. A message or instruction for modification of a segment is received. A cache line tracking determination is made whether a cache line used for the modification has already been used for a similar modification. If not, a part of the segment is checkpointed. The modification proceeds after checkpointing.	12-01-2011
20110296138	FAST REMOTE COMMUNICATION AND COMPUTATION BETWEEN PROCESSORS - A method, system, and computer usable program product for fast remote communication and computation between processors are provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute.	12-01-2011
20110296228	TOLERATING SOFT ERRORS BY SELECTIVE DUPLICATION - A method, system, and computer usable program product for tolerating soft errors by selective duplication are provided in the illustrative embodiments. An application executing in a data processing system, selects an instruction that has to be protected from soft errors. The instruction is marked for duplication such that the instruction is duplicated during execution of the instruction. The marked instruction is sent for execution to a hardware front end.	12-01-2011
20110296241	ACCELERATING RECOVERY IN MPI ENVIRONMENTS - A method, system, and computer usable program product for accelerating recovery in an MPI environment are provided in the illustrative embodiments. A first portion of a distributed application executes using a first processor and a second portion using a second processor in a distributed computing environment. After a failure of operation of the first portion, the first portion is restored to a checkpoint. A first part of the first portion is distributed to a third processor and a second part to a fourth processor. A computation of the first portion is performed using the first and the second parts in parallel. A first message is computed in the first portion and sent to the second portion, the message having been initially computed after a time of the checkpoint. A second message is replayed from the second portion without computing the second message in the second portion.	12-01-2011
20110296423	FRAMEWORK FOR SCHEDULING MULTICORE PROCESSORS - A method, system, and computer usable program product for a framework for scheduling tasks in a multi-core processor or multiprocessor system are provided in the illustrative embodiments. A thread is selected according to an order in a scheduling discipline, the thread being a thread of an application executing in the data processing system, the thread forming the leader thread in a bundle of threads. A value of a core attribute in a set of core attributes is determined according to a corresponding thread attribute in a set of thread attributes associated with the leader thread. A determination is made whether a second thread can be added to the bundle such that the bundle including the second thread will satisfy a policy. If the determining is affirmative, the second thread is added to the bundle. The bundle is scheduled for execution using a core of the multi-core processor.	12-01-2011
20120191946	FAST REMOTE COMMUNICATION AND COMPUTATION BETWEEN PROCESSORS - A method for fast remote communication and computation between processors is provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute.	07-26-2012
20120223764	ON-CHIP CONTROL OF THERMAL CYCLING - A method, system, and computer program product for on-chip control of thermal cycling in an integrated circuit (IC) are provided in the illustrative embodiments. A first circuit is configured on the IC for adjusting a first voltage being applied to a first part of the IC. A first temperature of the first part is measured at a first time. A determination is made that the first temperature is outside a temperature range defined by an upper temperature threshold and a lower temperature threshold. The first voltage is adjusted by reducing the first voltage when the first temperature exceeds the upper temperature threshold and by increasing the first voltage when the first temperature is below the lower temperature threshold, thereby causing the first temperature of the first part to attain a value within the temperature range.	09-06-2012
20120226870	RECOVERY IN SHARED MEMORY ENVIRONMENT - A method for recovery in a shared memory environment is provided in the illustrative embodiments. A core in a multi-core processor is designated as a user level core (ULC), which executes an instruction to modify a memory while executing an application. A second core is designated as a operating system core (OSC), which manages checkpointing of several segments of the shared memory. A set of flags is accessible to a memory controller to manage a shared memory. A flag in the set of flags corresponds to one segment in the segments of the shared memory. A message or instruction for modification of a segment is received. A cache line tracking determination is made whether a cache line used for the modification has already been used for a similar modification. If not, a part of the segment is checkpointed. The modification proceeds after checkpointing.	09-06-2012
20120226939	ACCELERATING RECOVERY IN MPI ENVIRONMENTS - A computer usable program product for accelerating recovery in an MPI environment is provided in the illustrative embodiments. A first portion of a distributed application executes using a first processor and a second portion using a second processor in a distributed computing environment. After a failure of operation of the first portion, the first portion is restored to a checkpoint. A first part of the first portion is distributed to a third processor and a second part to a fourth processor. A computation of the first portion is performed using the first and the second parts in parallel. A first message is computed in the first portion and sent to the second portion, the message having been initially computed after a time of the checkpoint. A second message is replayed from the second portion without computing the second message in the second portion.	09-06-2012
20120227048	FRAMEWORK FOR SCHEDULING MULTICORE PROCESSORS - A method for a framework for scheduling tasks in a multi-core processor or multiprocessor system is provided in the illustrative embodiments. A thread is selected according to an order in a scheduling discipline, the thread being a thread of an application executing in the data processing system, the thread forming the leader thread in a bundle of threads. A value of a core attribute in a set of core attributes is determined according to a corresponding thread attribute in a set of thread attributes associated with the leader thread. A determination is made whether a second thread can be added to the bundle such that the bundle including the second thread will satisfy a policy. If the determining is affirmative, the second thread is added to the bundle. The bundle is scheduled for execution using a core of the multi-core processor.	09-06-2012

Patent applications by Elmootazbellah Nabil Elnozahy, Austin, TX US