| Patent application number | Description | Published |
| 20080307201 | Method and Apparatus for Cooperative Software Multitasking In A Processor System with a Partitioned Register File - A processor system executes multiple applet programs within a software application program in an information handling system. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In particular, the operating system software manages partitioning of a register file in the processor system to achieve a cooperative relationship among multiple applet programs within respective partitions of the register file. In one embodiment, the operating system software manages unique applet ID's to modify register file partition sizes and locations during applet program instruction text execution. In one embodiment, applet ID masking hardware provides sharing of register file space among multiple copies of applet program code. | 12-11-2008 |
| 20090070654 | Design Structure For A Processor System With Background Error Handling Feature - A design structure for a processor system may be embodied in a machine readable medium for designing, manufacturing or testing a processor integrated circuit. The design structure may embody a processor system that integrates error correcting code (ECC) detection and correction hardware within an memory management circuit. The design structure may specify ECC hardware circuitry that provides detection, correction and generation of ECC data bits in conjunction with memory data read and writes. The design structure for the processor system may permit the detection and correction of soft single bit errors read from local memory in-line while using read modify write DMA circuit logic to correct local memory data. The design structure may provide for local memory data error detection and correction in a background memory scrub process without the need for additional in-line data logic. | 03-12-2009 |
| 20090204781 | System for Limiting the Size of a Local Storage of a Processor - A system for limiting the size of a local storage of a processor is provided. A facility is provided in association with a processor for setting a local storage size limit. This facility is a privileged facility and can only be accessed by the operating system running on a control processor in the multiprocessor system or the associated processor itself. The operating system sets the value stored in the local storage limit register when the operating system initializes a context switch in the processor. When the processor accesses the local storage using a request address, the local storage address corresponding to the request address is compared against the local storage limit size value in order to determine if the local storage address, or a modulo of the local storage address, is used to access the local storage. | 08-13-2009 |
| 20110145503 | ON-LINE OPTIMIZATION OF SOFTWARE INSTRUCTION CACHE - A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program. | 06-16-2011 |
| Patent application number | Description | Published |
| 20110161548 | Efficient Multi-Level Software Cache Using SIMD Vector Permute Functionality - A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location. | 06-30-2011 |
| 20110161641 | SPE Software Instruction Cache - An application thread executes a direct branch instruction that is stored in an instruction cache line. Upon execution, the direct branch instruction branches to a branch descriptor that is also stored in the instruction cache line. The branch descriptor includes a trampoline branch instruction and a target instruction space address. Next, the trampoline branch instruction sends a branch descriptor pointer, which points to the branch descriptor, to an instruction cache manager. The instruction cache manager extracts the target instruction space address from the branch descriptor, and executes a target instruction corresponding to the target instruction space address. In one embodiment, the instruction cache manager generates a target local store address by masking off a portion of bits included in the target instruction space address. In turn, the application thread executes the target instruction located at the target local store address accordingly. | 06-30-2011 |
| Patent application number | Description | Published |
| 20090112550 | System and Method for Generating a Worst Case Current Waveform for Testing of Integrated Circuit Devices - A system and method for generating a worst case current waveform for testing of integrated circuit devices are provided. Architectural analysis of an integrated circuit device is first performed to determine an initial worst case power workload to be applied to the integrated circuit device. Thereafter, the derived worst case power workload is applied to a model and is simulated to generate a worst case current waveform that is input to an electrical model of the integrated circuit device to generate a worst case noise budget value. The worst case noise budget value is then compared to measured noise from application of the worst case power workload to a hardware implemented integrated circuit device. The worst case current waveform may be selected for future testing of integrated circuit devices or modifications to the simulation models may be performed and the process repeated based on the results of the comparison. | 04-30-2009 |
| 20100161846 | Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution. | 06-24-2010 |
| 20100161848 | Programmable Direct Memory Access Engine - A mechanism for programming a direct memory access engine operating as a single thread processor is provided. A program is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the program located in the local memory is to be executed. The direct memory access engine executes the program without intervention by a host processor. Responsive to the program completing execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution. | 06-24-2010 |
| 20110066769 | Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution. | 03-17-2011 |
| 20110161623 | Data Parallel Function Call for Determining if Called Routine is Data Parallel - Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code. | 06-30-2011 |
| 20110161624 | Floating Point Collect and Operate - Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register. | 06-30-2011 |
| 20110161642 | Parallel Execution Unit that Extracts Data Parallelism at Runtime - Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor. Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory. | 06-30-2011 |
| 20110161643 | Runtime Extraction of Data Parallelism - Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory. | 06-30-2011 |