Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Flachs, TX

Brian Flachs, Georgetown, TX US

Patent application numberDescriptionPublished
20080307201Method and Apparatus for Cooperative Software Multitasking In A Processor System with a Partitioned Register File - A processor system executes multiple applet programs within a software application program in an information handling system. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In particular, the operating system software manages partitioning of a register file in the processor system to achieve a cooperative relationship among multiple applet programs within respective partitions of the register file. In one embodiment, the operating system software manages unique applet ID's to modify register file partition sizes and locations during applet program instruction text execution. In one embodiment, applet ID masking hardware provides sharing of register file space among multiple copies of applet program code.12-11-2008
20090070654Design Structure For A Processor System With Background Error Handling Feature - A design structure for a processor system may be embodied in a machine readable medium for designing, manufacturing or testing a processor integrated circuit. The design structure may embody a processor system that integrates error correcting code (ECC) detection and correction hardware within an memory management circuit. The design structure may specify ECC hardware circuitry that provides detection, correction and generation of ECC data bits in conjunction with memory data read and writes. The design structure for the processor system may permit the detection and correction of soft single bit errors read from local memory in-line while using read modify write DMA circuit logic to correct local memory data. The design structure may provide for local memory data error detection and correction in a background memory scrub process without the need for additional in-line data logic.03-12-2009
20090204781System for Limiting the Size of a Local Storage of a Processor - A system for limiting the size of a local storage of a processor is provided. A facility is provided in association with a processor for setting a local storage size limit. This facility is a privileged facility and can only be accessed by the operating system running on a control processor in the multiprocessor system or the associated processor itself. The operating system sets the value stored in the local storage limit register when the operating system initializes a context switch in the processor. When the processor accesses the local storage using a request address, the local storage address corresponding to the request address is compared against the local storage limit size value in order to determine if the local storage address, or a modulo of the local storage address, is used to access the local storage.08-13-2009
20110145503ON-LINE OPTIMIZATION OF SOFTWARE INSTRUCTION CACHE - A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.06-16-2011

Patent applications by Brian Flachs, Georgetown, TX US

Brian Flachs, Austin, TX US

Patent application numberDescriptionPublished
20110161548Efficient Multi-Level Software Cache Using SIMD Vector Permute Functionality - A cache manager receives a request for data, which includes a requested effective address. The cache manager determines whether the requested effective address matches a most recently used effective address stored in a mapped tag vector. When the most recently used effective address matches the requested effective address, the cache manager identifies a corresponding cache location and retrieves the data from the identified cache location. However, when the most recently used effective address fails to match the requested effective address, the cache manager determines whether the requested effective address matches a subsequent effective address stored in the mapped tag vector. When the cache manager determines a match to a subsequent effective address, the cache manager identifies a different cache location corresponding to the subsequent effective address and retrieves the data from the different cache location.06-30-2011
20110161641SPE Software Instruction Cache - An application thread executes a direct branch instruction that is stored in an instruction cache line. Upon execution, the direct branch instruction branches to a branch descriptor that is also stored in the instruction cache line. The branch descriptor includes a trampoline branch instruction and a target instruction space address. Next, the trampoline branch instruction sends a branch descriptor pointer, which points to the branch descriptor, to an instruction cache manager. The instruction cache manager extracts the target instruction space address from the branch descriptor, and executes a target instruction corresponding to the target instruction space address. In one embodiment, the instruction cache manager generates a target local store address by masking off a portion of bits included in the target instruction space address. In turn, the application thread executes the target instruction located at the target local store address accordingly.06-30-2011

Brian K. Flachs, Georgetown, TX US

Patent application numberDescriptionPublished
20090112550System and Method for Generating a Worst Case Current Waveform for Testing of Integrated Circuit Devices - A system and method for generating a worst case current waveform for testing of integrated circuit devices are provided. Architectural analysis of an integrated circuit device is first performed to determine an initial worst case power workload to be applied to the integrated circuit device. Thereafter, the derived worst case power workload is applied to a model and is simulated to generate a worst case current waveform that is input to an electrical model of the integrated circuit device to generate a worst case noise budget value. The worst case noise budget value is then compared to measured noise from application of the worst case power workload to a hardware implemented integrated circuit device. The worst case current waveform may be selected for future testing of integrated circuit devices or modifications to the simulation models may be performed and the process repeated based on the results of the comparison.04-30-2009
20100161846Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.06-24-2010
20100161848Programmable Direct Memory Access Engine - A mechanism for programming a direct memory access engine operating as a single thread processor is provided. A program is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the program located in the local memory is to be executed. The direct memory access engine executes the program without intervention by a host processor. Responsive to the program completing execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.06-24-2010
20110066769Multithreaded Programmable Direct Memory Access Engine - A mechanism programming a direct memory access engine operating as a multithreaded processor is provided. A plurality of programs is received from a host processor in a local memory associated with the direct memory access engine. A request is received in the direct memory access engine from the host processor indicating that the plurality of programs located in the local memory is to be executed. The direct memory access engine executes two or more of the plurality of programs without intervention by a host processor. As each of the two or more of the plurality of programs completes execution, the direct memory access engine sends a completion notification to the host processor that indicates that the program has completed execution.03-17-2011
20110161623Data Parallel Function Call for Determining if Called Routine is Data Parallel - Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.06-30-2011
20110161624Floating Point Collect and Operate - Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.06-30-2011
20110161642Parallel Execution Unit that Extracts Data Parallelism at Runtime - Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor. Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.06-30-2011
20110161643Runtime Extraction of Data Parallelism - Mechanisms for extracting data dependencies during runtime are provided. The mechanisms execute a portion of code having a loop and generate, for the loop, a first parallel execution group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The mechanisms further execute the first parallel execution group and determining, for each iteration in the subset of iterations, whether the iteration has a data dependence. Moreover, the mechanisms commit store data to system memory only for stores performed by iterations in the subset of iterations for which no data dependence is determined. Store data of stores performed by iterations in the subset of iterations for which a data dependence is determined is not committed to the system memory.06-30-2011

Patent applications by Brian K. Flachs, Georgetown, TX US

Brian King Flachs, Georgetown, TX US

Patent application numberDescriptionPublished
20110072170Systems and Methods for Transferring Data to Maintain Preferred Slot Positions in a Bi-endian Processor - A bi-endian multiprocessor system having multiple processing elements, each of which includes a processor core, a local memory and a memory flow controller. The memory flow controller transfers data between the local memory and data sources external to the processing element. If the processing element and the data source implement data representations having the same endian-ness, each multi-word line of data is stored in the local memory in the same word order as in the data source. If the processing element and the data source implement data representations having different endian-ness, the words of each multi-word line of data are transposed when data is transferred between local memory and the data source. The processing element may incorporate circuitry to add doublewords, wherein the circuitry can alternately carry bits from a first word to a second word or vice versa, depending upon whether the words in lines of data are transposed.03-24-2011

Patent applications by Brian King Flachs, Georgetown, TX US