Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Emberling

Brian Emberling, Palo Alto, CA US

Patent application numberDescriptionPublished
20090037918Thread sequencing for multi-threaded processor with instruction cache - Execution of the first thread of a new program is prioritized ahead of older threads for a previously running program. The new program is invoked during the execution of a thread of the previous program. The first thread of the program is prioritized ahead of the remaining threads of the previous program. In an embodiment of the invention, additional threads of the new program are also prioritized ahead of the older threads. A thread's context may include a table of constant values that can be referenced by each program and are shared by multiple threads. Changing the values in a constant table for a new thread is time intensive. To avoid changes to the constant table (and thereby save time), a higher priority status is conferred to the first thread that follows a change to the constant table.02-05-2009
20090276777Multiple Programs for Efficient State Transitions on Multi-Threaded Processors - A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache. Since, such a mechanism is carried out only for one thread in a program cycle, pitfalls of a conventional general pre-fetch scheme that involves parsing of the program in advance to determine which resources and instructions will be needed at a later time, are avoided.11-05-2009
20110055511Interlocked Increment Memory Allocation and Access - A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts. A corresponding method of assigning a memory to a plurality of reader threads includes determining a first number corresponding to a number of writer threads having a block allocated in said memory, launching a first number of reader threads, entering a first wavefront of said reader threads from said group of wavefronts to an atomic operation, and assigning a first block in the memory to the first wavefront during the corresponding atomic operation, where the first block is contiguous to a previously allocated block dynamically allocated to another wavefront from said group of wavefronts. Corresponding system embodiments and computer program product embodiments are also presented.03-03-2011

Brian Emberling, San Mateo, CA US

Patent application numberDescriptionPublished
20090300621Local and Global Data Share - A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.12-03-2009

Brian D. Emberling, Palo Alto, CA US

Patent application numberDescriptionPublished
20090276563Incremental State Updates - A system and method are described that manage incremental state updates in such a way that multiple threads within a processor can each operate, in effect, on their own set of state data. The system and method are applicable to any processor in which multiple threads require access to sets of state information which differ from one another by a relatively small number of state changes.11-05-2009
20100107143Method and System for Thread Monitoring - An apparatus and methods for hardware-based performance monitoring of a computer system are presented. The apparatus includes: processing units; a memory; a connector device connecting the processing units and the memory; probes inserted the processing units, and the probes generating probe signals when selected processing events are detected; and a thread trace device connected to the connector device. The thread trace device includes an event interface to receive probe signals, and an event memory controller to send probe event messages to the memory, where probe event messages are based on probe signals. Also presented is a method that includes: inserting event probes in hardware-based processing units, where the event probes generate probe events when predetermined processing events are detected; configuring a hardware-based device to generate probe event messages based on said probe events; and transferring the probe event messages to a memory. The probe event messages transferred to memory can be subsequently analyzed using a software program to determine, for example, thread-to-thread interactions.04-29-2010

Brian D. Emberling, San Mateo, CA US

Patent application numberDescriptionPublished
20080313436Handling of extra contexts for shader constants - The present invention provides a system for handling extra contexts for shader constants, and applications thereof. In an embodiment there is provided a computer-based method for executing a series of compute packets in an execution pipeline. The execution pipeline includes a first plurality of registers configured to store state-updates of a first type and a second plurality of registers configured to store state-updates of a second type. A first number of state-updates of the first type and a second number of state-updates of the second type are respectively identified and stored in the first and second plurality of registers. A compute packet is sent to the execution pipeline responsive to the first number and the second number. Then, the compute packet is executed by the execution pipeline.12-18-2008