Patent application number | Description | Published |
20090167770 | BOOSTING GRAPHICS PERFORMANCE BASED ON EXECUTING WORKLOAD - A novel graphics system including workload detection software is disclosed. The novel graphics system increases the voltage and frequency of the graphics hardware in an integrated graphics chipset, depending on operations performed by the hardware, for either a performance advantage or a power savings advantage. | 07-02-2009 |
20100332852 | Creating Secure Communication Channels Between Processing Elements - Two processing elements in a single platform may communicate securely to allow the platform to take advantage of the certain cryptographic functionality in one processing element. A first processing element, such as a bridge, may use its cryptographic functionality to request a key exchange with a second processing element, such as a graphics engine. Each processing element may include a global key which is common to the two processing elements and a unique key which is unique to each processing element. A key exchange may be established during the boot process the first time the system boots and, failing any hardware change, the same key may be used throughout the lifetime of the two processing elements. Once a secure channel is set up, any application wishing to authenticate a processing element without public-private cryptographic function may perform the authentication with the other processing element which shares a secure channel with the first processing element. | 12-30-2010 |
20110037770 | Memory Address Re-mapping of Graphics Data - A method and apparatus for creating, updating, and using guest physical address (GPA) to host physical address (HPA) shadow translation tables for translating GPAs of graphics data direct memory access (DMA) requests of a computing environment implementing a virtual machine monitor to support virtual machines. The requests may be sent through a render or display path of the computing environment from one or more virtual machines, transparently with respect to the virtual machine monitor. The creating, updating, and using may be performed by a memory controller detecting entries sent to existing global and page directory tables, forking off shadow table entries from the detected entries, and translating GPAs to HPAs for the shadow table entries. | 02-17-2011 |
20120139927 | MEMORY ADDRESS RE-MAPPING OF GRAPHICS DATA - A method and apparatus for creating, updating, and using guest physical address (GPA) to host physical address (HPA) shadow translation tables for translating GPAs of graphics data direct memory access (DMA) requests of a computing environment implementing a virtual machine monitor to support virtual machines. The requests may be sent through a render or display path of the computing environment from one or more virtual machines, transparently with respect to the virtual machine monitor. The creating, updating, and using may be performed by a memory controller detecting entries sent to existing global and page directory tables, forking off shadow table entries from the detected entries, and translating GPAs to HPAs for the shadow table entries. | 06-07-2012 |
20130031333 | METHOD AND APPARATUS FOR TLB SHOOT-DOWN IN A HETEROGENEOUS COMPUTING SYSTEM SUPPORTING SHARED VIRTUAL MEMORY - Methods and apparatus are disclosed for efficient TLB (translation look-aside buffer) shoot-downs for heterogeneous devices sharing virtual memory in a multi-core system. Embodiments of an apparatus for efficient TLB shoot-downs may include a TLB to store virtual address translation entries, and a memory management unit, coupled with the TLB, to maintain PASID (process address space identifier) state entries corresponding to the virtual address translation entries. The PASID state entries may include an active reference state and a lazy-invalidation state. The memory management unit may perform atomic modification of PASID state entries responsive to receiving PASID state update requests from devices in the multi-core system and read the lazy-invalidation state of the PASID state entries. The memory management unit may send PASID state update responses to the devices to synchronize TLB entries prior to activation responsive to the respective lazy-invalidation state. | 01-31-2013 |
20130159820 | DYNAMIC ERROR HANDLING USING PARITY AND REDUNDANT ROWS - Embodiments of an invention for dynamic error correction using parity and redundant rows are disclosed. In one embodiment, an apparatus includes a storage structure, parity logic, an error storage space, and an error event generator. The storage structure is to store a plurality of data values. The parity logic is to detect a parity error in a data value stored in the storage structure. The error storage space is to store an indication of a detection of the parity error. The error event generator is to generate an event in response to the indication of the parity error being stored in the error storage space. | 06-20-2013 |
20140025908 | FAST MECHANISM FOR ACCESSING 2n.+-.1 INTERLEAVED MEMORY SYSTEM - A mechanism implemented by a controller enables efficient access to an interleaved memory system that includes M modules, M being (2 | 01-23-2014 |
20140026137 | PERFORMING SCHEDULING OPERATIONS FOR GRAPHICS HARDWARE - A computing device for performing scheduling operations for graphics hardware is described herein. The computing device includes a central processing unit (CPU) that is configured to execute an application. The computing device also includes a graphics scheduler configured to operate independently of the CPU. The graphics scheduler is configured to receive work queues relating to workloads from the application that are to execute on the CPU and perform scheduling operations for any of a number of graphics engines based on the work queues. | 01-23-2014 |
20140068626 | Direct Ring 3 Submission of Processing Jobs to Adjunct Processors - Transitions to ring 0, each time an application wants to use an adjunct processor, are avoided, saving central processor operating cycles and improving efficiency. Instead, initially each application is registered and setup to use adjunct processor resources in ring 3. | 03-06-2014 |
20140104287 | HARDWARE ASSIST FOR PRIVILEGE ACCESS VIOLATION CHECKS - Techniques are disclosed for processing rendering engine workload of a graphics system in a secure fashion, wherein at least some security processing of the workload is offloaded from software-based security parsing to hardware-based security parsing. In some embodiments, commands from a given application are received by a user mode driver (UMD), which is configured to generate a command buffer delineated into privileged and/or non-privileged command sections. The delineated command buffer can then be passed by the UMD to a kernel-mode driver (KMD), which is configured to parse and validate only privileged buffer sections, but to issue all other batch buffers with a privilege indicator set to non-privileged. A graphics processing unit (GPU) can receive the privilege-designated batch buffers from the KMD, and is configured to disallow execution of any privileged command from a non-privileged batch buffer, while any privileged commands from privileged batch buffers are unrestricted by the GPU | 04-17-2014 |
20140160138 | MEMORY BASED SEMAPHORES - Memory-based semaphore are described that are useful for synchronizing operations between different processing engines. In one example, operations include executing a context at a producer engine, the executing including updating a memory register, and sending a signal from the producer engine to a consumer engine that the memory register has been updated, the signal including a Context ID to identify a context to be executed by the consumer engine to update the register. | 06-12-2014 |
20140267323 | MEMORY MAPPING FOR A GRAPHICS PROCESSING UNIT - An electronic device is described herein. The electronic device may include a page walker module to receive a page request of a graphics processing unit (GPU). The page walker module may detect a page fault associated with the page request. The electronic device may include a controller, at least partially comprising hardware logic. The controller is to monitor execution of the page request having the page fault. The controller determines whether to suspend execution of a work item at the GPU associated with the page request having the page fault, or to continue execution of the work item based on factors associated with the page request. | 09-18-2014 |
20140306949 | SCALABLE GEOMETRY PROCESSING WITHIN A CHECKERBOARD MULTI-GPU CONFIGURATION - Systems, apparatus and methods are described including distributing batches of geometric objects to a multi-core system, at each processor core, performing vertex processing and geometry setup processing on the corresponding batch of geometric objects, storing the vertex processing results shared memory accessible to all of the cores, and storing the geometry setup processing results in local storage. Each particular core may then perform rasterization using geometry setup results obtained from local storage within the particular core and from local storage of at least one of the other processor cores. | 10-16-2014 |
20140375661 | PAGE MANAGEMENT APPROACH TO FULLY UTILIZE HARDWARE CACHES FOR TILED RENDERING - Systems and methods may provide for identifying a tile associated with an image and ordering an entirety of the tile into a linear stream of pages associated with a frame buffer. Additionally, the linear stream of pages may be allocated to a cache. In one example, the linear stream of pages is allocated to the cache in accordance with a fixed set selection policy of the cache. | 12-25-2014 |
20150103084 | SUPPORTING ATOMIC OPERATIONS AS POST-SYNCHRONIZATION OPERATIONS IN GRAPHICS PROCESSING ARCHITECTURES - Methods and systems may provide for storing a set of post-synchronization operations to a graphics memory and sending a flush marker to a graphics pipeline. Additionally, the set of post-synchronization operations may be processed in response to the flush marker exiting the graphics pipeline. In one example, the set of post-synchronization operations includes one or more atomic operations. Moreover, the set of post-synchronization operations may be obtained from an inline portion of an atomics command. | 04-16-2015 |
20150123980 | METHOD AND APPARATUS FOR SUPPORTING PROGRAMMABLE SOFTWARE CONTEXT STATE EXECUTION DURING HARDWARE CONTEXT RESTORE FLOW - A method and apparatus for supporting programmable software context state execution during hardware context restore flow is described. In one example, a context ID is assigned to graphics applications including a unique context memory buffer, a unique indirect context pointer and a corresponding size to each context ID, an indirect context offset, and an indirect context buffer address range. When execution of the first context workload is indirected, the state of the first context workload is saved to the assigned context memory buffer. The indirect context pointer, the indirect context offset and a size of the indirect context buffer address range are saved to registers that are independent of the saved context state. The context is restored by accessing the saved indirect context pointer, the indirect context offset and the buffer size. | 05-07-2015 |
20150269083 | DYNAMIC CACHE AND MEMORY ALLOCATION FOR MEMORY SUBSYSTEMS - Technologies are presented that allow a portion of a cache to be used as a front memory when there is dynamic need based on system demand. A computing system may include at least one processor, a memory controlled by a controller and communicatively coupled with the at least one processor, a cache communicatively coupled with the at least one processor and the memory, and mapping logic communicatively coupled with the at least one processor, the memory, and the cache. The mapping logic may map a portion of the cache to a portion of the memory, wherein the portion of the cache is to be used by the at least one processor as a local memory, and wherein the mapping is dynamic based on system demand and managed by the controller in a physical address domain. | 09-24-2015 |
20150277981 | PRIORITY BASED CONTEXT PREEMPTION - Methods and apparatuses may prioritize the processing of high priority and low priority contexts submitted to a processing unit through separate high priority and low priority context submission ports. According to one embodiment, submission of a context to the low priority port causes contexts in progress to be preempted, whereas submission of a context to the high priority port causes contexts in progress to be paused. | 10-01-2015 |
20150278984 | SYSTEM COHERENCY IN A DISTRIBUTED GRAPHICS PROCESSOR HIERARCHY - Methods and systems may provide for executing, by a physically distributed set of compute slices, a plurality of work items. Additionally, the coherency of one or more memory lines associated with the plurality of work items may be maintained, by a cache fabric, across a graphics processor, a system memory and one or more host processors. In one example, a plurality of crossbar nodes track the one or more memory lines, wherein the coherency of the one or more memory lines is maintained across a plurality of level one (L1) caches and a physically distributed cache structure. Each L1 cache may be dedicated to an execution block of a compute slice and each crossbar node may be dedicated to a compute slice. | 10-01-2015 |
20150287159 | PROCESS SYNCHRONIZATION BETWEEN ENGINES USING DATA IN A MEMORY LOCATION - Memory-based semaphores are described that are useful for synchronizing processes between different processing engines. In one example, operations include executing a first process at a first processing engine, the executing including updating a memory register, sending a signal from the first processing engine to a second processing engine that the memory register has been updated, the signal including a memory register address to identify the updated memory register inline data and a dataword, fetching data from the memory register by the second processing engine, comparing the fetched data to the received dataword, and conditionally executing a next command of a second process at the second processing engine based on the comparison. | 10-08-2015 |
20150348222 | METHOD AND APPARATUS FOR PARALLEL PIXEL SHADING - An apparatus and method for identifying sub-groups of execution resources for parallel pixel processing. For example, one embodiment of a method comprises: determining X and Y coordinates for a pixel block to be processed; performing a first set of one or more modulus operations using even bits from the X and Y coordinates to generate a first intermediate result; performing a second set of one or more modulus operations using odd bits from the X and Y coordinates to generate a second intermediate result; comparing the first intermediate result and the second intermediate result to generate a final result; and using the final result to select a first set of processing resources from a set of N processing resources for processing the pixel block. | 12-03-2015 |
20150379661 | EFFICIENT HARDWARE MECHANISM TO ENSURE SHARED RESOURCE DATA COHERENCY ACROSS DRAW CALLS - Systems and methods may provide for receiving a plurality of signals from a software module associated with a shared resource such as, for example, an unordered access view (UAV). The plurality of signals may include a first signal that indicates whether a draw call accesses the shared resource, a second signal that indicates whether a boundary of the draw call has been reached, and a third signal that indicates whether the draw call has a coherency requirement. Additionally, a workload corresponding to the draw call may be selectively dispatched in a shader invocation based on the plurality of signals. | 12-31-2015 |
20160026494 | MID-THREAD PRE-EMPTION WITH SOFTWARE ASSISTED CONTEXT SWITCH - Methods and apparatus relating to mid-thread pre-emption with software assisted context switch are described. In an embodiment, one or more threads executing on a Graphics Processing Unit (GPU) are stopped at an instruction level granularity in response to a request to pre-empt the one or more threads. The context data of the one or more threads is copied to memory in response to completion of the one or more threads at the instruction level granularity and/or one or more instructions. Other embodiments are also disclosed and claimed. | 01-28-2016 |