Patent application number | Description | Published |
20110055531 | Synchronizing Commands and Dependencies in an Asynchronous Command Queue - Provided are techniques for the managing of command queue dependencies and command queue synchronization. Incoming commands are actively tracked through their dependency relationships. Command dependencies may be tracked across multiple lists, including a submission list and a completion list. Each command on the submission list is prepared for processing and ultimately submitted to command processing logic. Command completion processing is performed on each command on the completion list, including by not limited to removing dependencies from pending commands and possibly queuing pending commands for submission to the command processing logic. Also provided as features of a command queue are a standby barrier, an active barrier and a marker. Standby and active barriers are employed to synchronize and track commands through the command queue. Markers are employed to track commands through the command queue. | 03-03-2011 |
20110055839 | Multi-Core/Thread Work-Group Computation Scheduler - Execution units process commands from one or more command queues. Once a command is available on the queue, each unit participating in the execution of the command atomically decrements the command's work groups remaining counter by the work group reservation size and processes a corresponding number of work groups within a work group range. Once all work groups within a range are processed, an execution unit increments a work group processed counter. The unit that increments the work group processed counter to the value stored in a work groups to be executed counter signals completion of the command. Each execution unit that access a command also marks a work group seen counter. Once the work groups processed counter equals the work groups to be executed counter and the work group seen counter equals the number of execution units, the command may be removed or overwritten on the command queue. | 03-03-2011 |
20110161608 | METHOD TO CUSTOMIZE FUNCTION BEHAVIOR BASED ON CACHE AND SCHEDULING PARAMETERS OF A MEMORY ARGUMENT - Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. In one or more embodiments, each of multiple memory objects can be populated with work items and can be associated with attributes that can include information which can be used to describe data of each memory object and/or which can be used to process data of each memory object. The attributes can be used to indicate one or more of a cache policy, a cache size, and a cache line size, among others. In one or more embodiments, the attributes can be used as a history of how each memory object is used. The attributes can be used to indicate cache history statistics (e.g., a hit rate, a miss rate, etc.). | 06-30-2011 |
20110161734 | PROCESS INTEGRITY IN A MULTIPLE PROCESSOR SYSTEM - Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. In one or more embodiments, an error can be determined while two or more processor cores are processing a first group of two or more work items, and the error can be signaled to an application. The application can determine a state of progress of processing the two or more work items and at least one dependency from the state of progress. In one or more embodiments, a second group of two or more work items that are scheduled for processing can be unscheduled, in response to determining the error. In one or more embodiments, the application can process at least one work item that caused the error, and the second group of two or more work items can be rescheduled for processing. | 06-30-2011 |
20110161970 | METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN COMPUTE NODES - Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. The multiple processor cores can be coupled to a memory that can include multiple priority queues associated with multiple respective priorities and store multiple work items. Work items stored in the multiple priority queues can be associated with a bit mask which is associated with a respective priority queue and can be routed to respective groups of one or more processors based on the associated bit mask. In one or more embodiments, at least two groups of processor cores can include at least one processor core that is common to both of the at least two groups of processor cores. | 06-30-2011 |
20110161975 | REDUCING CROSS QUEUE SYNCHRONIZATION ON SYSTEMS WITH LOW MEMORY LATENCY ACROSS DISTRIBUTED PROCESSING NODES - A method for efficient dispatch/completion of a work element within a multi-node data processing system. The method comprises: selecting specific processing units from among the processing nodes to complete execution of a work element that has multiple individual work items that may be independently executed by different ones of the processing units; generating an allocated processor unit (APU) bit mask that identifies at least one of the processing units that has been selected; placing the work element in a first entry of a global command queue (GCQ); associating the APU mask with the work element in the GCQ; and responsive to receipt at the GCQ of work requests from each of the multiple processing nodes or the processing units, enabling only the selected specific ones of the processing nodes or the processing units to be able to retrieve work from the work element in the GCQ. | 06-30-2011 |
20110161976 | METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN PROCESSING NODES - A method efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The method comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units. | 06-30-2011 |
20120221836 | Synchronizing Commands and Dependencies in an Asynchronous Command Queue - Provided are techniques for the managing of command queue dependencies and command queue synchronization. Incoming commands are actively tracked through their dependency relationships. Command dependencies may be tracked across multiple lists, including a submission list and a completion list. Each command on the submission list is prepared for processing and ultimately submitted to command processing logic. Command completion processing is performed on each command on the completion list, including by not limited to removing dependencies from pending commands and possibly queuing pending commands for submission to the command processing logic. Also provided as features of a command queue are a standby barrier, an active barrier and a marker. Standby and active barriers are employed to synchronize and track commands through the command queue. Markers are employed to track commands through the command queue. | 08-30-2012 |
20130014124 | REDUCING CROSS QUEUE SYNCHRONIZATION ON SYSTEMS WITH LOW MEMORY LATENCY ACROSS DISTRIBUTED PROCESSING NODES - A method for efficient dispatch/completion of a work element within a multi-node data processing system. The method comprises: selecting specific processing units from among the processing nodes to complete execution of a work element that has multiple individual work items that may be independently executed by different ones of the processing units; generating an allocated processor unit (APU) bit mask that identifies at least one of the processing units that has been selected; placing the work element in a first entry of a global command queue (GCQ); associating the APU mask with the work element in the GCQ; and responsive to receipt at the GCQ of work requests from each of the multiple processing nodes or the processing units, enabling only the selected specific ones of the processing nodes or the processing units to be able to retrieve work from the work element in the GCQ. | 01-10-2013 |
20130254776 | METHOD TO REDUCE QUEUE SYNCHRONIZATION OF MULTIPLE WORK ITEMS IN A SYSTEM WITH HIGH MEMORY LATENCY BETWEEN PROCESSING NODES - A method efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The method comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units. | 09-26-2013 |
Patent application number | Description | Published |
20090150575 | Dynamic logical data channel assignment using time-grouped allocations - A method, system and program are provided for dynamically allocating DMA channel identifiers to multiple DMA transfer requests that are grouped in time by virtualizing DMA transfer requests into an available DMA channel identifier using a channel bitmap listing of available DMA channels to select and set an allocated DMA channel identifier. Once the input values associated with the DMA transfer requests are mapped to the selected DMA channel identifier, the DMA transfers are performed using the selected DMA channel identifier, which is then deallocated in the channel bitmap upon completion of the DMA transfers. When there is a request to wait for completion of the data transfers, the same input values are used with the mapping to wait on the appropriate logical channel. With this method, all available logical channels can be utilized with reduced instances of false-sharing. | 06-11-2009 |
20090150576 | Dynamic logical data channel assignment using channel bitmap - A method, system and program are provided for dynamically allocating DMA channel identifiers by virtualizing DMA transfer requests into available DMA channel identifiers using a channel bitmap listing of available DMA channels to select and set an allocated DMA channel identifier. Once an input value associated with the DMA transfer request is mapped to the selected DMA channel identifier, the DMA transfer is performed using the selected DMA channel identifier, which is then deallocated in the channel bitmap upon completion of the DMA transfer. When there is a request to wait for completion of the data transfer, the same input value is used with the mapping to wait on the appropriate logical channel. With this method, all available logical channels can be utilized with reduced instances of false-sharing. | 06-11-2009 |
20100033493 | System and Method for Iterative Interactive Ray Tracing in a Multiprocessor Environment - A method comprises receiving scene model data including a scene geometry model and a plurality of pixel data describing objects arranged in a scene. The method generates a primary ray based on a selected first pixel data. In the event the primary ray intersects an object in the scene, the method determines primary hit color data and generates a plurality of secondary rays. The method groups the secondary packets and arranges the packets in a queue based on the octant of each direction vector in the secondary ray packet. The method generates secondary color data based on the secondary ray packets in the queue and generates a pixel color based on the primary hit color data, and the secondary color data. The method generates an image based on the pixel color for the pixel data. | 02-11-2010 |
20100141652 | System and Method for Photorealistic Imaging Using Ambient Occlusion - Scene model data, including a scene geometry model and a plurality of pixel data describing objects arranged in a scene, is received. A first pixel data of the plurality of pixel data is selected. A primary pixel color and a primary ray are generated based on the first pixel data. If the primary ray intersects an object in the scene, an intersection point, P is determined. A surface normal, N, is determined based on the object intersected and the intersection point, P. A primary hit color is determined based on the intersection point, P. The primary pixel color is modified based on the primary hit color. A plurality of ambient occlusion (AO) rays are generated based on the intersection point, P and the surface normal, N, with each AO ray having a direction, D. For each AO ray, the AO ray direction is reversed, D, the AO ray origin, O, is set to a point outside the scene. Each AO ray is marched from the AO ray origin into the scene to the intersection point, P. If an AO ray intersects an object before reaching point P, that AO ray is excluded from ambient occlusion calculations. If an AO ray does not intersect an object before reaching point P, that ray is included in ambient occlusion calculations. Ambient occlusion is estimated based on included AO rays. The primary pixel color is shaded based on the ambient occlusion and the primary hit color and an image is generated based on the primary pixel color for the pixel data. | 06-10-2010 |
20100141665 | System and Method for Photorealistic Imaging Workload Distribution - A graphics client receives a frame, the frame comprising scene model data. A server load balancing factor is set based on the scene model data. A prospective rendering factor is set based on the scene model data. The frame is partitioned into a plurality of server bands based on the server load balancing factor and the prospective rendering factor. The server bands are distributed to a plurality of compute servers. Processed server bands are received from the compute servers. A processed frame is assembled based on the received processed server bands. The processed frame is transmitted for display to a user as an image. | 06-10-2010 |
20100166322 | Security Screening Image Analysis Simplification Through Object Pattern Identification - A mechanism is provided for security screening image analysis simplification through object pattern identification. Popular consumer electronics and other items are scanned in a control system, which creates an electronic signature for each known object. The system may reduce the signature to a hash value and place each signature for each known object in a “known good” storage set. For example, popular mobile phones, laptop computers, digital cameras, and the like may be scanned for the known good signature database. At the time of scan, such as at an airport, objects in a bag may be rotated to a common axis alignment and transformed to the same signature or hash value to match against the known good signature database. If an item matches, the scanning system marks it as a known safe object. | 07-01-2010 |
20110161618 | ASSIGNING EFFICIENTLY REFERENCED GLOBALLY UNIQUE IDENTIFIERS IN A MULTI-CORE ENVIRONMENT - A mechanism is provided in a multi-core environment for assigning a globally unique core identifier. A Power PC® processor unit (PPU) determines an index alias corresponding to a natural index to a location in local storage (LS) memory. A synergistic processor unit (SPU) corresponding to the PPU translates the natural index to a first address in a core's memory, as well as translates the index alias to a second address in the core's memory. Responsive to the second address exceeding a physical memory size, the load store unit of the SPU truncates the second address to a usable range of address space in systems that do not map an address space. The second address and the first address point to the same physical location in the core's memory. In addition, the aliasing using index aliases also preserves the ability to combine persistent indices with relative indices without creating holes in a relative index map. | 06-30-2011 |
20110161935 | METHOD FOR MANAGING HARDWARE RESOURCES WITHIN A SIMULTANEOUS MULTI-THREADED PROCESSING SYSTEM - A method for managing hardware resources and threads within a data processing system is disclosed. Compilation attributes of a function are collected during and after the compilation of the function. The pre-processing attributes of the function are also collected before the execution of the function. The collected attributes of the function are then analyzed, and a runtime configuration is assigned to the function based of the result of the attribute analysis. The runtime configuration may include, for example, the designation of the function to be executed under either a single-threaded mode or a simultaneous multi-threaded mode. During the execution of the function, real-time attributes of the function are being continuously collected. If necessary, the runtime configuration under which the function is being executed can be changed based on the real-time attributes collected during the execution of the function. | 06-30-2011 |
20110161943 | METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM - A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item. | 06-30-2011 |
20120198469 | Method for Managing Hardware Resources Within a Simultaneous Multi-Threaded Processing System - A method for managing hardware resources and threads within a data processing system is disclosed. Compilation attributes of a function are collected during and after the compilation of the function. The pre-processing attributes of the function are also collected before the execution of the function. The collected attributes of the function are then analyzed, and a runtime configuration is assigned to the function based of the result of the attribute analysis. The runtime configuration may include, for example, the designation of the function to be executed under either a single-threaded mode or a simultaneous multi-threaded mode. During the execution of the function, real-time attributes of the function are being continuously collected. If necessary, the runtime configuration under which the function is being executed can be changed based on the real-time attributes collected during the execution of the function. | 08-02-2012 |
20120213430 | System and Method for Iterative Interactive Ray Tracing in a Multiprocessor Environment - A method comprises receiving scene model data including a scene geometry model and a plurality of pixel data describing objects arranged in a scene. The method generates a primary ray based on a selected first pixel data. In the event the primary ray intersects an object in the scene, the method determines primary hit color data and generates a plurality of secondary rays. The method groups the secondary packets and arranges the packets in a queue based on the octant of each direction vector in the secondary ray packet. The method generates secondary color data based on the secondary ray packets in the queue and generates a pixel color based on the primary hit color data, and the secondary color data. The method generates an image based on the pixel color for the pixel data. | 08-23-2012 |
20130003135 | Security Screening Image Analysis Simplification Through Object Pattern Identification - A mechanism is provided for security screening image analysis simplification through object pattern identification. Popular consumer electronics and other items are scanned in a control system, which creates an electronic signature for each known object. The system may reduce the signature to a hash value and place each signature for each known object in a “known good” storage set. For example, popular mobile phones, laptop computers, digital cameras, and the like may be scanned for the known good signature database. At the time of scan, such as at an airport, objects in a bag may be rotated to a common axis alignment and transformed to the same signature or hash value to match against the known good signature database. If an item matches, the scanning system marks it as a known safe object. | 01-03-2013 |
20130013897 | METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM - A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item. | 01-10-2013 |