Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


PROCESSING CONTROL

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712233000 Branching (e.g., delayed branch, loop control, branch predict, interrupt) 335
712221000 Arithmetic operation instruction processing 187
712225000 Processing control for data transfer 136
712227000 Specialized instruction processing in support of testing, debugging, emulation 113
712226000 Instruction modification based on condition 90
712228000 Context preserving (e.g., context swapping, checkpointing, register windowing 77
712229000 Mode switch or change 54
712223000 Logic operation instruction processing 39
712245000 Processing sequence control (i.e., microsequencing) 28
712231000 Detecting end or completion of microprogram 3
20090100255Guaranteed core access in a multiple core processing system - Exclusive access to a core or part of a core, or to multiple cores, but in any case less than all of the cores, of a multiple core processing system. The access can be requested by an instruction, or by a routine. Once granted, the access provides exclusive access to the core so that a program can be run which requires substantially uninterrupted access to the core.04-16-2009
20110066832Configurable Processor Module Accelerator Using A Programmable Logic Device - A configurable processor module accelerator using a programmable logic device is described. According to one embodiment, the accelerator module includes a circuit board having coupled thereto a first programmable logic device, a controller, and a first memory. The first programmable logic device has access to a bitstream which is stored in the first memory. Access to the bitstream by the first programmable logic device is controlled by the controller. The bitstream is capable of being instantiated in the first programmable logic device using programmable logic thereof to provide at least a transport interface for communication between the first programmable logic device and one or more other devices associated with the motherboard using the microprocessor interface.03-17-2011
20090037707Determining When a Set of Compute Nodes Participating in a Barrier Operation on a Parallel Computer are Ready to Exit the Barrier Operation - Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.02-05-2009
712230000 Generating next microinstruction address 2
20100325402DATA PROCESSING DEVICE AND METHOD FOR EXECUTING OBFUSCATED PROGRAMS - A program is obfuscated by reordering its instructions. Original instruction addresses are mapped to target addresses in an irregular way, with position dependent address steps between the addresses of logically successive instructions. Preferably pseudo-random address steps are used, for example with address steps that have mutually opposite sign with equal frequency. The data processing device has an instruction flow control unit that updates instruction addresses according the position dependent address steps. The instruction flow control unit may comprise a circuit that contains secret information, which is not normally accessible from the outside, to control the updates. A lookup table may be used for example, with address steps, successor addresses or mapped address values. In an embodiment the mapping of original instruction addresses to target addresses may be visualized by means of a path (12-23-2010
20120124344LOOP PREDICTOR AND METHOD FOR INSTRUCTION FETCHING USING A LOOP PREDICTOR - A loop predictor and a method for instruction fetching using a loop predictor. A processor may include a loop predictor in addition to a primary branch predictor. A relatively common scenario in program execution is that a set of branches repeat over and over forming a loop. The loop may be detected based on a repeated pattern of access to a data structure used for branch prediction. Once a loop is detected and it may be determined whether the codes would stay in the loop for at least a duration sufficient to disable the branch prediction. On a determination that the detected loop is locked, a sequence of instruction addresses in one iteration of the detected loop may be captured in a buffer and the branch predictor may be turned off and a sequence of fetch instructions may be played from the buffer.05-17-2012
Entries
DocumentTitleDate
20120204011ASYNCHRONOUS ASSIST THREAD INITIATION - A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.08-09-2012
20090193234SEQUENCER CONTROLLED SYSTEM AND METHOD FOR CONTROLLING TIMING OF OPERATIONS OF FUNCTIONAL UNITS - The invention proposes a simple method for controlling distributed functional units (FU) in a system. It offloads the main system processor from intermediate status monitoring. The sequencer controlled system comprises a plurality of functional units, a processor operatively coupled to the plurality of functional units through a bus, a sequencer having a set of registers, and an interrupt source register configured for interrupt polling. The registers are configured to control the timing of at least one operation of the functional units with stored instructions for each of the functional units. The processor sets up at least some of the registers through the bus for the initial configuration and the sequencer is activated by the processor.07-30-2009
20130086365Exploiting an Architected List-Use Operand Indication in a Computer System Operand Resource Pool - A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an architected default value.04-04-2013
20130086364Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand Last-User Information - A multi-level register hierarchy is disclosed comprising a first level pool of registers for caching registers of a second level pool of registers in a system wherein programs can dynamically release and re-enable architected registers such that released architected registers need not be maintained by the processor, the processor accessing operands from the first level pool of registers, wherein a last-use instruction is identified as having a last use of an architected register before being released, the last-use architected register being released causes the multi-level register hierarchy to discard any correspondence of an entry to said last use architected register.04-04-2013
20100077185MANAGING THREAD AFFINITY ON MULTI-CORE PROCESSORS - Embodiments of the invention intelligently associate processes with core processors in a multi-core processor. The core processors are asymmetrical in that the core processors support different features or provide different resources. The features or resources are published by the core processors or otherwise identified (e.g., via a query). Responsive to a request to execute an instruction associated with a thread, one of the core processors is selected based on the resource or feature supporting execution of the instruction. The thread is assigned to the selected core processor such that the selected core processor executes the instruction and subsequent instructions from the assigned thread. In some embodiments, the resource or feature is emulated until an activity limit is reached upon which the thread assignment occurs.03-25-2010
20100077186PROCESSING APPARATUS, PROCESSING SYSTEM, AND COMPUTER READABLE MEDIUM - A processing apparatus includes: an execution unit; an execution request accept unit; a process instruction unit; an information leakage preventing process execution unit; a recording unit; and a transmission unit.03-25-2010
20100115241KERNEL FUNCTION GENERATING METHOD AND DEVICE AND DATA CLASSIFICATION DEVICE - Kernel functions, the number of which is set in advance, are linearly coupled to generate the most suitable Kernel function for a data classification. An element Kernel generating unit 05-06-2010
20130086363Computer Instructions for Activating and Deactivating Operands - An instruction set architecture (ISA) includes instructions for selectively indicating last-use architected operands having values that will not be accessed again, wherein architected operands are made active or inactive after an instruction specified last-use by an instruction, wherein the architected operands are made active by performing a write operation to an inactive operand, wherein the activation/deactivation may be performed by the instruction having the last-use of the operand or another (prefix) instruction.04-04-2013
20130080746Providing A Dedicated Communication Path Separate From A Second Path To Enable Communication Between Complaint Sequencers Of A Processor Using An Assertion Signal - In one embodiment, the present invention includes a method for communicating an assertion signal from a first instruction sequencer to a plurality of accelerators coupled to the first instruction sequencer, detecting the assertion signal in the accelerators and communicating a request for a lock, and registering an accelerator that achieves the lock by communication of a registration message for the accelerator to the first instruction sequencer. Other embodiments are described and claimed.03-28-2013
20130080745FINE-GRAINED INSTRUCTION ENABLEMENT AT SUB-FUNCTION GRANULARITY - Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).03-28-2013
20130080744ABSTRACTING COMPUTATIONAL INSTRUCTIONS TO IMPROVE PERFORMANCE - Methods and systems for executing a code stream of non-native binary code on a computing system are disclosed. One method includes parsing the code stream to detect a plurality of elements including one or more branch destinations, and traversing the code stream to detect a plurality of non-native operators. The method also includes executing a pattern matching algorithm against the plurality of non-native operators to find combinations of two or more non-native operators that do not span across a detected branch destination and that correspond to one or more target operators executable by the computing system. The method further includes generating a second code stream executable on the computing system including the one or more target operators.03-28-2013
20130036295GPU ASSISTED GARBAGE COLLECTION - A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory.02-07-2013
20120265969ALLOCATION OF COUNTERS FROM A POOL OF COUNTERS TO TRACK MAPPINGS OF LOGICAL REGISTERS TO PHYSICAL REGISTERS FOR MAPPER BASED INSTRUCTION EXECUTIONS - A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.10-18-2012
20130042091BIT Splitting Instruction - An instruction specifies a source value and an offset value. Upon execution of the instruction, a first result of the instruction and a second result of the instruction are generated. The first result is a first portion of the source value and the second result is a second portion of the source value.02-14-2013
20090049281MULTIMEDIA DECODING METHOD AND MULTIMEDIA DECODING APPARATUS BASED ON MULTI-CORE PROCESSOR - Provided are a multimedia decoding method and multimedia decoding apparatus based on a multi-core platform including a central processor and a plurality of operation processors. The multimedia decoding method includes performing a queue generation operation on input multimedia data to generate queues of one or more operations of the multimedia data which are to be performed by the central processor and the operation processors, wherein the queue generation operation is performed by the central processor; performing motion compensation operations on partitioned data regions of the multimedia data by one or more motion compensation processors among the operation processors; and performing a deblocking operation on the multimedia data by a deblocking processor among the operation processors.02-19-2009
20090158011DATA PROCESSING SYSTEM - A data processing system comprising a computer chip having a processing circuit and a chip-internal first memory and a chip-external second memory being coupled to the computer chip, wherein the processing circuit is configured to allow execution of computer programs stored in the first memory and to prevent execution of computer programs stored in the second memory when the data processing system is in a first state, and to allow execution of computer programs stored in the second memory when the data processing system is in a second state.06-18-2009
20100106946METHOD FOR PROCESSING STREAM DATA AND SYSTEM THEREOF - The present invention provides a method for processing stream data and a system thereof capable of implementing general data processing including recursive processing with low latency. In the system for processing stream data, a single operator graph is prepared from operator trees of a plurality of queries, an execution order of the operators is determined so that execution of a streaming operator is progressed in one way from an input to an output, the ignition time of an external ignition operator that inputs data from the outside of the system and an internal ignition operator that time-limitedly generates data is monitored, and an operator execution control unit repeats processing that completes the processing in the operator graph at the time according to the determined operator execution order, assuming the operator of the earliest ignition time as a start point.04-29-2010
20090125705DATA PROCESSING METHOD AND APPARATUS WITH RESPECT TO SCALABILITY OF PARALLEL COMPUTER SYSTEMS - A data processing method for scalability of a parallel computer system includes: obtaining a processing time τ(p) that is the longest processing time in a case where a parallel processing is carried out by p processors and a processing time γ05-14-2009
20120166771AGILE COMMUNICATION OPERATOR - A high level programming language provides an agile communication operator that generates a segmented computational space based on a resource map for distributing the computational space across compute nodes. The agile communication operator decomposes the computational space into segments, causes the segments to be assigned to compute nodes, and allows the user to centrally manage and automate movement of the segments between the compute nodes. The segment movement may be managed using either a full global-view representation or a local-global-view representation of the segments.06-28-2012
20090043995Handling Data Cache Misses Out-of-Order for Asynchronous Pipelines - An apparatus and method for handling data cache misses out-of-order for asynchronous pipelines are provided. The apparatus and method associates load tag (LTAG) identifiers with the load instructions and uses them to track the load instruction across multiple pipelines as an index into a load table data structure of a load target buffer. The load table is used to manage cache “hits” and “misses” and to aid in the recycling of data from the L2 cache. With cache misses, the LTAG indexed load table permits load data to recycle from the L2 cache in any order. When the load instruction issues and sees its corresponding entry in the load table marked as a “miss,” the effects of issuance of the load instruction are canceled and the load instruction is stored in the load table for future reissuing to the instruction pipeline when the required data is recycled.02-12-2009
20090307466Resource Sharing Techniques in a Parallel Processing Computing System - A method, apparatus, and program product share a resource in a computing system that includes a plurality of computing cores. A request from a second execution context (“EC”) to lock the resource currently locked by a first EC on a first core causes replication of the second EC as a third EC on a third core. The first and third ECs are executed substantially concurrently. When the first EC modifies the resource, the third EC is restarted after the resource has been modified. Alternately, a first EC is configured in a first core and shadowed as a second EC in a second core. In response to a blocked lock request, the first EC is halted and the second EC continues. After granting a lock, it is determined whether a conflict has occurred and the first and second EC are particularly synchronized to each other in response to that determination.12-10-2009
20090307465Computational expansion system - The invention relates to a system for expanding capacity for executing processes that are executed in a central processing unit (12-10-2009
20090106535SHARED PROCESSOR ARCHITECTURE APPLIED TO FUNCTIONAL STAGES CONFIGURED IN A RECEIVER SYSTEM FOR PROCESSING SIGNALS FROM DIFFERENT TRANSMITTER SYSTEMS AND METHOD THEREOF - According to an embodiment of the present invention, a shared processor architecture in a receiver system is disclosed. The receiver system is configured to have a first functional stage and a second functional stage for processing information carried by signals from a first transmitter system and a second transmitter system respectively. The first functional stage and the second functional stage correspond to an identical signal processing function. The shared processor architecture includes a first processor, allocated to the first functional stage and the second functional stage, for processing an output generated from the first functional stage or an output from the second functional stage.04-23-2009
20120191954PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA INSTRUCTION PRE-COMPLETION - Methods and apparatuses are provided for achieving increased performance and energy saving via instruction pre-completion without having to schedule instruction execution in processor execution units. The apparatus comprises an operational unit for determining whether an instruction can be completed without scheduling use of an execution unit of the processor and units within the operational unit capable of employing alternate or equivalent processes or techniques to complete the instruction. In this way, the instruction is completed without scheduling use of the execution unit of the processor. The method comprises determining that an instruction can be completed without scheduling use of an execution unit of a processor and then pre-completing the instruction without use of one or more the execution units.07-26-2012
20130061026CONFIGURABLE MASS DATA PORTIONING FOR PARALLEL PROCESSING - A configurable mass data portioning for parallel processing is described herein. One or more operation attributes are selected to participate in parallelization criteria. The values of the selected operation attributes for a number of operations are submitted to a specified algorithm using to provide parallelization values corresponding to the operations. The parallelization values are applied to group the operations in comparable portions for parallel execution without conflicts.03-07-2013
20120226893Hardware controller to choose selected hardware entity and to execute instructions in relation to selected hardware entity - A hardware controller includes a first hardware interface, a second hardware interface, first hardware logic, and second hardware logic. The first hardware interface is to couple the hardware controller to hardware entities of a hardware device in which the hardware controller is to be included. The second hardware interface is to couple the hardware controller to a memory to receive instructions. The first hardware logic is to choose a selected hardware entity from the hardware entities. The second hardware logic is to execute the instructions in relation to the selected hardware entity.09-06-2012
20090031117SAME INSTRUCTION DIFFERENT OPERATION (SIDO) COMPUTER WITH SHORT INSTRUCTION AND PROVISION OF SENDING INSTRUCTION CODE THROUGH DATA - A same instruction different operation (SIDO) processor is disclosed in which the instruction control word is supplied using data bus as one operand and the data to be operated is supplied through another operand. Also disclosed is a method for the provision of operation-code along with data/operands using a short instruction word. With all the execution units working in parallel on multiple data operands, a variety of operations can be performed in parallel. This allows short instruction format and flexibility to dynamically program the processor on the fly by changing data/operand words, and supports basic integer operations using very simple and efficient hardware execution units.01-29-2009
20130067202CONDITIONAL NON-BRANCH INSTRUCTION PREDICTION - A microprocessor processes conditional non-branch instructions that specify a condition and instruct the microprocessor to perform an operation if the condition is satisfied and otherwise to not perform the operation. A predictor provides a prediction about a conditional non-branch instruction. An instruction translator translates the conditional non-branch instruction into a no-operation microinstruction when the prediction predicts the condition will not be satisfied, and into a set of one or more microinstructions to unconditionally perform the operation when the prediction predicts the condition will be satisfied. An execution pipeline executes the no-operation microinstruction or the set of microinstructions. The predictor translates into a second set of one or more microinstructions to conditionally perform the operation when the prediction does not make a prediction. In the case of a misprediction, the translator re-translates the conditional non-branch instruction into the second set of microinstructions.03-14-2013
20090265528PROGRAMMABLE STREAMING PROCESSOR WITH MIXED PRECISION INSTRUCTION EXECUTION - The disclosure relates to a programmable streaming processor that is capable of executing mixed-precision (e.g., full-precision, half-precision) instructions using different execution units. The various execution units are each capable of using graphics data to execute instructions at a particular precision level. An exemplary programmable shader processor includes a controller and multiple execution units. The controller is configured to receive an instruction for execution and to receive an indication of a data precision for execution of the instruction. The controller is also configured to receive a separate conversion instruction that, when executed, converts graphics data associated with the instruction to the indicated data precision. When operable, the controller selects one of the execution units based on the indicated data precision. The controller then causes the selected execution unit to execute the instruction with the indicated data precision using the graphics data associated with the instruction.10-22-2009
20090235056RECORDING MEDIUM STORING PERFORMANCE MONITORING PROGRAM, PERFORMANCE MONITORING METHOD, AND PERFORMANCE MONITORING DEVICE - A performance monitoring device has an interrupt detection unit that detects generation of an interrupt process to be executed by a processor in accordance with TLB entry invalidation executed in an operating system. A counter value acquisition unit acquires a counter value of a predetermined event counted by the processor when the interrupt process is detected by the interrupt detection unit. A process information acquisition unit acquires identification information for identifying a process executed on the processor from the operating system immediately before the interrupt process is detected by the interrupt detection unit. An associating unit associates the counter value acquired by the counter value acquisition unit during the interrupt process with the identification information acquired by the process information acquisition unit immediately before the interrupt process.09-17-2009
20090013155System and Method for Retiring Approximately Simultaneously a Group of Instructions in a Superscalar Microprocessor - An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results. In addition, the retirement control block further controls the retiring of a group of instructions determined to be retirable, by simultaneously transferring their results from the temporary buffer to the register array, and retires instructions executed in order by storing their results directly in the register array. The method comprises the steps of monitoring the status of the instructions to determine which group of instructions have been executed, determining whether each executed instruction is retirable, storing results of instructions executed out of program order in a temporary buffer, storing retirable-instruction results in a register array and retiring a group of retirable instructions by simultaneously transferring their results from the temporary buffer to the register array, and retiring instructions executed in order by storing their results directly in the register array.01-08-2009
20130166887DATA PROCESSING APPARATUS AND DATA PROCESSING METHOD - According to one embodiment, a data processing apparatus includes a processor and a memory. The processor includes core blocks. The memory stores a command queue and task management structure data. The command queue stores a series of kernel functions. The task management structure data defines an order of execution of kernel functions by associating a return value of a previous kernel function with an argument of a subsequent kernel function. Core blocks of the processor are capable of executing different kernel functions.06-27-2013
20130166888PREDICTIVE OPERATOR GRAPH ELEMENT PROCESSING - Techniques are described for predictively starting a processing element. Embodiments receive streaming data to be processed by a plurality of processing elements. An operator graph of the plurality of processing elements that defines at least one execution path is established. Embodiments determine a historical startup time for a first processing element in the operator graph, where, once started, the first processing element begins normal operations once the first processing element has received a requisite amount of data from one or more upstream processing elements. Additionally, embodiments determine an amount of time the first processing element takes to receive the requisite amount of data from the one or more upstream processing elements. The first processing element is then predictively started at a first startup time based on the determined historical startup time and the determined amount of time historically taken to receive the requisite amount of data.06-27-2013
20080320284VIRTUAL SERIAL-STREAM PROCESSOR - A virtual serial-stream processor or system consists of one or more data input ports, zero or more data output ports, zero or more virtual control ports, one or more virtual serial and stream processing cores, one or more virtual serial control processors, and memory. Virtual components are spread across multiple physical devices, multiple virtual processing cores implemented in one physical device, or some combination, as dictated by an application-specific design.12-25-2008
20120239909SYSTEMS AND METHODS FOR VOTING AMONG PARALLEL THREADS - One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.09-20-2012
20110283091PARALLELIZING SEQUENTIAL FRAMEWORKS USING TRANSACTIONS - Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.11-17-2011
20120110308METHOD FOR CONTROLLING BMC HAVING CUSTOMIZED SDR - A Baseboard Management Controller (BMC) controlling method includes the steps of dividing a memory of a BMC into an original region and customized region, in which the original region includes at least one original sensor data record (SDR) and original platform event filter (PEF) corresponding to each other; providing an instruction set to at least one external system, in which the external system manages at least one customized SDR and customized PEF corresponding to each other in the customized region through the instruction set; polling the original SDR in the original region and the customized SDR in the customized region; determining whether values of the SDRs obtained through polling conform to a plurality of critical values individually corresponding to the SDRs; and obtaining a processing policy according to the corresponding PEF when at least one value of the SDR does not conform to the corresponding critical value.05-03-2012
20090300332NON-DESTRUCTIVE SIDEBAND READING OF PROCESSOR STATE INFORMATION - A processor receives a command via a sideband interface on the processor to read processor state information, e.g., CPUID information. The sideband interface provides the command information to a microcode engine in the processor that executes the command to retrieve the designated processor state information at an appropriate instruction boundary and retrieves the processor state information. That processor information is stored in local buffers in the sideband interface to avoid modifying processor state. After the microcode engine completes retrieval of the information and the sideband interface command is complete, execution returns to the normal flow in the processor. Thus, the processor state information may be obtained non-destructively during processor runtime.12-03-2009
20100005277Communicating Between Multiple Threads In A Processor - In one embodiment, the present invention includes a method for accessing registers associated with a first thread while executing a second thread. In one such embodiment a method may include preventing an instruction of a first thread that is to access a source operand from a register file of a second thread from executing if a synchronization indicator associated with the source operand indicates incompletion of a producer operation of the second thread, and executing the instruction if the synchronization indicator indicates completion of the producer operation of the second thread. Other embodiments are described and claimed.01-07-2010
20090158010Command Protocol for Integrated Circuits - A method of operating an integrated circuit involves supplying an instruction portion of a command to the integrated circuit to specify an operation to be performed by the integrated circuit. At least some types of commands also include an attributes portion that provides additional information about the operation to be performed. The attributes portion of the command is supplied to the integrated circuit with a delay relative to the instruction portion of the command. The integrated circuit selectively enables circuitry for processing the attributes portion if the integrated circuit determines from the received instruction portion that the command also includes an attributes portion. The delay between the two portions of the command provides sufficient time for the integrated circuit to enable the attributes processing circuitry, which, in a default state, can be disabled during an active mode of the integrated circuit to save power.06-18-2009
20100011193Selective Hardware Lock Disabling - Controlling a reorder buffer (ROB) to selectively perform functional hardware lock disabling (HLD) is described. One apparatus embodiment includes a unit to enable an ROB to selectively disable a lock upon Identifying a lock acquire operation (LAO) associated with a critical section (CS) entry point, a unit to selectively retire the LAO, a unit to cause the ROB to selectively disable the lock, and a unit to snoop a buffer. The apparatus may, based on the snooping, selectively abort a transaction associated with the CS.01-14-2010
20100115244MULTITHREADING MICROPROCESSOR WITH OPTIMIZED THREAD SCHEDULER FOR INCREASING PIPELINE UTILIZATION EFFICIENCY - A multithreading processor for concurrently executing multiple threads is provided. The processor includes an execution pipeline and a thread scheduler that dispatches instructions of the threads to the execution pipeline. The execution pipeline execution pipeline is configured for generating a thread context (TC) flush indicator associated with a thread context when one or more instructions of the thread context would stall in the execution pipeline. One or more instructions in the pipeline of the thread context associated with the thread context flush signal can be flushed or nullified.05-06-2010
20100169618IDENTIFYING CONCURRENCY CONTROL FROM A SEQUENTIAL PROOF - The claimed subject matter provides a system and/or a method that facilitates ensuring non-interference between multiple threads that access a shared resource. An interface can receive a portion of sequential code, wherein the portion of sequential code includes a property that is maintained and relied upon when invoked and executed by a sequential client. A synthesizer component can leverage a sequential proof related to the portion of sequential code in order to derive a concurrency control mechanism for a portion of concurrency code that maintains the property when invoked by a concurrent client, wherein the sequential proof identifies a concurrent interference at an execution point that is tolerable for the concurrent client.07-01-2010
20100115242ENGINE/PROCESSOR COOPERATION SYSTEM AND COOPERATION METHOD - To provide an engine software cooperation mechanism which avoids stopping the operation of a high-speed engine during timer monitoring processing. This system checks occurrence of a timeout event by directly accessing the content of a session data memory without regard to the locking state of a session. If detecting the state of timeout, the system requests execution of timeout processing via a timer transmission circuit. By timeout processing, the time of timeout and the present time are checked again to confirm whether a timer is not cancelled.05-06-2010
20080215858PROCESSOR AND PROGRAM EXECUTION METHOD CAPABLE OF EFFICIENT PROGRAM EXECUTION - A processor for sequentially executing a plurality of programs using a plurality of register value groups stored in a memory that correspond one-to-one with the programs. The processor includes a plurality of register groups; a select/switch unit operable to select one of the plurality of register groups as an execution target register group on which a program execution is based, and to switch the selection target every time a first predetermined period elapses; a restoring unit operable to restore, every time the switching is performed, one of the register value groups into one of the register groups that is not selected as the execution target register group; a saving unit operable to save, prior to the restoring, register values in the register group targeted for restoring, by overwriting a register value group in the memory that corresponds to the register values; and a program execution unit operable to execute, every time the switching is performed, a program corresponding to a register value group in the execution target register group.09-04-2008
20090106537PROCESSOR SUPPORTING VECTOR MODE EXECUTION - An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.04-23-2009
20090089553MULTI-THREADED PROCESSING - A system includes a multi-threaded processor that executes an instruction of a process of an executing program. The multi-threaded processor includes at least a first and a second thread. First and second sets of source registers are respectively allocated to the first and second threads, and first and second sets of destination registers are respectively allocated to the first and second threads. A resource prefix configuration register includes mappings between each of the source and destination registers and the threads. The multi-threaded processor, during execution of the instruction by one of the first or the second threads of execution, accesses the source and destination registers based on the mapping, wherein at least one of the accessed registers is allocated to the other of the first or the second thread of execution.04-02-2009
20090063825SYSTEMS AND METHODS FOR COMPRESSING STATE MACHINE INSTRUCTIONS - Systems and methods for compressing state machine instructions are disclosed herein. In one embodiment, the method comprises associating input characters associated with states to respective indices, where each index comprises information indicative of a particular transition instruction.03-05-2009
20100125721System and Method for Determining and/or Reducing Costs Associated with Utilizing Objects - According to one embodiment of the present invention, a method includes receiving, in near real time, data associated with a utilization of one or more objects. The method further includes comparing the data associated with the utilization of the one or more objects with one or more rules associated with the utilization of the one or more objects. The method further includes determining, in near real time, a cost associated with the utilization of the one or more objects based at least on the comparison. The method further includes providing, in near real time, an indication of the cost associated with the utilization of the one or more objects.05-20-2010
20110173420PROCESSOR RESUME UNIT - A system for enhancing performance of a computer includes a computer system having a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processor. An external unit is external to the processor for monitoring specified computer resources. The external unit is configured to detect a specified condition using the processor. The processor including one or more threads. The thread resumes an active state from a pause state using the external unit when the specified condition is detected by the external unit.07-14-2011
20090210676System and Method for the Scheduling of Load Instructions Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a first pipeline based upon a priority list; and (3) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20110173419Look-Ahead Wake-and-Go Engine With Speculative Execution - A wake-and-go mechanism is provided for a microprocessor. The wake-and-go mechanism looks ahead in the instruction stream of a thread for programming idioms that indicates that the thread is waiting for an event. If a look-ahead polling operation succeeds, the look-ahead wake-and-go engine may record an instruction address for the corresponding idiom so that the wake-and-go mechanism may have the thread perform speculative execution at a time when the thread is waiting for an event. During execution, when the wake-and-go mechanism recognizes a programming idiom, the wake-and-go mechanism may store the thread state in the thread state storage. Instead of putting thread to sleep, the wake-and-go mechanism may perform speculative execution.07-14-2011
20090287910SYSTEM AND METHOD FOR PROVIDING PREPROGRAMMED IMAGING IN A DATA COLLECTION ENVIRONMENT - According to a particular embodiment, an imaging system is provided that includes an imaging device operable to image a hard-drive of a target device. The imaging device includes a first connection to the target device and a second connection to an output capture device, whereby both connections facilitate an information flow of data. The imaging device includes a preprogramming element that includes a stored version of preprogramming instructions, which direct the imaging device on how to image the hard-drive autonomously.11-19-2009
20120297171METHODS FOR GENERATING CODE FOR AN ARCHITECTURE ENCODING AN EXTENDED REGISTER SPECIFICATION - There are provided methods and computer program products for generating code for an architecture encoding an extended register specification. A method for generating code for a fixed-width instruction set includes identifying a non-contiguous register specifier. The method further includes generating a fixed-width instruction word that includes the non-contiguous register specifier.11-22-2012
20080209181Method and System for Automatic Generation of Processor Datapaths - Systems and method for automatically generating a set of shared processor datapaths from the description of the behavior of one or more ISA operations is presented. The operations may include, for example, the standard operations of a processor necessary to support an application language such as C or C++ on the ISA. Such operations, for example, may represent a configurable processor ISA. The operations may also include one or more extension operations defined by one or more designers. Thus, a description of the behaviors of the various standard and/or extension operations that compose the ISA of an instance of a standard or configurable processor is used to automatically generate a set of shared processor datapaths that implement the behavior of those operations. In addition, certain aspects may take one or more operations as well as one or more input semantics and either re-implement the input semantics automatically, or combine the input semantics with each other or with one or more other operations to automatically generate a new set of shared processor datapaths.08-28-2008
20080209180Emulation prevention byte removers for video decoder - An emulation prevention byte remover may include one or more of a first buffer, a second buffer, a checker, and a shifter. The first buffer may store first stream data. The second buffer may store second stream data. The checker may determine whether one or more emulation prevention bytes are included in the first, second, or first and second stream data. If the checker determines that the one or more emulation prevention bytes are included in the first, second, or first and second stream data, the checker may output a check signal. In response to the check signal, the shifter may remove at least one of the one or more emulation prevention bytes from the first, second, or first and second stream data. The shifter may generate output stream data based on the first, second, or first and second stream data.08-28-2008
20080209179Low-Impact Performance Sampling Within a Massively Parallel Computer - An apparatus, program product and method sample at different times nodes that are performing similar work. Performance data associated with first and second node subsets performing the similar work are sampled at different times, e.g., in a round-robin fashion, and in accordance with a given sampling rate. The performance data is analyzed. Nodes whose performance suffers as a result of a sampling operation may be identified and removed from a subsequent operation.08-28-2008
20110271084Information processing system and information processing method - A disclosed information processing system includes a receiving node and a storing node, the receiving node includes an order information adding unit that adds first order information to operation instructions included in an operation instruction sequence, the first order information indicating an order among the operation instruction sequences and an operation instruction transmission unit that transmits the one or more operation instructions to the storing node, the storing node includes an operation instruction execution unit that executes the operation instructions. Further, upon a receipt of a second operation instruction having the first order information indicating that the second operation instruction is earlier than the one or more first operation instructions, which was already executed, in the first order relationship, the storing node re-executes the first operation instruction after the second operation instruction is executed.11-03-2011
20090282222Dynamic Virtual Software Pipelining On A Network On Chip - A NOC for dynamic virtual software pipelining including IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including: a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID, each stage assigned to a thread of execution on an IP block; and each stage executing on a thread of execution on an IP block, including a first stage executing on an IP block, producing output data and sending by the first stage the produced output data to a second stage, the output data including control information for the next stage and payload data; and the second stage consuming the produced output data in dependence upon the control information.11-12-2009
20090287909Dynamically Estimating Lifetime of a Semiconductor Device - In one embodiment, the present invention includes a method for obtaining dynamic operating parameter information of a semiconductor device such as a processor, determining dynamic usage of the device, either as a whole or for one or more portions thereof, based on the dynamic operating parameter information, and dynamically estimating a remaining lifetime of the device based on the dynamic usage. Depending on the estimated remaining lifetime, the device may be controlled in a desired manner. Other embodiments are described and claimed.11-19-2009
20090300333HARDWARE SUPPORT FOR WORK QUEUE MANAGEMENT - The claimed matter provides systems and/or methods that effectuate utilization of fine-grained concurrency in parallel processing and efficient management of established memory structures. The system can include devices that establish memory structures associated with individual processors that can comprise a parallel processing phalanx. The system can thereafter utilize various enqueuing and/or dequeuing directives to add or remove work descriptors to or from the memory structures individually associated with each of the individual processors thereby providing improved work flow synchronization amongst the processors that comprise the parallel processing complex.12-03-2009
20080215857Method For Latest Producer Tracking In An Out-Of-Order Processor, And Applications Thereof - Methods for latest producer tracking in a processor. In one embodiment, the method includes the steps of (1) writing a physical register identification value in a first register rename map location specified by a first instruction, (2) writing a first in-register status value in a second register rename map location specified by the first instruction, (3) writing a producer tracking status value at a producer tracking map location specified by the physical register identification value, and (4) modifying, upon graduation of the first instruction, the first in-register status value only if the producer tracking map location stores the producer tracking status value written in step (3). Other methods are also presented.09-04-2008
20100005278DEVICE AND METHOD FOR CONTROLLING AN INTERNAL STATE OF INFORMATION PROCESSING EQUIPMENT - The state control device for controlling an internal state of information processing equipment includes a scenario table, an information recorder, an information player and a state change controller. The information recorder acquires sync information and one item or a plurality of items of state information from the information processing equipment and records the acquired sync information and state information in association with each other in the scenario table. The information player, receiving sync information from the information processing equipment, acquires state information associated with sync information corresponding to the received sync information, among the sync information stored in the scenario table, from the scenario table, and supplies the acquired state information to the state change controller. The state change controller controls the inside of the information processing equipment based on the state information received from the information player. The sync information is information for identifying an execution state of the information processing equipment at a given time point, and the state information is information representing an internal state of the information processing equipment at a given time point synchronous with the sync information.01-07-2010
20100138636Method of sending an executable code to a reception device and method of executing this code - One embodiment of the present invention discloses a process for sending an executable code to a security module locally connected to a receiving device. This security module comprises a microcontroller and a memory, the memory including at least one executable area provided to contain instructions suitable to be executed by the microcontroller, and at least one non-executable area, wherein the microcontroller cannot execute the instructions, further comprising the steps of dividing the executable code into blocks; adding at least one block management code to the blocks in order to create an extended block; introducing the content of an extended block into a message to be processed in the receiving device, in such a way that the whole executable code is contained in a plurality of messages; sending a message to the receiving device, this message containing one of the extended blocks different from the first extended block; processing the message in order to extract its extended block; storing the executable code and the at least one management code of the block received in the executable area of the memory; executing at least one management code of the extended block, this management code includes the effect of transferring the content of the block to a non-executable area of the memory; repeating the previous steps until all the extended blocks are stored in the memory, except for the first block; sending a message containing the first extended block to the receiving device; processing the message in order to extract the extended block and storing the executable code of the block received in the executable area of the memory. One embodiment of the invention also concerns a process for the execution of this code.06-03-2010
20080244236METHOD AND SYSTEM FOR COMPOSING STREAM PROCESSING APPLICATIONS ACCORDING TO A SEMANTIC DESCRIPTION OF A PROCESSING GOAL - A method for assembling a stream processing application, includes: inputting a plurality of data source descriptions, wherein each of the data source descriptions includes a graph pattern that semantically describes an output of a data source; inputting a plurality of component descriptions, wherein each of the component descriptions includes a graph pattern that semantically describes an input of a component and a graph pattern that semantically describes an output of the component; inputting a stream processing request, wherein the stream processing request includes a goal that is represented by a graph pattern that semantically describes a desired stream processing outcome; assembling a stream processing graph, wherein the stream processing graph includes at least one data source or at least one component that satisfies the desired processing outcome; and outputting the stream processing graph.10-02-2008
20080244237Compute unit with an internal bit FIFO circuit - A compute unit with an internal bit FIFO circuit includes at least one data register, a lookup table, a configuration register including FIFO base address, length and read/write mode fields for configuring a portion of the lookup table as a bit FIFO circuit and a read/write pointer register responsive to an instruction having a lookup table identification field, length of bits field and register extract/deposit field for selectively transferring in a single cycle between the FIFO circuit and the data register a bit field of specified length.10-02-2008
20080244239Method and System for Autonomic Monitoring of Semaphore Operations in an Application - A method, an apparatus, and a computer program product in a data processing system are presented for using hardware assistance for gathering performance information that significantly reduces the overhead in gathering such information. Performance indicators are associated with instructions or memory locations, and processing of the performance indicators enables counting of events associated with execution of those instructions or events associated with accesses to those memory locations. The performance information that has been dynamically gathered from the assisting hardware is available to the software application during runtime in order to autonomically affect the behavior of the software application, particularly to enhance its performance. For example, the counted events may be used to autonomically collecting statistical information about the ability of a software application to successfully acquire a semaphore.10-02-2008
20080276077Method To Reduce The Number Of Load Instructions Searched By Stores And Snoops In An Out-Of-Order Processor - A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.11-06-2008
20090063826Quad aware Locking Primitive - A method and computer system for efficiently handling high contention locking in a multiprocessor computer system. At least some of the processors in the system are organized into a hierarchy, and process an interruptible lock in response to the hierarchy. The method utilizes two alternative methods of acquiring the lock, including a conditional lock acquisition primitive and an unconditional lock acquisition primitive, and an unconditional lock release primitive for releasing the lock from a particular processor. To prevent races between processors requesting a lock acquisition and a processor releasing the lock, a release flag is utilized. Furthermore, in order to ensure that the a processor utilizing the unconditional lock acquisition primitive is granted the lock, a handoff flag is utilized.03-05-2009
20090037699APPARATUS AND METHOD FOR PROCESSING SEMICONDUCTOR - A semiconductor processing apparatus includes a tester for inspecting a semiconductor device, a display unit 02-05-2009
20080282069METHOD AND SYSTEM FOR DESIGNING A FLEXIBLE HARDWARE STATE MACHINE - Method and system for performing hardware tasks using a hardware state machine and a processor is provided. The method includes, setting a breakpoint for a state machine state; running the processor in a parallel mode with the state machine; passing control to the processor after a breakpoint condition is encountered; performing a task, wherein the processor performs the task which was meant to be performed by the state machine; and transferring control back to the state machine after the processor performs the task. The system includes an Application Specific Integrated Circuit (ASIC) with the state machine, and the processor.11-13-2008
20080282068HOST COMMAND EXECUTION ACCELERATION METHOD AND SYSTEM - The present invention sets forth an interface method and system for host acceleration between an electronic device and a host PC. The system comprises an acceleration unit for rapidly classifying a type of an host command then issuing a flag signal to a microprocessor. The microprocessor then executes corresponding actions according to the flag signal and the host command without parsing the host command for accelerating the data communication between the device and a host PC.11-13-2008
20080270764STATE MACHINE COMPRESSION - Compressing state transition instructions may achieve a reduction in the binary instruction footprint of a state machine. In certain embodiments, the compressed state transition instructions are used by state machine engines that use one or more caches in order to increase the speed at which the state machine engine can execute a state machine. In addition to reducing the instruction footprint, the use of compressed state transition instructions as discussed herein may also increase the cache hit rate of a cache-based state machine engine, resulting in an increase in performance.10-30-2008
20120297170DECENTRALIZED ALLOCATION OF RESOURCES AND INTERCONNNECT STRUCTURES TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES - A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.11-22-2012
20080270765DISPLAY INFORMATION VERIFICATION PROGRAM, METHOD AND APPARATUS - A display information verification method, when display data of financial data is generated from the financial data and scripts for the financial data, includes: searching the scripts for an arithmetic instruction to process a numeric value in the financial data or a conversion instruction to convert a character string included in the financial data; and judging whether or not the arithmetic instruction or the conversion instruction detected in the searching is an instruction considered to manipulate the financial data. Thus, it is possible to detect the instruction considered to manipulate data from the scripts, and to avoid display including the manipulation of the data. In addition, for example, by using information of the instruction, which is stored inside in advance and is allowed to be used, it is possible to detect only the arithmetic instruction or the conversion instruction, which is not allowed to be used.10-30-2008
20080270763Device and Method for Processing Instructions - A method and a device for processing instructions. The device includes a pipelined processor, an instruction memory unit and a register file, whereas the pipelined processor includes a write-back unit and an execution unit. The device is characterized by including a controller that is adapted to receive a first register group size information and a first register identification information that define a first group of source registers associated with a first instruction; and to determine an execution related operation of the first instruction in response to the first register group size information, the first register identification information, a second register group size information and a second register identification information. Whereas the second register group size information and the second register identification information define a second group of target registers associated with a second instruction. Whereas the second instruction is provided to the pipelined processor before the first instruction.10-30-2008
20080270766Method To Reduce The Number Of Load Instructions Searched By Stores and Snoops In An Out-Of-Order Processor - A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.10-30-2008
20100146246Method and Apparatus for Decompression of Block Compressed Data - System and method for decompressing data. A compressed data stream including contiguous variable length blocks is received, each block including multiple contiguous variable length data fields and a tag portion that includes multiple contiguous tag fields corresponding respectively to the data fields. Each tag field stores a tag value specifying a size of a respective field in the block. A current variable length block is stored. A single machine instruction of a processor is executed that analyzes the tag portion of the current block, and creates a control pattern, storing the control pattern in a first register of the processor. The control pattern is configured to unpack the variable length data fields of the current variable length block into corresponding uniform data fields. The contiguous variable length data fields of the current variable length block are decompressed using the control pattern, thereby decompressing the compressed data stream.06-10-2010
20100146245PARALLEL EXECUTION OF A LOOP - A method of executing a loop over an integer index range of indices in a parallel manner includes assigning a plurality of index subsets of the integer index range to a corresponding plurality of threads, and defining for each index subset a start point of the index subset, an end point of the index subset, and a boundary point of the index subset positioned between the start point and the end point of the index subset. A portion of the index subset between the start point and the boundary point represents a private range and the portion of the index subset between the boundary point and the end point represents a public range. Loop code is executed by each thread based on the index subset of the integer index range assigned to the thread.06-10-2010
20090182989MULTITHREADED MICROPROCESSOR WITH REGISTER ALLOCATION BASED ON NUMBER OF ACTIVE THREADS - A mechanism in a multithreaded processor to allocate resources based on configuration information indicating how many threads are in use.07-16-2009
20090089555Methods and apparatus for executing or converting real-time instructions - In one embodiment, a computer processor is configured to execute a plurality of instructions defined by an instruction set including at least one real-time instruction. Each of the at least one real-time instruction specifies an execution timing of a respective one of the at least one real-time instruction. Each execution timing is tied to a common real-time measurement system. Other embodiments are also described.04-02-2009
20090138685Processor for processing instruction set of plurality of instructions packed into single code - A conversion table converts a packed instruction (pre-conversion code) contained in the instruction code fetched from an instruction memory into a plurality of instruction codes (converted codes). An instruction decoder decodes the plurality of the instruction codes converted by a conversion table. A plurality of ALUs perform the operation in accordance with the decoding result of the instruction decoder. Therefore, the number of instructions that can be executed in parallel per cycle may be increased while at the same time the capacity of the instruction memory is reduced.05-28-2009
20090164761HIERARCHICAL SYSTEM AND METHOD FOR ANALYZING DATA STREAMS - A method for analyzing data streams comprises receiving a data stream, conducting a first analysis of the data stream for a possible target activity, and if a possible target activity is indicated generating a first alert. If the first alert is generated, a second analysis for the possible target activity is conducted to determine whether the target activity is indicated in the data stream with a high degree of certainty. If a possible target activity is indicated by the second analysis, a second alert is generated and provided to an external system for action.06-25-2009
20090138686METHOD FOR PROCESSING A GRAPH CONTAINING A SET OF NODES - The invention relates to a computerized method for processing a graph containing a set of nodes processing a graph containing a set of nodes, wherein forest of trees is provided corresponding to a directed acyclic graph containing a set of nodes, each of said nodes having a type chosen from a set of types; a depth for each node in said forest of trees is determined; in a breadth-first traversal manner, the depth and type of each node in said forest of trees is compared to a predefined matrix, said matrix defining for each depth and type combination one of the following actions to be carried out: no action, creating a new sub-tree, triggering exception handling.05-28-2009
20090019267Method, System, and Apparatus for Dynamic Reconfiguration of Resources - A dynamic reconfiguration to include on-line addition, deletion, and replacement of individual modules of to support dynamic partitioning of a system, interconnect (link) reconfiguration, memory RAS to allow migration and mirroring without OS intervention, dynamic memory reinterleaving, CPU and socket migration, and support for global shared memory across partitions is described. To facilitate the on-line addition or deletion, the firmware is able to quiesce and de-quiesce the domain of interest so that many system resources, such as routing tables and address decoders, can be updated in what essentially appears to be an atomic operation to the software layer above the firmware.01-15-2009
20090187745INFORMATION PROCESSING SYSTEM AND METHOD OF EXECUTING FIRMWARE - An information processing system includes a control central processing unit a memory; and a stream interface configured to receive an input stream including data to be processed and to transfer the input stream to the memory. A download process in which the stream interface receives stream data including firmware and stores the stream data in the memory is executed in response to an instruction from the control central processing unit, and the control central processing unit analyzes the stream data stored in the memory to extract the firmware in the memory space and executes the firmware extracted in the memory space to process the data to be processed.07-23-2009
20090259829THREAD-LOCAL MEMORY REFERENCE PROMOTION FOR TRANSLATING CUDA CODE FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.10-15-2009
20090198972Microprocessor systems - A microprocessor pipeline arrangement 1 includes a plurality of functional units 08-06-2009
20090198971Heterogeneous Processing Elements - A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model puts special purpose processing elements on the same playing field as processors, from a programming perspective, operating system perspective, power perspective, as the processors. The operating system can get work to a security engine, for example, in the same way it does to a processor.08-06-2009
20090083522Systems, Devices, and/or Methods for Managing Programmable Logic Controller Processing - Certain exemplary embodiments can provide a programmable logic controller, which can comprise a Reduced Instruction Set Computer (RISC) processor. The RISC processor can be adapted to, responsive to a received request to process a Boolean operation, execute a single processor data access instruction addressed to a region of a memory-mapped register corresponding to the Boolean operation.03-26-2009
20090083521Program illegiblizing device and method - A program obfuscating device for generating obfuscated program from which unauthorized analyzer cannot obtain confidential information easily. The program obfuscating device stores original program that contains authorized program instructions and confidential process instruction group containing confidential information that needs to be kept confidential, generates process instructions which, when executed in predetermined order, provide same result, with execution of last process instruction thereof, as the confidential process instruction group, inserts the process instructions into the original program at position between start of the original program and the confidential process instruction group so as to be executed in the predetermined order, in place of the confidential process instruction group, generates dummy block as dummy of the process instructions, and inserts the dummy block and control instruction, which causes the dummy block to be bypassed, into the original program, and inserts branch instruction into the dummy block.03-26-2009
20090083523PROCESSOR POWER MANAGEMENT ASSOCIATED WITH WORKLOADS - Some embodiments provide determination of a processor performance characteristic associated with a first workload, and determination of a processor performance state for the first workload based on the performance characteristic. Further aspects may include determination of a second processor performance characteristic associated with a second workload, determination of a second processor performance state for the second workload based on the performance characteristic, determination of a similarity between the first performance characteristics and the second performance characteristics, determination of a cluster comprising the first workload and the second workload, and association of a third processor performance state with the cluster, wherein the third processor performance state is identical to the first processor performance state and to the second processor performance state.03-26-2009
20080263333DOCUMENT PROCESSING METHOD - The present invention discloses a method for processing document data to achieve document interoperation, and the method comprises: by an application, performing an operation on abstract unstructured information by issuing instruction(s) to a platform software; and by the said platform software, receiving the said instruction and performing the operation on storage data corresponded to the abstract unstructured information according to the said instruction; wherein said abstract unstructured information are independent of a way in which said storage data are stored.10-23-2008
20080263332Data Processing Apparatus and Method for Accelerating Execution Subgraphs - A data processing apparatus and method are provided for processing data under control of a program having program instructions including sequences of individual program instructions corresponding to computational subgraphs identified within the program. Each computational subgraph has a number of input operands and produces one or more output operands. The apparatus comprises an operand store for storing the input and output operands, and processing logic for executing individual program instructions from the program. Also provided is configurable accelerator logic which, in response to reaching an execution point within the program corresponding to a sequence of individual program instructions corresponding to a computational subgraph, evaluates one or more output functions associated with the computational subgraph. The evaluation of each output function generates an output operand for storing in the operand store, and each output operand corresponds to an output that would have been generated had the sequence of individual program instructions corresponding to the computational subgraph have been executed by the processing logic. Configuration storage stores a single look-up table (LUT) configuration for each output function, and for each output function to be evaluated, the accelerator logic is configured dependent on the associated single LUT configuration from the configuration storage, such that on receipt of the input operands of the computational subgraph, the accelerator logic will generate the output operand. This technique has been found to provide a particularly efficient accelerator logic for evaluating output functions associated with computational subgraphs.10-23-2008
20110231635Register, Processor, and Method of Controlling a Processor - A processor and a processor control method which efficiently perform an operation on data using a register, are provided. The register may include a data type field and a data field. The processor may generate the data type bits and store the generated data type bits in the data type field.09-22-2011
20110225399PROCESSOR AND METHOD FOR SUPPORTING MULTIPLE INPUT MULTIPLE OUTPUT OPERATION - A processor for supporting a MIMO operation and method of processing a MIMO instruction are provided. The MIMO operation supporting processor may include a scheduler and at least one functional unit. The scheduler may map multiple inputs of the MIMO instruction to a plurality of sequential input cycles, respectively, and may map multiple outputs of the MIMO instruction to a plurality of sequential output cycles, respectively. The output cycles may be followed by the input cycles and a predetermined number of cycles for a MIMO operation. A functional unit may read a register during sequential input cycles, may perform a MIMO operation during a predetermined number of execution cycles, and may write the result of the MIMO operation into a register during sequential output cycles.09-15-2011
20090235057CACHE MEMORY CONTROL CIRCUIT AND PROCESSOR - A cache memory control circuit includes a selecting section configured to be capable of selecting, in a predetermined order, each way or a predetermined two or more ways of a cache memory having multiple ways; a comparing section configured to detect a cache hit in each way; and a control section configured to, upon detection of a cache hit, stop a selection of the respective ways or the predetermined two or more ways at the selecting section.09-17-2009
20090210677System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine a stall penalty of all the instructions in the issue group, (3) schedule the instructions in an order of the longest stall penalty to shortest stall penalty, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090259828EXECUTION OF RETARGETTED GRAPHICS PROCESSOR ACCELERATED CODE BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.10-15-2009
20090276609CONFIGURABLE PIPELINE BASED ON ERROR DETECTION MODE IN A DATA PROCESSING SYSTEM - A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.11-05-2009
20090138683Dynamic instruction execution using distributed transaction priority registers - A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values in thread priority registers distributed throughout the computer system, instructions from different threads that are dispatched through the system are allocated system resources based on the priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies. The test results may be used to optimize the workload allocation of system resources by dynamically assigning thread priority values to individual threads using any desired policy, such as achieving thread execution balance relative to thresholds and to performance of other threads, reducing thread response time, lowering power consumption, etc.05-28-2009
20090249038STREAM DATA PROCESSING APPARATUS - In a normal operation state, a connection management section writes data transmitted from a first processing section to a data temporary storage section and reads data to be received by a second processing section from the data temporary storage section. Upon receiving control signals which instruct a change of the subject of processing, the first processing section and the second processing section output a transmitting-end clear request and a receiving-end clear request, respectively. The connection management section reads data from the empty data storage section after a transmitting-end clear request is received and until a receiving-end clear request is received, and writes data to the empty data storage section after a receiving-end clear request is received and until a transmitting-end clear request is received.10-01-2009
20090249037Pipeline processors - A method and apparatus are provided for executing instructions from a plurality of instruction threads on a multi-threaded processor. The instruction threads may each include instructions of different complexity. A plurality of pipelines for executing instructions are provided and an instruction scheduler determines on each clock cycle the pipelines upon which instructions will be executed. Some of the pipelines are configured to appear to the instruction threads as single pipelines but in fact comprise two pipeline paths, one for executed instructions of lower complexity and the other. The instruction scheduler determines on which of the two pipeline paths an instruction should execute.10-01-2009
20120246451PROCESSING LONG-LATENCY INSTRUCTIONS IN A PIPELINED PROCESSOR - There is provided a method and processor for processing a thread. The thread comprises a plurality of sequential instructions, the plurality of sequential instructions comprising some short-latency instructions and some long-latency instructions and at least one hazard instruction, the hazard instruction requiring one or more preceding instructions to be processed before the hazard instruction is processed. The method comprises the steps of: a) before processing each long-latency instruction, incrementing by one, a counter associated with the thread; b) after each long-latency instruction has been processed, decrementing by one, the counter associated with the thread; c) before processing each hazard instruction, checking the value of the counter associated with the thread, and i) if the counter value is zero, processing the hazard instruction, or ii) if the counter value is non-zero, pausing processing of the hazard instruction until a later time. The processor includes means for performing steps a), b) and c) of the method.09-27-2012
20100161946CONTROLLING JITTERING EFFECTS - Methods and related systems for controlling jitter effects are disclosed.06-24-2010
20090077352PERFORMANCE OF AN IN-ORDER PROCESSOR BY NO LONGER REQUIRING A UNIFORM COMPLETION POINT ACROSS DIFFERENT EXECUTION PIPELINES - A method, system and processor for improving the performance of an in-order processor. A processor may include an execution unit with an execution pipeline that includes a backup pipeline and a regular pipeline. The backup pipeline may store a copy of the instructions issued to the regular pipeline. The execution pipeline may include logic for allowing instructions to flow from the backup pipeline to the regular pipeline following the flushing of the instructions younger than the exception detected in the regular pipeline. By maintaining a backup copy of the instructions issued to the regular pipeline, instructions may not need to be flushed from separate execution pipelines and re-fetched. As a result, one may complete the results of the execution units to the architected state out of order thereby allowing the completion point to vary among the different execution pipelines.03-19-2009
20090077351Information processing device and compiler - Devices, compilers and methods to reduce energy consumption associated with execution of a program by adjusting a computational capability of a CPU with higher accuracy than before. A device sets an appropriate computational capability to the CPU. It includes: changing a computational capability of the CPU every time each of a plurality of program areas included in the execution program is executed while the execution program is being executed, and measuring execution time each of the program areas; deciding an optimal computational capability required to execute the program area using the CPU, based on the execution time for each of the computational capabilities measured for the respective program areas; and performing setting of the optimal computational capability for executing the program area, which is to be used when executing the program area again in the course of executing the execution program, for each of the program areas.03-19-2009
20100153691LOWER POWER ASSEMBLER - A method for processing data using a time-stationary multiple-instruction word processing apparatus, arranged to execute a plurality of instructions in parallel, said method comprising the following steps: generating a set of multiple-instruction words (INS(i), INS(i+1), INS(i+2)), wherein each multiple-instruction word comprises a plurality of instruction fields, wherein each instruction field encodes control information for a corresponding resource of the processing apparatus, and wherein bit changes between an instruction field related to a no-operation instruction, and a corresponding instruction field of an adjacent multiple-instruction word are minimised; storing input data in a register file (RF06-17-2010
20090240924INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PRODUCT - An information processing device disclosed includes a plurality of executing units for executing various processes. The information processing device and method thereof acquire setting information that indicates an operating condition with respect to each executing unit from information an operation of a main process executed by the plurality of executing units, and sets an operating state of each of the executing units based on the acquired setting information.09-24-2009
20100235611COMPILER, COMPILE METHOD, AND PROCESSOR CORE CONTROL METHOD AND PROCESSOR - A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.09-16-2010
20100223446CONTEXTUAL TRACING - A method of tracking execution of activities in a computing environment in which events in an activity are recorded along with an activity identifier uniquely identifying the activity and tying the events to the activity. To track interactions between activities, a correlation identifier may be generated and transferred between the interacting activities as part of the interaction. For each of the activities participating in the interaction, information on an event relating to the interaction is recorded along with the correlation identifier. The correlation identifier thus allows uniquely identifying each interaction which may be used to synchronize streams of events within the activities at points of their interaction. Activities may interact across any boundary, including a network.09-02-2010
20100211763Data broadcast processing device, method and program - The present invention relates to a data broadcast processing device, method, and program which enable secure control of an operation of a data broadcast processing device. Since a flag standalone is not set in a script NCL Script 08-19-2010
20100115243Apparatus, Method and Instruction for Initiation of Concurrent Instruction Streams in a Multithreading Microprocessor - A fork instruction for execution on a multithreaded microprocessor and occupying a single instruction issue slot is disclosed. The fork instruction, executing in a parent thread, includes a first operand specifying the initial instruction address of a new thread and a second operand. The microprocessor executes the fork instruction by allocating context for the new thread, copying the first operand to a program counter of the new thread context, copying the second operand to a register of the new thread context, and scheduling the new thread for execution. If no new thread context is free for allocation, the microprocessor raises an exception to the fork instruction. The fork instruction is efficient because it does not copy the parent thread general purpose registers to the new thread. The second operand is typically used as a pointer to a data structure in memory containing initial general purpose register set values for the new thread.05-06-2010
20100138637DATA PROCESSING METHOD - A data processing method for sampling data from data each varying over time, each of the data being associated with each of grid points arranged in an area, the method includes: dividing the area into blocks; calculating a variation rate of each of the data associated with each of the grid points included in each of the blocks; dividing the blocks into sub-blocks in accordance with the variation rate of the blocks; calculating a variation rate of each set of the data associated with each of the grid points included in each of the sub-blocks; and determining a frequency of sampling data associated with each of the grid points for the sub-blocks of the blocks and for the rest of the blocks in accordance with the variation rate of the sub-blocks and the rest of the blocks.06-03-2010
20100250903APPARATUSES AND SYSTEMS INCLUDING A SOFTWARE APPLICATION ADAPTATION LAYER AND METHODS OF OPERATING A DATA PROCESSING APPARATUS WITH A SOFTWARE ADAPTATION LAYER - A software application adaptation layer is comprised of a program file comprising a plurality of adaptation filters and a configuration file. The configuration file may designate one or more adaptation filters of the plurality of adaptation filters to be applied by the program file for modifying one or more behaviors of an active software application. A data processing apparatus including such a software application adaptation layer includes processing circuitry configured to execute instructions for the active software application, a communications module coupled to the processing circuitry and at least one storage medium for storing the program file and the configuration file. Operational methods for such an apparatus includes storing the program file and the configuration file in the at least one storage medium, identifying the active software application, generating an adaptation filter set and attaching the adaptation filter set to an input queue associated with the active software application.09-30-2010
20120144168ODD AND EVEN START BIT VECTORS - A method and apparatus is presented for identifying instructions in a stream of information by preprocessing the stream of information, creating a vector of instructions and breaking the vector of instructions into two or more vectors for picking the identified instructions at a high frequency.06-07-2012
20100095096AV DEVICE AND ITS CONTROL METHOD - In an AV device control, from unit instructions (04-15-2010
20100058036Distributed Acceleration Devices Management for Streams Processing - A method for managing distributed computer data stream acceleration devices is provided that utilizes distributed acceleration devices on nodes within the computing system to process inquiries by programs executing on the computing system. The available nodes and available acceleration devices in the computing system are identified. In addition, a plurality of virtual acceleration device definitions is created. Each virtual acceleration device definition includes attributes used to configure at least one of the plurality of identified acceleration devices. When an inquiry containing an identification of computing system resources to be used in processing the inquiry is received, at least one virtual acceleration device definition that is capable of configuring an acceleration device in accordance with the computing system resources identified by the inquiry is identified. That acceleration device is configured in accordance with the identified virtual acceleration device definition and is used to process the inquiry.03-04-2010
20090319758Processor, performance profiling apparatus, performance profiling method , and computer product - A processor capable of executing an arbitrary application program on an operating system includes an event context register that stores therein an ID of an event to be measured in the arbitrary application program and a context register that records therein an ID of an event executed by the arbitrary application program upon the application program being executed on the operating system. The processor further includes a comparator that compares the ID of the event recorded in the context register and the ID of the event to be measured that is stored in the event context register and an event counter that counts the number of times the ID of the event recorded in the context register and the ID of the event to be measured are determined to coincide by the comparator.12-24-2009
20090113184Method, Apparatus, and Program for Pinning Internal Slack Nodes to Improve Instruction Scheduling - A scheduling algorithm is provided for selecting the placement of instructions with internal slack into a schedule of instructions within a loop. The algorithm achieves this by pinning nodes with internal slack to corresponding nodes on the critical path of the code that have similar properties in terms of the data dependency graph, such as earliest time and latest time. The effect is that nodes with internal slack are more often optimally placed in the schedule, reducing the need for rotating registers or register copy instructions. The benefit of the present invention can primarily be seen when performing instruction scheduling or software pipelining on loop code, but can also apply to other forms of instruction scheduling when greater control of placement of nodes with internal slack is desired.04-30-2009
20090138682Dynamic instruction execution based on transaction priority tagging - A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values for each thread as a tag in the thread's instructions, tagged instructions from different threads that are dispatched through the system are allocated system resources based on the tagged priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies. The test results may be used to optimize the workload allocation of system resources by dynamically assigning thread priority values to individual threads using any desired policy, such as achieving thread execution balance relative to thresholds and to performance of other threads, reducing thread response time, lowering power consumption, etc.05-28-2009
20090113183METHOD OF CONTROLLING A DEVICE AND A DEVICE CONTROLLED THEREBY - A method of controlling at least one device is disclosed. The method includes providing the device with at least one constraint for carrying out an operation. The device determines if the constraint can be met. If it is determined that the constraint can be met, the device determines on its own accord a manner to get into a state wherein the constraint will be met. The device then goes into the state in the determined manner. A device that is controlled by the method and a system including such a device are also disclosed.04-30-2009
20120144169INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER READABLE MEDIUM - An information processing apparatus includes the following elements. A generator generates, on the basis of instruction information which describes processing to be executed for obtaining output data from raw data, processing definition information that defines details of the processing, upon inputting the instruction information. A determination unit determines whether output data associated with the currently generated processing definition information and data to be used as raw data is stored in a first memory, the first memory storing therein output data which has been obtained in accordance with previously generated processing definition information in association with data used as the raw data and the processing definition information. An output unit outputs, if the determination unit determines that the output data is stored in the first memory, the output data stored in the first memory without causing a processor to execute the processing.06-07-2012
20110238954DATA PROCESSING APPARATUS - Source code to be processed is analyzed and configuration data in implementing in accordance with each of plural implementation systems is created and is stored in a local memory of a DRP incorporating system. When execution of target processing is started, the implementation system determination processing calculates estimated processing time when the configuration of each of the implementation systems is adopted and determines the optimum one of the implementation systems based on a combination of the estimated processing time and the circuit scale of the configuration.09-29-2011
20090106536Processor for executing group extract instructions requiring wide operands - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.04-23-2009
20110107067SINGLE-CHIP MULTIPROCESSOR WITH CLOCK CYCLE-PRECISE PROGRAM SCHEDULING OF PARALLEL EXECUTION - A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.05-05-2011
20110107065INTERCONNECT CONTROLLER FOR A DATA PROCESSING DEVICE AND METHOD THEREFOR - A data processing device includes an interconnect controller operable to manage the communication of information between modules of the data processing device via an interconnect. In response to a transaction request the interconnect controller selects a tag value from a set of available tag values, assigns the tag to the transaction and reserves the tag value so that it is unavailable for assignment to other transactions. If an expected response to the transaction request is not received within a designated amount of time, the transaction enters a timed-out state and the interconnect controller locks the tag value, so that it remains unavailable for assignment to other transactions until an unlock event, such as a request from software.05-05-2011
20090037700Method and system for reactively assigning computational threads of control between processors - A method and system for reactively assigning computational threads of control between processors provide a coordination model implemented by a software framework. The coordination model comprises five (5) entities which implement the three elements of a coordination model: 1) Behavior, 2) Data, 3) Container, 4) Source and 5) Processor. The invention decomposes an application into a cooperative collection of distributed and networked Behaviors, which are subsequently executed by Containers. A designer using this invention implements a Behavior for each logical stage of execution, which represents the core service-processing logic for that stage.02-05-2009
20090070563CONCURRENT PHYSICAL PROCESSOR REASSIGNMENT - Reassignment of a physical processor backing a logical processor is performed concurrently to the operation of the processor. The operating state of one physical processor is loaded on another physical processor, such that the logical processor is backed by a different physical processor. This reassignment is performed concurrent to processor operation and transparent to the operating system.03-12-2009
20130013899Using Hardware Transaction Primitives for Implementing Non-Transactional Escape Actions Inside Transactions - Mechanisms are provided for performing escape actions within transactions. These mechanisms execute a transaction comprising a transactional section and an escape action. The transactional section is comprised of one or more instructions that are to be executed in an atomic manner as part of the transaction. The escape action is comprised of one or more instructions to be executed in a non-transactional manner. These mechanisms further populate at least one actions list data structure, associated with a thread of the data processing system that is executing the transaction, with one or more actions associated with the escape action. Moreover, these mechanisms execute one or more actions in the actions list data structure based upon whether the transaction commits successfully or is aborted.01-10-2013
20130013900MULTI-THREAD PROCESSOR AND ITS HARDWARE THREAD SCHEDULING METHOD - A multi-thread processor includes a plurality of hardware threads each of which generates an independent instruction flow, a first thread scheduler that outputs a first thread selection signal, the first thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads according to a priority rank, the priority rank being established in advance for each of the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the first thread selection signal and outputs an instruction generated by the selected hardware thread, and an execution pipeline that executes an instruction output from the first selector. Whenever the hardware thread is executed in the execution pipeline, the first scheduler updates the priority rank for the executed hardware thread and outputs the first thread selection signal in accordance with the updated priority rank.01-10-2013
20130138927DATA PROCESSING APPARATUS ADDRESS RANGE DEPENDENT PARALLELIZATION OF INSTRUCTIONS - A data processing apparatus has an instruction memory system arranged to output an instruction word addressed by an instruction address. An instruction execution unit, processes a plurality of instructions from the instruction word in parallel. A detection unit, detects in which of a plurality of ranges the instruction address lies. The detection unit is coupled to the instruction execution unit and/or the instruction memory system, to control a way in which the instruction execution unit parallelizes processing of the instructions from the instruction word, dependent on a detected range. In an embodiment the instruction execution unit and/or the instruction memory system adjusts a width of the instruction word that determines a number of instructions from the instruction word that is processed in parallel, dependent on the detected range.05-30-2013
20090240921METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR SUPPORTING PARTIAL RECYCLE IN A PIPELINED MICROPROCESSOR - A computer processing system is provided. The computer processing system includes a first datastore that stores a subset of information associated with an instruction. A first stage of a processor pipeline writes the subset of information to the first datastore based on an execution of an operation associated with the instruction. A second stage of the pipeline initiates reprocessing of the operation associated with the instruction based on the subset of information stored in the first datastore.09-24-2009
20100241833INFORMATION PROCESSING APPARATUS - Disclosed is an information processing apparatus in which various kinds of information are processed in either the real time processing mode or the non-real time processing mode. The apparatus includes an operation display section to accept an inputted instruction, an image processing section to apply a processing to image information and a processor provided with a plurality of same cores. The real-time processing unnecessary process that is related to the operation display section, is fixed onto one of the plurality of same cores so that the one of the plurality of same cores is in charge of controlling the real-time processing unnecessary process, while, the real-time processing necessary process that is related to the image processing section, is fixed onto another one of the plurality of same cores so that the other one of the plurality of same cores is in charge of controlling the real-time processing necessary process.09-23-2010
20110010529INSTRUCTION EXECUTION CONTROL METHOD, INSTRUCTION FORMAT, AND PROCESSOR - With conventional ordered data reference instructions, an instruction which is to be the subject of an execution order guarantee cannot be separately specified, and a resource which is to be the subject of an execution order guarantee likewise cannot be specified and thus instruction movement is restricted more than necessary in the out-of-order execution of instructions and so on and performance deterioration becomes significant particularly in the case of performing data transfer to a resource having high access latency. Consequently, the field of an ordered data reference instruction judged to include a predetermined field is decoded so as to identify a subject instruction which is specified by the ordered data reference instruction and is the subject of execution order guarantee, and guarantee the execution order of the subject instruction with respect to the execution of the identified ordered data reference instruction.01-13-2011
20100199075MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD - A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.08-05-2010
20090249036EFFICIENT METHOD AND APPARATUS FOR EMPLOYING A MICRO-OP CACHE IN A PROCESSOR - Methods and apparatus for using micro-op caches in processors are disclosed. A tag match for an instruction pointer retrieves a set of micro-op cache line access tuples having matching tags. The set is stored in a match queue. Line access tuples from the match queue are used to access cache lines in a micro-op cache data array to supply a micro-op queue. On a micro-op cache miss, a macroinstruction translation engine (MITE) decodes macroinstructions to supply the micro-op queue. Instruction pointers are stored in a miss queue for fetching macroinstructions from the MITE. The MITE may be disabled to conserve power when the miss queue is empty-likewise for the micro-op cache data array when the match queue is empty. Synchronization flags in the last micro-op from the micro-op cache on a subsequent micro-op cache miss indicate where micro-ops from the MITE merge with micro-ops from the micro-op cache.10-01-2009
20110246751INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.10-06-2011
20100332807PERFORMING ESCAPE ACTIONS IN TRANSACTIONS - Performing non-transactional escape actions within a hardware based transactional memory system. A method includes at a hardware thread on a processor beginning a hardware based transaction for the thread. Without committing or aborting the transaction, the method further includes suspending the hardware based transaction and performing one or more operations for the thread, non-transactionally and not affected by: transaction monitoring and buffering for the transaction, an abort for the transaction, or a commit for the transaction. After performing one or more operations for the thread, non-transactionally, the method further includes resuming the transaction and performing additional operations transactionally. After performing the additional operations, the method further includes either committing or aborting the transaction.12-30-2010
20110087863DATA PROCESSING APPARATUS HAVING A PARALLEL PROCESSING CIRCUIT INCLUDING A PLURALITY OF PROCESSING MODULES, AND METHOD FOR CONTROLLING THE SAME - In an apparatus which includes a plurality of processing modules connected via a ring-shape bus, if a plurality pieces of pipeline processing to be processed in a different order is allocated to a plurality of processing modules, the transfer efficiency may decrease when an amount of data transferred from one of the processing modules to a post-stage module exceeds a processing capacity of the post-stage module. Accordingly, a module positioned on the preceding side in the pipeline processing controls a transmission interval of processed data so that the post-stage module can receive the data processed by the preceding module.04-14-2011
20090313457System and Method for Extracting Fields from Packets Having Fields Spread Over More Than One Register - Systems and methods that allow for extracting a field from data stored in a pair of registers using two instructions. A first instruction extracts any part of the field from a first register designated as a first source register, and executes a second instruction extracting any part of the field from a second general register designated as a second source register. The second instruction inserts any extracted field parts in a result register.12-17-2009
20100131743LAZY AND STATELESS EVENTS - Event-based processing is employed in conjunction with lazy and stateless events. Addition of any handlers is deferred until a user-specified handler is identified. Furthermore, event handlers can be composed at this time including the same properties as underlying events. More specifically, handlers specified on composite events can be composed and propagated up to a one or more related source events. As a result, handlers are not accumulated on composite events thereby making them stateless while allowing equivalent functionality upon invocation of the composed top-level handler.05-27-2010
20100131742OUT-OF-ORDER EXECUTION MICROPROCESSOR THAT SELECTIVELY INITIATES INSTRUCTION RETIREMENT EARLY - A microprocessor for improving out-of-order superscalar execution unit utilization with a relatively small in-order instruction retirement buffer. A plurality of execution units each calculate an instruction result. The instruction is either an excepting type instruction or a non-excepting type instruction. The excepting type instruction is capable of causing the microprocessor to take an exception after being issued to the execution unit, wherein the non-excepting type instruction is incapable of causing the microprocessor to take an exception after being issued. A retire unit makes a determination that an instruction is the oldest instruction in the microprocessor and that the instruction is ready to update the architectural state of the microprocessor with its result. The retire unit makes the determination before the execution unit outputs the result of the non-excepting type instruction, wherein the retire unit makes the determination after the execution unit outputs the result of the excepting type instruction.05-27-2010
20110087864PROVIDING PIPELINE STATE THROUGH CONSTANT BUFFERS - One embodiment of the present invention sets forth a technique for providing state information to one or more shader engines within a processing pipeline. State information received from an application accessing the processing pipeline is stored in constant buffer memory accessible to each of the shader engines. The shader engines can then retrieve the state information during execution.04-14-2011
20090031116THREE OPERAND INSTRUCTION EXTENSION FOR X86 ARCHITECTURE - A method and apparatus are contemplated for increasing the number of available instructions in an instruction set architecture. The new instructions extend the number of general-purpose registers and include three or more operands. A combination of an escape code field, an opcode field, an operation configuration field and an operation size field determines a unique new instruction operation. A source operand extension field includes bits to be combined with other fields in order to extend the number of source operand values for general-purpose registers.01-29-2009
20090198970METHOD AND STRUCTURE FOR ASYNCHRONOUS SKIP-AHEAD IN SYNCHRONOUS PIPELINES - An electronic apparatus includes a plurality of stages serially interconnected as a pipeline to perform sequential processings on input operands. A shortening circuit associated with at least one stage of the pipeline recognizes when one or more of input operands for the stage has been predetermined as appropriate for shortening and execute the shortening when appropriate.08-06-2009
20100037038Dynamic Core Pool Management - Embodiments that dynamically manage core pools are disclosed. Various embodiments involve measuring the amount of a computational load on a computing device. One way of measuring the load may consist of executing a number of instructions, in a unit of time, with numerous cores of the computing device. These embodiments may compare the number of instructions executed with specific thresholds. Depending on whether the number of instructions is higher or lower than the thresholds, the computing devices may respond by activating and deactivating cores of the computing devices. By limiting execution of instructions of the computing device to a smaller number of cores and switching one or more cores to a lower power state, the devices may conserve power.02-11-2010
20090217007DISCOVERY OF A VIRTUAL TOPOLOGY IN A MULTI-TASKING MULTI-PROCESSOR ENVIRONMENT - A computer program product, apparatus and method for identifying processors in a multi-tasking multiprocessor network, the computer program product including a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method including storing a service record for a port to which an LID has been assigned, retrieving service records for nodes to which channel paths may connect, retrieving path records that provide address destinations for the nodes identified in the service records, initiating channel initialization for the channel paths defined for the port and removing the service record for the port.08-27-2009
20110252222EVENT COUNTER IN A SYSTEM ADAPTED TO THE JAVACARD LANGUAGE - The implementation of a counter in a microcontroller adapted to the JavaCard language while respecting the atomicity of a modification of the value of this counter, wherein the counter is reset by the sending to the microcontroller of an instruction to verify a user code by submitting a correct code, and the value of the counter is decremented by the sending to the microcontroller of the instruction to verify the user code with an erroneous code value.10-13-2011
20110072246NODE CONTROL DEVICE INTERPOSED BETWEEN PROCESSOR NODE AND IO NODE IN INFORMATION PROCESSING SYSTEM - A node control device is interposed between processor nodes and IO nodes in an information processing system, wherein each IO node subordinates at least one IO device. The node control device includes a register storing a base address of a mapping destination of an IO space, a table describing a plurality of entries retaining a plurality of IO space numbers and address ranges, and an IO space access detection circuit. The table stores an identification flag as to whether or not IO spaces are each mapped onto a memory space. The IO space access detection circuit decodes a command code and an address of an FRTT signal output from a processor node, thus detecting a target IO space and detecting whether the processor node is accessing an IO space mapped onto the memory space or another IO space.03-24-2011
20090019266INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING SYSTEM - With respect to memory access instructions contained in an internal representation program, an information processing apparatus generates a load cache instruction, a cache hit judgment instruction, and a cache miss instruction that is executed in correspondence with a result of a judgment process performed according to the cache hit judgment instruction. In a case where the internal representation program contains a plurality of memory access instructions having a possibility of causing accesses to mutually the same cache line in a cache memory, the information processing apparatus generates a combine instruction instructing that judgment results of the judgment processes that are performed according to the cache hit judgment instruction should be combined into one judgment result. The information processing apparatus outputs an output program that contains these instructions that have been generated.01-15-2009
20110078419SET PROGRAM PARAMETER INSTRUCTION - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.03-31-2011
20110078418Support for Non-Local Returns in Parallel Thread SIMD Engine - One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.03-31-2011
20110072245HARDWARE FOR PARALLEL COMMAND LIST GENERATION - A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.03-24-2011
20110060891PARALLEL PIPELINED VECTOR REDUCTION IN A DATA PROCESSING SYSTEM - A parallel processing data processing system builds at least one data structure indicating a communication schedule for a plurality of processes each having a respective one of a plurality of equal length vectors formed of multiple equal size chunks. The data processing system, based upon the at least one data structure, communicates chunks of the plurality of vectors among the plurality of processes and performs partial reduction operations on chunks in accordance with the communication schedule. The data processing system then stores a result vector representing reduction of the plurality of vectors.03-10-2011
20110016293DEVICE AND METHOD FOR THE DISTRIBUTED EXECUTION OF DIGITAL DATA PROCESSING OPERATIONS - This device (01-20-2011
20120151189DATA PROCESSING WITH VARIABLE OPERAND SIZE - A method of processing data comprising performing a sequence of operation instructions with variable operand size, wherein respective size codes for different source and destination operands are obtained and registered separately from performing the sequence of operation instructions, and the sequence of operation instructions is performed using operand sizes defined by the registered size codes, the operation instructions of the sequence not themselves containing size codes.06-14-2012
20100088492SYSTEMS AND METHODS FOR IMPLEMENTING BEST-EFFORT PARALLEL COMPUTING FRAMEWORKS - Implementations of the present principles include Best-effort computing systems and methods. In accordance with various exemplary aspects of the present principles, a application computation requests directed to a processing platform may be intercepted and classified as either guaranteed computations or best-effort computations. Best-effort computations may be dropped to improve processing performance while minimally affecting the end result of application computations. In addition, interdependencies between best-effort computations may be relaxed to improve parallelism and processing speed while maintaining accuracy of computation results.04-08-2010
20090327663Power Aware Retirement - In one embodiment, the present invention includes a retirement unit to receive and retire executed instructions. The retirement unit may include a first array to receive information at allocation and a second array to receive information after execution. The retirement unit may further include logic to calculate an event associated with an executed instruction if information associated with the executed instruction is stored in an on-demand portion of at least one of arrays. Other embodiments are described and claimed.12-31-2009
20090240922METHOD, SYSTEM, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR IMPLEMENTING RESULT FORWARDING BETWEEN DIFFERENTLY SIZED OPERANDS IN A SUPERSCALAR PROCESSOR - Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.09-24-2009
20090240923Computing Device with Entry Authentication into Trusted Execution Environment and Method Therefor - A computing device (09-24-2009
20110153992METHODS AND APPARATUS TO MANAGE OBJECT LOCKS - Example methods and apparatus to manage object locks are disclosed. A disclosed example method includes receiving an object lock request from a processor, the lock request associated with object lock code to lock an object, and generating object lock-bypass code based on a type of the processor, the object lock-bypass code to execute in a managed runtime in response to receiving the object lock request. The example method also includes identifying a type of instruction set architecture (ISA) associated with the processor, invoking a checkpoint instruction for the processor based on the identified ISA, suspending the object lock code from executing and executing target code when the object is uncontended, and allowing the object lock code to execute when the object is contended.06-23-2011
20080320283Programmable Data Processor for a Variable Length Encoder/Decoder - A data processing circuit has a programmable processor (1212-25-2008
20080229079APPARATUS, SYSTEM, AND METHOD FOR MANAGING COMMANDS OF SOLID-STATE STORAGE USING BANK INTERLEAVE - An apparatus, system, and method are disclosed for efficiently managing commands in a solid-state storage device that includes a solid-state storage arranged in two or more banks. Each bank is separately accessible and includes two or more solid-state storage elements accessed in parallel by a storage input/output bus. The solid-state storage includes solid-state, non-volatile memory. The solid-state storage device includes a bank interleave that directs one or more commands to two or more queues, where the one or more commands are separated by command type into the queues. Each bank includes a set of queues in the bank interleave controller. Each set of queues includes a queue for each command type. The bank interleave controller coordinates among the banks execution of the commands stored in the queues, where a command of a first type executes on one bank while a command of a second type executes on a second bank.09-18-2008
20110258416STATICALLY SPECULATIVE COMPILATION AND EXECUTION - A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.10-20-2011
20110258417POWER AND THROUGHPUT OPTIMIZATION OF AN UNBALANCED INSTRUCTION PIPELINE - A method includes determining a rate of resource occupancy of a constituent stage of an unbalanced instruction pipeline implemented in a processor through profiling an instruction code. The method also includes performing data processing at a maximum throughput at an optimum clock frequency based on the rate of resource occupancy.10-20-2011
20110161637APPARATUS AND METHOD FOR PARALLEL PROCESSING - An apparatus and method for parallel processing in consideration of degree of parallelism are provided. One of a task parallelism and a data parallelism is dynamically selected while a job is processed. In response to a task parallelism being selected, a sequential version code is allocated to a core or processor for processing a job. In response to a data parallelism being selected, a parallel version code is allocated to a core a processor for processing a job.06-30-2011
20110161636METHOD OF MANAGING POWER OF MULTI-CORE PROCESSOR, RECORDING MEDIUM STORING PROGRAM FOR PERFORMING THE SAME, AND MULTI-CORE PROCESSOR SYSTEM - Provided are a method of managing power of a multi-core processor, a recording medium storing a program for performing the method, and a multi-core processor system. The method of managing power of a multi-core processor having at least one core includes determining a parallel-processing section on the basis of information included in a parallel-processing program, collecting information for determining a clock frequency of the core in the determined parallel-processing section according to each core, and then determining the clock frequency of the core on the basis of the collected information. Accordingly, it is possible to minimize power consumption while ensuring quality of service (QoS).06-30-2011
20110161635Rotate instructions that complete execution without reading carry flag - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.06-30-2011
20110047357Methods and Apparatus to Predict Non-Execution of Conditional Non-branching Instructions - Efficient techniques are described for not executing an issued conditional non-branch instruction. A conditional non-branch instruction is identified as being eligible for a prediction, the prediction indicating that the eligible conditional non-branch (ECNB) instruction would not execute. The ECNB instruction executes as a no operation (NOP) instruction in response to the prediction that the ECNB instruction would not execute. A source operand required for the ECNB instruction to execute is not fetched in response to the prediction to not execute.02-24-2011
20080256341Data Processing Pipeline Selection - Strategies for automatically selecting the most appropriate processing pipeline (or runtime) for a particular data item are described. In one embodiment, a media playing application automatically selects the most appropriate media processing pipeline for a media data item from multiple available processing pipelines, or candidates. In this regard, the application makes this selection by utilizing heuristic techniques to identify which available pipeline provides the most enhanced playback experience to a user with respect to certain attributes such as supported playback features and security. These heuristic techniques can take one or more criteria into account and can be implemented in any suitable way. By way of example and not limitation, in one embodiment, a selection process is used wherein potential pipeline candidates are ordered and sequentially evaluated.10-16-2008
20080256342SCALABLE AND CONFIGURABLE EXECUTION PIPELINE - Optimizing pipeline handler execution. A method may be practiced in a computing environment including an execution pipeline. The method includes acts to optimize execution of handlers in the pipeline. The method includes receiving a payload object. Policy information about the payload object is referenced. The policy information includes at least one property value. Based on the policy information about the payload object, handlers are selected from among the pipeline to execute on the payload object. The policy information may be referenced by strategies. Handlers may be registered with the strategies to facilitate the strategies being used to select handlers.10-16-2008
20080250231PROGRAM CODE CONVERSION APPARATUS, PROGRAM CODE CONVERSION METHOD AND RECORDING MEDIUM - A program conversion apparatus includes: a code analyzing section configured to analyze an A binary code executable in an A processor in order to convert the A binary code into a program code for a B processor; a instruction function extracting section configured to extract a predetermined instruction function for the B processor which corresponds to a predetermined instruction for the A processor obtained by the analysis performed by the code analyzing section; and a translator section configured to generate a source code for the B processor from the A binary code, by rewriting the predetermined instruction for the A processor to the predetermined instruction function extracted by the instruction function extracting section.10-09-2008
20080215856METHODS FOR GENERATING CODE FOR AN ARCHITECTURE ENCODING AN EXTENDED REGISTER SPECIFICATION - There are provided methods and computer program products for generating code for an architecture encoding an extended register specification. A method for generating code for a fixed-width instruction set includes identifying a non-contiguous register specifier. The method further includes generating a fixed-width instruction word that includes the non-contiguous register specifier.09-04-2008
20080201558PROCESSOR SYSTEM - A processor system according to an aspect of the present invention has a pipeline. The pipeline includes a cache memory, an instruction fetch buffer which stores commands, an execution module which requests data access to the cache memory, a tag memory which outputs information related to the data access of the execution module, and an arbitration circuit which arbitrates access to the cache memory based on entry information of the instruction fetch buffer and the information related to the data access from the tag memory.08-21-2008
20080201557Security Message Authentication Instruction - A method, system and computer program product for computing a message authentication code for data in storage of a computing environment. An instruction specifies a unit of storage for which an authentication code is to be computed. An computing operation computes an authentication code for the unit of storage. A register is used for providing a cryptographic key for use in the computing to the authentication code. Further, the register may be used in a chaining operation.08-21-2008
20110179257IMAGE FORMING DEVICE, IMAGE FORMING METHOD AND COMPUTER READABLE MEDIUM - A data processing device including a reception unit, an instruction unit and a storage unit. The reception unit receives instructions for processing at a processing execution device. The instruction unit instructs the processing execution device to cancel a power saving state of the processing execution device and execute the processing corresponding to an instruction received by the reception unit. The storage unit stores data relating to received instructions. If the processing corresponding to the received instruction is a pre-specified process, data relating to the instruction is stored by the storage unit. If the processing corresponding to the received instruction is not a pre-specified process, the instruction unit instructs the processing execution device to execute both the processing corresponding to this instruction and processing based on data relating to instructions stored in the storage unit.07-21-2011
20110119470GENERATION-BASED MEMORY SYNCHRONIZATION IN A MULTIPROCESSOR SYSTEM WITH WEAKLY CONSISTENT MEMORY ACCESSES - In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.05-19-2011
20100262810CONCURRENT INSTRUCTION OPERATION METHOD AND DEVICE - A concurrent instruction operation method and device are provided. The method includes: establishing a concurrent queue, and setting a queue base address and a queue maximum length of the concurrent queue; generating concurrent operation instructions according to a length of data that needs to be written or read as well as the queue base address and queue maximum length of the concurrent queue; and executing the concurrent operation instructions in the concurrent queue, and completing a data operation to the concurrent queue.10-14-2010
20110093685DATA PROCESSING CIRCUIT - A data processing circuit is disclosed in the present invention. The data processing circuit includes a decoder and a number of N-stage circuits. The circuits receive input data from at least a memory and separate the input data into N stages. The circuit process and store the N input data simultaneously to decrease the time of data processing in the data processing circuit.04-21-2011
20120151188TYPE AND LENGTH ABSTRACTION FOR DATA TYPES - Embodiments are directed to implementing a generic SIMD data type in software code. In an embodiment, a computer system accesses a portion of software code that includes an algorithm with a generic SIMD data type that includes a variable number of elements. The algorithm with the generic SIMD data type is to be processed by a specific processor that includes various specific hardware features. The computer system determines at runtime a portion of customized processor-specific code that is to be used with the specified processor based on the generic SIMD data type, wherein the runtime determination resolves the number of elements that are to be used with the specified processor. The computer system also processes the software code including the algorithm with the generic SIMD data type using the determined, customized processor-specific code.06-14-2012
20120151190DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM - A data processing apparatus includes an output unit. The output unit determines, when parallel control is performed in a data processor created in the data processing apparatus so that plural processing modules forming the data processor perform data processing in parallel, on the basis of a value representing a parallel-processing time for which at least two processing modules are operated in parallel and a value representing a control time, which is not necessary when serial control is performed so that the processing modules serially perform data processing but which is necessary when the parallel control is performed so that the processing modules perform data processing in parallel, whether a time necessary to complete data processing performed by the data processor under the parallel control would be shorter than a time necessary to complete data processing performed by the data processor under the serial control, and outputs a determination result.06-14-2012
20110138154Optimization of a Computing Environment in which Data Management Operations are Performed - Described are embodiments of an invention for optimizing a computing environment that performs data management operations such as encryption, deduplication and compression. The computing environment includes data components and a management system. The data components operate on data during the lifecycle of the data. The management system identifies all the data components in a data path, how the data components are interconnected, the data management operations performed at each data component, and how many data management operations of each type are performed at each data component. Further, the management system builds a data structure to represent the flow of data through the data path and analyzes the data structure in view of policy. After the analysis, the management system provides recommendations to optimize the computing environment through the reconfiguration of the data management operation configuration and reconfigures the data management operation configuration to optimize the computing environment.06-09-2011
20090300334Method and Apparatus for Loading Data and Instructions Into a Computer - A computer array (12-03-2009
20090292903MICROPROCESSOR PROVIDING ISOLATED TIMERS AND COUNTERS FOR EXECUTION OF SECURE CODE - An apparatus providing for a secure execution environment is presented. The apparatus includes a microprocessor and a secure non-volatile memory. The a microprocessor is configured to execute non-secure application programs and a secure application program, where the non-secure application programs are accessed from a system memory via a system bus. The microprocessor has a plurality of timers which are visible and accessible only by the secure application program when executing in a secure execution mode. The secure non-volatile memory is coupled to the microprocessor via a private bus and is configured to store the secure application program. Transactions over the private bus between the microprocessor and the secure non-volatile memory are isolated from the system bus and corresponding system bus resources within the microprocessor.11-26-2009
20100031008PARALLEL SORTING APPARATUS, METHOD, AND PROGRAM - A parallel sorting apparatus is provided whose sorting processing is speeded up. A reference value calculation section calculates a plurality of reference values serving as boundaries of intervals used for allocating input data depending on the magnitude of a value. An input data aggregation section partitions the input data into a plurality of input data regions, and calculates, by parallel processing, mapping information used for allocating data in each of the partitioned input data regions to the plurality of intervals that have boundaries on the reference values calculated by the reference value calculation section. A data allocation section allocates, by parallel processing, data in each of the input data regions to the plurality of intervals in accordance with the mapping information calculated by the input data aggregation section. An interval sorting section individually sorts, by parallel processing, data in the plurality of intervals allocated by the data allocation section.02-04-2010
20100031007METHOD TO ACCELERATE NULL-TERMINATED STRING OPERATIONS - A method reads and compares first and second register values, each with a size of at least two bytes. A third register indicates a match if: (1) a byte in the first register value is equal to (or, alternatively, not equal to) a corresponding byte in the second register value, or (2) if a byte in the first register value is zero. Next, a fourth register value is set to one of the following: (1) a count of the matching byte, if the corresponding bytes in the first and second register values are equal (or, alternatively, are not equal), or (2) a number outside of a range between 0 and n−1, if the corresponding bytes in the first and second register values are not equal (or, alternatively, are equal). The value, n, is an integer equal to the number of bytes in the first and second register values.02-04-2010
20090172362PROCESSING PIPELINE HAVING STAGE-SPECIFIC THREAD SELECTION AND METHOD THEREOF - One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.07-02-2009
20090172361COMPLETION CONTINUE ON THREAD SWITCH MECHANISM FOR A MICROPROCESSOR - A thread switch mechanism and technique for a microprocessor is disclosed wherein a processing of a first thread is completed, and a continuation of a second thread is initiated during completion of the first thread. In one form, the technique includes processing a first thread at a pipeline of a processing device, and initiating processing of a second thread at a front end of the pipeline in response to an occurrence of a context switch event. The technique can also include initiating a instruction progress metric in response the context switch event. The technique can further include enabling completion of processing of instructions of the first thread that are at a back end of the pipeline at the occurrence of the context switch event until an expiry of the instruction progress metric.07-02-2009
20120047353System and Method Providing Run-Time Parallelization of Computer Software Accommodating Data Dependencies - A system and method of parallelizing programs employs runtime instructions to identify data accessed by program portions and to assign those program portions to particular processors based on potential overlap between the access data. Data dependence between different program portions may be identified and used to look for pending “predicate” program portions that could create data dependencies and to postpone program portions that may be dependent while permitting parallel execution of other program portions.02-23-2012
20090138684H.264 CAVLC DECODING METHOD BASED ON APPLICATION-SPECIFIC INSTRUCTION-SET PROCESSOR - Provided is an H.264 Context Adaptive Variable Length Coding (CAVLC) decoding method based on an Application-Specific Instruction-set Processor (ASIP). The H.264 CAVLC decoding method includes determining a plurality of comparison bit strings on the basis of a table of a decoding coefficient, storing lengths of the comparison bit strings in a first register, storing code values of the comparison bit strings in a second register, comparing an input bit stream with the comparison bit strings based on the lengths and code values of the comparison bit strings, and determining value of the decoding coefficient according to a result of comparison between the input bit stream and the comparison bit strings. The method extracts a decoding coefficient using a register in an ASIP without accessing a memory and prevents a reduction in speed caused by memory access, thereby increasing the decoding speed of an H.264 decoder.05-28-2009
20110167245Task list generation, parallelism templates, and memory management for multi-core systems - There is provided a multi-core system that provides automated task list generation, parallelism templates, and memory management. By constructing, profiling, and analyzing a sequential list of functions to be executed in a parallel fashion, corresponding parallel execution templates may be stored for future lookup in a database. A processor may then select a subset of functions from the sequential list of functions based on input data, select a template from the template database based on particular matching criteria such as high-level task parameters, finalize the template by resolving pointers and adding or removing transaction control blocks, and forward the resulting optimized task list to a scheduler for distribution to multiple slave processing cores. The processor may also analyze data dependencies between tasks to consolidate tasks working on the same data to a single core, thereby implementing memory management and efficient memory locality.07-07-2011
20120011347PARALLEL PROGRAMMING INTERFACE TO DYNAMICALY ALLOCATE PROGRAM PORTIONS - A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.01-12-2012
20120060018Collective Operations in a File System Based Execution Model - A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.03-08-2012
20110066828MAPPING OF COMPUTER THREADS ONTO HETEROGENEOUS RESOURCES - Techniques are generally described for mapping a thread onto heterogeneous processor cores. Example techniques may include associating the thread with one or more predefined execution characteristic(s), assigning the thread to one or more heterogeneous processor core(s) based on the one or more predefined execution characteristic(s), and/or executing the thread by the respective assigned heterogeneous processor core(s).03-17-2011
20110107066CASCADED ACCELERATOR FUNCTIONS - Accelerator functions are cascaded, such that a result of one accelerator function is directly forwarded to another accelerator function, bypassing the processor requesting the functions to be performed. The cascading may be provided during compilation of a program specifying the functions to be performed, but can be dynamically reversed during runtime of the program.05-05-2011
20120124341Methods and Apparatus for Performing Multiple Operand Logical Operations in a Single Instruction - A method for performing multiple-operand logical operations in a single instruction includes the steps of: generating a table defining a correspondence between a plurality of input variables to a multiple-operand logical operation and a plurality of output results of the multiple-operand logical operation; encoding the table to generate a set of values for use by the single instruction, each value being indicative of an output result of the multiple-operand logical operation as a function of a corresponding unique combination of values of the input variables; and at least one processor performing the multiple-operand logical operation in a single instruction as a function of the set of values for a prescribed combination of values of the input variables.05-17-2012
20090132792Method of generating internode timing diagrams for a multiprocessor array - The apparatus used includes a multi core computer processor 05-21-2009
20120166772EXTENSIBLE DATA PARALLEL SEMANTICS - A high level programming language provides extensible data parallel semantics. User code specifies hardware and software resources for executing data parallel code using a compute device object and a resource view object. The user code uses the objects and semantic metadata to allow execution by new and/or updated types of compute nodes and new and/or updated types of runtime libraries. The extensible data parallel semantics allow the user code to be executed by the new and/or updated types of compute nodes and runtime libraries.06-28-2012
20120131315DATA PROCESSING APPARATUS - A data processing apparatus may include a processing unit that performs processing related to data, a first register that holds a value for defining an operation of the processing unit, a second register that holds a value output from the first register, the second register outputting the value to the processing unit, a first control unit that performs control for writing a value in the first register, a second control unit that performs control for rewriting the value held by the second register with the value output from the first register, after the value is written in the first register, and a third control unit that performs control for rewriting the value held by the second register with an invalid value, at which the processing of the processing unit is stopped, during a period for which the value is written in the first register.05-24-2012
20120131314Ganged Hardware Counters for Coordinated Rollover and Reset Operations - Mechanisms for controlling rollover or reset of hardware performance counters in the data processing system. A signal indicating that a rollover or reset of a first hardware performance counter has occurred is received and it is determined if the first hardware performance counter is analytically related to one or more second hardware performance counters based on defined ganged hardware performance counter sets. A signal is sent to each of the one or more second hardware performance counters in response to a determination that the first hardware performance counter is analytically related to the one or more second hardware performance counters. Each of the one or more second hardware performance counters is reset to an initial value in response to the one or more second hardware performance counters receiving the signal from the ganged hardware performance counter rollover/reset logic.05-24-2012
20100205410Data Processing - Apparatus for data processing includes a processor, memory and storage. A plurality of sets of instructions, each corresponding to one of a plurality of programs, is stored in the storage. The processor is configured to load the sets of instructions from the storage into the memory, identify a first program as nonessential, close the first program and remove its corresponding set of instructions from the memory, and reload the set of instructions corresponding to the first program into the memory from the storage.08-12-2010
20120137110HARDWARE DEVICE FOR PROCESSING THE TASKS OF AN ALGORITHM IN PARALLEL - A hardware device for concurrently processing a fixed set of predetermined tasks associated with an algorithm which includes a number of processes, some of the processes being dependent on binary decisions, includes a plurality of task units for processing data, making decisions and/or processing data and making decisions, including source task units and destination task units. A task interconnection logic means interconnect the task units for communicating actions from a source task unit to a destination task unit. Each of the task units includes a processor for executing only a particular single task of the fixed set of predetermined tasks associated with the algorithm in response to a received request action, and a status manager for handling the actions from the source task units and building the actions to be sent to the destination task units.05-31-2012
20100174890Known Good Code for On-Chip Device Management - In one embodiment, a processor comprises a programmable map and a circuit. The programmable map is configured to store data that identifies at least one instruction for which an architectural modification of an instruction set architecture implemented by the processor has been defined, wherein the processor does not implement the modification. The circuitry is configured to detect the instruction or its memory operands and cause a transition to Known Good Code (KGC), wherein the KGC is protected from unauthorized modification and is provided from an authenticated entity. The KGC comprises code that, when executed, emulates the modification. In another embodiment, an integrated circuit comprises at least one processor core; at least one other circuit; and a KGC source configured to supply KGC to the processor core for execution. The KGC comprises interface code for the other circuit whereby an application executing on the processor core interfaces to the other circuit through the KGC.07-08-2010
20120173853Processing apparatus and method for performing computation - A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.07-05-2012
20120233445Multi-Thread Processors and Methods for Instruction Execution and Synchronization Therein and Computer Program Products Thereof - Methods for instruction execution and synchronization in a multi-thread processor are provided, wherein in the multi-thread processor, multiple threads are running and each of the threads can simultaneously execute a same instruction sequence. A source code or an object code is received and then compiled to generate the instruction sequence. Instructions for all of function calls within the instruction sequence are sorted according to a calling order. Each thread is provided a counter value pointing to one of the instructions in the instruction sequence. A main counter value is determined according to the counter values of the threads such that all of the threads simultaneously execute an instruction of the instruction sequence that the main counter value points to.09-13-2012
20100023732OPTIMIZING NON-PREEMPTIBLE READ-COPY UPDATE FOR LOW-POWER USAGE BY AVOIDING UNNECESSARY WAKEUPS - A technique for low-power detection of a grace period following a shared data element update operation that affects non-preemptible data readers. A grace period processing action is implemented that requires a processor that may be running a non-preemptible reader of the shared data element to pass through a quiescent state before further grace period processing can proceed. A power status of the processor is also determined. Further grace period processing may proceed without requiring the processor to pass through a quiescent state if the power status indicates that quiescent state processing by the processor is unnecessary.01-28-2010
20100011192SIMPLIFYING COMPLEX DATA STREAM PROBLEMS INVOLVING FEATURE EXTRACTION FROM NOISY DATA - Methods, systems and computer program products for simplifying complex data stream problems involving feature extraction from noisy data. Exemplary embodiments include a method for processing a data stream, including applying multiple operators to the data stream, wherein an operation by each of the multiple operators includes retrieving the next chunk for each of set of input parameters, performing digital processing operations on a respective next chunk, producing sets of output parameters and adding data to one or more internal data stores, each internal data store acting as a data stream source.01-14-2010
20120185678HARDWARE THREAD DISABLE WITH STATUS INDICATING SAFE SHARED RESOURCE CONDITION - A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.07-19-2012
20120185677METHODS AND SYSTEMS FOR STORAGE OF BINARY INFORMATION THAT IS USABLE IN A MIXED COMPUTING ENVIRONMENT - A method of managing binary data across a mixed computing environment is provided. The method includes performing on one or more processors: receiving binary data; receiving binary coded data indicating a type of the binary data; formatting the binary data and the binary coded data according to a first format; and generating at least one of a message and a file based on the formatted data.07-19-2012
20080301413Method of and apparatus and architecture for real time signal processing by switch-controlled programmable processor configuring and flexible pipeline and parallel processing - A new signal processor technique and apparatus combining microprocessor technology with switch fabric telecommunication technology to achieve a programmable processor architecture wherein the processor and the connections among its functional blocks are configured by software for each specific application by communication through a switch fabric in a dynamic, parallel and flexible fashion to achieve a reconfigurable pipeline, wherein the length of the pipeline stages and the order of the stages varies from time to time and from application to application, admirably handling the explosion of varieties of diverse signal processing needs in single devices such as handsets, set-top boxes and the like with unprecedented performance, cost and power savings, and with full application flexibility.12-04-2008
20090089554METHOD FOR TUNING CHIPSET PARAMETERS TO ACHIEVE OPTIMAL PERFORMANCE UNDER VARYING WORKLOAD TYPES - A method, system, and computer program product for tuning a set of chipset parameters to achieve optimal chipset performance under varying workload characteristics. A set of workload characteristics of a current workload type is determined. An instruction stream is generated using weighted parameters derived from the set of workload characteristics of the current workload type. A set of chipset parameters is generated and integrated within the instruction stream. The instruction stream is loaded to one or more processors and executed to collect and analyze performance data relating to the chipset's performance. The analysis includes comparing the set of performance data of a plurality of different instruction streams having the same set of workload characteristics. Each executed instruction stream is executed with at least one different combination of chipset parameters. A determination is made regarding which combination of chipset parameters provides the best performance data for the current workload.04-02-2009
20120265970SYSTEM AND METHOD OF INDIRECT REGISTER ACCESS - Systems and methods are provided for managing access to registers. A system may include a set of direct registers and a set of indirect registers. The indirect registers may be accessed through the direct registers, and the direct registers may provide various features to provide faster access to the indirect registers. One of the direct registers may indicate access modes for accessing the indirect registers. The access modes may include auto-increment, auto-decrement, auto-reset, and no change modes. Based on the access mode, the currently accessed address may be automatically modified after accessing the indirect register at the address.10-18-2012
20110238955METHODS FOR SCALABLY EXPLOITING PARALLELISM IN A PARALLEL PROCESSING SYSTEM - Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.09-29-2011
20120331275SYSTEM AND METHOD FOR POWER OPTIMIZATION - A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.12-27-2012
20110320780HYBRID COMPARE AND SWAP/PERFORM LOCKED OPERATION QUEUE ALGORITHM - Systems, methods, and computer program products are disclosed for intermixing different types of machine instructions. One embodiment of the invention provides a protocol for intermixing the different types of machine instructions. By adhering to the protocol, different types of machine instructions may be intermixed to concurrently update data structures without leading to unpredictable results.12-29-2011
20110320779PERFORMANCE MONITORING IN A SHARED PIPELINE - A pipelined processing device includes: a device controller configured to receive a request to perform an operation; a plurality of subcontrollers configured to receive at least one instruction associated with the operation, each of the plurality of subcontrollers including a counter configured to generate an active time value indicating at least a portion of a time taken to process the at least one instruction; a pipeline processor configured to receive and process the at least one instruction, the pipeline processor configured to receive the active time value; and a shared pipeline storage area configured to store the active time value for each of the plurality of subcontrollers.12-29-2011
20110320778CENTRALIZED SERIALIZATION OF REQUESTS IN A MULTIPROCESSOR SYSTEM - Serializing instructions in a multiprocessor system includes receiving a plurality of processor requests at a central point in the multiprocessor system. Each of the plurality of processor requests includes a needs register having a requestor needs switch and a resource needs switch. The method also includes establishing a tail switch indicating the presence of the plurality of processor requests at the central point, establishing a sequential order of the plurality of processor requests, and processing the plurality of processor requests at the central point in the sequential order.12-29-2011
20100199074INSTRUCTION SET ARCHITECTURE WITH DECOMPOSING OPERANDS - Instead of having a processor with an instruction set architecture (ISA) that includes fixed architected operands, an improved processor supports additional characteristic bits for computing instructions (e.g., a multiply-add, load/store instructions). Such additional bits for the certain instructions influence the processing of these instructions by the processor. Also, a new instruction is introduced for further usage of the proposed method. Typically these additional characteristic bits as well as the instruction can be automatically generated by compilers to provide relatively well-suited instruction sequences for the processor.08-05-2010
20120151187INSTRUCTION OPTIMIZATION - Programs can be optimized at runtime prior to execution to enhance performance. Program instructions/operations designated for execution can be recorded and subsequently optimized at runtime prior to execution, for instance by performing transformations on the instructions. For example, such optimization can remove, reorder, and/or combine instructions, among other things.06-14-2012
20080244238Stream processing accelerator - The present invention is a stream processing accelerator which includes multiple coupled processing elements which are interconnected through a shared file register and a set of global predicates. The stream processing accelerator has two modes: full-processor mode and circuit mode. In full-processor mode, a branch unit, an arithmetic logic unit and a memory unit work together as a regular processor. In circuit mode, each component acts like functional units with configurable interconnections.10-02-2008
20080235496Methods and Apparatus for Dynamic Instruction Controlled Reconfigurable Register File - A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64x64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports.09-25-2008
20080235495Method and Apparatus for Counting Instruction and Memory Location Ranges - A method, apparatus, and computer instructions in a data processing system for processing instructions and monitoring accesses to memory location ranges. An instruction for execution is identified. A determination is made as to whether the instruction is within a contiguous range of instructions. Execution information relating to the instruction is identified if the instruction is within the contiguous range of instructions. With memory location accesses, an access to a memory location is identified. A determination of whether the memory location is within a contiguous range of memory locations is made. Access information is identified if the memory location is within the contiguous range of memory locations.09-25-2008
20130173888Processor for Executing Wide Operand Operations Using a Control Register and a Results Register - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.07-04-2013
20130173889PARALLEL PROCESSING SYSTEM FOR COMPUTING PARTICLE INTERACTIONS - A parallel processing system for computing particle interactions includes a plurality of computation nodes arranged according to a geometric partitioning of a simulation volume. Each computation node has storage for particle data. This particle data is associated with particles in a region of the geometrically partitioned simulation volume. The parallel processing system also includes a communication system having links interconnecting the computation nodes. Each of the computation nodes includes a processor subsystem. These processor subsystems cooperate to coordinate computation of the particle interactions in a distributed manner.07-04-2013
20110246750PROCESSING CAPACITY ON DEMAND - Embodiments of the present invention relate to a system and method for providing processing capacity on demand. According to the embodiments, a processor package has a plurality of processing elements. One or more of the processing elements may be made active in response to increased demand for processing capacity based on modifiable authorization information.10-06-2011
20080222397Hard Object: Hardware Protection for Software Objects - In accordance with one embodiment, additions to the standard computer microprocessor architecture hardware are disclosed comprising novel page table entry fields 09-11-2008
20080222396Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces - In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.09-11-2008
20080222395System and Method for Predictive Early Allocation of Stores in a Microprocessor - A system and method for predictive early allocation of stores in a microprocessor is presented. During instruction dispatch, an instruction dispatch unit retrieves an instruction from an instruction cache (Icache). When the retrieved instruction is an interruptible instruction, the instruction dispatch unit loads the interruptible instruction's instruction tag (IITAG) into an interruptible instruction tag register. A load store unit loads subsequent instruction information (instruction tag and store data) along with the interruptible instruction tag in a store data queue entry. Comparison logic receives a completing instruction tag from completion logic, and compares the completing instruction tag with the interruptible instruction tags included in the store data queue entries. In turn, deallocation logic deallocates those store data queue entries that include an interruptible instruction tag that matches the completing instruction tag.09-11-2008
20080215855Execution unit for performing shuffle and other operations - In one embodiment, the present invention includes a method for receiving first and second data operands in a common execution unit and manipulating the operands responsive to an instruction to generate an output according to local control signals of a local controller of the execution unit. Various instruction types such as shuffle and shift operations may be performed in the common execution unit in a single cycle. Other embodiments are described and claimed.09-04-2008
20130138926INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR - An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.05-30-2013
20130097410MACHINE PROCESSOR - Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor.04-18-2013
20130103931MACHINE PROCESSOR - Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor.04-25-2013
20110276789PARALLEL PROCESSING OF DATA - A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.11-10-2011
20130132709METHOD AND SYSTEM FOR PROCESSING INSTRUCTION INFORMATION - A method and system for processing instruction information. Each instruction information character string of a sequence of instruction information character strings are sequentially extracted and processed. Each instruction information character string pertains to an associated target object wrapped in a target object storage unit by an associated operation target model. It is independently ascertained for each instruction information character string whether to generate a code line for each instruction information character string, by: determining whether a requirement is satisfied and generating the code line and storing the code line in a code buffer if the requirement has been determined to be satisfied and not generating the code line if the requirement has been determined to not be satisfied. The requirement relates to whether the instruction information character string being processed comprises a naming instruction or a generation instruction. The generated code lines stored in the code buffer are displayed.05-23-2013
20130145132STATICALLY SPECULATIVE COMPILATION AND EXECUTION - A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.06-06-2013
20100281238Execution of instructions directly from input source - A computer array (11-04-2010
20100318770ELECTRONIC DEVICE, COMPUTER-IMPLEMENTED SYSTEM, AND APPLICATION DISPLAY CONTROL METHOD THEREFOR - An electronic device, a computer-implemented system, and an application display control method thereof are disclosed. The electronic device has a process unit that executes an operating system kernel, and then executes a first and a second software platform via the operating system kernel. When a first application is executed on the first software platform, a first window manager of the first software platform controls a screen area within which the first application being executed is displayed. The second software platform is notified by the first application to execute a second application, and a second window manager of the second software platform displays in the screen area a screen image that is generated when the second application is executed. Therefore, a user is given the flexibility of executing applications for different software platforms on the same one electronic device.12-16-2010
20130159678Code optimization by memory barrier removal and enclosure within transaction - A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.06-20-2013
20130159679Providing Hint Register Storage For A Processor - In one embodiment, the present invention includes a method for receiving a data access instruction and obtaining an index into a data access hint register (DAHR) register file of a processor from the data access instruction, reading hint information from a register of the DAHR register file accessed using the index, and performing the data access instruction using the hint information. Other embodiments are described and claimed.06-20-2013
20130185543GENERAL PURPOSE EMBEDDED PROCESSOR - The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from.07-18-2013
20130191618Systems and Methods for Dynamic Scaling in a Data Decoding System - Various embodiments of the present invention provide systems and methods for data processing using variable scaling.07-25-2013
20120030450METHOD AND SYSTEM FOR PARALLEL COMPUTATION OF LINEAR SEQUENTIAL CIRCUITS - A method and system for parallel computation of a linear sequential circuit (LSC) based on a state transition matrix is disclosed herein. A multistep state transition matrix and a multistep output generation matrix can be pre-computed and stored in association with the linear sequential circuit. The multiple state transitions and the multiple output bits can be computed by multiplying the current input-state vector with a multistep next state transition matrix and a multistep output generation matrix, respectively. Multiple state transitions and multiple output bits can be generated in parallel in a single clock cycle based on the pre-computed state transition matrix and the output generation matrix utilizing a dot product in order to improve computational speed. Such a simple augmentation provides a flexible and inexpensive solution for high speedup linear sequential circuit computation with respect to a processor.02-02-2012
20130198493SYSTEMS AND METHODS THAT FACILITATE MANAGEMENT OF ADD-ON INSTRUCTION GENERATION, SELECTION, AND/OR MONITORING DURING EXECUTION - The subject invention relates to systems and methods that facilitate display, selection, and management of context associated with execution of add-on instructions. The systems and methods track add-on instruction calls provide a user with call and data context, wherein the user can select a particular add-on instruction context from a plurality of contexts in order to observe values and/or edit parameters associated with the add-on instruction. The add-on instruction context can include information such as instances of data for particular lines of execution, the add-on instruction called, a caller of the instruction, a location of the instruction call, references to complex data types and objects, etc. The systems and methods further provide a technique for automatic routine selection based on the add-on instruction state information such that the add-on instruction executed corresponds to a current state.08-01-2013
20120036339ASYNCHRONOUS ASSIST THREAD INITIATION - A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.02-09-2012
20130205121PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS - A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.08-08-2013
20130205122INSTRUCTION SET ARCHITECTURE-BASED INTER-SEQUENCER COMMUNICATIONS WITH A HETEROGENEOUS RESOURCE - In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.08-08-2013
20130205120PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS - A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.08-08-2013
20130212361INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.08-15-2013

Patent applications in class PROCESSING CONTROL

Patent applications in all subclasses PROCESSING CONTROL