Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Processing control for data transfer

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

712220000 - PROCESSING CONTROL

Patent class list (only not empty are listed)

Deeper subclasses:

Entries
DocumentTitleDate
20090193236CONDITIONAL MEMORY ORDERING - A system for conditional memory ordering implemented in a multiprocessor environment. A conditional memory ordering instruction executes locally using a release vector containing release numbers for each processor in the system. The instruction first determines whether a processor identifier of the release number is associated with the current processor. Where it is not, a conditional registered is examined and appropriate remote synchronization operations are commanded where necessary.07-30-2009
20100115246SYSTEM AND METHOD OF DATA PARTITIONING FOR PARALLEL PROCESSING OF DYNAMICALLY GENERATED APPLICATION DATA - An improved system and method of data partitioning for parallel processing of dynamically generated application data is provided. An application may send a request to partition the application data specified by a data partitioning policy and to process each of the data partitions according to processing instructions. The data partitioning policy may be flexibly defined by an application for partitioning data any number of ways, including balancing the data volume across each of the partitions or partitioning the data by data type. Asynchronous data partition processors may be instantiated to perform parallel processing of the partitioned data. The data may be partitioned according to the data partitioning policy and processed according to the processing instructions. And the results may be returned to the application.05-06-2010
20130080747PROCESSOR AND INSTRUCTION PROCESSING METHOD IN PROCESSOR - The present invention relates to a processor including: an instruction cache configured to store at least some of first instructions stored in an external memory and second instructions each including a plurality of micro instructions; a micro cache configured to store third instructions corresponding to the plurality of micro instructions included in the second instructions; and a core configured to read out the first and second instructions from the instruction cache and perform calculation, in which the core performs calculation by the first instructions from the instruction cache under a normal mode, and when the process enters a micro instruction mode, the core performs calculation by the third instructions corresponding to the plurality of micro instructions provided from the micro cache.03-28-2013
20100325400MICROPROCESSOR AND DATA WRITE-IN METHOD THEREOF - A microprocessor comprises a register set, a micro operations pool (Uops pool), a hazard detection unit, an execution unit, a dispatch unit, and a mask unit. The Uops pool receives a first micro operation and a second micro operation from a decoder, and reads at least one first operand of the first micro operation and at least one second operand of the second micro operation from the register set. The hazard detection unit detects that the first micro operation is in a write after write hazard state due to the second micro operation. The execution unit executes the first micro operation dispatched from the Uops pool to obtain a first operation result and executes the second micro operation dispatched from the Uops pool to obtain a second operation result. The mask unit protects the first operation result from writing back to the register set according to the write after write hazard state.12-23-2010
20090158014System and Method for Retiring Approximately Simultaneously a Group of Instructions in a Superscalar Microprocessor - An apparatus and method for executing instructions having a program order. The apparatus comprising a temporary buffer, tag assignment logic, a plurality of functional units, a plurality of data paths, a register array, a retirement control block, and a superscalar instruction retirement unit. The temporary buffer includes a plurality of temporary buffer locations to store result data for executed instructions, wherein the temporary buffer locations are arranged in a plurality of groups of temporary buffer locations. The tag assignment logic is configured to concurrently assign a tag to each instruction in a first set of instructions, wherein the tags are assigned such that the respective tag assigned to each of the instructions in the first set of instructions identifies a different one of the temporary buffer locations in a first one of the groups of temporary buffer locations.06-18-2009
20100106948MANAGING AN OUT-OF-ORDER ASYNCHRONOUS HETEROGENEOUS REMOTE DIRECT MEMORY ACCESS (RDMA) MESSAGE QUEUE - A system and method operable to manage a message queue is provided. This management may involve out-of-order asynchronous heterogeneous remote direct memory access (RDMA) to the message queue. This system includes a pair of processing devices, a primary processing device and an additional processing device, a memory in storage location and a data bus coupled to the processing devices. The processing devices cooperate to process queue data within a shared message queue wherein when an individual processing device successfully accesses queue data the queue data is locked for the exclusive use of the processing device. When the processing device acquires the queue data, the queue data is locked and the queue data acquired by the acquiring processing device includes the queue data for both the primary processing device and additional processing device such that the processing device has all queue data necessary to process the data and return processed queue data.04-29-2010
20120185679Endpoint-Based Parallel Data Processing With Non-Blocking Collective Instructions In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing by the parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.07-19-2012
20090125706Software Pipelining on a Network on Chip - A network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, the NOC also including a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID with each stage executing on a thread of execution on an IP block.05-14-2009
20100005279Data processor - The data processor executes an instruction having a direction for write to a reference register of other instruction flow and an instruction having a direction for reference register invalidation. The data processor is arranged as a data processor having typical functions as an integrated whole of processors (CPU01-07-2010
20090307467Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer - Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.12-10-2009
20090300337Instruction set design, control and communication in programmable microprocessor cases and the like - Improved instruction set and core design, control and communication for programmable microprocessors is disclosed, involving the strategy for replacing centralized program sequencing in present-day and prior art processors with a novel distributed program sequencing wherein each functional unit has its own instruction fetch and decode block, and each functional unit has its own local memory for program storage; and wherein computational hardware execution units and memory units are flexibly pipelined as programmable embedded processors with reconfigurable pipeline stages of different order in response to varying application instruction sequences that establish different configurations and switching interconnections of the hardware units.12-03-2009
20090031118Apparatus and method for controlling order of instruction - An apparatus includes an instruction generator which generates a load instruction and a first store instruction from a program, a processor which executes said load and store instruction, wherein said instruction generator analyzes a relevancy between said load instruction and said first store instruction with respect to memory addresses accessed by said instructions, specifies a second store instruction irrelevant to said load instruction with respect to said memory address, and notifies said second store instruction to said processor, wherein said processor executes said load instruction in advance of said second store instruction during said processor prepares to execute said second store instruction.01-29-2009
20130067206Endpoint-Based Parallel Data Processing In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.03-14-2013
20090013159Queue Processor And Data Processing Method By The Queue Processor - A queue processor and its data processing method are provided. It can do high-speed data processing and decreases the electric energy consumption.01-08-2009
20090013158System and Method for Assigning Tags to Control Instruction Processing in a Superscalar Processor - A tag monitoring system for assigning tags to instructions embodied in software on a tangible computer-readable storage medium. A source supplies instructions to be executed by a functional unit. A queue having a plurality of slots containing tags which are used for tagging instructions. A register file stores information required for the execution of each instruction at a location in the register file defined by the tag assigned to that instruction. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order.01-08-2009
20090013157Management of Software Implemented Services in Processor-Based Devices - A service management system for devices with embedded processor systems manages use of memory by programs implementing the services by assigning services to classes and limiting the number of services per class that can be loaded into memory. Classes enable achieving predictable and stable system behavior, defining the services and service classes in a manifest that is downloaded to embedded devices operating on a network, such as a cable or satellite television network, telephone or computer network, and permit a system operator, administrator, or manager to manage the operation of the embedded devices while deploying new services implemented with applications downloaded from the network when the service is requested by a user.01-08-2009
20090013156PROCESSOR COMMUNICATION TOKENS - The invention provides a method of transmitting messages over an interconnect between processors, each message comprising a header token specifying a destination processor and at least one of a data token and a control token. The method comprises: executing a first instruction on a first one of the processors to generate a data token comprising a byte of data and at least one additional bit to identify that token as a data token, and outputting the data token from the first processor onto the interconnect as part of one of the messages. The method also comprises executing a second instruction on said first processor to generate a control token comprising a byte of control information and at least one additional bit to identify that token as a control token, and outputting the control token from the first processor onto the interconnect as part of one of the messages.01-08-2009
20090006824STRUCTURE FOR A CIRCUIT FUNCTION THAT IMPLEMENTS A LOAD WHEN RESERVATION LOST INSTRUCTION TO PERFORM CACHELINE POLLING - A design structure for a circuit function that implements a load when reservation lost instruction for performing cacheline polling is disclosed. Initially, a first process requests an action to be performed by a second process. The request is made via a store operation to a cacheable memory location. The first process then reads the cacheable memory location via a conditional load operation to determine whether or not the requested action has been completed by the second process, and the first process sets a reservation at the cacheable memory location if the requested action has not been completed by the second process. The conditional load operation of the first process is stalled until the reservation at the cacheable memory location has been lost. After the requested action has been completed, the reservation in the cacheable memory location is reset by the second process.01-01-2009
20090006823DESIGN STRUCTURE FOR SINGLE HOT FORWARD INTERCONNECT SCHEME FOR DELAYED EXECUTION PIPELINES - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding data in a processor is provided. The design structure includes a processor. The processor includes at least one cascaded delayed execution pipeline unit having a first and second pipeline, wherein the second pipeline is configured to execute instructions in a common issue group in a delayed manner relative to the first pipeline, and circuitry. The circuitry is configured to determine if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline, and if the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, forward the modified data from the first pipeline to the second pipeline.01-01-2009
20080294879Asynchronous Ripple Pipeline - An asynchronous ripple pipeline has a plurality of stages, each with a controller (11-27-2008
20100268921DATA COLLECTION PREFETCH DEVICE AND METHODS THEREOF - A method of retrieving information from a memory includes receiving an instruction associated with a data collection. In response to determining the instruction is a request to retrieve a first element of the data collection, an application program interface (API) generates an instruction to prefetch a second element of the data collection. In one embodiment, the second element to be prefetched is indicated by a pointer or other information associated with the first element. In response to the prefetch instruction, an execution core of the data processing device retrieves the second element from a memory module and stores the second element at a cache. By prefetching the second element before it has been explicitly requested by the application, the efficiency of the application can be increased.10-21-2010
20080209186Method for Reducing Buffer Capacity in a Pipeline Processor - The invention presents a method for a processor (08-28-2008
20090193235TRANSFER SYSTEM, AND TRANSFER METHOD - In response to a transfer request, for which a loading time at a transfer source and a loading time at a transfer target are designated by a production controller, there is created a transfer scenario, which contains a basic transfer (From) from the transfer source to a buffer near the transfer target, for example, and a basic transfer (To) from the buffer to the transfer target. In order that the basic transfers (From, To) may be able to be executed, the buffer is reserved, and a transfer vehicle is allocated. The time period for the transfer vehicle to run to the transfer source or the buffer and the time period for the transfer vehicle to run from the transfer source or the buffer are estimated to assign a transfer command to the transfer vehicle. The possibility that the loading and the loading time may deviate from a designated period is evaluated. In case this possibility is high, a production controller is informed that a just-in-time transfer is difficult.07-30-2009
20120110309Data Output Transfer To Memory - Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory.05-03-2012
20080313440SWITCHING TO ORIGINAL CODE COMPARISON OF MODIFIABLE CODE FOR TRANSLATED CODE VALIDITY WHEN FREQUENCY OF DETECTING MEMORY OVERWRITES EXCEEDS THRESHOLD - A method of translating instructions from a target instruction set to a host instruction set. In one embodiment, a plurality of first target instructions is translated into a plurality of first host instructions. After the translation, it is determined whether the plurality of first target instructions has changed. A copy of a second plurality of target instructions is stored and compared with the plurality of first target instructions if the determining slows the operation of the computer system. After comparing, the plurality of first host instructions is invalidated if there is a mismatch. According to one embodiment, the storing, the comparing and the invaliding is initiated when the determining indicates that a page contains at least one change to the plurality of first target instructions. In one embodiment, the determining is by examining a bit indicator associated with a memory location of the plurality of first target instructions.12-18-2008
20100122071SEMICONDUCTOR DEVICE AND DATA PROCESSING METHOD - A semiconductor device includes a first circuit that executes a first calculation, a second circuit that includes a first storage unit therein and executes a second calculation, a controller that outputs a first address for specifying a first execution circuit for the first calculation and a second execution circuit for the second calculation, to the first circuit and the second circuit, and controls input of data into the first circuit, and a bus that transfers a result of the first calculation executed by the first circuit to the second circuit, wherein the result of the first calculation can be conditionally used as an address for specifying the second execution circuit.05-13-2010
20090094442Storage medium storing load detecting program and load detecting apparatus - A load detecting apparatus includes a load controller, and judges a motion of a player on the basis of detected load values. Judgment timing for a motion of putting the feet on and down from the controller is decided on the basis of an elapsed time from an instruction of a motion. In a case that a step-up-and-down exercise is performed, a judgment timing of a motion for bringing about a state both of the feet are put down on a ground at a fourth step is decided on the basis of a judgment timing of a motion of putting a third step down.04-09-2009
20120272047METHOD AND APPARATUS FOR SHUFFLING DATA - Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.10-25-2012
20110173422PAUSE PROCESSOR HARDWARE THREAD UNTIL PIN - A system and method for enhancing performance of a computer which includes a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processer. The processor processes instructions from the program. A wait state in the processor waits for receiving specified data. A thread in the processor has a pause state wherein the processor waits for specified data. A pin in the processor initiates a return to an active state from the pause state for the thread. A logic circuit is external to the processor, and the logic circuit is configured to detect a specified condition. The pin initiates a return to the active state of the thread when the specified condition is detected using the logic circuit.07-14-2011
20090282226Context Switching On A Network On Chip - A network on chip (‘NOC’) that includes IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to the network by an application messaging interconnect including an inbox and an outbox, one or more of the IP blocks including computer processors supporting a plurality of threads, the NOC also including an inbox and outbox controller configured to set pointers to the inbox and outbox, respectively, that identify valid message data for a current thread; and software running in the current thread that, upon a context switch to a new thread, is configured to: save the pointer values for the current thread, and reset the pointer values to identify valid message data for the new thread, where the inbox and outbox controller are further configured to retain the valid message data for the current thread in the boxes until context switches again to the current thread.11-12-2009
20120297172Multi-Threaded Processes for Opening and Saving Documents - Tools and techniques are described for multi-threaded processing for opening and saving documents. These tools may provide load processes for reading documents from storage devices, and for loading the documents into applications. These tools may spawn a load process thread for executing a given load process on a first processing unit, and an application thread may execute a given application on a second processing unit. A first pipeline may be created for executing the load process thread, with the first pipeline performing tasks associated with loading the document into the application. A second pipeline may be created for executing the application process thread, with the second pipeline performing tasks associated with operating on the documents. The tasks in the first pipeline are configured to pass tokens as input to the tasks in the second pipeline.11-22-2012
20090282225STORE QUEUE - Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache. If so, the system conditionally forwards data from the unrestricted entry or the restricted entry by: (1) forwarding data from an unrestricted entry that contains the youngest store that satisfies the load instruction when any number of unrestricted or restricted entries contain data that satisfies the load instruction; (2) forwarding data from an unrestricted entry when only one restricted entry and no unrestricted entries contain data that satisfies the load instruction; and (3) deferring the load instruction by placing the load instruction in a deferred queue when two or more restricted entries and no unrestricted entries contain data that satisfies the load instruction.11-12-2009
20090287911PROGRAMMABLE SIGNAL AND PROCESSING CIRCUIT AND METHOD OF DEPUNCTURING - A programmable signal processing circuit has an instruction processing circuit (11-19-2009
20090037702Processor and data load method using the same - A processor includes an instruction decoder, an instruction execution part and a register file. The instruction decoder is adapted to decode an instruction. The instruction execution part is adapted to execute processing corresponding to the instruction decoded by the instruction decoder. The register file is capable of storing load data from a data memory and supplying input data to the instruction execution part. The register file includes a plurality of registers, each of which is capable of holding a plurality of bits of data. Furthermore, the register file is configured to update the data held by the plurality of registers by shifting the data held by the plurality of registers among the plurality of registers.02-05-2009
20080244243Computer program product and system for altering execution flow of a computer program - A debugger alters the execution flow of a child computer program of the debugger at runtime by inserting jump statements determined by the insertion of breakpoint instructions. Breakpoints are used to force the child computer program to throw exceptions at specified locations. One or more instructions of the computer program are replaced by jump instructions. The jump destination addresses associated with the break instructions can be specified by input from a user. The debugger changes the instruction pointer of the child program to achieve the desired change in execution flow. No instructions are lost in the child program.10-02-2008
20080250232Data Processing Device, Data Processing Program, and Recording Medium Recording Data Processing Program - A dependence relationship storage unit M indicates from which input address and input value each of the output addresses and output values derives. An inter-line AND comparator MR performs AND between each of the line components stored in the dependence relationship storage unit M and sets an I/O group including an output pattern containing at least one output address and output value and an input pattern containing at least one input address and input value. Thus, it is possible to provide a data processing device capable of registering an I/O group appropriate for reuse in instruction section storage means.10-09-2008
20080288757Communicating Instructions and Data Between a Processor and External Devices - A mechanism for communicating instructions and data between a processor and external devices are provided. The mechanism makes use of a channel interface as the primary mechanism for communicating between the processor and a memory flow controller. The channel interface provides channels for communicating with processor facilities, memory flow control facilities, machine state registers, and external processor interrupt facilities, for example. These channels may be designated as blocking or non-blocking. With blocking channels, when no data is available to be read from the corresponding registers, or there is no space available to write to the corresponding registers, the processor is placed in a low power “stall” state. The processor is automatically awakened, via communication across the blocking channel, when data becomes available or space is freed. Thus, the channels of the present invention permit the processor to stay in a low power state.11-20-2008
20080313441METHOD AND STRUCTURE FOR PRODUCING HIGH PERFORMANCE LINEAR ALGEBRA ROUTINES USING REGISTER BLOCK DATA FORMAT ROUTINES - A method (and structure) of executing a matrix operation, includes, for a matrix A, separating the matrix A into blocks, each block having a size p-by-q. The blocks of size p-by-q are then stored in a cache or memory in at least one of the two following ways. The elements in at least one of the blocks is stored in a format in which elements of the block occupy a location different from an original location in the block, and/or the blocks of size p-by-q are stored in a format in which at least one block occupies a position different relative to its original position in the matrix A.12-18-2008
20080313439PIPELINE DEVICE WITH A PLURALITY OF PIPELINED PROCESSING UNITS - In a pipeline device, the output of each of processing units is connected to a corresponding one of data output lines of data transfer lines. Input selectors are provided for the processing units, respectively. Each input selector selects one of the data transfer lines except for one data output line to which the output of a corresponding one processing unit is connected to thereby determine one of interconnection patterns among the processing units. The interconnection patterns correspond to data-processing tasks, respectively. Each input selector inputs, to a corresponding one of the processing units, data flowing through the selected one of the data transfer lines. Each processing unit individually performs a predetermined process based on data inputted thereto by a corresponding one of the input selectors to thereby perform, in pipeline, one of the data-processing tasks corresponding to the determined one of the interconnection patterns.12-18-2008
20090319760SINGLE-CYCLE LOW POWER CPU ARCHITECTURE - An n architecture for implementing an instruction pipeline within a CPU comprises an arithmetic logic unit (ALU), an address arithmetic unit (AAU), a program counter (PC), a read-only memory (ROM) coupled to the program counter, to an instruction register, and to an instruction decoder coupled to the arithmetic logic unit. A random access memory (RAM) is coupled to the instruction decoder, to the arithmetic logic unit, and to a RAM address register.12-24-2009
20080301415INFORMATION PROCESSING SYSTEM - An information processing system includes a first processor that accesses a first memory, a second processor that accesses a second memory, and a data transfer unit for executing data transfer between the first memory and the second memory. The first processor executes functions of translating an instruction out of instructions included in the program except a memory access instruction into an instruction for the second processor and translating the memory access instruction into an instruction sequence containing a call instruction of the program to transfer the access data on the first memory to the second memory via a data transfer unit.12-04-2008
20090089558ADJUSTMENT OF DATA COLLECTION RATE BASED ON ANOMALY DETECTION - Systems and methods that vary multiple data sampling rates, to collect sets of data with different levels of granularity for an industrial system. The data for such industrial system includes sets of data from the “internal” data stream(s) (e.g., history data collected from an industrial unit) and sets of data from an “external” (e.g., traffic data on network services) data stream(s), based in part on the criticality/importance criteria assigned to each collection stage. Each set of data can be assigned its own unique data collection rate. For example, a higher sample rate can be employed when collecting data from the network during an operation stage that is deemed more critical (e.g., dynamic attribution of predetermined importance factors) than the rest of the operation.04-02-2009
20090164763METHOD AND APPARATUS FOR A DOUBLE WIDTH LOAD USING A SINGLE WIDTH LOAD PORT - A single micro-instruction to perform either an N-bit or a 2N-bit load is provided. A microprocessor having an N-bit load port performs either an N-bit load or a 2N-bit load in a single cycle with the same micro-instruction being used for both the N-bit and the 2N-bit load.06-25-2009
20110060893CIRCUIT COMPRISING A MICROPROGRAMMED MACHINE FOR PROCESSING THE INPUTS OR THE OUTPUTS OF A PROCESSOR SO AS TO ENABLE THEM TO ENTER OR LEAVE THE CIRCUIT ACCORDING TO ANY COMMUNICATION PROTOCOL - A circuit having at least one processor and a microprogrammed machine for processing the data which enters or leaves the processor in order to input or output the data into/from the circuit in compliance with a communication protocol.03-10-2011
20090182992Load Relative and Store Relative Facility and Instructions Therefore - A method, system and program product for loading or storing memory data wherein the address of the memory operand is based an offset of the program counter rather than an explicitly defined address location. The offset is defined by an immediate field of the instruction which is sign extended and is aligned as a halfword address when added to the value of the program counter.07-16-2009
20090049285INFORMATION DELIVERY APPARATUS, INFORMATION REPRODUCTION APPARATUS, AND INFORMATION PROCESSING METHOD - An information delivery apparatus includes an encoding information collection unit which collects information used to encode content information, a generation unit which predicts decode processes of the content information based on the collected information, and generates configuration information used to configure data paths required to execute the decode processes, an embedding unit which embeds the configuration information in the content information, and a delivery unit which delivers the content information embedded with the configuration information.02-19-2009
20090063828Systems and Methods for Communication between a PC Application and the DSP in a HDA Audio Codec - Systems and methods implemented in a PC for enabling communication between an application executing on the CPU and a DSP that is incorporated into a codec in the High Definition Audio (HDA) system, wherein the communication is carried out via the HDA bus. In one embodiment, an HDA codec includes one or more conventional HDA widgets coupled to a programmable processor such as a DSP. The codec includes a set of registers that are configured to store HDA verbs and data transmitted via the HDA bus. The programmable processor is configured to identify verbs that indicate associated information is a communication from an application executing on the CPU, read the associated information, and process the information according to the associated verbs. The information may be program instructions, parametric data, requests for information, etc.03-05-2009
20090198976METHOD AND STRUCTURE FOR HIGH-PERFORMANCE MATRIX MULTIPLICATION IN THE PRESENCE OF SEVERAL ARCHITECTURAL OBSTACLES - A method (and apparatus) for processing data on a computer having a memory to store the data and a processing unit to execute the processing, the processing unit having a plurality of registers available for an internal working space for a data processing occurring in the processing unit, includes configuring the plurality of registers to include at least two sets of registers. A first set of the at least two sets interfaces with the processing unit for the data processing in a current processing cycle. A second set of the at least two sets is used for removing data from the processing unit of a previous processing cycle to be stored in the memory and preloading data into the processing unit from the memory, to be used for a next processing cycle.08-06-2009
20090198975TERMINATION OF IN-FLIGHT ASYNCHRONOUS MEMORY MOVE - A data processing system has a processor, a memory, and an instruction set architecture (ISA) that includes: (1) an asynchronous memory mover (AMM) store (ST) instruction initiates an asynchronous memory move operation that moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) first performing a move of the data in virtual address space utilizing a source effective address a destination effective address; and (b) when the move is completed, completing a physical move of the data to the second memory location, independent of the processor. The ISA further provides (2) an AMM terminate ST instruction for stopping an ongoing AMM operation before completion of the AMM operation, and (3) a LD CMP instruction for checking a status of an AMM operation.08-06-2009
20090049284Parallel Subword Instructions With Distributed Results - The present invention provides for parallel subword instructions that cause results to be non-contiguously stored in a result register. For example, a targeting-type instruction can specify (implicitly or explicitly) a bit position and the result of each of the parallel subword compare operations can be stored at that bit position within the respective subword location of a result register. Alternatively, for a shifting-type instruction, pre-existing contents of a result register can be shifted one bit toward greater significance while the results are of the present operation are stored in the least-significant bits of respective result-register subword locations. This approach provides the results of multiple parallel subword compare instructions to be combined with relatively few instructions and reduces the maximum lateral movement of information—both of which can enhance performance.02-19-2009
20080263337INSTRUCTIONS FOR ORDERING EXECUTION IN PIPELINED PROCESSES - Ordering instructions for specifying the execution order of other instructions improve throughput in a pipelined multiprocessor. Memory write operations local to a CPU are allowed to occur in an arbitrary order, and constraints are placed on shared memory operations. Multiple sets of instructions are provided in which order of execution of the instructions is maintained through the use of CPU registers, write buffers in conjunction with assignment of sequence numbers to the instruction, or a hierarchical ordering system. The system ensures that an earlier designated instruction has reach a specified state of execution prior to a latter instruction reaching a specified state of execution. The ordering of operations allows memory operations local to a CPU to occur in conjunction with other memory operations that are not affected by such execution.10-23-2008
20090210679PROCESSOR AND METHOD FOR STORE DATA FORWARDING IN A SYSTEM WITH NO MEMORY MODEL RESTRICTIONS - A pipelined microprocessor includes circuitry for store forwarding by performing: for each store request, and while a write to one of a cache and a memory is pending; obtaining the most recent value for at least one complete block of data; merging store data from the store request with the complete block of data thus updating the block of data and forming a new most recent value and an updated complete block of data; and buffering the updated complete block of data into a store data queue; for each load request, where the load request may require at least one updated completed block of data: determining if store forwarding is appropriate for the load request on a block-by-block basis; if store forwarding is appropriate, selecting an appropriate block of data from the store data queue on a block-by-block basis; and forwarding the selected block of data to the load request.08-20-2009
20120198214N-WAY MEMORY BARRIER OPERATION COALESCING - One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed.08-02-2012
20120198213PACKET HANDLER INCLUDING PLURALITY OF PARALLEL ACTION MACHINES - A packet handler for a packet processing system includes a plurality of parallel action machines, each of the plurality of parallel action machines being configured to perform a respective packet processing function; and a plurality of action machine input registers, wherein each of the plurality of parallel action machines is associated with one or more of the plurality of action machine input registers, and wherein an action machine of the plurality of parallel action machines is automatically triggered to perform its respective packet processing function in the event that data sufficient to perform the actions machine's respective packet processing function is written into the action machine's one or more respective action machine input registers.08-02-2012
20090254737INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SUPPORTING SERVER AND INFORMATION PROCESSING SYSTEM - In information processing device or the like is provided to output a suitable form of information from a view point of user's non-feeling of trouble. In the information processing system, whether a user issues an output instruction or not is confirmed only with respect to information that is not extracted out of information stored in a first storing unit (10-08-2009
20100274997Executing a Gather Operation on a Parallel Computer - Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network.10-28-2010
20090254736Data processing system for performing data rearrangement operations - An apparatus for processing data is provided comprising rearrangement circuitry having a plurality of rearrangement stages for rearranging a plurality N of input data elements, each rearrangement stage comprising at most N multiplexers arranged to select between M data elements where M is in integer less than N. Control circuitry is provided that is responsive to program instructions to control the rearrangement circuitry to perform rearrangement operations. The rearrangement circuitry is configurable by the control circuitry to perform a plurality of different rearrangement operations. The rearrangement circuitry comprises main rearrangement circuitry having a plurality of rearrangement stages in which there is a unique path between any given input element and any given output element and supplementary rearrangement circuitry in which from each input data element there is a path to at most C output data elements where 110-08-2009
20090249042GATEWAY APPARATUS, CONTROL INSTRUCTION PROCESSING METHOD, AND PROGRAM - A gateway apparatus includes a translator connected to a first network for one or more controllers to control one or more devices, and one or more aggregators. The translator includes an acquisition unit which acquires load information concerning a load on each of the controllers, a control instruction reception unit which receives a control instruction for a device from a client via the second network, a determination unit which determines whether the instruction is an aggregation target, based on the information, a first transfer unit which transfers the instruction to the aggregator corresponding to the instruction, and a second transfer unit which receives an aggregate control instruction from the aggregator. The aggregator includes a third transfer unit which receives the instruction from the translator, an aggregation unit which aggregates the plurality of instructions into one aggregate control instruction, and a fourth transfer unit which transfers the aggregate control instruction.10-01-2009
20080307207DATA EXCHANGE AND COMMUNICATION BETWEEN EXECUTION UNITS IN A PARALLEL PROCESSOR - A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.12-11-2008
20100228956CONTROL CIRCUIT, INFORMATION PROCESSING DEVICE, AND METHOD OF CONTROLLING INFORMATION PROCESSING DEVICE - A control circuit for receiving data transmitted by a data transmitting circuit and transmitting the received data to a data receiving circuit includes: a data receiving unit for receiving the data transmitted by the data transmitting circuit; a packet analyzing unit for judging whether the data received from the data transmitting circuit is a packet including history acquisition information and reading the history acquisition information from the received data; a history acquisition executing unit for starting or stopping acquiring the history information of the transmission and reception of the data according to the history acquisition information read by the packet analyzing unit to store the history information acquired; and a data transmitting unit for transmitting the packet including the history acquisition information or a packet other than the packet including the history acquisition information to the data receiving circuit.09-09-2010
20100161947PACK UNICODE ZSERIES INSTRUCTIONS - Emulation methods are provided for two PACK instructions, one for Unicode data and the other for ASCII coded data in which processing is carried out in a block-by-block fashion as opposed to a byte-by-byte fashion as a way to provide superior performance in the face of the usual challenges facing the execution of emulated data processing machine instructions as opposed to native instructions.06-24-2010
20100241834METHOD OF ENCODING USING INSTRUCTION FIELD OVERLOADING - The method selects registers by a register instruction field having x bits. A first group of registers has up to 209-23-2010
20100223447Translate and Verify Instruction for a Processor - In an embodiment, a first instruction is defined that comprises at least a first operand from which the execution core is configured to determine a virtual address and a second operand that specifies one or more translation attributes that exist in a page table entry that defines a translation for the virtual address. A processor executing the instruction translates the virtual address, verifies whether or not the translation attributes in the page table entry match the specified translation attributes, faults the first instruction responsive to failing to locate a translation for the virtual address, and responsive to locating a translation for the virtual address in the page table entry but with the translation attributes in the entry failing to match the specified translation attributes.09-02-2010
20090037701Method of Updating Electronic Operationg Instructions of a Vehicle and an Operating Instructions Updating System - A method and system for updating electronic operating instructions of a vehicle is provided. Local operating instruction data objects are stored in a local storage device arranged in the vehicle so that they can be used by the driver. Corresponding current operating instruction data objects are stored in an external storage device. One data object category, respectively, is assigned to the operating instruction data objects. For updating, a current operating instruction data object is transmitted from the external storage device to the local storage device in order to modify the corresponding local operating instruction data object in the local storage device The frequency of the updating of a local operating instruction data object depends on the data object category assigned to the data object.02-05-2009
20100313001DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM - An apparatus includes a plurality of processing modules which are connected to each other by corresponding communication unit and the modules transfer packets in a predetermined direction to execute a plurality of operations of pipeline processing. The module includes a storage unit for storing a first identification and a second identification for each of the plurality of operations, a reception unit for extracting data from a packet which has the first identification, a processing unit for processing the data extracted by the reception unit, and a transmission unit for storing the second identification corresponding to the first identification of the packet a packet and transmitting the packet to the module arranged in the predetermined direction.12-09-2010
20130138929PROCESS MAPPING IN PARALLEL COMPUTING - A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided.05-30-2013
20130138930COMPUTER SYSTEMS AND METHODS FOR REGISTER-BASED MESSAGE PASSING - Systems and methods are disclosed that include a plurality of processing units having a plurality of register file entries. Control logic identifies a first register entry as including a message address in response to receiving a first instruction. The control logic further identifies a second register entry to receive messages in response to receiving a second instruction.05-30-2013
20100332808MINIMIZING CODE DUPLICATION IN AN UNBOUNDED TRANSACTIONAL MEMORY SYSTEM - Minimizing code duplication in an unbounded transactional memory system. A computing apparatus including one or more processors in which it is possible to use a set of common mode-agnostic TM barrier sequences that runs on legacy ISA and extended ISA processors, and that employs hardware filter indicators (when available) to filter redundant applications of TM barriers, and that enables a compiled binary representation of the subject code to run correctly in any of the currently implemented set of transactional memory execution modes, including running the code outside of a transaction, and that enables the same compiled binary to continue to work with future TM implementations which may introduce as yet unknown future TM execution modes.12-30-2010
20110040955STORE-TO-LOAD FORWARDING BASED ON LOAD/STORE ADDRESS COMPUTATION SOURCE INFORMATION COMPARISONS - A microprocessor includes a queue comprising a plurality of entries each configured to hold store information for a store instruction. The store information specifies sources of operands used to calculate a store address. The store instruction specifies store data to be stored to a memory location identified by the store address. The microprocessor also includes control logic, coupled to the queue, configured to encounter a load instruction. The load instruction includes load information that specifies sources of operands used to calculate a load address. The control logic detects that the load information matches the store information held in a valid one of the plurality of queue entries and responsively predicts that the microprocessor should forward to the load instruction the store data specified by the store instruction whose store information matches the load information.02-17-2011
20100223448Computer Configuration Virtual Topology Discovery and Instruction Therefore - In a logically partitioned host computer system comprising host processors (host CPUs), a facility and instruction for discovering topology of one or more guest processors (guest CPUs) of a guest configuration comprises a guest processor of the guest configuration fetching and executing a STORE SYSTEM INFORMATION instruction that obtains topology information of the computer configuration. The topology information comprising nesting information of processors of the configuration and the degree of dedication a host processor provides to a corresponding guest processor. The information is preferably stored in a single table in memory.09-02-2010
20090313459SYSTEM AND METHOD FOR PROCESSING LOW DENSITY PARITY CHECK CODES USING A DETERMINISTIC CACHING APPARATUS - A system, method and article of manufacture are disclosed for processing Low Density Parity Check (LDPC) codes. The system comprises a multitude of processing units for processing the codes; and a processor chip including an on-chip, multi-port data cache for temporarily storing the LDPC codes. This data cache includes a plurality of input ports for receiving the LDPC codes from some of the processing units, and a plurality of output ports for sending the LDPC codes to others of the processing units. An off-chip, external memory stores the LDPC codes and transmits the LDPC codes to and receives the LDPC codes from at least some of the processing units. A sequence processor controls the transmission of the LDPC codes between the processor units and the on-chip data cache so that the LDPC codes are processed by the processing units according to a given sequence.12-17-2009
20110087865Intermediate Register Mapper - A method, processor, and computer program product employing an intermediate register mapper within a register renaming mechanism. A logical register lookup determines whether a hit to a logical register associated with the dispatched instruction has occurred. In this regard, the logical register lookup searches within at least one register mapper from a group of register mappers, including an architected register mapper, a unified main mapper, and an intermediate register mapper. A single hit to the logical register is selected among the group of register mappers. If an instruction having a mapper entry in the unified main mapper has finished but has not completed, the mapping contents of the register mapper entry in the unified main mapper are moved to the intermediate register mapper, and the unified register mapper entry is released, thus increasing a number of unified main mapper entries available for reuse.04-14-2011
20090031119Method for the operation of a multiprocessor system in conjunction with a medical imaging system - The invention relates to a method for operating a multiprocessor system, especially in conjunction with a medical imaging system. The invention also relates to a medical imaging device which is designed to perform this method. The multiprocessor system in this case has at least two processing units, at least one control unit and operations which can be allocated to the processing units. Data provided from an input is processed by the processing unit and made available at an output. The at least one control unit enhances the named data with control data, which defines an allocation of the data to the respective operations for the purposes of processing.01-29-2009
20100070742EMBEDDED-DRAM DSP ARCHITECTURE HAVING IMPROVED INSTRUCTION SET - An embedded-DRAM processor architecture includes a DRAM array, a set of register files, set of functional units, and a data assembly unit. The data assembly unit includes a set of row-address registers and is responsive to commands to activate and deactivate DRAM rows and to control the movement of data throughout the system. A pipelined data assembly approach allowing the functional units to perform register-to-register operations, and allowing the data assembly unit to perform all load/store operations using wide data busses. Data masking and switching hardware allows individual data words or groups of words to be transferred between the registers and memory. Other aspects of the disclosure include a memory and logic structure and an associated method to extract data blocks from memory to accelerate, for example, operations related to image compression and decompression.03-18-2010
20090113187Processor architecture for executing instructions using wide operands - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.04-30-2009
20100306511COMMUNICATION DATA PROCESSOR AND COMMUNICATION DATA PROCESSING METHOD - There is a need for providing a communication data processor easily adaptable to network configurations required for industrial Ethernet. The apparatus successively analyzes received packets. The apparatus uses a register to determine whether or not to transmit the received packet as transmission data to another port. Rewritable memory saves a program code that provides control for analyzing a reception packet and generating a transmission packet. The apparatus is capable of complying with various communication protocols by changing the program code.12-02-2010
20120204015Sharing a Data Buffer - A computer-program product may have instructions that, when executed, cause a processor to perform operations including managing execution of application functions that access data in a shared buffer; determining if a first instruction that is stored at a first memory location causes, upon execution, data to be read from or written to the shared buffer; and when it is determined that the first instruction causes data to be read from or written to the shared buffer, 1) identify one or more replacement instructions to execute in place of the first instruction; 2) store the one or more replacement instructions; and 3) replace the first instruction at the first memory location with a second instruction that, when executed, causes the stored one or more replacement instructions to be executed.08-09-2012
20110153998Methods and Apparatus for Attaching Application Specific Functions Within an Array Processor - A multi-node video signal processor (VSP06-23-2011
20090204794METHODS COMPUTER PROGRAM PRODUCTS AND SYSTEMS FOR UNIFYING PROGRAM EVENT RECORDING FOR BRANCHES AND STORES IN THE SAME DATAFLOW - The present invention relates to a method for the unification of PER branch and PER store operations within the same dataflow. The method comprises determining a PER range, the PER range comprising a storage area defined by a designated storage starting area and a designated storage ending area, wherein the storage starting area is designated by a value of the contents of a first control register and the storage ending area is designated by a value of the contents of a second control register. The method also comprises retrieving register field content values that are stored at a plurality of registers, wherein the retrieved content values comprises a length field content value, and setting the length field content value to zero for a PER branch instruction, thereby enabling a PER branch instruction to performed similarly to a PER storage instruction.08-13-2009
20090138687MEMORY DEVICE HAVING DATA PROCESSING FUNCTION - A memory device having a data processing function is disclosed. The memory device can include a process area, in which process command information is written by a processor; a storage area, in which one or more data is written; an output area, in which display data selected by the processor from among the data written in the storage area is written; and a processing unit, which performs one or more processes of copying data, computing data, and transmitting display data to an external outputting device, in correspondence with the process command information. According to some aspects of the present invention, the memory device is able to independently perform commands received from the processor, and does not require a separate memory for storing data that will be transmitted to an external outputting device, so that the processing efficiency of the processor can be enhanced.05-28-2009
20090113188COORDINATOR SERVER, DATABASE SERVER, AND PIPELINE PROCESSING CONTROL METHOD - A first transmitting unit transmits a processing command to a plurality of parallelized database servers. A second transmitting unit integrates data sets transmitted from the database servers in response to the processing command, and transmits an integrated data set to a client. An integrating unit integrates data sets buffered in a buffer unit. A determining unit determines a transmission start or a transmission suspend of the data sets based on a data size in the buffer unit. A third transmitting unit transmits a control command for the transmission start or the transmission suspend to the database servers based on a result of determination by the first determining unit.04-30-2009
20110047361Load/Move Duplicate Instructions for a Processor - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.02-24-2011
20110047360PROCESSOR - The present application provides a method of randomly accessing a compressed structure in memory without the need for retrieving and decompressing the entire compressed structure.02-24-2011
20110055526METHOD AND APPARATUS FOR ACCESSING MEMORY ACCORDING TO PROCESSOR INSTRUCTION - There is provided a method and apparatus for accessing a memory according to a processor instruction. The apparatus includes: a stack offset extractor extracting an offset value from a stack pointer offset indicating a local variable in the processor instruction; a local stack storage including a plurality of items, each of which is formed of an activation bit indicating whether each item is activated, an offset storing an offset value of a stack pointer, and an element storing a local variable value of the stack pointer; an offset comparator comparing the extracted offset value with an offset value of each item and determining whether an item corresponding to the extracted offset value is present in the local stack storage; and a stack access controller controlling a processor to access the local stack storage or a cache memory according to a determining result of the offset comparator.03-03-2011
20080201559Switching Device and Corresponding Method for Activating a Load - A cost-effective safety concept for safety-relevant applications in motor vehicles accordingly activates a load not directly from a central unit, but instead indirectly via a switching device. The latter has a first and a second register for the acquisition of the same control data from the central unit, and a third register for outputting data to the load. A transmission device transmits data from the second register to the third register. A first comparison logic compares a content of the second register with that of the third register and sends an interrupt to the central unit, when the two contents are not identical. A second comparison logic compares the content of the first and second registers and enables the transmission device, when the contents of the two registers are identical and otherwise blocks the transmission device. The last held state is thus maintained in the event of an error.08-21-2008
20100293359General Purpose Register Cloning - A clone set of General Purpose Registers (GPRs) is created to be used by a set of helper thread binaries, which is created from a set of main thread binaries. When the set of main thread binaries enters a wait state, the set of helper thread binaries uses the clone set of GPRs to continue using unused execution units within a processor core. The set of helper threads are thus able to warm up local cache memory with data that will be needed when execution of the set of main thread binaries resumes.11-18-2010
20120311306DATA PROCESSING APPARATUS, DATA PROCESSING SYSTEM, PACKET, RECORDING MEDIUM, STORAGE DEVICE, AND DATA PROCESSING METHOD - Parallelism of processing can be improved while existing software resources are utilized substantially as they are.12-06-2012
20110093687MANAGING MULTIPLE SPECULATIVE ASSIST THREADS AT DIFFERING CACHE LEVELS - An illustrative embodiment provides a computer-implemented process for managing multiple speculative assist threads for data pre-fetching that sends a command from an assist thread of a first processor to second processor and a memory, wherein parameters of the command specify a processor identifier of the second processor, responsive to receiving the command, reply by the second processor indicating an ability to receive a cache line that is a target of a pre-fetch, responsive to receiving the command replying by the memory indicating a capability to provide the cache line, responsive to receiving replies from the second processor and the memory, sending, by the first processor, a combined response to the second processor and the memory, wherein the combined response indicates an action, and responsive to the action indicating a transaction can continue sending the requested cache line, by the memory, to the second processor into a target cache level on the second processor.04-21-2011
20110138157CONVOLUTION COMPUTATION FOR MANY-CORE PROCESSOR ARCHITECTURES - A convolution of the kernel over a layout in a multi-core processor system includes identifying a sector, called a dynamic band, of the layout including a plurality of evaluation points. Layout data specifying the sector of the layout is loaded in shared memory, which is shared by a plurality of processor cores. A convolution operation of the kernel and the evaluation points in the sector is executed. The convolution operation includes iteratively loading parts of the basis data set, called a stride, into space available in shared memory given the size of the layout data specifying the sector. A plurality of threads is executed concurrently using the layout data for the sector and the currently loaded part of the basis data set. The iteration for the loading basis data set proceeds through the entire data set until the convolution operation is completed.06-09-2011
20090292905Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer - Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node.11-26-2009
20090172367PROCESSING UNIT - A processing unit has an extended register to which instruction extension information indicating an extension of an instruction can be set. An operation unit that, when instruction extension information is set to the extended register, executes a subsequent instruction following a first instruction for writing the instruction extension information into the extended register, extends the subsequent instruction based on the instruction extension information.07-02-2009
20090172366Enabling permute operations with flexible zero control - In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed.07-02-2009
20090172365Instructions and logic to perform mask load and store operations - In one embodiment, logic is provided to receive and execute a mask move instruction to transfer a vector data element including a plurality of packed data elements from a source location to a destination location, subject to mask information for the instruction. Other embodiments are described and claimed.07-02-2009
20090172364DEVICE, SYSTEM, AND METHOD FOR GATHERING ELEMENTS FROM MEMORY - A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.07-02-2009
20090172363MIXING INSTRUCTIONS WITH DIFFERENT REGISTER SIZES - When legacy instructions, that can only operate on smaller registers, are mixed with new instructions in a processor with larger registers, special handling and architecture are used to prevent the legacy instructions from causing problems with the data in the upper portion of the registers, i.e., the portion that they cannot directly access. In some embodiments, the upper portion of the registers are saved to temporary storage while the legacy instructions are operating, and restored to the upper portion of the registers when the new instructions are operating. A special instruction may also be used to disable this save/restore operation if the new instruction are not going to use the upper part of the registers.07-02-2009
20120210102Obtaining And Releasing Hardware Threads Without Hypervisor Involvement - A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.08-16-2012
20120011349DATA EXCHANGE AND COMMUNICATION BETWEEN EXECUTION UNITS IN A PARALLEL PROCESSOR - Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel.01-12-2012
20110107069Processor Architecture for Executing Wide Transform Slice Instructions - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.05-05-2011
20110107068ELIMINATING REDUNDANT OPERATIONS FOR COMMON PROPERTIES USING SHARED REAL REGISTERS - One embodiment of a method for eliminating redundant operations establishing common properties includes identifying a first virtual register storing a first value having a common property. The method may assign the first virtual register to use a real register. The method may further identify a second virtual register storing a second value also having the common property. The method may assign the second virtual register to use the same real register after the first value is no longer live. As a result of assigning the second virtual register to the first real register, the method may eliminate an operation configured to establish the common property for the second virtual register since this operation is redundant and is no longer needed.05-05-2011
20110099357Utilizing a Bidding Model in a Microparallel Processor Architecture to Allocate Additional Registers and Execution Units for Short to Intermediate Stretches of Code Identified as Opportunities for Microparallelization - An enhanced mechanism for parallel execution of computer programs utilizes a bidding model to allocate additional registers and execution units for stretches of code identified as opportunities for microparallelization. A microparallel processor architecture apparatus permits software (e.g. compiler) to implement short-term parallel execution of stretches of code identified as such before execution. In one embodiment, an additional paired unit, if available, is allocated for execution of an identified stretch of code. Each additional paired unit includes an execution unit and a half set of registers. This apparatus is available for compilers or assembler language coders to use and allows software to unlock parallel execution capabilities that are present in existing computer programs but heretofore were executed sequentially for lack of a suitable apparatus. The enhanced mechanism enables a variable amount of parallelism to be implemented and yet provides correct program execution even if less parallelism is available than ideal for a given computer program.04-28-2011
20090132793System and Method of Selectively Accessing a Register File - In a particular embodiment, a method is disclosed that includes identifying a first block of bits within a result to be written to a destination register by an execution unit. The result includes a plurality of bits having the first block of bits and a second block of bits. The first block of bits has a value of zero. The method further includes providing an encoded bit value representing the first block of bits to a control register and selectively writing the second block of bits, but not the first block of bits, to the destination register. The destination register is sized to receive the first and second blocks of bits.05-21-2009
20090132794Method and apparatus for performing complex calculations in a multiprocessor array - A method and apparatus for performing complex mathematical calculations. The apparatus includes a multicore processor 05-21-2009
20090089559METHOD OF MANAGING DATA MOVEMENT AND CELL BROADBAND ENGINE PROCESSOR USING THE SAME - A method of managing data movement in a cell broadband engine processor, comprising: determining one or more idle synergistic processing elements among multiple SPEs in the cell broadband engine processor as a managing SPE, and informing a computing SPE among said multiple SPEs of a starting effective address of a LS of said managing SPE and an effective address for a command queue; and said managing SPE managing movement of data associated with computing of said computing SPE based on the command queue from the computing SPE.04-02-2009
20100205412CONTROL SEQUENCER - An audio codec (08-12-2010
20120254596METHOD AND SYSTEM FOR CONTROLLING MESSAGE TRAFFIC BETWEEN TWO PROCESSORS - A system and method for controlling messaging between a first processor and a second processor is disclosed. The second processor controls one or more peripheral devices on behalf of a plurality of predetermined tasks being executed by the first processor. The system includes a message control module that receives an input message intended for the second processor from the first processor and maintains a message history based on the received input message and previously received input messages. The message history indicates which peripheral devices of the system are to be on and which tasks of the plurality of tasks requested the peripheral devices to be on. The message control module is further configured to generate an output message that includes output instructions for the second processor based on the message history and an output duration based on the message history. The second processor executes the output instructions.10-04-2012
20100049953DATA CACHE RECEIVE FLOP BYPASS - A microprocessor includes an N-way cache and a logic block that selectively enables and disables the N-way cache for at least one clock cycle if a first register load instructions and a second register load instruction, following the first register load instruction, are detected as pointing to the same index line in which the requested data is stored. The logic block further provides a disabling signal to the N-way cache for at least one clock cycle if the first and second instructions are detected as pointing to the same cache way.02-25-2010
20120317402EXECUTING A START OPERATOR MESSAGE COMMAND - A facility is provided to enable operator message commands from multiple, distinct sources to be provided to a coupling facility of a computing environment for processing. These commands are used, for instance, to perform actions on the coupling facility, and may be received from consoles coupled to the coupling facility, as well as logical partitions or other systems coupled thereto. Responsive to performing the commands, responses are returned to the initiators of the commands.12-13-2012
20090327668Multi-Threaded Processes For Opening And Saving Documents - Tools and techniques are described for multi-threaded processing for opening and saving documents. These tools may provide load processes for reading documents from storage devices, and for loading the documents into applications. These tools may spawn a load process thread for executing a given load process on a first processing unit, and an application thread may execute a given application on a second processing unit. A first pipeline may be created for executing the load process thread, with the first pipeline performing tasks associated with loading the document into the application. A second pipeline may be created for executing the application process thread, with the second pipeline performing tasks associated with operating on the documents. The tasks in the first pipeline are configured to pass tokens as input to the tasks in the second pipeline.12-31-2009
20090327667System and Method to Perform Fast Rotation Operations - Systems and methods to perform fast rotation operations are disclosed. In a particular embodiment, a method includes executing a single instruction. The method includes receiving first data indicating a first coordinate and a second coordinate, receiving a first control value that indicates a first rotation value selected from a set of ninety degree multiples, and writing output data corresponding to the first data rotated by the first rotation value.12-31-2009
20090327666METHOD AND SYSTEM FOR HARDWARE-BASED SECURITY OF OBJECT REFERENCES - A method for managing data, including obtaining a first instruction for moving a first data item from a first source to a first destination, determining a data type of the first data item, determining a data type supported by the first destination, comparing the data type of the first data item with the data type supported by the first destination to test a validity of the first instruction, and moving the first data item from the first source to the first destination based on the validity of the first instruction.12-31-2009
20120260075CONDITIONAL ALU INSTRUCTION PRE-SHIFT-GENERATED CARRY FLAG PROPAGATION BETWEEN MICROINSTRUCTIONS IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - A microprocessor includes a hardware instruction translator that translates an architectural instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the shift operation on the first source operand to generate the first result and a carry flag value and updates a non-architectural carry flag with the generated carry flag value. To execute the second microinstruction, it performs the second operation on the first result and the second operand to generate the second result and new condition flag values based on the second result. If a architectural condition flags satisfy the condition, it updates the architectural carry flag with the non-architectural carry flag value and updates at least one of the other architectural condition flags with the corresponding generated new condition flag values; otherwise, it updates the architectural condition flags with the current value of the architectural condition flags.10-11-2012
20120260074EFFICIENT CONDITIONAL ALU INSTRUCTION IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - A microprocessor having performs an architectural instruction that instructs it to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if its architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result. To execute the second microinstruction, it writes the destination register with the result generated by the first microinstruction if the architectural condition flags satisfy the condition, and writes the destination register with the current value of the destination register if the architectural condition flags do not satisfy the condition.10-11-2012
20090019269Methods and Apparatus for a Bit Rake Instruction - Techniques for performing a bit rake instruction in a programmable processor. The bit rake instruction extracts an arbitrary pattern of bits from a source register, based on a mask provided in another register, and packs and right justifies the bits into a target register. The bit rake instruction allows any set of bits from the source register to be packed together.01-15-2009
20080301416SYSTEM AND PROGRAM PRODUCT OF DOING PACK UNICODE Z SERIES INSTRUCTIONS - Emulation methods are provided for two PACK instructions, one for Unicode data and the other for ASCII coded data in which processing is carried out in a block-by-block fashion as opposed to a byte-by-byte fashion as a way to provide superior performance in the face of the usual challenges facing the execution of emulated data processing machine instructions as opposed to native instructions.12-04-2008
20120265971ALLOCATION OF COUNTERS FROM A POOL OF COUNTERS TO TRACK MAPPINGS OF LOGICAL REGISTERS TO PHYSICAL REGISTERS FOR MAPPER BASED INSTRUCTION EXECUTIONS - A mapper unit of an out-of-order processor assigns a particular counter currently in a counter free pool to count a number of mappings of logical registers to a particular physical register from among multiple physical registers, responsive to an execution of an instruction by the mapper unit mapping at least one logical register to the particular physical register. The number of counters is less than the number of physical registers. The mapper unit, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.10-18-2012
20110238960DISTRIBUTED PROCESSING SYSTEM, CONTROL UNIT, PROCESSING ELEMENT, DISTRIBUTED PROCESSING METHOD AND COMPUTER PROGRAM - A distributed processing system has a control unit and a plurality of processing elements and includes a control line through which control information is sent and received between the control unit and the processing elements, and a data line through which data to be processed is transmitted from at least one selected from the control unit and the processing elements to the processing element connected thereto. The data line is independent from the control line. The processing element has a processing content changing section that changes the content of processing in the processing element using processing content changing information received through the control line when a monitored processing context of the processing element meets a processing content changing condition received through the control line.09-29-2011
20110238959DISTRIBUTED CONTROLLER, DISTRIBUTED PROCESSING SYSTEM, AND DISTRIBUTED PROCESSING METHOD - A distributed controller is connected to two or more processing elements and controls the two or more processing elements to execute distributed processing. The distributed controller comprises a plurality of control modules, each of which is connected to at least one other control module. The distributed controller determines a processing path between processing elements using at least two of the plurality of control modules.09-29-2011
20110276791HANDLING A STORE INSTRUCTION WITH AN UNKNOWN DESTINATION ADDRESS DURING SPECULATIVE EXECUTION - The described embodiments provide a system for executing instructions in a processor. While executing instructions in an execute-ahead mode, the processor encounters a store instruction for which a destination address is unknown. The processor then defers the store instruction. Upon encountering a load instruction while the store instruction with the unknown destination address is deferred, the processor determines if the load instruction is to continue executing. If not, the processor defers the load instruction. Otherwise, the processor continues executing the load instruction.11-10-2011
20120096245COMPUTING DEVICE, PARALLEL COMPUTER SYSTEM, AND METHOD OF CONTROLLING COMPUTER DEVICE - A computing device includes a receiving unit that receives control information indicating an instruction to be executed on a process that is distributed or an instruction contained in the process that is distributed, from a control information creating device that transmits the control information to each computing device on a network. The computing device further includes a processor configured to suspend execution of an instruction when the instruction to be executed on the process occurs or the instruction contained in the process that is distributed is executed, and execute the suspended instruction when the suspended instruction is associated with the instruction indicated by the control information that is received by the receiving unit.04-19-2012
20120331276Instruction Execution - A method of executing an instruction set to select a set of registers, includes reading a first instruction of the instruction set; interpreting a first operand of the first instruction to represent a first register S to be selected; interpreting a second operand of the first instruction to represent a number N of registers to be selected; and selecting N consecutive registers starting at the first register S to form the set of registers.12-27-2012
20110320781DYNAMIC DATA SYNCHRONIZATION IN THREAD-LEVEL SPECULATION - In one embodiment, the present invention introduces a speculation engine to parallelize serial instructions by creating separate threads from the serial instructions and inserting processor instructions to set a synchronization bit before a dependence source and to clear the synchronization bit after a dependence source, where the synchronization bit is designed to stall a dependence sink from a thread running on a separate core. Other embodiments are described and claimed.12-29-2011
20110145551TWO-STAGE COMMIT (TSC) REGION FOR DYNAMIC BINARY OPTIMIZATION IN X86 - Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reording load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.06-16-2011
20130024673PROCESSSING UNIT AND MICRO CONTROLLER UNIT (MCU) - A technology capable of reducing load on both system processing and filter operation and improving power consumption and performance is provided. In a digital signal processor, a program memory, a program counter, and a control logic circuit are provided, and a bit field of each instruction includes instruction stop flag information and bit field information. Also, the control logic circuit carries out the control in such a manner that the instruction whose instruction stop flag information is cleared is executed as is to proceed to the next instruction processing, execution of the instruction whose instruction stop flag information is set is stopped if an execution resumption trigger condition corresponding to the bit field information is not satisfied, and the instruction whose instruction stop flag information is set is executed if the execution resumption trigger condition corresponding to bit field information is satisfied, to proceed to the next instruction processing.01-24-2013
20080244242Using a Register File as Either a Rename Buffer or an Architected Register File - A computer implemented method, apparatus, and computer usable program code are provided for implementing a set of architected register files as a set of temporary rename buffers. An instruction dispatch unit receives an instruction that includes instruction data. The instruction dispatch unit determines a thread mode under which a processor is operating. Responsive to determining the thread mode, the instruction dispatch unit determines an ability to use the set of architected register files as the set of temporary rename buffers. Responsive to the ability to use the set of architected register files as the set of temporary rename buffers, the instruction dispatch unit analyzes the instruction to determine an address of an architected register file in the set of architected register files where the instruction data is to be stored. The architected register file operating as a temporary rename buffer stores the instruction data as finished data.10-02-2008
20080235497Parallel Data Output - Multiple processing threads operate in parallel to convert data, produced by one or more electronic design automation processes in an initial format, into another data format for output. A processing thread accesses a portion of the initial results data produced by one or more electronic design automation processes in an initial format and in an initial organizational arrangement. The processing thread will then store data within this portion of the initial results data belonging to a target category of the desired output organizational arrangement, such as a cell, at a memory location corresponding to that target category. It will also convert the stored data from a first data format to another data format for output. The first data format may use a relatively low amount of compression, with the second data format may use a relatively high level of compression. Each of a plurality of processing threads may operate in this manner in parallel upon portions of the initial results data, until all of the initial results data has been converted to the desired data format for output. A processing thread can then collect the converted data from the various memory locations, and provide it as output data for the electronic design automation process or processes.09-25-2008
20130173892CONVERT TO ZONED FORMAT FROM DECIMAL FLOATING POINT FORMAT - Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location.07-04-2013
20110246752Emulating Execution of An Instruction For Discovering Virtual Topology of a Logical Partitioned Computer System - In a logically partitioned host computer system comprising host processors (host CPUs), a facility and instruction for discovering topology of one or more guest processors (guest CPUs) of a guest configuration comprises a guest processor of the guest configuration fetching and executing a STORE SYSTEM INFORMATION instruction that obtains topology information of the computer configuration. The topology information comprising nesting information of processors of the configuration and the degree of dedication a host processor provides to a corresponding guest processor. The information is preferably stored in a single table in memory.10-06-2011
20080222400Power Consumption of a Microprocessor Employing Speculative Performance Counting - Reduction of power consumption and chip area of a microprocessor employing speculative performance counting, comprising splitting a counter and a backup register of a speculative counting mechanism performing the speculative performance counting into first and second parts each, re-using an available storage within the microprocessor as first parts respectively; integrating at least one dedicated pre-counter into the microprocessor as second parts respectively; splitting the data handled by the speculative counting mechanism in high-order and low-order bits; storing the high order bits in the first parts; storing the low order bits in the second parts; updating the first parts periodically; and saving and propagating the carry-out from the second parts to high-order bits when a corresponding first part of the second parts is next updated respectively.09-11-2008
20130145133PROCESSOR, APPARATUS AND METHOD FOR GENERATING INSTRUCTIONS - A processor, apparatus and method to use a multiple store instruction based on physical addresses of registers are provided. The processor is configured to execute an instruction to store data of a plurality of registers in a memory, the instruction including a first area in which a physical address of each of the registers is written. An instruction generating apparatus is configured to generate an instruction to store data of a plurality of registers in a memory, the instruction including a first area in which a physical address of each of the registers is written. An instruction generating method includes detecting a code area that instructs to store data of a plurality of registers in a memory, from a program code. The instruction generating method further includes generating an instruction corresponding to the code area by mapping physical addresses of the registers to a first area of the instruction.06-06-2013
20100287360Task Processing Device - The speed of task scheduling by a multitask OS is increased. A task processor includes a CPU, a save circuit, and a task control circuit. The CPU is provided with a processing register and an execution control circuit operative to load data from a memory into a processing register and execute a task in accordance with the data in the processing register. The save circuit is provided with a plurality of save registers respectively associated with a plurality of tasks. In executing a predetermined system call, the execution control circuit notifies the task control circuit as such. The task control circuit switches between tasks for execution upon receipt of the system call signal, by saving, in the save register associated with a task being executed, the data in the processing register, selecting a task to be executed next, and loading data in the save register associated with the selected task into the processing register.11-11-2010
20110258420EXECUTION MIGRATION - An execution migration approach includes bringing the computation to the locus of the data: when a memory instruction requests an address not cached by the current core, the execution context (current program counter, register values, etc.) moves to the core where the data is cached.10-20-2011
20130151821METHOD AND INSTRUCTION SET INCLUDING REGISTER SHIFTS AND ROTATES FOR DATA PROCESSING - A method includes identifying at least one first register with M bits and identifying at least one second register with N bits. The process also includes shifting K bits, where K≦N, from the second register into the first register. The shifting operation executes a left shift or a right shift operation. For a left shift operation, bits K . . . N−1 from the first register are read, the bits K . . . N−1 are written into bit positions 0 . . . N-k−1 of the first register, the K bits from the second register are read, and the K bits from the second register are written into bit positions N-K . . . N−1 of first register. The right shift includes reading bits 0 . . . N-K−1 from the first register, writing the bits 0 . . . N-K−1 into bit position K . . . N−1 of the first register, reading the K bits from the second register, and writing the K bits from second register into bit positions 0 . . . K−1 of first register.06-13-2013
20130185545HIGH-PERFORMANCE CACHE SYSTEM AND METHOD - A digital system includes a processor core and a cache control unit. The processor core can be coupled to a first memory containing data and a second memory with a faster speed than the first memory, and is configured to execute a segment of instructions having at least one instruction accessing the data from the second memory using a base register. The cache control unit is configured to be coupled to the first memory, the second memory, and the processor core to fill the data from the first memory to the second memory before the processor core executes the instruction accessing the data, and is further configured to examine the segment of instructions to extract instruction information containing at least data access instruction information and last register updating instruction information and to create a track corresponding to the segment of instructions based on the extracted instruction information.07-18-2013
20130185544PROCESSOR WITH INSTRUCTION VARIABLE DATA DISTRIBUTION - A vector processor includes a plurality of execution units arranged in parallel, a register file, and a plurality of load units. The register file includes a plurality of registers coupled to the execution units. Each of the load units is configured to load, in a single transaction, a plurality of the registers with data retrieved from memory. The loaded registers corresponding to different execution units. Each of the load units is configured to distribute the data to the registers in accordance with an instruction selectable distribution. The instruction selectable distribution specifies one of plurality of distributions. Each of the distributions specifies a data sequence that differs from the sequence in which the data is stored in memory.07-18-2013
20130124836CONSTANT DATA ACESSING SYSTEM AND METHOD THEREOF - A constant data accessing system having a constant pool comprises a computer processor having a constant pool base register, a compiler having a constant pool handler, and an instruction set module having a constant pool instruction set unit. The constant pool base register is configured to store a value of constant pool base address of one or a plurality of subroutines which have constants to be accessed.05-16-2013
20120030452MODIFYING COMMANDS - The present disclosure includes methods, devices, modules, and systems for modifying commands. One device embodiment includes a memory controller including a channel, wherein the channel includes a command queue configured to hold commands, and circuitry configured to modify at least a number of commands in the queue and execute the modified commands.02-02-2012

Patent applications in class Processing control for data transfer