Entries |
Document | Title | Date |
20080201559 | Switching Device and Corresponding Method for Activating a Load - A cost-effective safety concept for safety-relevant applications in motor vehicles accordingly activates a load not directly from a central unit, but instead indirectly via a switching device. The latter has a first and a second register for the acquisition of the same control data from the central unit, and a third register for outputting data to the load. A transmission device transmits data from the second register to the third register. A first comparison logic compares a content of the second register with that of the third register and sends an interrupt to the central unit, when the two contents are not identical. A second comparison logic compares the content of the first and second registers and enables the transmission device, when the contents of the two registers are identical and otherwise blocks the transmission device. The last held state is thus maintained in the event of an error. | 08-21-2008 |
20080209186 | Method for Reducing Buffer Capacity in a Pipeline Processor - The invention presents a method for a processor ( | 08-28-2008 |
20080222400 | Power Consumption of a Microprocessor Employing Speculative Performance Counting - Reduction of power consumption and chip area of a microprocessor employing speculative performance counting, comprising splitting a counter and a backup register of a speculative counting mechanism performing the speculative performance counting into first and second parts each, re-using an available storage within the microprocessor as first parts respectively; integrating at least one dedicated pre-counter into the microprocessor as second parts respectively; splitting the data handled by the speculative counting mechanism in high-order and low-order bits; storing the high order bits in the first parts; storing the low order bits in the second parts; updating the first parts periodically; and saving and propagating the carry-out from the second parts to high-order bits when a corresponding first part of the second parts is next updated respectively. | 09-11-2008 |
20080235497 | Parallel Data Output - Multiple processing threads operate in parallel to convert data, produced by one or more electronic design automation processes in an initial format, into another data format for output. A processing thread accesses a portion of the initial results data produced by one or more electronic design automation processes in an initial format and in an initial organizational arrangement. The processing thread will then store data within this portion of the initial results data belonging to a target category of the desired output organizational arrangement, such as a cell, at a memory location corresponding to that target category. It will also convert the stored data from a first data format to another data format for output. The first data format may use a relatively low amount of compression, with the second data format may use a relatively high level of compression. Each of a plurality of processing threads may operate in this manner in parallel upon portions of the initial results data, until all of the initial results data has been converted to the desired data format for output. A processing thread can then collect the converted data from the various memory locations, and provide it as output data for the electronic design automation process or processes. | 09-25-2008 |
20080244242 | Using a Register File as Either a Rename Buffer or an Architected Register File - A computer implemented method, apparatus, and computer usable program code are provided for implementing a set of architected register files as a set of temporary rename buffers. An instruction dispatch unit receives an instruction that includes instruction data. The instruction dispatch unit determines a thread mode under which a processor is operating. Responsive to determining the thread mode, the instruction dispatch unit determines an ability to use the set of architected register files as the set of temporary rename buffers. Responsive to the ability to use the set of architected register files as the set of temporary rename buffers, the instruction dispatch unit analyzes the instruction to determine an address of an architected register file in the set of architected register files where the instruction data is to be stored. The architected register file operating as a temporary rename buffer stores the instruction data as finished data. | 10-02-2008 |
20080244243 | Computer program product and system for altering execution flow of a computer program - A debugger alters the execution flow of a child computer program of the debugger at runtime by inserting jump statements determined by the insertion of breakpoint instructions. Breakpoints are used to force the child computer program to throw exceptions at specified locations. One or more instructions of the computer program are replaced by jump instructions. The jump destination addresses associated with the break instructions can be specified by input from a user. The debugger changes the instruction pointer of the child program to achieve the desired change in execution flow. No instructions are lost in the child program. | 10-02-2008 |
20080250232 | Data Processing Device, Data Processing Program, and Recording Medium Recording Data Processing Program - A dependence relationship storage unit M indicates from which input address and input value each of the output addresses and output values derives. An inter-line AND comparator MR performs AND between each of the line components stored in the dependence relationship storage unit M and sets an I/O group including an output pattern containing at least one output address and output value and an input pattern containing at least one input address and input value. Thus, it is possible to provide a data processing device capable of registering an I/O group appropriate for reuse in instruction section storage means. | 10-09-2008 |
20080263337 | INSTRUCTIONS FOR ORDERING EXECUTION IN PIPELINED PROCESSES - Ordering instructions for specifying the execution order of other instructions improve throughput in a pipelined multiprocessor. Memory write operations local to a CPU are allowed to occur in an arbitrary order, and constraints are placed on shared memory operations. Multiple sets of instructions are provided in which order of execution of the instructions is maintained through the use of CPU registers, write buffers in conjunction with assignment of sequence numbers to the instruction, or a hierarchical ordering system. The system ensures that an earlier designated instruction has reach a specified state of execution prior to a latter instruction reaching a specified state of execution. The ordering of operations allows memory operations local to a CPU to occur in conjunction with other memory operations that are not affected by such execution. | 10-23-2008 |
20080288757 | Communicating Instructions and Data Between a Processor and External Devices - A mechanism for communicating instructions and data between a processor and external devices are provided. The mechanism makes use of a channel interface as the primary mechanism for communicating between the processor and a memory flow controller. The channel interface provides channels for communicating with processor facilities, memory flow control facilities, machine state registers, and external processor interrupt facilities, for example. These channels may be designated as blocking or non-blocking. With blocking channels, when no data is available to be read from the corresponding registers, or there is no space available to write to the corresponding registers, the processor is placed in a low power “stall” state. The processor is automatically awakened, via communication across the blocking channel, when data becomes available or space is freed. Thus, the channels of the present invention permit the processor to stay in a low power state. | 11-20-2008 |
20080294879 | Asynchronous Ripple Pipeline - An asynchronous ripple pipeline has a plurality of stages, each with a controller ( | 11-27-2008 |
20080301415 | INFORMATION PROCESSING SYSTEM - An information processing system includes a first processor that accesses a first memory, a second processor that accesses a second memory, and a data transfer unit for executing data transfer between the first memory and the second memory. The first processor executes functions of translating an instruction out of instructions included in the program except a memory access instruction into an instruction for the second processor and translating the memory access instruction into an instruction sequence containing a call instruction of the program to transfer the access data on the first memory to the second memory via a data transfer unit. | 12-04-2008 |
20080301416 | SYSTEM AND PROGRAM PRODUCT OF DOING PACK UNICODE Z SERIES INSTRUCTIONS - Emulation methods are provided for two PACK instructions, one for Unicode data and the other for ASCII coded data in which processing is carried out in a block-by-block fashion as opposed to a byte-by-byte fashion as a way to provide superior performance in the face of the usual challenges facing the execution of emulated data processing machine instructions as opposed to native instructions. | 12-04-2008 |
20080307207 | DATA EXCHANGE AND COMMUNICATION BETWEEN EXECUTION UNITS IN A PARALLEL PROCESSOR - A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes. | 12-11-2008 |
20080313439 | PIPELINE DEVICE WITH A PLURALITY OF PIPELINED PROCESSING UNITS - In a pipeline device, the output of each of processing units is connected to a corresponding one of data output lines of data transfer lines. Input selectors are provided for the processing units, respectively. Each input selector selects one of the data transfer lines except for one data output line to which the output of a corresponding one processing unit is connected to thereby determine one of interconnection patterns among the processing units. The interconnection patterns correspond to data-processing tasks, respectively. Each input selector inputs, to a corresponding one of the processing units, data flowing through the selected one of the data transfer lines. Each processing unit individually performs a predetermined process based on data inputted thereto by a corresponding one of the input selectors to thereby perform, in pipeline, one of the data-processing tasks corresponding to the determined one of the interconnection patterns. | 12-18-2008 |
20080313440 | SWITCHING TO ORIGINAL CODE COMPARISON OF MODIFIABLE CODE FOR TRANSLATED CODE VALIDITY WHEN FREQUENCY OF DETECTING MEMORY OVERWRITES EXCEEDS THRESHOLD - A method of translating instructions from a target instruction set to a host instruction set. In one embodiment, a plurality of first target instructions is translated into a plurality of first host instructions. After the translation, it is determined whether the plurality of first target instructions has changed. A copy of a second plurality of target instructions is stored and compared with the plurality of first target instructions if the determining slows the operation of the computer system. After comparing, the plurality of first host instructions is invalidated if there is a mismatch. According to one embodiment, the storing, the comparing and the invaliding is initiated when the determining indicates that a page contains at least one change to the plurality of first target instructions. In one embodiment, the determining is by examining a bit indicator associated with a memory location of the plurality of first target instructions. | 12-18-2008 |
20080313441 | METHOD AND STRUCTURE FOR PRODUCING HIGH PERFORMANCE LINEAR ALGEBRA ROUTINES USING REGISTER BLOCK DATA FORMAT ROUTINES - A method (and structure) of executing a matrix operation, includes, for a matrix A, separating the matrix A into blocks, each block having a size p-by-q. The blocks of size p-by-q are then stored in a cache or memory in at least one of the two following ways. The elements in at least one of the blocks is stored in a format in which elements of the block occupy a location different from an original location in the block, and/or the blocks of size p-by-q are stored in a format in which at least one block occupies a position different relative to its original position in the matrix A. | 12-18-2008 |
20090006823 | DESIGN STRUCTURE FOR SINGLE HOT FORWARD INTERCONNECT SCHEME FOR DELAYED EXECUTION PIPELINES - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding data in a processor is provided. The design structure includes a processor. The processor includes at least one cascaded delayed execution pipeline unit having a first and second pipeline, wherein the second pipeline is configured to execute instructions in a common issue group in a delayed manner relative to the first pipeline, and circuitry. The circuitry is configured to determine if a first instruction being executed in the first pipeline modifies data in a data register which is accessed by a second instruction being executed in the second pipeline, and if the first instruction being executed in the first pipeline modifies data in the data register which is accessed by the second instruction being executed in the second pipeline, forward the modified data from the first pipeline to the second pipeline. | 01-01-2009 |
20090006824 | STRUCTURE FOR A CIRCUIT FUNCTION THAT IMPLEMENTS A LOAD WHEN RESERVATION LOST INSTRUCTION TO PERFORM CACHELINE POLLING - A design structure for a circuit function that implements a load when reservation lost instruction for performing cacheline polling is disclosed. Initially, a first process requests an action to be performed by a second process. The request is made via a store operation to a cacheable memory location. The first process then reads the cacheable memory location via a conditional load operation to determine whether or not the requested action has been completed by the second process, and the first process sets a reservation at the cacheable memory location if the requested action has not been completed by the second process. The conditional load operation of the first process is stalled until the reservation at the cacheable memory location has been lost. After the requested action has been completed, the reservation in the cacheable memory location is reset by the second process. | 01-01-2009 |
20090013156 | PROCESSOR COMMUNICATION TOKENS - The invention provides a method of transmitting messages over an interconnect between processors, each message comprising a header token specifying a destination processor and at least one of a data token and a control token. The method comprises: executing a first instruction on a first one of the processors to generate a data token comprising a byte of data and at least one additional bit to identify that token as a data token, and outputting the data token from the first processor onto the interconnect as part of one of the messages. The method also comprises executing a second instruction on said first processor to generate a control token comprising a byte of control information and at least one additional bit to identify that token as a control token, and outputting the control token from the first processor onto the interconnect as part of one of the messages. | 01-08-2009 |
20090013157 | Management of Software Implemented Services in Processor-Based Devices - A service management system for devices with embedded processor systems manages use of memory by programs implementing the services by assigning services to classes and limiting the number of services per class that can be loaded into memory. Classes enable achieving predictable and stable system behavior, defining the services and service classes in a manifest that is downloaded to embedded devices operating on a network, such as a cable or satellite television network, telephone or computer network, and permit a system operator, administrator, or manager to manage the operation of the embedded devices while deploying new services implemented with applications downloaded from the network when the service is requested by a user. | 01-08-2009 |
20090013158 | System and Method for Assigning Tags to Control Instruction Processing in a Superscalar Processor - A tag monitoring system for assigning tags to instructions embodied in software on a tangible computer-readable storage medium. A source supplies instructions to be executed by a functional unit. A queue having a plurality of slots containing tags which are used for tagging instructions. A register file stores information required for the execution of each instruction at a location in the register file defined by the tag assigned to that instruction. A control unit monitors the completion of executed instructions and advances the tags in the queue upon completion of an executed instruction. The register file also contains a plurality of read address enable ports and corresponding read output ports. Each of the slots from the queue is coupled to a corresponding one of the read address enable ports. Thus, the information for each instruction can be read out of the register file in program order. | 01-08-2009 |
20090013159 | Queue Processor And Data Processing Method By The Queue Processor - A queue processor and its data processing method are provided. It can do high-speed data processing and decreases the electric energy consumption. | 01-08-2009 |
20090019269 | Methods and Apparatus for a Bit Rake Instruction - Techniques for performing a bit rake instruction in a programmable processor. The bit rake instruction extracts an arbitrary pattern of bits from a source register, based on a mask provided in another register, and packs and right justifies the bits into a target register. The bit rake instruction allows any set of bits from the source register to be packed together. | 01-15-2009 |
20090031118 | Apparatus and method for controlling order of instruction - An apparatus includes an instruction generator which generates a load instruction and a first store instruction from a program, a processor which executes said load and store instruction, wherein said instruction generator analyzes a relevancy between said load instruction and said first store instruction with respect to memory addresses accessed by said instructions, specifies a second store instruction irrelevant to said load instruction with respect to said memory address, and notifies said second store instruction to said processor, wherein said processor executes said load instruction in advance of said second store instruction during said processor prepares to execute said second store instruction. | 01-29-2009 |
20090031119 | Method for the operation of a multiprocessor system in conjunction with a medical imaging system - The invention relates to a method for operating a multiprocessor system, especially in conjunction with a medical imaging system. The invention also relates to a medical imaging device which is designed to perform this method. The multiprocessor system in this case has at least two processing units, at least one control unit and operations which can be allocated to the processing units. Data provided from an input is processed by the processing unit and made available at an output. The at least one control unit enhances the named data with control data, which defines an allocation of the data to the respective operations for the purposes of processing. | 01-29-2009 |
20090037701 | Method of Updating Electronic Operationg Instructions of a Vehicle and an Operating Instructions Updating System - A method and system for updating electronic operating instructions of a vehicle is provided. Local operating instruction data objects are stored in a local storage device arranged in the vehicle so that they can be used by the driver. Corresponding current operating instruction data objects are stored in an external storage device. One data object category, respectively, is assigned to the operating instruction data objects. For updating, a current operating instruction data object is transmitted from the external storage device to the local storage device in order to modify the corresponding local operating instruction data object in the local storage device The frequency of the updating of a local operating instruction data object depends on the data object category assigned to the data object. | 02-05-2009 |
20090037702 | Processor and data load method using the same - A processor includes an instruction decoder, an instruction execution part and a register file. The instruction decoder is adapted to decode an instruction. The instruction execution part is adapted to execute processing corresponding to the instruction decoded by the instruction decoder. The register file is capable of storing load data from a data memory and supplying input data to the instruction execution part. The register file includes a plurality of registers, each of which is capable of holding a plurality of bits of data. Furthermore, the register file is configured to update the data held by the plurality of registers by shifting the data held by the plurality of registers among the plurality of registers. | 02-05-2009 |
20090049284 | Parallel Subword Instructions With Distributed Results - The present invention provides for parallel subword instructions that cause results to be non-contiguously stored in a result register. For example, a targeting-type instruction can specify (implicitly or explicitly) a bit position and the result of each of the parallel subword compare operations can be stored at that bit position within the respective subword location of a result register. Alternatively, for a shifting-type instruction, pre-existing contents of a result register can be shifted one bit toward greater significance while the results are of the present operation are stored in the least-significant bits of respective result-register subword locations. This approach provides the results of multiple parallel subword compare instructions to be combined with relatively few instructions and reduces the maximum lateral movement of information—both of which can enhance performance. | 02-19-2009 |
20090049285 | INFORMATION DELIVERY APPARATUS, INFORMATION REPRODUCTION APPARATUS, AND INFORMATION PROCESSING METHOD - An information delivery apparatus includes an encoding information collection unit which collects information used to encode content information, a generation unit which predicts decode processes of the content information based on the collected information, and generates configuration information used to configure data paths required to execute the decode processes, an embedding unit which embeds the configuration information in the content information, and a delivery unit which delivers the content information embedded with the configuration information. | 02-19-2009 |
20090063828 | Systems and Methods for Communication between a PC Application and the DSP in a HDA Audio Codec - Systems and methods implemented in a PC for enabling communication between an application executing on the CPU and a DSP that is incorporated into a codec in the High Definition Audio (HDA) system, wherein the communication is carried out via the HDA bus. In one embodiment, an HDA codec includes one or more conventional HDA widgets coupled to a programmable processor such as a DSP. The codec includes a set of registers that are configured to store HDA verbs and data transmitted via the HDA bus. The programmable processor is configured to identify verbs that indicate associated information is a communication from an application executing on the CPU, read the associated information, and process the information according to the associated verbs. The information may be program instructions, parametric data, requests for information, etc. | 03-05-2009 |
20090089558 | ADJUSTMENT OF DATA COLLECTION RATE BASED ON ANOMALY DETECTION - Systems and methods that vary multiple data sampling rates, to collect sets of data with different levels of granularity for an industrial system. The data for such industrial system includes sets of data from the “internal” data stream(s) (e.g., history data collected from an industrial unit) and sets of data from an “external” (e.g., traffic data on network services) data stream(s), based in part on the criticality/importance criteria assigned to each collection stage. Each set of data can be assigned its own unique data collection rate. For example, a higher sample rate can be employed when collecting data from the network during an operation stage that is deemed more critical (e.g., dynamic attribution of predetermined importance factors) than the rest of the operation. | 04-02-2009 |
20090089559 | METHOD OF MANAGING DATA MOVEMENT AND CELL BROADBAND ENGINE PROCESSOR USING THE SAME - A method of managing data movement in a cell broadband engine processor, comprising: determining one or more idle synergistic processing elements among multiple SPEs in the cell broadband engine processor as a managing SPE, and informing a computing SPE among said multiple SPEs of a starting effective address of a LS of said managing SPE and an effective address for a command queue; and said managing SPE managing movement of data associated with computing of said computing SPE based on the command queue from the computing SPE. | 04-02-2009 |
20090094442 | Storage medium storing load detecting program and load detecting apparatus - A load detecting apparatus includes a load controller, and judges a motion of a player on the basis of detected load values. Judgment timing for a motion of putting the feet on and down from the controller is decided on the basis of an elapsed time from an instruction of a motion. In a case that a step-up-and-down exercise is performed, a judgment timing of a motion for bringing about a state both of the feet are put down on a ground at a fourth step is decided on the basis of a judgment timing of a motion of putting a third step down. | 04-09-2009 |
20090113187 | Processor architecture for executing instructions using wide operands - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers. | 04-30-2009 |
20090113188 | COORDINATOR SERVER, DATABASE SERVER, AND PIPELINE PROCESSING CONTROL METHOD - A first transmitting unit transmits a processing command to a plurality of parallelized database servers. A second transmitting unit integrates data sets transmitted from the database servers in response to the processing command, and transmits an integrated data set to a client. An integrating unit integrates data sets buffered in a buffer unit. A determining unit determines a transmission start or a transmission suspend of the data sets based on a data size in the buffer unit. A third transmitting unit transmits a control command for the transmission start or the transmission suspend to the database servers based on a result of determination by the first determining unit. | 04-30-2009 |
20090125706 | Software Pipelining on a Network on Chip - A network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, where each memory communications controller controlling communications between an IP block and memory, and each network interface controller controlling inter-IP block communications through routers, the NOC also including a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID with each stage executing on a thread of execution on an IP block. | 05-14-2009 |
20090132793 | System and Method of Selectively Accessing a Register File - In a particular embodiment, a method is disclosed that includes identifying a first block of bits within a result to be written to a destination register by an execution unit. The result includes a plurality of bits having the first block of bits and a second block of bits. The first block of bits has a value of zero. The method further includes providing an encoded bit value representing the first block of bits to a control register and selectively writing the second block of bits, but not the first block of bits, to the destination register. The destination register is sized to receive the first and second blocks of bits. | 05-21-2009 |
20090132794 | Method and apparatus for performing complex calculations in a multiprocessor array - A method and apparatus for performing complex mathematical calculations. The apparatus includes a multicore processor | 05-21-2009 |
20090138687 | MEMORY DEVICE HAVING DATA PROCESSING FUNCTION - A memory device having a data processing function is disclosed. The memory device can include a process area, in which process command information is written by a processor; a storage area, in which one or more data is written; an output area, in which display data selected by the processor from among the data written in the storage area is written; and a processing unit, which performs one or more processes of copying data, computing data, and transmitting display data to an external outputting device, in correspondence with the process command information. According to some aspects of the present invention, the memory device is able to independently perform commands received from the processor, and does not require a separate memory for storing data that will be transmitted to an external outputting device, so that the processing efficiency of the processor can be enhanced. | 05-28-2009 |
20090158014 | System and Method for Retiring Approximately Simultaneously a Group of Instructions in a Superscalar Microprocessor - An apparatus and method for executing instructions having a program order. The apparatus comprising a temporary buffer, tag assignment logic, a plurality of functional units, a plurality of data paths, a register array, a retirement control block, and a superscalar instruction retirement unit. The temporary buffer includes a plurality of temporary buffer locations to store result data for executed instructions, wherein the temporary buffer locations are arranged in a plurality of groups of temporary buffer locations. The tag assignment logic is configured to concurrently assign a tag to each instruction in a first set of instructions, wherein the tags are assigned such that the respective tag assigned to each of the instructions in the first set of instructions identifies a different one of the temporary buffer locations in a first one of the groups of temporary buffer locations. | 06-18-2009 |
20090164763 | METHOD AND APPARATUS FOR A DOUBLE WIDTH LOAD USING A SINGLE WIDTH LOAD PORT - A single micro-instruction to perform either an N-bit or a 2N-bit load is provided. A microprocessor having an N-bit load port performs either an N-bit load or a 2N-bit load in a single cycle with the same micro-instruction being used for both the N-bit and the 2N-bit load. | 06-25-2009 |
20090172363 | MIXING INSTRUCTIONS WITH DIFFERENT REGISTER SIZES - When legacy instructions, that can only operate on smaller registers, are mixed with new instructions in a processor with larger registers, special handling and architecture are used to prevent the legacy instructions from causing problems with the data in the upper portion of the registers, i.e., the portion that they cannot directly access. In some embodiments, the upper portion of the registers are saved to temporary storage while the legacy instructions are operating, and restored to the upper portion of the registers when the new instructions are operating. A special instruction may also be used to disable this save/restore operation if the new instruction are not going to use the upper part of the registers. | 07-02-2009 |
20090172364 | DEVICE, SYSTEM, AND METHOD FOR GATHERING ELEMENTS FROM MEMORY - A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed. | 07-02-2009 |
20090172365 | Instructions and logic to perform mask load and store operations - In one embodiment, logic is provided to receive and execute a mask move instruction to transfer a vector data element including a plurality of packed data elements from a source location to a destination location, subject to mask information for the instruction. Other embodiments are described and claimed. | 07-02-2009 |
20090172366 | Enabling permute operations with flexible zero control - In one embodiment, the present invention includes logic to receive a permute instruction, first and second source operands, and control values, and to perform a permute operation based on an operation between at least two of the control values. Multiple permute instructions may be combined to perform efficient table lookups. Other embodiments are described and claimed. | 07-02-2009 |
20090172367 | PROCESSING UNIT - A processing unit has an extended register to which instruction extension information indicating an extension of an instruction can be set. An operation unit that, when instruction extension information is set to the extended register, executes a subsequent instruction following a first instruction for writing the instruction extension information into the extended register, extends the subsequent instruction based on the instruction extension information. | 07-02-2009 |
20090182992 | Load Relative and Store Relative Facility and Instructions Therefore - A method, system and program product for loading or storing memory data wherein the address of the memory operand is based an offset of the program counter rather than an explicitly defined address location. The offset is defined by an immediate field of the instruction which is sign extended and is aligned as a halfword address when added to the value of the program counter. | 07-16-2009 |
20090193235 | TRANSFER SYSTEM, AND TRANSFER METHOD - In response to a transfer request, for which a loading time at a transfer source and a loading time at a transfer target are designated by a production controller, there is created a transfer scenario, which contains a basic transfer (From) from the transfer source to a buffer near the transfer target, for example, and a basic transfer (To) from the buffer to the transfer target. In order that the basic transfers (From, To) may be able to be executed, the buffer is reserved, and a transfer vehicle is allocated. The time period for the transfer vehicle to run to the transfer source or the buffer and the time period for the transfer vehicle to run from the transfer source or the buffer are estimated to assign a transfer command to the transfer vehicle. The possibility that the loading and the loading time may deviate from a designated period is evaluated. In case this possibility is high, a production controller is informed that a just-in-time transfer is difficult. | 07-30-2009 |
20090193236 | CONDITIONAL MEMORY ORDERING - A system for conditional memory ordering implemented in a multiprocessor environment. A conditional memory ordering instruction executes locally using a release vector containing release numbers for each processor in the system. The instruction first determines whether a processor identifier of the release number is associated with the current processor. Where it is not, a conditional registered is examined and appropriate remote synchronization operations are commanded where necessary. | 07-30-2009 |
20090198975 | TERMINATION OF IN-FLIGHT ASYNCHRONOUS MEMORY MOVE - A data processing system has a processor, a memory, and an instruction set architecture (ISA) that includes: (1) an asynchronous memory mover (AMM) store (ST) instruction initiates an asynchronous memory move operation that moves data from a first memory location having a first real address to a second memory location having a second real address by: (a) first performing a move of the data in virtual address space utilizing a source effective address a destination effective address; and (b) when the move is completed, completing a physical move of the data to the second memory location, independent of the processor. The ISA further provides (2) an AMM terminate ST instruction for stopping an ongoing AMM operation before completion of the AMM operation, and (3) a LD CMP instruction for checking a status of an AMM operation. | 08-06-2009 |
20090198976 | METHOD AND STRUCTURE FOR HIGH-PERFORMANCE MATRIX MULTIPLICATION IN THE PRESENCE OF SEVERAL ARCHITECTURAL OBSTACLES - A method (and apparatus) for processing data on a computer having a memory to store the data and a processing unit to execute the processing, the processing unit having a plurality of registers available for an internal working space for a data processing occurring in the processing unit, includes configuring the plurality of registers to include at least two sets of registers. A first set of the at least two sets interfaces with the processing unit for the data processing in a current processing cycle. A second set of the at least two sets is used for removing data from the processing unit of a previous processing cycle to be stored in the memory and preloading data into the processing unit from the memory, to be used for a next processing cycle. | 08-06-2009 |
20090204794 | METHODS COMPUTER PROGRAM PRODUCTS AND SYSTEMS FOR UNIFYING PROGRAM EVENT RECORDING FOR BRANCHES AND STORES IN THE SAME DATAFLOW - The present invention relates to a method for the unification of PER branch and PER store operations within the same dataflow. The method comprises determining a PER range, the PER range comprising a storage area defined by a designated storage starting area and a designated storage ending area, wherein the storage starting area is designated by a value of the contents of a first control register and the storage ending area is designated by a value of the contents of a second control register. The method also comprises retrieving register field content values that are stored at a plurality of registers, wherein the retrieved content values comprises a length field content value, and setting the length field content value to zero for a PER branch instruction, thereby enabling a PER branch instruction to performed similarly to a PER storage instruction. | 08-13-2009 |
20090210679 | PROCESSOR AND METHOD FOR STORE DATA FORWARDING IN A SYSTEM WITH NO MEMORY MODEL RESTRICTIONS - A pipelined microprocessor includes circuitry for store forwarding by performing: for each store request, and while a write to one of a cache and a memory is pending; obtaining the most recent value for at least one complete block of data; merging store data from the store request with the complete block of data thus updating the block of data and forming a new most recent value and an updated complete block of data; and buffering the updated complete block of data into a store data queue; for each load request, where the load request may require at least one updated completed block of data: determining if store forwarding is appropriate for the load request on a block-by-block basis; if store forwarding is appropriate, selecting an appropriate block of data from the store data queue on a block-by-block basis; and forwarding the selected block of data to the load request. | 08-20-2009 |
20090249042 | GATEWAY APPARATUS, CONTROL INSTRUCTION PROCESSING METHOD, AND PROGRAM - A gateway apparatus includes a translator connected to a first network for one or more controllers to control one or more devices, and one or more aggregators. The translator includes an acquisition unit which acquires load information concerning a load on each of the controllers, a control instruction reception unit which receives a control instruction for a device from a client via the second network, a determination unit which determines whether the instruction is an aggregation target, based on the information, a first transfer unit which transfers the instruction to the aggregator corresponding to the instruction, and a second transfer unit which receives an aggregate control instruction from the aggregator. The aggregator includes a third transfer unit which receives the instruction from the translator, an aggregation unit which aggregates the plurality of instructions into one aggregate control instruction, and a fourth transfer unit which transfers the aggregate control instruction. | 10-01-2009 |
20090254736 | Data processing system for performing data rearrangement operations - An apparatus for processing data is provided comprising rearrangement circuitry having a plurality of rearrangement stages for rearranging a plurality N of input data elements, each rearrangement stage comprising at most N multiplexers arranged to select between M data elements where M is in integer less than N. Control circuitry is provided that is responsive to program instructions to control the rearrangement circuitry to perform rearrangement operations. The rearrangement circuitry is configurable by the control circuitry to perform a plurality of different rearrangement operations. The rearrangement circuitry comprises main rearrangement circuitry having a plurality of rearrangement stages in which there is a unique path between any given input element and any given output element and supplementary rearrangement circuitry in which from each input data element there is a path to at most C output data elements where 110-08-2009 | |
20090254737 | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SUPPORTING SERVER AND INFORMATION PROCESSING SYSTEM - In information processing device or the like is provided to output a suitable form of information from a view point of user's non-feeling of trouble. In the information processing system, whether a user issues an output instruction or not is confirmed only with respect to information that is not extracted out of information stored in a first storing unit ( | 10-08-2009 |
20090282225 | STORE QUEUE - Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache. If so, the system conditionally forwards data from the unrestricted entry or the restricted entry by: (1) forwarding data from an unrestricted entry that contains the youngest store that satisfies the load instruction when any number of unrestricted or restricted entries contain data that satisfies the load instruction; (2) forwarding data from an unrestricted entry when only one restricted entry and no unrestricted entries contain data that satisfies the load instruction; and (3) deferring the load instruction by placing the load instruction in a deferred queue when two or more restricted entries and no unrestricted entries contain data that satisfies the load instruction. | 11-12-2009 |
20090282226 | Context Switching On A Network On Chip - A network on chip (‘NOC’) that includes IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to the network by an application messaging interconnect including an inbox and an outbox, one or more of the IP blocks including computer processors supporting a plurality of threads, the NOC also including an inbox and outbox controller configured to set pointers to the inbox and outbox, respectively, that identify valid message data for a current thread; and software running in the current thread that, upon a context switch to a new thread, is configured to: save the pointer values for the current thread, and reset the pointer values to identify valid message data for the new thread, where the inbox and outbox controller are further configured to retain the valid message data for the current thread in the boxes until context switches again to the current thread. | 11-12-2009 |
20090287911 | PROGRAMMABLE SIGNAL AND PROCESSING CIRCUIT AND METHOD OF DEPUNCTURING - A programmable signal processing circuit has an instruction processing circuit ( | 11-19-2009 |
20090292905 | Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer - Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer, each node including at least two processing cores, that include: performing, for each node, a local reduction operation using allreduce contribution data for the cores of that node, yielding, for each node, a local reduction result for one or more representative cores for that node; establishing one or more logical rings among the nodes, each logical ring including only one of the representative cores from each node; performing, for each logical ring, a global allreduce operation using the local reduction result for the representative cores included in that logical ring, yielding a global allreduce result for each representative core included in that logical ring; and performing, for each node, a local broadcast operation using the global allreduce results for each representative core on that node. | 11-26-2009 |
20090300337 | Instruction set design, control and communication in programmable microprocessor cases and the like - Improved instruction set and core design, control and communication for programmable microprocessors is disclosed, involving the strategy for replacing centralized program sequencing in present-day and prior art processors with a novel distributed program sequencing wherein each functional unit has its own instruction fetch and decode block, and each functional unit has its own local memory for program storage; and wherein computational hardware execution units and memory units are flexibly pipelined as programmable embedded processors with reconfigurable pipeline stages of different order in response to varying application instruction sequences that establish different configurations and switching interconnections of the hardware units. | 12-03-2009 |
20090307467 | Performing An Allreduce Operation On A Plurality Of Compute Nodes Of A Parallel Computer - Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node. | 12-10-2009 |
20090313459 | SYSTEM AND METHOD FOR PROCESSING LOW DENSITY PARITY CHECK CODES USING A DETERMINISTIC CACHING APPARATUS - A system, method and article of manufacture are disclosed for processing Low Density Parity Check (LDPC) codes. The system comprises a multitude of processing units for processing the codes; and a processor chip including an on-chip, multi-port data cache for temporarily storing the LDPC codes. This data cache includes a plurality of input ports for receiving the LDPC codes from some of the processing units, and a plurality of output ports for sending the LDPC codes to others of the processing units. An off-chip, external memory stores the LDPC codes and transmits the LDPC codes to and receives the LDPC codes from at least some of the processing units. A sequence processor controls the transmission of the LDPC codes between the processor units and the on-chip data cache so that the LDPC codes are processed by the processing units according to a given sequence. | 12-17-2009 |
20090319760 | SINGLE-CYCLE LOW POWER CPU ARCHITECTURE - An n architecture for implementing an instruction pipeline within a CPU comprises an arithmetic logic unit (ALU), an address arithmetic unit (AAU), a program counter (PC), a read-only memory (ROM) coupled to the program counter, to an instruction register, and to an instruction decoder coupled to the arithmetic logic unit. A random access memory (RAM) is coupled to the instruction decoder, to the arithmetic logic unit, and to a RAM address register. | 12-24-2009 |
20090327666 | METHOD AND SYSTEM FOR HARDWARE-BASED SECURITY OF OBJECT REFERENCES - A method for managing data, including obtaining a first instruction for moving a first data item from a first source to a first destination, determining a data type of the first data item, determining a data type supported by the first destination, comparing the data type of the first data item with the data type supported by the first destination to test a validity of the first instruction, and moving the first data item from the first source to the first destination based on the validity of the first instruction. | 12-31-2009 |
20090327667 | System and Method to Perform Fast Rotation Operations - Systems and methods to perform fast rotation operations are disclosed. In a particular embodiment, a method includes executing a single instruction. The method includes receiving first data indicating a first coordinate and a second coordinate, receiving a first control value that indicates a first rotation value selected from a set of ninety degree multiples, and writing output data corresponding to the first data rotated by the first rotation value. | 12-31-2009 |
20090327668 | Multi-Threaded Processes For Opening And Saving Documents - Tools and techniques are described for multi-threaded processing for opening and saving documents. These tools may provide load processes for reading documents from storage devices, and for loading the documents into applications. These tools may spawn a load process thread for executing a given load process on a first processing unit, and an application thread may execute a given application on a second processing unit. A first pipeline may be created for executing the load process thread, with the first pipeline performing tasks associated with loading the document into the application. A second pipeline may be created for executing the application process thread, with the second pipeline performing tasks associated with operating on the documents. The tasks in the first pipeline are configured to pass tokens as input to the tasks in the second pipeline. | 12-31-2009 |
20100005279 | Data processor - The data processor executes an instruction having a direction for write to a reference register of other instruction flow and an instruction having a direction for reference register invalidation. The data processor is arranged as a data processor having typical functions as an integrated whole of processors (CPU | 01-07-2010 |
20100049953 | DATA CACHE RECEIVE FLOP BYPASS - A microprocessor includes an N-way cache and a logic block that selectively enables and disables the N-way cache for at least one clock cycle if a first register load instructions and a second register load instruction, following the first register load instruction, are detected as pointing to the same index line in which the requested data is stored. The logic block further provides a disabling signal to the N-way cache for at least one clock cycle if the first and second instructions are detected as pointing to the same cache way. | 02-25-2010 |
20100070742 | EMBEDDED-DRAM DSP ARCHITECTURE HAVING IMPROVED INSTRUCTION SET - An embedded-DRAM processor architecture includes a DRAM array, a set of register files, set of functional units, and a data assembly unit. The data assembly unit includes a set of row-address registers and is responsive to commands to activate and deactivate DRAM rows and to control the movement of data throughout the system. A pipelined data assembly approach allowing the functional units to perform register-to-register operations, and allowing the data assembly unit to perform all load/store operations using wide data busses. Data masking and switching hardware allows individual data words or groups of words to be transferred between the registers and memory. Other aspects of the disclosure include a memory and logic structure and an associated method to extract data blocks from memory to accelerate, for example, operations related to image compression and decompression. | 03-18-2010 |
20100106948 | MANAGING AN OUT-OF-ORDER ASYNCHRONOUS HETEROGENEOUS REMOTE DIRECT MEMORY ACCESS (RDMA) MESSAGE QUEUE - A system and method operable to manage a message queue is provided. This management may involve out-of-order asynchronous heterogeneous remote direct memory access (RDMA) to the message queue. This system includes a pair of processing devices, a primary processing device and an additional processing device, a memory in storage location and a data bus coupled to the processing devices. The processing devices cooperate to process queue data within a shared message queue wherein when an individual processing device successfully accesses queue data the queue data is locked for the exclusive use of the processing device. When the processing device acquires the queue data, the queue data is locked and the queue data acquired by the acquiring processing device includes the queue data for both the primary processing device and additional processing device such that the processing device has all queue data necessary to process the data and return processed queue data. | 04-29-2010 |
20100115246 | SYSTEM AND METHOD OF DATA PARTITIONING FOR PARALLEL PROCESSING OF DYNAMICALLY GENERATED APPLICATION DATA - An improved system and method of data partitioning for parallel processing of dynamically generated application data is provided. An application may send a request to partition the application data specified by a data partitioning policy and to process each of the data partitions according to processing instructions. The data partitioning policy may be flexibly defined by an application for partitioning data any number of ways, including balancing the data volume across each of the partitions or partitioning the data by data type. Asynchronous data partition processors may be instantiated to perform parallel processing of the partitioned data. The data may be partitioned according to the data partitioning policy and processed according to the processing instructions. And the results may be returned to the application. | 05-06-2010 |
20100122071 | SEMICONDUCTOR DEVICE AND DATA PROCESSING METHOD - A semiconductor device includes a first circuit that executes a first calculation, a second circuit that includes a first storage unit therein and executes a second calculation, a controller that outputs a first address for specifying a first execution circuit for the first calculation and a second execution circuit for the second calculation, to the first circuit and the second circuit, and controls input of data into the first circuit, and a bus that transfers a result of the first calculation executed by the first circuit to the second circuit, wherein the result of the first calculation can be conditionally used as an address for specifying the second execution circuit. | 05-13-2010 |
20100161947 | PACK UNICODE ZSERIES INSTRUCTIONS - Emulation methods are provided for two PACK instructions, one for Unicode data and the other for ASCII coded data in which processing is carried out in a block-by-block fashion as opposed to a byte-by-byte fashion as a way to provide superior performance in the face of the usual challenges facing the execution of emulated data processing machine instructions as opposed to native instructions. | 06-24-2010 |
20100205412 | CONTROL SEQUENCER - An audio codec ( | 08-12-2010 |
20100223447 | Translate and Verify Instruction for a Processor - In an embodiment, a first instruction is defined that comprises at least a first operand from which the execution core is configured to determine a virtual address and a second operand that specifies one or more translation attributes that exist in a page table entry that defines a translation for the virtual address. A processor executing the instruction translates the virtual address, verifies whether or not the translation attributes in the page table entry match the specified translation attributes, faults the first instruction responsive to failing to locate a translation for the virtual address, and responsive to locating a translation for the virtual address in the page table entry but with the translation attributes in the entry failing to match the specified translation attributes. | 09-02-2010 |
20100223448 | Computer Configuration Virtual Topology Discovery and Instruction Therefore - In a logically partitioned host computer system comprising host processors (host CPUs), a facility and instruction for discovering topology of one or more guest processors (guest CPUs) of a guest configuration comprises a guest processor of the guest configuration fetching and executing a STORE SYSTEM INFORMATION instruction that obtains topology information of the computer configuration. The topology information comprising nesting information of processors of the configuration and the degree of dedication a host processor provides to a corresponding guest processor. The information is preferably stored in a single table in memory. | 09-02-2010 |
20100228956 | CONTROL CIRCUIT, INFORMATION PROCESSING DEVICE, AND METHOD OF CONTROLLING INFORMATION PROCESSING DEVICE - A control circuit for receiving data transmitted by a data transmitting circuit and transmitting the received data to a data receiving circuit includes: a data receiving unit for receiving the data transmitted by the data transmitting circuit; a packet analyzing unit for judging whether the data received from the data transmitting circuit is a packet including history acquisition information and reading the history acquisition information from the received data; a history acquisition executing unit for starting or stopping acquiring the history information of the transmission and reception of the data according to the history acquisition information read by the packet analyzing unit to store the history information acquired; and a data transmitting unit for transmitting the packet including the history acquisition information or a packet other than the packet including the history acquisition information to the data receiving circuit. | 09-09-2010 |
20100241834 | METHOD OF ENCODING USING INSTRUCTION FIELD OVERLOADING - The method selects registers by a register instruction field having x bits. A first group of registers has up to 2 | 09-23-2010 |
20100268921 | DATA COLLECTION PREFETCH DEVICE AND METHODS THEREOF - A method of retrieving information from a memory includes receiving an instruction associated with a data collection. In response to determining the instruction is a request to retrieve a first element of the data collection, an application program interface (API) generates an instruction to prefetch a second element of the data collection. In one embodiment, the second element to be prefetched is indicated by a pointer or other information associated with the first element. In response to the prefetch instruction, an execution core of the data processing device retrieves the second element from a memory module and stores the second element at a cache. By prefetching the second element before it has been explicitly requested by the application, the efficiency of the application can be increased. | 10-21-2010 |
20100274997 | Executing a Gather Operation on a Parallel Computer - Methods, apparatus, and computer program products are disclosed for executing a gather operation on a parallel computer according to embodiments of the present invention. Embodiments include configuring, by the logical root, a result buffer or the logical root, the result buffer having positions, each position corresponding to a ranked node in the operational group and for storing contribution data gathered from that ranked node. Embodiments also include repeatedly for each position in the result buffer: determining, by each compute node of an operational group, whether the current position in the result buffer corresponds with the rank of the compute node, if the current position in the result buffer corresponds with the rank of the compute node, contributing, by that compute node, the compute node's contribution data, if the current position in the result buffer does not correspond with the rank of the compute node, contributing, by that compute node, a value of zero for the contribution data, and storing, by the logical root in the current position in the result buffer, results of a bitwise OR operation of all the contribution data by all compute nodes of the operational group for the current position, the results received through the global combining network. | 10-28-2010 |
20100287360 | Task Processing Device - The speed of task scheduling by a multitask OS is increased. A task processor includes a CPU, a save circuit, and a task control circuit. The CPU is provided with a processing register and an execution control circuit operative to load data from a memory into a processing register and execute a task in accordance with the data in the processing register. The save circuit is provided with a plurality of save registers respectively associated with a plurality of tasks. In executing a predetermined system call, the execution control circuit notifies the task control circuit as such. The task control circuit switches between tasks for execution upon receipt of the system call signal, by saving, in the save register associated with a task being executed, the data in the processing register, selecting a task to be executed next, and loading data in the save register associated with the selected task into the processing register. | 11-11-2010 |
20100293359 | General Purpose Register Cloning - A clone set of General Purpose Registers (GPRs) is created to be used by a set of helper thread binaries, which is created from a set of main thread binaries. When the set of main thread binaries enters a wait state, the set of helper thread binaries uses the clone set of GPRs to continue using unused execution units within a processor core. The set of helper threads are thus able to warm up local cache memory with data that will be needed when execution of the set of main thread binaries resumes. | 11-18-2010 |
20100306511 | COMMUNICATION DATA PROCESSOR AND COMMUNICATION DATA PROCESSING METHOD - There is a need for providing a communication data processor easily adaptable to network configurations required for industrial Ethernet. The apparatus successively analyzes received packets. The apparatus uses a register to determine whether or not to transmit the received packet as transmission data to another port. Rewritable memory saves a program code that provides control for analyzing a reception packet and generating a transmission packet. The apparatus is capable of complying with various communication protocols by changing the program code. | 12-02-2010 |
20100313001 | DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM - An apparatus includes a plurality of processing modules which are connected to each other by corresponding communication unit and the modules transfer packets in a predetermined direction to execute a plurality of operations of pipeline processing. The module includes a storage unit for storing a first identification and a second identification for each of the plurality of operations, a reception unit for extracting data from a packet which has the first identification, a processing unit for processing the data extracted by the reception unit, and a transmission unit for storing the second identification corresponding to the first identification of the packet a packet and transmitting the packet to the module arranged in the predetermined direction. | 12-09-2010 |
20100325400 | MICROPROCESSOR AND DATA WRITE-IN METHOD THEREOF - A microprocessor comprises a register set, a micro operations pool (Uops pool), a hazard detection unit, an execution unit, a dispatch unit, and a mask unit. The Uops pool receives a first micro operation and a second micro operation from a decoder, and reads at least one first operand of the first micro operation and at least one second operand of the second micro operation from the register set. The hazard detection unit detects that the first micro operation is in a write after write hazard state due to the second micro operation. The execution unit executes the first micro operation dispatched from the Uops pool to obtain a first operation result and executes the second micro operation dispatched from the Uops pool to obtain a second operation result. The mask unit protects the first operation result from writing back to the register set according to the write after write hazard state. | 12-23-2010 |
20100332808 | MINIMIZING CODE DUPLICATION IN AN UNBOUNDED TRANSACTIONAL MEMORY SYSTEM - Minimizing code duplication in an unbounded transactional memory system. A computing apparatus including one or more processors in which it is possible to use a set of common mode-agnostic TM barrier sequences that runs on legacy ISA and extended ISA processors, and that employs hardware filter indicators (when available) to filter redundant applications of TM barriers, and that enables a compiled binary representation of the subject code to run correctly in any of the currently implemented set of transactional memory execution modes, including running the code outside of a transaction, and that enables the same compiled binary to continue to work with future TM implementations which may introduce as yet unknown future TM execution modes. | 12-30-2010 |
20110040955 | STORE-TO-LOAD FORWARDING BASED ON LOAD/STORE ADDRESS COMPUTATION SOURCE INFORMATION COMPARISONS - A microprocessor includes a queue comprising a plurality of entries each configured to hold store information for a store instruction. The store information specifies sources of operands used to calculate a store address. The store instruction specifies store data to be stored to a memory location identified by the store address. The microprocessor also includes control logic, coupled to the queue, configured to encounter a load instruction. The load instruction includes load information that specifies sources of operands used to calculate a load address. The control logic detects that the load information matches the store information held in a valid one of the plurality of queue entries and responsively predicts that the microprocessor should forward to the load instruction the store data specified by the store instruction whose store information matches the load information. | 02-17-2011 |
20110047360 | PROCESSOR - The present application provides a method of randomly accessing a compressed structure in memory without the need for retrieving and decompressing the entire compressed structure. | 02-24-2011 |
20110047361 | Load/Move Duplicate Instructions for a Processor - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register. | 02-24-2011 |
20110055526 | METHOD AND APPARATUS FOR ACCESSING MEMORY ACCORDING TO PROCESSOR INSTRUCTION - There is provided a method and apparatus for accessing a memory according to a processor instruction. The apparatus includes: a stack offset extractor extracting an offset value from a stack pointer offset indicating a local variable in the processor instruction; a local stack storage including a plurality of items, each of which is formed of an activation bit indicating whether each item is activated, an offset storing an offset value of a stack pointer, and an element storing a local variable value of the stack pointer; an offset comparator comparing the extracted offset value with an offset value of each item and determining whether an item corresponding to the extracted offset value is present in the local stack storage; and a stack access controller controlling a processor to access the local stack storage or a cache memory according to a determining result of the offset comparator. | 03-03-2011 |
20110060893 | CIRCUIT COMPRISING A MICROPROGRAMMED MACHINE FOR PROCESSING THE INPUTS OR THE OUTPUTS OF A PROCESSOR SO AS TO ENABLE THEM TO ENTER OR LEAVE THE CIRCUIT ACCORDING TO ANY COMMUNICATION PROTOCOL - A circuit having at least one processor and a microprogrammed machine for processing the data which enters or leaves the processor in order to input or output the data into/from the circuit in compliance with a communication protocol. | 03-10-2011 |
20110087865 | Intermediate Register Mapper - A method, processor, and computer program product employing an intermediate register mapper within a register renaming mechanism. A logical register lookup determines whether a hit to a logical register associated with the dispatched instruction has occurred. In this regard, the logical register lookup searches within at least one register mapper from a group of register mappers, including an architected register mapper, a unified main mapper, and an intermediate register mapper. A single hit to the logical register is selected among the group of register mappers. If an instruction having a mapper entry in the unified main mapper has finished but has not completed, the mapping contents of the register mapper entry in the unified main mapper are moved to the intermediate register mapper, and the unified register mapper entry is released, thus increasing a number of unified main mapper entries available for reuse. | 04-14-2011 |
20110093687 | MANAGING MULTIPLE SPECULATIVE ASSIST THREADS AT DIFFERING CACHE LEVELS - An illustrative embodiment provides a computer-implemented process for managing multiple speculative assist threads for data pre-fetching that sends a command from an assist thread of a first processor to second processor and a memory, wherein parameters of the command specify a processor identifier of the second processor, responsive to receiving the command, reply by the second processor indicating an ability to receive a cache line that is a target of a pre-fetch, responsive to receiving the command replying by the memory indicating a capability to provide the cache line, responsive to receiving replies from the second processor and the memory, sending, by the first processor, a combined response to the second processor and the memory, wherein the combined response indicates an action, and responsive to the action indicating a transaction can continue sending the requested cache line, by the memory, to the second processor into a target cache level on the second processor. | 04-21-2011 |
20110099357 | Utilizing a Bidding Model in a Microparallel Processor Architecture to Allocate Additional Registers and Execution Units for Short to Intermediate Stretches of Code Identified as Opportunities for Microparallelization - An enhanced mechanism for parallel execution of computer programs utilizes a bidding model to allocate additional registers and execution units for stretches of code identified as opportunities for microparallelization. A microparallel processor architecture apparatus permits software (e.g. compiler) to implement short-term parallel execution of stretches of code identified as such before execution. In one embodiment, an additional paired unit, if available, is allocated for execution of an identified stretch of code. Each additional paired unit includes an execution unit and a half set of registers. This apparatus is available for compilers or assembler language coders to use and allows software to unlock parallel execution capabilities that are present in existing computer programs but heretofore were executed sequentially for lack of a suitable apparatus. The enhanced mechanism enables a variable amount of parallelism to be implemented and yet provides correct program execution even if less parallelism is available than ideal for a given computer program. | 04-28-2011 |
20110107068 | ELIMINATING REDUNDANT OPERATIONS FOR COMMON PROPERTIES USING SHARED REAL REGISTERS - One embodiment of a method for eliminating redundant operations establishing common properties includes identifying a first virtual register storing a first value having a common property. The method may assign the first virtual register to use a real register. The method may further identify a second virtual register storing a second value also having the common property. The method may assign the second virtual register to use the same real register after the first value is no longer live. As a result of assigning the second virtual register to the first real register, the method may eliminate an operation configured to establish the common property for the second virtual register since this operation is redundant and is no longer needed. | 05-05-2011 |
20110107069 | Processor Architecture for Executing Wide Transform Slice Instructions - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers. | 05-05-2011 |
20110138157 | CONVOLUTION COMPUTATION FOR MANY-CORE PROCESSOR ARCHITECTURES - A convolution of the kernel over a layout in a multi-core processor system includes identifying a sector, called a dynamic band, of the layout including a plurality of evaluation points. Layout data specifying the sector of the layout is loaded in shared memory, which is shared by a plurality of processor cores. A convolution operation of the kernel and the evaluation points in the sector is executed. The convolution operation includes iteratively loading parts of the basis data set, called a stride, into space available in shared memory given the size of the layout data specifying the sector. A plurality of threads is executed concurrently using the layout data for the sector and the currently loaded part of the basis data set. The iteration for the loading basis data set proceeds through the entire data set until the convolution operation is completed. | 06-09-2011 |
20110145551 | TWO-STAGE COMMIT (TSC) REGION FOR DYNAMIC BINARY OPTIMIZATION IN X86 - Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reording load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically. | 06-16-2011 |
20110153998 | Methods and Apparatus for Attaching Application Specific Functions Within an Array Processor - A multi-node video signal processor (VSP | 06-23-2011 |
20110173422 | PAUSE PROCESSOR HARDWARE THREAD UNTIL PIN - A system and method for enhancing performance of a computer which includes a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processer. The processor processes instructions from the program. A wait state in the processor waits for receiving specified data. A thread in the processor has a pause state wherein the processor waits for specified data. A pin in the processor initiates a return to an active state from the pause state for the thread. A logic circuit is external to the processor, and the logic circuit is configured to detect a specified condition. The pin initiates a return to the active state of the thread when the specified condition is detected using the logic circuit. | 07-14-2011 |
20110238959 | DISTRIBUTED CONTROLLER, DISTRIBUTED PROCESSING SYSTEM, AND DISTRIBUTED PROCESSING METHOD - A distributed controller is connected to two or more processing elements and controls the two or more processing elements to execute distributed processing. The distributed controller comprises a plurality of control modules, each of which is connected to at least one other control module. The distributed controller determines a processing path between processing elements using at least two of the plurality of control modules. | 09-29-2011 |
20110238960 | DISTRIBUTED PROCESSING SYSTEM, CONTROL UNIT, PROCESSING ELEMENT, DISTRIBUTED PROCESSING METHOD AND COMPUTER PROGRAM - A distributed processing system has a control unit and a plurality of processing elements and includes a control line through which control information is sent and received between the control unit and the processing elements, and a data line through which data to be processed is transmitted from at least one selected from the control unit and the processing elements to the processing element connected thereto. The data line is independent from the control line. The processing element has a processing content changing section that changes the content of processing in the processing element using processing content changing information received through the control line when a monitored processing context of the processing element meets a processing content changing condition received through the control line. | 09-29-2011 |
20110246752 | Emulating Execution of An Instruction For Discovering Virtual Topology of a Logical Partitioned Computer System - In a logically partitioned host computer system comprising host processors (host CPUs), a facility and instruction for discovering topology of one or more guest processors (guest CPUs) of a guest configuration comprises a guest processor of the guest configuration fetching and executing a STORE SYSTEM INFORMATION instruction that obtains topology information of the computer configuration. The topology information comprising nesting information of processors of the configuration and the degree of dedication a host processor provides to a corresponding guest processor. The information is preferably stored in a single table in memory. | 10-06-2011 |
20110258420 | EXECUTION MIGRATION - An execution migration approach includes bringing the computation to the locus of the data: when a memory instruction requests an address not cached by the current core, the execution context (current program counter, register values, etc.) moves to the core where the data is cached. | 10-20-2011 |
20110276791 | HANDLING A STORE INSTRUCTION WITH AN UNKNOWN DESTINATION ADDRESS DURING SPECULATIVE EXECUTION - The described embodiments provide a system for executing instructions in a processor. While executing instructions in an execute-ahead mode, the processor encounters a store instruction for which a destination address is unknown. The processor then defers the store instruction. Upon encountering a load instruction while the store instruction with the unknown destination address is deferred, the processor determines if the load instruction is to continue executing. If not, the processor defers the load instruction. Otherwise, the processor continues executing the load instruction. | 11-10-2011 |
20110320781 | DYNAMIC DATA SYNCHRONIZATION IN THREAD-LEVEL SPECULATION - In one embodiment, the present invention introduces a speculation engine to parallelize serial instructions by creating separate threads from the serial instructions and inserting processor instructions to set a synchronization bit before a dependence source and to clear the synchronization bit after a dependence source, where the synchronization bit is designed to stall a dependence sink from a thread running on a separate core. Other embodiments are described and claimed. | 12-29-2011 |
20120011349 | DATA EXCHANGE AND COMMUNICATION BETWEEN EXECUTION UNITS IN A PARALLEL PROCESSOR - Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer pats are determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel. | 01-12-2012 |
20120030452 | MODIFYING COMMANDS - The present disclosure includes methods, devices, modules, and systems for modifying commands. One device embodiment includes a memory controller including a channel, wherein the channel includes a command queue configured to hold commands, and circuitry configured to modify at least a number of commands in the queue and execute the modified commands. | 02-02-2012 |
20120096245 | COMPUTING DEVICE, PARALLEL COMPUTER SYSTEM, AND METHOD OF CONTROLLING COMPUTER DEVICE - A computing device includes a receiving unit that receives control information indicating an instruction to be executed on a process that is distributed or an instruction contained in the process that is distributed, from a control information creating device that transmits the control information to each computing device on a network. The computing device further includes a processor configured to suspend execution of an instruction when the instruction to be executed on the process occurs or the instruction contained in the process that is distributed is executed, and execute the suspended instruction when the suspended instruction is associated with the instruction indicated by the control information that is received by the receiving unit. | 04-19-2012 |
20120110309 | Data Output Transfer To Memory - Methods, systems, and computer readable media for improved transfer of processing data outputs to memory are disclosed. According to an embodiment, a method for transferring outputs of a plurality of threads concurrently executing in one or more processing units to a memory includes: forming, based upon one or more of the outputs, a combined memory export instruction comprising one or more data elements and one or more control elements; and sending the combined memory export instruction to the memory. The combined memory export instruction can be sent to memory in a single clock cycle. Another method includes: forming, based upon outputs from two or more of the threads, a memory export instruction comprising two or more data elements; embedding at least one address representative of the two or more of the outputs in a second memory instruction; and sending the memory export instruction and the second memory instruction to the memory. | 05-03-2012 |
20120185679 | Endpoint-Based Parallel Data Processing With Non-Blocking Collective Instructions In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing by the parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation. | 07-19-2012 |
20120198213 | PACKET HANDLER INCLUDING PLURALITY OF PARALLEL ACTION MACHINES - A packet handler for a packet processing system includes a plurality of parallel action machines, each of the plurality of parallel action machines being configured to perform a respective packet processing function; and a plurality of action machine input registers, wherein each of the plurality of parallel action machines is associated with one or more of the plurality of action machine input registers, and wherein an action machine of the plurality of parallel action machines is automatically triggered to perform its respective packet processing function in the event that data sufficient to perform the actions machine's respective packet processing function is written into the action machine's one or more respective action machine input registers. | 08-02-2012 |
20120198214 | N-WAY MEMORY BARRIER OPERATION COALESCING - One embodiment sets forth a technique for N-way memory barrier operation coalescing. When a first memory barrier is received for a first thread group execution of subsequent memory operations for the first thread group are suspended until the first memory barrier is executed. Subsequent memory barriers for different thread groups may be coalesced with the first memory barrier to produce a coalesced memory barrier that represents memory barrier operations for multiple thread groups. When the coalesced memory barrier is being processed, execution of subsequent memory operations for the different thread groups is also suspended. However, memory operations for other thread groups that are not affected by the coalesced memory barrier may be executed. | 08-02-2012 |
20120204015 | Sharing a Data Buffer - A computer-program product may have instructions that, when executed, cause a processor to perform operations including managing execution of application functions that access data in a shared buffer; determining if a first instruction that is stored at a first memory location causes, upon execution, data to be read from or written to the shared buffer; and when it is determined that the first instruction causes data to be read from or written to the shared buffer, 1) identify one or more replacement instructions to execute in place of the first instruction; 2) store the one or more replacement instructions; and 3) replace the first instruction at the first memory location with a second instruction that, when executed, causes the stored one or more replacement instructions to be executed. | 08-09-2012 |
20120210102 | Obtaining And Releasing Hardware Threads Without Hypervisor Involvement - A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly. | 08-16-2012 |
20120254596 | METHOD AND SYSTEM FOR CONTROLLING MESSAGE TRAFFIC BETWEEN TWO PROCESSORS - A system and method for controlling messaging between a first processor and a second processor is disclosed. The second processor controls one or more peripheral devices on behalf of a plurality of predetermined tasks being executed by the first processor. The system includes a message control module that receives an input message intended for the second processor from the first processor and maintains a message history based on the received input message and previously received input messages. The message history indicates which peripheral devices of the system are to be on and which tasks of the plurality of tasks requested the peripheral devices to be on. The message control module is further configured to generate an output message that includes output instructions for the second processor based on the message history and an output duration based on the message history. The second processor executes the output instructions. | 10-04-2012 |
20120260074 | EFFICIENT CONDITIONAL ALU INSTRUCTION IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - A microprocessor having performs an architectural instruction that instructs it to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if its architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result. To execute the second microinstruction, it writes the destination register with the result generated by the first microinstruction if the architectural condition flags satisfy the condition, and writes the destination register with the current value of the destination register if the architectural condition flags do not satisfy the condition. | 10-11-2012 |
20120260075 | CONDITIONAL ALU INSTRUCTION PRE-SHIFT-GENERATED CARRY FLAG PROPAGATION BETWEEN MICROINSTRUCTIONS IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - A microprocessor includes a hardware instruction translator that translates an architectural instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the shift operation on the first source operand to generate the first result and a carry flag value and updates a non-architectural carry flag with the generated carry flag value. To execute the second microinstruction, it performs the second operation on the first result and the second operand to generate the second result and new condition flag values based on the second result. If a architectural condition flags satisfy the condition, it updates the architectural carry flag with the non-architectural carry flag value and updates at least one of the other architectural condition flags with the corresponding generated new condition flag values; otherwise, it updates the architectural condition flags with the current value of the architectural condition flags. | 10-11-2012 |
20120265971 | ALLOCATION OF COUNTERS FROM A POOL OF COUNTERS TO TRACK MAPPINGS OF LOGICAL REGISTERS TO PHYSICAL REGISTERS FOR MAPPER BASED INSTRUCTION EXECUTIONS - A mapper unit of an out-of-order processor assigns a particular counter currently in a counter free pool to count a number of mappings of logical registers to a particular physical register from among multiple physical registers, responsive to an execution of an instruction by the mapper unit mapping at least one logical register to the particular physical register. The number of counters is less than the number of physical registers. The mapper unit, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool. | 10-18-2012 |
20120272047 | METHOD AND APPARATUS FOR SHUFFLING DATA - Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set. | 10-25-2012 |
20120297172 | Multi-Threaded Processes for Opening and Saving Documents - Tools and techniques are described for multi-threaded processing for opening and saving documents. These tools may provide load processes for reading documents from storage devices, and for loading the documents into applications. These tools may spawn a load process thread for executing a given load process on a first processing unit, and an application thread may execute a given application on a second processing unit. A first pipeline may be created for executing the load process thread, with the first pipeline performing tasks associated with loading the document into the application. A second pipeline may be created for executing the application process thread, with the second pipeline performing tasks associated with operating on the documents. The tasks in the first pipeline are configured to pass tokens as input to the tasks in the second pipeline. | 11-22-2012 |
20120311306 | DATA PROCESSING APPARATUS, DATA PROCESSING SYSTEM, PACKET, RECORDING MEDIUM, STORAGE DEVICE, AND DATA PROCESSING METHOD - Parallelism of processing can be improved while existing software resources are utilized substantially as they are. | 12-06-2012 |
20120317402 | EXECUTING A START OPERATOR MESSAGE COMMAND - A facility is provided to enable operator message commands from multiple, distinct sources to be provided to a coupling facility of a computing environment for processing. These commands are used, for instance, to perform actions on the coupling facility, and may be received from consoles coupled to the coupling facility, as well as logical partitions or other systems coupled thereto. Responsive to performing the commands, responses are returned to the initiators of the commands. | 12-13-2012 |
20120331276 | Instruction Execution - A method of executing an instruction set to select a set of registers, includes reading a first instruction of the instruction set; interpreting a first operand of the first instruction to represent a first register S to be selected; interpreting a second operand of the first instruction to represent a number N of registers to be selected; and selecting N consecutive registers starting at the first register S to form the set of registers. | 12-27-2012 |
20130024673 | PROCESSSING UNIT AND MICRO CONTROLLER UNIT (MCU) - A technology capable of reducing load on both system processing and filter operation and improving power consumption and performance is provided. In a digital signal processor, a program memory, a program counter, and a control logic circuit are provided, and a bit field of each instruction includes instruction stop flag information and bit field information. Also, the control logic circuit carries out the control in such a manner that the instruction whose instruction stop flag information is cleared is executed as is to proceed to the next instruction processing, execution of the instruction whose instruction stop flag information is set is stopped if an execution resumption trigger condition corresponding to the bit field information is not satisfied, and the instruction whose instruction stop flag information is set is executed if the execution resumption trigger condition corresponding to bit field information is satisfied, to proceed to the next instruction processing. | 01-24-2013 |
20130067206 | Endpoint-Based Parallel Data Processing In A Parallel Active Messaging Interface Of A Parallel Computer - Endpoint-based parallel data processing in a parallel active messaging interface (‘PAMI’) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks. | 03-14-2013 |
20130080747 | PROCESSOR AND INSTRUCTION PROCESSING METHOD IN PROCESSOR - The present invention relates to a processor including: an instruction cache configured to store at least some of first instructions stored in an external memory and second instructions each including a plurality of micro instructions; a micro cache configured to store third instructions corresponding to the plurality of micro instructions included in the second instructions; and a core configured to read out the first and second instructions from the instruction cache and perform calculation, in which the core performs calculation by the first instructions from the instruction cache under a normal mode, and when the process enters a micro instruction mode, the core performs calculation by the third instructions corresponding to the plurality of micro instructions provided from the micro cache. | 03-28-2013 |
20130111192 | Adjusting acknowledgement requests for remote control transmissions based on previous acknowledgements | 05-02-2013 |
20130117546 | Load Pair Disjoint Facility and Instruction Therefore - A Load/Store Disjoint instruction, when executed by a CPU, accesses operands from two disjoint memory locations and sets condition code indicators to indicate whether or not the two operands appeared to be accessed atomically by means of block-concurrent interlocked fetch with no intervening stores to the operands from other CPUs. In a Load Pair Disjoint form of the instruction, the accesses are loads and the disjoint data is stored in general registers. | 05-09-2013 |
20130117547 | Method and Apparatus for Unpacking and Moving Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element. | 05-09-2013 |
20130124836 | CONSTANT DATA ACESSING SYSTEM AND METHOD THEREOF - A constant data accessing system having a constant pool comprises a computer processor having a constant pool base register, a compiler having a constant pool handler, and an instruction set module having a constant pool instruction set unit. The constant pool base register is configured to store a value of constant pool base address of one or a plurality of subroutines which have constants to be accessed. | 05-16-2013 |
20130138929 | PROCESS MAPPING IN PARALLEL COMPUTING - A method of mapping processes to processors in a parallel computing environment where a parallel application is to be run on a cluster of nodes wherein at least one of the nodes has multiple processors sharing a common memory, the method comprising using compiler based communication analysis to map Message Passing Interface processes to processors on the nodes, whereby at least some more heavily communicating processes are mapped to processors within nodes. Other methods, apparatus, and computer readable media are also provided. | 05-30-2013 |
20130138930 | COMPUTER SYSTEMS AND METHODS FOR REGISTER-BASED MESSAGE PASSING - Systems and methods are disclosed that include a plurality of processing units having a plurality of register file entries. Control logic identifies a first register entry as including a message address in response to receiving a first instruction. The control logic further identifies a second register entry to receive messages in response to receiving a second instruction. | 05-30-2013 |
20130145133 | PROCESSOR, APPARATUS AND METHOD FOR GENERATING INSTRUCTIONS - A processor, apparatus and method to use a multiple store instruction based on physical addresses of registers are provided. The processor is configured to execute an instruction to store data of a plurality of registers in a memory, the instruction including a first area in which a physical address of each of the registers is written. An instruction generating apparatus is configured to generate an instruction to store data of a plurality of registers in a memory, the instruction including a first area in which a physical address of each of the registers is written. An instruction generating method includes detecting a code area that instructs to store data of a plurality of registers in a memory, from a program code. The instruction generating method further includes generating an instruction corresponding to the code area by mapping physical addresses of the registers to a first area of the instruction. | 06-06-2013 |
20130151821 | METHOD AND INSTRUCTION SET INCLUDING REGISTER SHIFTS AND ROTATES FOR DATA PROCESSING - A method includes identifying at least one first register with M bits and identifying at least one second register with N bits. The process also includes shifting K bits, where K≦N, from the second register into the first register. The shifting operation executes a left shift or a right shift operation. For a left shift operation, bits K . . . N−1 from the first register are read, the bits K . . . N−1 are written into bit positions 0 . . . N-k−1 of the first register, the K bits from the second register are read, and the K bits from the second register are written into bit positions N-K . . . N−1 of first register. The right shift includes reading bits 0 . . . N-K−1 from the first register, writing the bits 0 . . . N-K−1 into bit position K . . . N−1 of the first register, reading the K bits from the second register, and writing the K bits from second register into bit positions 0 . . . K−1 of first register. | 06-13-2013 |
20130173892 | CONVERT TO ZONED FORMAT FROM DECIMAL FLOATING POINT FORMAT - Machine instructions, referred to herein as a long Convert from Zoned instruction (CDZT) and extended Convert from Zoned instruction (CXZT), are provided that read EBCDIC or ASCII data from memory, convert it to the appropriate decimal floating point format, and write it to a target floating point register or floating point register pair. Further, machine instructions, referred to herein as a long Convert to Zoned instruction (CZDT) and extended Convert to Zoned instruction (CZXT), are provided that convert a decimal floating point (DFP) operand in a source floating point register or floating point register pair to EBCDIC or ASCII data and store it to a target memory location. | 07-04-2013 |
20130185544 | PROCESSOR WITH INSTRUCTION VARIABLE DATA DISTRIBUTION - A vector processor includes a plurality of execution units arranged in parallel, a register file, and a plurality of load units. The register file includes a plurality of registers coupled to the execution units. Each of the load units is configured to load, in a single transaction, a plurality of the registers with data retrieved from memory. The loaded registers corresponding to different execution units. Each of the load units is configured to distribute the data to the registers in accordance with an instruction selectable distribution. The instruction selectable distribution specifies one of plurality of distributions. Each of the distributions specifies a data sequence that differs from the sequence in which the data is stored in memory. | 07-18-2013 |
20130185545 | HIGH-PERFORMANCE CACHE SYSTEM AND METHOD - A digital system includes a processor core and a cache control unit. The processor core can be coupled to a first memory containing data and a second memory with a faster speed than the first memory, and is configured to execute a segment of instructions having at least one instruction accessing the data from the second memory using a base register. The cache control unit is configured to be coupled to the first memory, the second memory, and the processor core to fill the data from the first memory to the second memory before the processor core executes the instruction accessing the data, and is further configured to examine the segment of instructions to extract instruction information containing at least data access instruction information and last register updating instruction information and to create a track corresponding to the segment of instructions based on the extracted instruction information. | 07-18-2013 |
20130232322 | UNIFORM LOAD PROCESSING FOR PARALLEL THREAD SUB-SETS - One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available. | 09-05-2013 |
20130238881 | DATA TRANSMISSION DEVICE, DATA TRANSMISSION METHOD, AND COMPUTER PROGRAM PRODUCT - According to an embodiment, a data transmission device includes a processor, a memory controller, an information memory, a generation unit, and an instruction unit. The memory controller controls read and write operations during data transfer between memory devices. The information memory stores transfer instruction information generated by the processor. The transfer instruction information includes positional information of a position where data is stored in the memory device as read source or a write destination of the data and size information of the data. The generation unit includes divides the transfer instruction information into predetermined data size pieces to generate plural pieces of partial transfer instruction information. The instruction unit instructs the memory controller to acquire a piece of the partial transfer instruction information. | 09-12-2013 |
20130246761 | REGISTER SHARING IN AN EXTENDED PROCESSOR ARCHITECTURE - Systems and methods are disclosed for sharing one or more registers in an extended processor architecture. The method comprises executing a first thread and a second thread on a processor core supported by an extended register file, wherein one or more registers in the extended register file are accessible by said first and second threads; loading first data for use by the first thread into a first set of physical registers mapped to a first set of logical registers associated with the first thread; and providing the first data for use by the second thread by maintaining the first data in the first set of physical registers and mapping set first set of physical registers to a second set of logical registers associated with the second thread. | 09-19-2013 |
20130246762 | INSTRUCTION TO LOAD DATA UP TO A DYNAMICALLY DETERMINED MEMORY BOUNDARY - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary is dynamically determined based on a specified type of boundary and one or more characteristics of the processor executing the instruction, such as cache line size or page size used by the processor. | 09-19-2013 |
20130246763 | INSTRUCTION TO COMPUTE THE DISTANCE TO A SPECIFIED MEMORY BOUNDARY - A Load Count to Block Boundary instruction is provided that provides a distance from a specified memory address to a specified memory boundary. The memory boundary is a boundary that is not to be crossed in loading data. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary; or it may be dynamically determined. | 09-19-2013 |
20130246764 | INSTRUCTION TO LOAD DATA UP TO A SPECIFIED MEMORY BOUNDARY INDICATED BY THE INSTRUCTION - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary. | 09-19-2013 |
20130262838 | Memory Disambiguation Hardware To Support Software Binary Translation - A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated. | 10-03-2013 |
20130275732 | PROCESSORS - A processing apparatus comprises a plurality of processors | 10-17-2013 |
20130275733 | MULTI-LEVEL TRACKING OF IN-USE STATE OF CACHE LINES - This disclosure includes tracking of in-use states of cache lines to improve throughput of pipelines and thus increase performance of processors. Access data for a number of sets of instructions stored in an instruction cache may be tracked using an in-use array in a first array until the data for one or more of those sets reach a threshold condition. A second array may then be used as the in-use array to track the sets of instructions after a micro-operation is inserted into the pipeline. When the micro-operation retires from the pipeline, the first array may be cleared. The process may repeat after the second array reaches the threshold condition. During the tracking, an in-use state for an instruction line may be detected by inspecting a corresponding bit in each of the arrays. Additional arrays may also be used to track the in-use state. | 10-17-2013 |
20130283017 | HARD OBJECT: CONSTRAINING CONTROL FLOW AND PROVIDING LIGHTWEIGHT KERNEL CROSSINGS - A method providing simple fine-grain hardware primitives with which software engineers can efficiently implement enforceable separation of programs into modules and constraints on control flow, thereby providing fine-grain locality of causality to the world of software. Additionally, a mechanism is provided to mark some modules, or parts thereof, as having kernel privileges and thereby allows the provision of kernel services through normal function calls, obviating the expensive prior art mechanism of system calls. Together with software changes, Object Oriented encapsulation semantics and control flow integrity in hardware are enforced. | 10-24-2013 |
20130283018 | Packed Data Rearrangement Control Indexes Generation Processors, Methods, Systems and Instructions - A method of an aspect includes receiving a packed data rearrangement control indexes generation instruction. The packed data rearrangement control indexes generation instruction indicates a destination storage location. A result is stored in the destination storage location in response to the packed data rearrangement control indexes generation instruction. The result includes a sequence of at least four non-negative integers representing packed data rearrangement control indexes. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed. | 10-24-2013 |
20130283019 | PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO GENERATE SEQUENCES OF INTEGERS IN NUMERICAL ORDER THAT DIFFER BY A CONSTANT STRIDE - A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four non-negative integers in numerical order with all integers in consecutive positions differing by a constant stride of at least two. In an aspect, storing the result including the sequence of the at least four integers is performed without calculating the at least four integers using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed. | 10-24-2013 |
20130318330 | PREDICTING AND AVOIDING OPERAND-STORE-COMPARE HAZARDS IN OUT-OF-ORDER MICROPROCESSORS - A method and information processing system manage load and store operations that can be executed out-of-order. At least one of a load instruction and a store instruction is executed. A determination is made that an operand store compare hazard has been encountered. An entry within an operand store compare hazard prediction table is created based on the determination. The entry includes at least an instruction address of the instruction that has been executed and a hazard indicating flag associated with the instruction. The hazard indicating flag indicates that the instruction has encountered the operand store compare hazard. When a load instruction is associated with the hazard indicating flag, the load instruction becomes dependent upon all store instructions associated with a substantially similar hazard indicating flag. | 11-28-2013 |
20130318331 | START CONTROL APPARATUS, INFORMATION DEVICE, AND START CONTROL METHOD - A CPU includes a code write unit which writes an interrupt generation code into a page in which the instructions stored in the non-volatile memory are not written, among a plurality of the pages included in an instruction area that is an area of the volatile memory into which the instructions are written, the interrupt generation code being a code for generating a software interrupt, an instruction transfer unit which transfers the instructions from the non-volatile memory to a corresponding page of the volatile memory that is a page in which the interrupt generation code generating the software interrupt is stored when the software interrupt is generated by the interrupt generation code, the instructions being to be stored in the corresponding page, and an instruction execution unit which executes the instructions stored in the instruction area, and when the interrupt generation code is executed, generates a software interrupt. | 11-28-2013 |
20130326200 | INTEGRATED CIRCUIT DEVICES AND METHODS FOR SCHEDULING AND EXECUTING A RESTRICTED LOAD OPERATION - An integrated circuit device comprising at least one instruction processing module arranged to compare validation data with data stored within a target register upon receipt of a load validation instruction. Wherein, the instruction processing module is further arranged to proceed with execution of a next sequential instruction if the validation data matches the stored data within the target register, and to load the validation data into the target register if the validation data does not match the stored data within the target register. | 12-05-2013 |
20130326201 | PROCESSOR-BASED APPARATUS AND METHOD FOR PROCESSING BIT STREAMS - An apparatus and method are described for processing bit streams using bit-oriented instructions. For example, a method according to one embodiment includes the operations of: executing an instruction to get bits for an operation, the instruction identifying a start bit address and a number of bits to be retrieved; retrieving the bits identified by the start bit address and number of bits from a bit-oriented register or cache; and performing a sequence of specified bit operations on the retrieved bits to generate results. | 12-05-2013 |
20130332708 | PROGRAMMABLE PARTITIONABLE COUNTER - An integrated circuit device for receiving packets. The integrated circuit device includes a programmable partitionable counter that includes a first counter partition for counting a number of the packets, and a second counter partition for counting bytes of the packets. The first counter partition and the second counter partition are configured to be incremented by a single command from the packet processor. | 12-12-2013 |
20130332709 | SET SAMPLING CONTROLS INSTRUCTION - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss. | 12-12-2013 |
20130339679 | METHOD AND APPARATUS FOR REDUCING AREA AND COMPLEXITY OF INSTRUCTION WAKEUP LOGIC IN A MULTI-STRAND OUT-OF-ORDER PROCESSOR - A computer system, a computer processor and a method executable on a computer processor involve placing each sequence of a plurality of sequences of computer instructions being scheduled for execution in the processor into a separate queue. The head instruction from each queue is stored into a first storage unit prior to determining whether the head instruction is ready for scheduling. For each instruction in the first storage unit that is determined to be ready, the instruction is moved from the first storage unit to a second storage unit. During a first processor cycle, each instruction in the first storage unit that is determined to be not ready is retained in the first storage unit, and the determining of whether the instruction is ready is repeated during the next processor cycle. Scheduling logic performs scheduling of instructions contained in the second storage unit. | 12-19-2013 |
20130339680 | NONTRANSACTIONAL STORE INSTRUCTION - A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction. | 12-19-2013 |
20130339681 | Temporal Multithreading - Systems and methods for temporal multithreading are described. In some embodiments, a method may include directing a first instruction received from a first of a plurality of pipeline stages to a first register set storing a first thread context. The method may also include, in response to a command to initiate execution of a second thread, directing a second instruction received from the first of the plurality of pipeline stages to a second register set storing a second thread context while concurrently directing a third instruction received from a second of the plurality of pipeline stages to the first register set. In some embodiments, various techniques disclosed herein may be implemented via a microprocessor, microcontroller, or the like. | 12-19-2013 |
20140013087 | PROCESSOR SYSTEM WITH PREDICATE REGISTER, COMPUTER SYSTEM, METHOD FOR MANAGING PREDICATES AND COMPUTER PROGRAM PRODUCT - A processor system is adapted to carry out a predicate swap instruction of an instruction set to swap, via a data pathway, predicate data in a first predicate data location of a predicate register with data in a corresponding additional predicate data location of a first additional predicate data container and to swap, via a data pathway, predicate data in a second predicate storage location of the predicate register with data in a corresponding additional predicate data location in a second additional predicate data container. | 01-09-2014 |
20140019729 | Method for Processing Data Sets, a Pipelined Stream Processor for Processing Data Sets, and a Computer Program for Programming a Pipelined Stream Processor - There is provided a method for processing data sets in a processor. The processor has a pipelined data path including an input, an output, and at least one discrete stage. The pipeline is configured to enable one or more data sets, each comprising one or more data items, to enter the pipeline from the input, propagate through the pipeline, and exit the pipeline through the output. Each discrete stage represents an operation to be performed on the data item occupying the discrete stage. The method comprises defining one or more non-overlapping sections of the pipeline corresponding to portions of the pipeline occupied by the data items of at least one data set. In addition, the method comprises providing one or more logic units, each dedicated to control the progress of the data items of the at least one data set through the pipeline as the section advances through the pipeline. | 01-16-2014 |
20140019730 | Method and Device for Data Transmission Between Register Files - The present disclosure discloses a method and device for data transmission between register files. The method includes that: data in a source register file are read at a Stage i of a pipeline; and the read data are transmitted to a destination register file using an idle instruction pipeline. With the method of the present disclosure, data and mask information are transmitted using an idle instruction pipeline, without addition of extra registers for data and control information buffering, thus reducing logic consumption as well as increasing utilization of an existing functional unit. | 01-16-2014 |
20140025935 | Programmable Queuing - A traffic manager includes an execution unit that is responsive to instructions related to queuing of data in memory. The instructions may be provided by a network processor that is programmed to generate such instructions, depending on the data. Examples of such instructions include (1) writing of data units (of fixed size or variable size) without linking to a queue, (2) re-sequencing of the data units relative to one another without moving the data units in memory, and (3) linking the previously-written data units to a queue. The network processor and traffic manager may be implemented in a single chip. | 01-23-2014 |
20140025936 | CONTROL APPARATUS - A control apparatus configured to receive instruction data from a transmission unit and to control a controlled apparatus based on the instruction data includes a determination unit configured to determine an error in reception of the instruction data from the transmission unit, a communication unit configured to receive the instruction data from the transmission unit and to transmit reply data according to a result of determination of the determination unit to the transmission unit, a module configured to control the controlled apparatus based on the instruction data, and a control unit configured to, if a content of current instruction data received by the communication unit matches a content of previous instruction data received by the communication unit, control the module not to control the controlled apparatus based on the current instruction data. | 01-23-2014 |
20140040604 | PACKED ROTATE PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a masked packed rotate instruction. The instruction indicates a first source packed data including a plurality of packed data elements, a packed data operation mask having a plurality of mask elements, at least one rotation amount, and a destination storage location. A result packed data is stored in the destination storage location in response to the instruction. The result packed data includes result data elements that each correspond to a different one of the mask elements in a corresponding relative position. Result data elements that are not masked out by the corresponding mask element include one of the data elements of the first source packed data in a corresponding position that has been rotated. Result data elements that are masked out by the corresponding mask element include a masked out value. Other methods, apparatus, systems, and instructions are disclosed. | 02-06-2014 |
20140059329 | ALLOCATION OF COUNTERS FROM A POOL OF COUNTERS TO TRACK MAPPINGS OF LOGICAL REGISTERS TO PHYSICAL REGISTERS FOR MAPPER BASED INSTRUCTION EXECUTIONS - A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool. | 02-27-2014 |
20140068232 | GLOBAL REGISTER PROTECTION IN A MULTI-THREADED PROCESSOR - Global register protection in a multi-threaded processor is described. In an embodiment, global resources within a multi-threaded processor are protected by performing checks, before allowing a thread to write to a global resource, to determine whether the thread has write access to the particular global resource. The check involves accessing one or more local control registers or a global control field within the multi-threaded processor and in an example, a local register associated with each other thread in the multi-threaded processor is accessed and checked to see whether it contains an identifier for the particular global resource. Only if none of the accessed local resources contain such an identifier, is the instruction issued and the thread allowed to write to the global resource. Otherwise, the instruction is blocked and an exception may be raised to alert the program that issued the instruction that the write failed. | 03-06-2014 |
20140068233 | INFORMATION PROCESSING APPARATUS AND COPY CONTROL METHOD - The information processing apparatus includes a creating unit and a control unit. On receiving an offloaded data transfer instruction, the creating unit creates a copy session for transferring data from a transfer-source memory area of a transfer-source memory apparatus to a transfer-destination memory area of a transfer-destination memory apparatus. When detecting no overload incurred by asynchronous execution control under which the data transfer is executed out of sync with the offloaded data transfer instruction, the control unit determines that the data transfer of the created copy session is to be executed under the asynchronous execution control. On the other hand, when detecting overload incurred by the asynchronous execution control, the control unit determines that the data transfer of the created copy session is to be executed under synchronous execution control in which the data transfer is executed in synchronization with the offloaded data transfer instruction. | 03-06-2014 |
20140075163 | LOAD-MONITOR MWAIT - Techniques are disclosed relating to suspending execution of a processor thread while monitoring for a write to a specified memory location. An execution subsystem may be configured to perform a load instruction that causes the processor to retrieve data from a specified memory location and atomically begin monitoring for a write to the specified location. The load instruction may be a load-monitor instruction. The execution subsystem may be further configured to perform a wait instruction that causes the processor to suspend execution of a processor thread during at least a portion of an interval specified by the wait instruction and to resume execution of the processor thread at the end of the interval. The wait instruction may be a monitor-wait instruction. The processor may be further configured to resume execution of the processor thread in response to detecting a write to a memory location specified by a previous monitor instruction. | 03-13-2014 |
20140089645 | PROCESSOR WITH EXECUTION UNIT INTEROPERATION - A processor includes a plurality of execution units. Each of the execution units includes processing logic configured to process data, and registers accessible by the processing logic. At least one of the execution units is configured to execute a first instruction that causes the at least one execution unit to: route a value from a first register of the registers of one of the execution units to the processing logic of one of the execution units, to process the value in the processing logic to generate a result, and to store the result in a second register of the registers of one of the execution units. At least one of the first register, the second register, and the processing logic are located in a different one of the execution units from the at least one of the execution units. | 03-27-2014 |
20140122840 | EFFICIENT USAGE OF A MULTI-LEVEL REGISTER FILE UTILIZING A REGISTER FILE BYPASS - A processor includes an execution unit, a first level register file, a second level register file, a plurality of storage locations and a register file bypass controller. The first and second level register files are comprised of physical registers, with the first level register file more efficiently accessed relative to the second level register file. The register file bypass controller is coupled with the execution unit and second level register file. The register file bypass controller determines whether an instruction indicates a logical register is unmapped from a physical register in the first level register file. The register file controller also loads data into one of the storage locations and selects one of the storage locations as input to the execution unit, without mapping the logical register to one of the physical registers in the first level register file. | 05-01-2014 |
20140122841 | EFFICIENT USAGE OF A REGISTER FILE MAPPER AND FIRST-LEVEL DATA REGISTER FILE - A processor includes a first level register file, second level register file, and register file mapper. The first and second level register files are comprised of physical registers, with the first level register file more efficiently accessed relative to the second level register file. The register file mapper is coupled with the first and second level register files. The register file mapper comprises a mapping structure and register file mapper controller. The mapping structure hosts mappings between logical registers and physical registers of the first level register file. The register file mapper controller determines whether to map a destination logical register of an instruction to a physical register in the first level register file. The register file mapper controller also determines, based on metadata associated with the instruction, whether to write data associated with the destination logical register to one of the physical registers of the second level register file. | 05-01-2014 |
20140122842 | EFFICIENT USAGE OF A REGISTER FILE MAPPER MAPPING STRUCTURE - A processor with a register file mapper can use a hasher to improve the distribution of mappings within a mapping structure. The hasher generates a value based, at least in part, on a thread identifier and logical register identifier. The hash value is used as an index value into the mapping structure. The hashing algorithm is chosen to provide a more even distribution of mappings within the mapping structure, reducing the amount of data written from a first level register file to a second level register file. | 05-01-2014 |
20140122843 | CONDITIONAL STORE INSTRUCTIONS IN AN OUT-OF-ORDER EXECUTION MICROPROCESSOR - An instruction translator translates a conditional store instruction (specifying data register, base register, and offset register of the register file) into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives a base value and an offset from the register file and generates a first result as a function of the base value and offset. The first result specifies the memory location address. To execute a second microinstruction, an execution unit receives the first result and writes the first result to an allocated entry in the store queue if the condition flags satisfy the condition (the store queue subsequently writes the data to the memory location specified by the address), and otherwise kills the allocated store queue entry so that the store queue does not write the data to the memory location specified by the address. | 05-01-2014 |
20140129808 | MIGRATING TASKS BETWEEN ASYMMETRIC COMPUTING ELEMENTS OF A MULTI-CORE PROCESSOR - In one embodiment, the present invention includes a multicore processor having first and second cores to independently execute instructions, the first core visible to an operating system (OS) and the second core transparent to the OS and heterogeneous from the first core. A task controller, which may be included in or coupled to the multicore processor, can cause dynamic migration of a first process scheduled by the OS to the first core to the second core transparently to the OS. Other embodiments are described and claimed. | 05-08-2014 |
20140156974 | DEVICE OPERATING INFORMATION PROVIDING DEVICE AND DEVICE OPERATING INFORMATION PROVIDING METHOD - A device operating information providing device includes a control information storing portion that produces and stores, each time an instruction value evaluating portion evaluates that an instruction control value has changed to a different value, control information including the instruction control value after the change, a most recent execution control value acquired by an execution value acquiring portion at the time of the change, and the timing of the change, an operating status evaluating portion that evaluates a device operating status using a change status of the instruction control value and a change status of the execution control value within a specific time interval, which can be specified by control information stored by the control information storing portion, and an operating information transmitting portion that produces, and transmits to a higher-level device side, operating information for the device, using the operating status evaluated by the operating status evaluating portion. | 06-05-2014 |
20140164744 | Tracking Multiple Conditions in a General Purpose Register and Instruction Therefor - An operate-and-insert instruction of a program, when executed performs an operation based on one or more operands, results of an instruction specified test of the operation performed are stored in an instruction specified location of an instruction specified general register. The instruction specified general register is therefore able to hold results of many operate-and-insert instructions. The program can then use non-branch type instructions to evaluate conditions saved in the register, thus avoiding the performance penalty of branch instructions. | 06-12-2014 |
20140173258 | TECHNIQUE FOR PERFORMING MEMORY ACCESS OPERATIONS VIA TEXTURE HARDWARE - A texture processing pipeline can be configured to service memory access requests that represent texture data access operations or generic data access operations. When the texture processing pipeline receives a memory access request that represents a texture data access operation, the texture processing pipeline may retrieve texture data based on texture coordinates. When the memory access request represents a generic data access operation, the texture pipeline extracts a virtual address from the memory access request and then retrieves data based on the virtual address. The texture processing pipeline is also configured to cache generic data retrieved on behalf of a group of threads and to then invalidate that generic data when the group of threads exits. | 06-19-2014 |
20140181482 | STORE-TO-LOAD FORWARDING - An arithmetic unit performs store-to-load forwarding based on predicted dependencies between store instructions and load instructions. In some embodiments, the arithmetic unit maintains a table of store instructions that are awaiting movement to a load/store unit of the instruction pipeline. In response to receiving a load instruction that is predicted to be dependent on a store instruction stored at the table, the arithmetic unit causes the data associated with the store instruction to be placed into the physical register targeted by the load instruction. In some embodiments, the arithmetic unit performs the forwarding by mapping the physical register targeted by the load instruction to the physical register where the data associated with the store instruction is located. | 06-26-2014 |
20140181483 | Computation Memory Operations in a Logic Layer of a Stacked Memory - Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various computation operations. This functionality would be desired where performing the operations locally near the memory devices would allow increased performance and/or power efficiency by avoiding transmission of data across the interface to the host processor. | 06-26-2014 |
20140189324 | PHYSICAL REGISTER TABLE FOR ELIMINATING MOVE INSTRUCTIONS - Embodiments of an invention for a physical register table for eliminating move instructions are disclosed. In one embodiment, a processor includes a physical register file, a register allocation table, and a physical register table. The register allocation table is to store mappings of logical registers to physical registers. The physical register table is to store entries including pointers to physical registers in the mappings. The number of entry locations in the physical register table is less than the number of physical registers in the physical register file. | 07-03-2014 |
20140189325 | PAGING IN SECURE ENCLAVES - Embodiments of an invention for paging in secure enclaves are disclosed. In one embodiment, a processor includes an instruction unit and an execution unit. The instruction unit is to receive a first instruction. The execution unit is to execute the first instruction, wherein execution of the first instruction includes evicting a first page from an enclave page cache. | 07-03-2014 |
20140189326 | MEMORY MANAGEMENT IN SECURE ENCLAVES - Embodiments of an invention for memory management in secure enclaves are disclosed. In one embodiment, a processor includes an instruction unit and an execution unit. The instruction unit is to receive a first instruction and a second instruction. The execution unit is to execute the first instruction, wherein execution of the first instruction includes allocating a page in an enclave page cache to a secure enclave. The execution unit is also to execute the second instruction, wherein execution of the second instruction includes confirming the allocation of the page. | 07-03-2014 |
20140189327 | ACKNOWLEDGEMENT FORWARDING - A method for processing data packets in a pipeline and executed by a network processor. The pipeline includes a plurality of logical blocks, each logical block configured to process one stage of the pipeline. Each data packet includes a descriptor and a data. The network processor is coupled to a resource for storing the data. The method reduces latency and enables non-blocking processing of data packets by forwarding a unique identification of a write request from a first logical block to a subsequent second logical block in the pipeline, the write request to modify the data in the resource. The method includes receiving the descriptor for processing at the first logical block, generating the write request and the unique identification for the write request, transmitting the write request to the resource, and transmitting the unique identification towards the second logical block before an acknowledgement is returned by the resource. | 07-03-2014 |
20140195784 | METHOD, DEVICE AND SYSTEM FOR CONTROLLING EXECUTION OF AN INSTRUCTION SEQUENCE IN A DATA STREAM ACCELERATOR - Techniques and mechanisms for controlling execution of an instruction sequence in a data stream processing engine. In an embodiment, a control unit of the data stream processing engine detects that execution of a first instruction in an instruction sequence has ended. In another embodiment, the control unit determines information regarding a next instruction of the instruction sequence which is to be executed. Control signals may be sent from the control unit to form a data path set for execution of the next instruction in the instruction sequence. | 07-10-2014 |
20140223148 | METHOD OF ENTROPY RANDOMIZATION ON A PARALLEL COMPUTER - Method, system, and computer program product for randomizing entropy on a parallel computing system using network arithmetic logic units (ALUs). In one embodiment, network ALUs on nodes of the parallel computing system pseudorandomly modify entropy data during broadcast operations through application of arithmetic and/or logic operations. That is, each compute node's ALU may modify the entropy data during broadcasts, thereby mixing, and thus improving, the entropy data with every hop of entropy data packets from one node to another. At each compute node, the respective ALUs may further deposit modified entropy data in, e.g., local entropy pools such that software running on the compute nodes and needing entropy data may fetch it from the entropy pools. In some embodiments, entropy data may be broadcast via dedicated packets or included in unused portions of existing broadcast packets. | 08-07-2014 |
20140223149 | METHOD OF ENTROPY RANDOMIZATION ON A PARALLEL COMPUTER - Method, system, and computer program product for randomizing entropy on a parallel computing system using network arithmetic logic units (ALUs). In one embodiment, network ALUs on nodes of the parallel computing system pseudorandomly modify entropy data during broadcast operations through application of arithmetic and/or logic operations. That is, each compute node's ALU may modify the entropy data during broadcasts, thereby mixing, and thus improving, the entropy data with every hop of entropy data packets from one node to another. At each compute node, the respective ALUs may further deposit modified entropy data in, e.g., local entropy pools such that software running on the compute nodes and needing entropy data may fetch it from the entropy pools. In some embodiments, entropy data may be broadcast via dedicated packets or included in unused portions of existing broadcast packets. | 08-07-2014 |
20140244983 | EXECUTING AN OPERATING SYSTEM ON PROCESSORS HAVING DIFFERENT INSTRUCTION SET ARCHITECTURES - An apparatus includes a first processor having a first instruction set and a second processor having a second instruction set that is different than the first instruction set. The apparatus also includes a memory storing at least a portion of an operating system. The operating system is concurrently executable on the first processor and the second processor. | 08-28-2014 |
20140244984 | ELIGIBLE STORE MAPS FOR STORE-TO-LOAD FORWARDING - The present invention provides a method and apparatus for generating eligible store maps for store-to-load forwarding. Some embodiments of the method include generating information associated with a load instruction in a load queue. The information indicates whether one or more store instructions in a store queue is older than the load instruction and whether the store instruction(s) overlap with any younger store instructions in the store queue that are older than the load instruction. Some embodiments of the method also include determining whether to forward data associated with a store instruction to the load instruction based on the information. Some embodiments of the apparatus include a load-store unit that implements embodiments of the method. | 08-28-2014 |
20140258690 | APPARATUS AND METHOD FOR NON-BLOCKING EXECUTION OF STATIC SCHEDULED PROCESSOR - An apparatus and method for non-blocking execution of a static scheduled processor, the apparatus including a processor to process at least one operation using transferred input data, and an input buffer used to transfer the input data to the processor, and store a result of processing the at least one operation, wherein the processor may include at least one functional unit (FU) to execute the at least one operation, and the at least one FU may process the transferred input data using at least one of a regular latency operation and an irregular latency operation. | 09-11-2014 |
20140258691 | THREAD TRANSITION MANAGEMENT - Various systems, processes, products, and techniques may be used to manage thread transitions. In particular implementations, a system and process for managing thread transitions may include the ability to determine that a transition is to be made regarding the relative use of two data register sets and determine, based on the transition determination, whether to move thread data in at least one of the data register sets to second-level registers. The system and process may also include the ability to move the thread data from at least one data register set to second-level registers based on the move determination. | 09-11-2014 |
20140281423 | PROCESSOR AND METHOD FOR PROCESSING INSTRUCTIONS USING AT LEAST ONE PROCESSING PIPELINE - A processor has a processing pipeline with first, second and third stages. An instruction at the first stage takes fewer cycles to reach the second stage then the third stage. The second and third stages each have a duplicated processing resource. For a pending instruction which requires the duplicated resource and can be processed using the duplicated resource at either of the second and third stages, the first stage determines whether a required operand would be available when the pending instruction would reach the second stage. If the operand would be available, then the pending instruction is processed using the duplicated resource at the second stage, while if the operand would not be available in time then the instruction is processed using the duplicated resource in the third pipeline stage. This technique helps to reduce delays caused by data dependency hazards. | 09-18-2014 |
20140281424 | TRACKING CONTROL FLOW OF INSTRUCTIONS - A mechanism for tracking the control flow of instructions in an application and performing one or more optimizations of a processing device, based on the control flow of the instructions in the application, is disclosed. Control flow data is generated to indicate the control flow of blocks of instructions in the application. The control flow data may include annotations that indicate whether optimizations may be performed for different blocks of instructions. The control flow data may also be used to track the execution of the instructions to determine whether an instruction in a block of instructions is assigned to a thread, a process, and/or an execution core of a processor, and to determine whether errors have occurred during the execution of the instructions. | 09-18-2014 |
20140281425 | LIMITED RANGE VECTOR MEMORY ACCESS INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS - A processor of an aspect includes a plurality of packed data registers. The processor also includes a unit coupled with the packed data registers. The unit is operable, in response to a limited range vector memory access instruction. The instruction is to indicate a source packed memory indices, which is to have a plurality of packed memory indices, which are to be selected from 8-bit memory indices and 16-bit memory indices. The unit is operable to access memory locations, in only a limited range of a memory, in response to the limited range vector memory access instruction. Other processors are disclosed, as are methods, systems, and instructions. | 09-18-2014 |
20140281426 | METHOD FOR POPULATING A SOURCE VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS - A method for populating a source view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a source view data structure, wherein the source view data structure stores sources corresponding to the instruction blocks as recorded by the plurality of register templates; and determining which of the plurality of instruction blocks are ready for dispatch by using the populated source view data structure. | 09-18-2014 |
20140281427 | METHOD FOR IMPLEMENTING A REDUCED SIZE REGISTER VIEW DATA STRUCTURE IN A MICROPROCESSOR - A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a register view data structure, wherein the register view data structure stores destinations corresponding to the instruction blocks as recorded by the plurality of register templates; and using the register view data structure to track a machine state in accordance with the execution of the plurality of instruction blocks, wherein the register view data structure is a reduced size register view data structure by only storing register template snapshots containing branches or by storing deltas between changing register template snapshots. | 09-18-2014 |
20140281428 | METHOD FOR POPULATING REGISTER VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS - A method for populating a register view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a register view data structure, wherein the register view data structure stores destinations corresponding to the instruction blocks as recorded by the plurality of register templates; and using the register view data structure to track a machine state in accordance with the execution of the plurality of instruction blocks. | 09-18-2014 |
20140331032 | STREAMING MEMORY TRANSPOSE OPERATIONS - According to one general aspect, an apparatus may include a load/store unit, an execution unit, and a first and a second data path. The load/store unit may be configured to load/store data from/to a memory and transmit the data to/from an execution unit, wherein the data includes a plurality of elements. The execution unit may be configured to perform an operation upon the data. The load/store unit may be configured to transmit the data to/from the execution unit via either a first data path configured to communicate, without transposition, the data between the load/store unit and the execution unit, or a second data path configured to communicate, with transposition, the data between the load/store unit and the execution unit, wherein transposition includes dynamically distributing portions of the data amongst a plurality of elements according to an instruction. | 11-06-2014 |
20140351567 | UNIQUE PACKED DATA ELEMENT IDENTIFICATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a unique packed data element identification instruction. The unique packed data element identification instruction indicates a source packed data having a plurality of packed data elements and indicates a destination storage location. A unique packed data element identification result is stored in the destination storage location in response to the unique packed data element identification instruction. The unique packed data element identification result indicates which of the plurality of the packed data elements are unique in the source packed data. Other methods, apparatus, systems, and instructions are disclosed. | 11-27-2014 |
20140380026 | CONTROL DEVICE AND ACCESS SYSTEM UTILIZING THE SAME - A control device coupled between a first memory and a second memory and including an execution unit, a first storage unit, a second storage unit, a selection unit and a processing unit is disclosed. The execution unit executes a specific instruction set to access the first and the second memories. The first storage unit is configured to store a first instruction set. The second storage unit is configured to store a second instruction set. The selection unit outputs one of the first and the second instruction sets to serve as the specific instruction set according to a control signal. The processing unit generates the control signal according to an execution state of the execution unit. | 12-25-2014 |
20150012732 | METHOD AND DEVICE FOR RECOMBINING RUNTIME INSTRUCTION - A method for recombining runtime instruction comprising: an instruction running environment is buffered; the machine instruction segment to be scheduled is obtained; the second jump instruction which directs an entry address of an instruction recombining platform is inserted before the last instruction of the obtained machine instruction segment to generate the recombined instruction segment comprising the address A″; the value A of the address register of the buffered instruction running environment is modified to the address A″; the instruction running environment is recovered. A device for recombining the runtime instruction comprising: an instruction running environment buffering and recovering unit suitable for buffering and recovering the instruction running environment; an instruction obtaining unit suitable for obtaining the machine instruction segment to be scheduled; an instruction recombining unit suitable for generating the recombined instruction segment comprised the address A″; and an instruction replacing unit suitable for modifying the value of the address register of the buffered instruction running environment to the address of the recombined instruction segment. The monitoring and control of the runtime instruction of the computing device is completed. | 01-08-2015 |
20150026438 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR COOPERATIVE MULTI-THREADING FOR VECTOR THREADS - A system, method, and computer program product for ensuring forward progress of threads that implement divergent operations in a single-instruction, multiple data (SIMD) architecture is disclosed. The method includes the steps of allocating a queue data structure to a thread block including a plurality of threads, determining that a current instruction specifies a yield operation, pushing a token onto the second side of the queue data structure, disabling any active threads in the thread block, popping a next pending token from the first side of the queue data structure, and activating one or more threads in the thread block according to a mask included in the next pending token. | 01-22-2015 |
20150033001 | METHOD, DEVICE AND SYSTEM FOR CONTROL SIGNALLING IN A DATA PATH MODULE OF A DATA STREAM PROCESSING ENGINE - Techniques and mechanisms for exchanging control signals in a data path module of a data stream processing engine. In an embodiment, the data path module may be configured to form a set of one or more data paths corresponding to an instruction which is to be executed. In another embodiment, data processing units of the data path module may be configured to exchange one or more control signals for elastic execution of the instruction. | 01-29-2015 |
20150039867 | INSTRUCTION SOURCE SPECIFICATION - Techniques are disclosed relating to specification of instruction operands. In some embodiments, this may involve assigning operands to source inputs. In one embodiment, an instruction includes one or more mapping values, each of which corresponds to a source of the instruction and each of which specifies a location value. In this embodiment, the instruction includes one or more location values that are each usable to identify an operand for the instruction. In this embodiment, a method may include accessing operands using the location values and assigning accessed operands to sources using the mapping values. In one embodiment, the sources may correspond to inputs of an execution block. In one embodiment, a destination mapping value in the instruction may specify a location value that indicates a destination for storing an instruction result. | 02-05-2015 |
20150046687 | Hardware Streaming Unit - A processor having a streaming unit is disclosed. In one embodiment, a processor includes one or more execution units configured to execute instructions of a processor instruction set. The processor further includes a streaming unit configured to execute a first instruction of the processor instruction set, wherein executing the first instruction comprises the streaming unit loading a first data stream from a memory of a computer system responsive to execution of a first instruction. The first data stream comprises a plurality of data elements. The first instruction includes a first argument indicating a starting address of the first stream, a second argument indicating a stride between the data elements, and a third argument indicative of an ending address of the stream. The streaming unit is configured to output a second data stream corresponding to the first data stream. | 02-12-2015 |
20150067305 | SPECIALIZED MEMORY DISAMBIGUATION MECHANISMS FOR DIFFERENT MEMORY READ ACCESS TYPES - A system and method for efficient predicting and processing of memory access dependencies. A computing system includes control logic that marks a detected load instruction as a first type responsive to predicting the load instruction has high locality and is a candidate for store-to-load (STL) data forwarding. The control logic marks the detected load instruction as a second type responsive to predicting the load instruction has low locality and is not a candidate for STL data forwarding. The control logic processes a load instruction marked as the first type as if the load instruction is dependent on an older store operation. The control logic processes a load instruction marked as the second type as if the load instruction is independent on any older store operation. | 03-05-2015 |
20150067306 | INTER-CORE COMMUNICATION VIA UNCORE RAM - A microprocessor includes a plurality of processing cores and an uncore random access memory (RAM) readable and writable by each of the plurality of processing cores. Each core of the plurality of processing cores comprises microcode run by the core that implements architectural instructions of an instruction set architecture of the microprocessor. The microcode is configured to both read and write the uncore RAM to accomplish inter-core communication between the plurality of processing cores. | 03-05-2015 |
20150082010 | SHIFT INSTRUCTION WITH PER-ELEMENT SHIFT COUNTS AND FULL-WIDTH SOURCES - Techniques for packing and unpacking data from a source register using a particular shift instruction are provided. The shift instructions takes, as input, a source register that contains a plurality of elements and a shift count register that contains a plurality of shift counts. Each shift count indicates how much to shift bits from the source registers. Where “source” bits are shifted (or copied) to in an output register depends on the position of the shift count in the shift count register. The shift counts may correspond to one or more bytes from the source register. The shift instruction may initiate a left shift operation or a right shift operation. | 03-19-2015 |
20150089207 | TECHNIQUE FOR COUNTING VALUES IN A REGISTER - A parallel counter accesses data generated by an application and stored within a register. The register includes different segments that include different portions of the application data. The parallel counter is configured to count the number of values within each segment that have a particular characteristic in a parallel fashion. The parallel counter may then return the individual segment counts to the application, or combine those segment counts and return a register count to the application. Advantageously, applications that rely on population count operations may be accelerated. Further, increasing the number of segments in a given register may reduce the time needed to count the values in that register, thereby providing a scalable solution to population counting. Additionally, the architecture of the parallel counter is sufficiently flexible to allow both register counting and segment counting, thereby combining two separate functionalities into just one hardware unit. | 03-26-2015 |
20150100765 | DISAMBIGUATION-FREE OUT OF ORDER LOAD STORE QUEUE - In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching. The method further includes in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match; and forwarding data from the first match to the subsequent load. | 04-09-2015 |
20150100766 | REORDERED SPECULATIVE INSTRUCTION SEQUENCES WITH A DISAMBIGUATION-FREE OUT OF ORDER LOAD STORE QUEUE - In a processor, a disambiguation-free out of order load store queue method. The method includes implementing a memory resource that can be accessed by a plurality of asynchronous cores; implementing a store retirement buffer, wherein stores from a store queue have entries in the store retirement buffer in original program order; and implementing speculative execution, wherein results of speculative execution can be saved in the store retirement/reorder buffer as a speculative state. The method further includes, upon dispatch of a subsequent load from a load queue, searching the store retirement buffer for address matching; and, in cases where there are a plurality of address matches, locating a correct forwarding entry by scanning for the store retirement buffer for a first match, and forwarding data from the first match to the subsequent load. Once speculative outcomes are known, the speculative state is retired to memory. | 04-09-2015 |
20150106597 | Computer Processor With Deferred Operations - A computer processor and corresponding method of operation employs execution logic that includes at least one functional unit and operand storage that stores data that is produced and consumed by the at least one functional unit. The at least one functional unit is configured to execute a deferred operation whose execution produces result data. The execution logic further includes a retire station that is configured to store and retire the result data of the deferred operation in order to store such result data in the operand storage, wherein the retire of such result data occurs at a machine cycle following issue of the deferred operation as controlled by statically-assigned parameter data included in the encoding of the deferred operation. | 04-16-2015 |
20150106598 | Computer Processor Employing Efficient Bypass Network For Result Operand Routing - A computer processor is provided with a plurality of functional units that performs operations specified by the at least one instruction over the multiple machine cycles, wherein the operations produce result operands. The processor also includes circuitry that generates result tags dynamically according to the number of operations that produce result operands in a given machine cycle. A bypass network is configured to provide data paths for transfer of operand data between the plurality of functional units according to the result tags. | 04-16-2015 |
20150106599 | EXECUTION OF A PERFORM FRAME MANAGEMENT FUNCTION INSTRUCTION - Optimizations are provided for frame management operations, including a clear operation and/or a set storage key operation, requested by pageable guests. The operations are performed, absent host intervention, on frames not resident in host memory. The operations may be specified in an instruction issued by the pageable guests. | 04-16-2015 |
20150113254 | EFFICIENCY THROUGH A DISTRIBUTED INSTRUCTION SET ARCHITECTURE - A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction. | 04-23-2015 |
20150121045 | READING A REGISTER PAIR BY WRITING A WIDE REGISTER - A read operation is initiated to obtain a wide input operand. Based on the initiating, a determination is made as to whether the wide input operand is available in a wide register or in two narrow registers. Based on determining the wide input operand is not available in the wide register, merging at least a portion of contents of the two narrow registers to obtain merged contents, writing the merged contents into the wide register, and continuing the read operation to obtain the wide input operand. Based on determining the wide input operand is available in the wide register, obtaining the wide input operand from the wide register. | 04-30-2015 |
20150121046 | ORDERING AND BANDWIDTH IMPROVEMENTS FOR LOAD AND STORE UNIT AND DATA CACHE - The present invention provides a method and apparatus for supporting embodiments of an out-of-order load to load queue structure. One embodiment of the apparatus includes a load queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes a load order queue for cacheable operations that ordered for a particular address. | 04-30-2015 |
20150127927 | EFFICIENT HARDWARE DISPATCHING OF CONCURRENT FUNCTIONS IN MULTICORE PROCESSORS, AND RELATED PROCESSOR SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA - Embodiments of the disclosure provide efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media. In one embodiment, a first instruction indicating an operation requesting a concurrent transfer of program control is detected in a first hardware thread of a multicore processor. A request for the concurrent transfer of program control is enqueued in a hardware first-in-first-out (FIFO) queue. A second instruction indicating an operation dispatching the request for the concurrent transfer of program control in the hardware FIFO queue is detected in a second hardware thread of the multicore processor. The request for the concurrent transfer of program control is dequeued from the hardware FIFO queue, and the concurrent transfer of program control is executed in the second hardware thread. In this manner, functions may be efficiently and concurrently dispatched in context of multiple hardware threads, while minimizing contention management overhead. | 05-07-2015 |
20150134937 | SIMD VARIABLE SHIFT AND ROTATE USING CONTROL MANIPULATION - Vector single instruction multiple data (SIMD) shift and rotate instructions are provided specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, and a second vector register. Vector data fields of a first element size are duplicated. Duplicate vector data fields are stored as corresponding data fields of twice the first element size. Control logic receives an element size for performing a SIMD shift or rotation operation. Through selectors corresponding to a vector element, portions are selected from the duplicated data fields, the selectors corresponding to any particular vector element select all portions similarly from the duplicated data fields for that particular vector element responsive to the first element size, but selectors corresponding to any particular vector element select at least two portions from the duplicated data fields differently for that particular vector element responsive to a second element size. | 05-14-2015 |
20150134938 | IMAGE PROCESSING DEVICE, INSTRUCTION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT - An image processing device includes an operation unit and is able to receive a plurality of operation instructions in parallel from the operation unit and a portable information processing terminal. The image processing device includes: an instruction processing unit that executes processing according to the received operation instructions. The instruction processing unit, when the operation instruction is a predetermined instruction generated by an operation to the operation unit included in the image processing device, executes predetermined processing according to the predetermined instruction, when the operation instruction is a predetermined instruction generated by an operation to the information processing terminal, stores the predetermined instruction in a storage medium of the image processing device, and when the operation instruction is a processing execution permission instruction, executes the predetermined processing according to the predetermined instruction corresponding to the processing execution permission instruction among the predetermined instructions stored in the storage medium. | 05-14-2015 |
20150293767 | ROTATING REGISTER FILE WITH BIT EXPANSION SUPPORT - A method and system for implementing rotating register files with bit expansion support may enable a plurality of register pointers, each with a respective increment value, to be implemented in a register file of a processor. The register pointers may correspond to different regions of the register file. Each region of the register file may have a different size. | 10-15-2015 |
20150309792 | REDUCING LATENCY FOR POINTER CHASING LOADS - Systems, methods, and apparatuses for reducing the load to load/store address latency in an out-of-order processor. When a producer load is detected in the processor pipeline, the processor predicts whether the producer load is going to hit in the store queue. If the producer load is predicted not to hit in the store queue, then a dependent load or store can be issued early. The result data of the producer load is then bypassed forward from the data cache directly to the address generation unit. This result data is then used to generate an address for the dependent load or store, reducing the latency of the dependent load or store by one clock cycle. | 10-29-2015 |
20150309793 | RESOURCE LOCKING FOR LOAD STORE SCHEUDLING IN A VLIW PROCESSOR - A load/store unit including a memory queue configured to store a plurality of memory instructions and state information indicating whether each memory instruction of the plurality of memory instructions can be performed independently, with, separately, or after older pending instructions; and a state-selection circuit configured to set a state information of each memory instruction of the plurality of memory instructions in view of an older pending instruction in the memory queue. | 10-29-2015 |
20150324196 | UTILIZING PIPELINE REGISTERS AS INTERMEDIATE STORAGE - In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR. | 11-12-2015 |
20150347130 | COMPUTER PROCESSOR EMPLOYING SPLIT-STREAM ENCODING - A computer processor is operably coupled to a memory system. The memory system is configured to store instruction blocks, wherein each instruction block is associated with an entry address and multiple distinct instruction streams within the instruction block. The multiple distinct instruction streams include at least a first instruction stream and a second instruction stream. The first instruction stream has an instruction order that logically extends in a direction of increasing memory space relative to the entry address of the instruction block. The second instruction stream has an instruction order that logically extends in a direction of decreasing memory space relative to the entry address of the instruction block. The computer processor includes a number of multi-stage instruction processing components corresponding to the multiple distinct instruction streams within each instruction block. The number of multi-stage instruction processing components are configured to access and process in parallel instructions belonging to multiple distinct instruction streams of a particular instruction block stored in the memory system. | 12-03-2015 |
20150363199 | PERFORMING A CLEAR OPERATION ABSENT HOST INTERVENTION - Optimizations are provided for frame management operations, including a clear operation and/or a set storage key operation, requested by pageable guests. The operations are performed, absent host intervention, on frames not resident in host memory. The operations may be specified in an instruction issued by the pageable guests. | 12-17-2015 |
20150363200 | QoS Based Dynamic Execution Engine Selection - In one embodiment, a processor includes plural processing cores, and plural instruction stores, each instruction store storing at least one instruction, each instruction having a corresponding group number, each instruction store having a unique identifier. The processor also includes a group execution matrix having a plurality of group execution masks and a store execution matrix comprising a plurality of store execution masks. The processor further includes a core selection unit that, for each instruction within each instruction store, selects a store execution mask from the store execution matrix. The core selection unit for each instruction within each instruction store selects at least one group execution mask from the group execution matrix. The core selection unit performs logic operations to create a core request mask. The processor includes an arbitration unit that determines instruction priority among each instruction, assigns an instruction for each available core, and signals the instruction store. | 12-17-2015 |
20150370558 | RELOCATION OF INSTRUCTIONS THAT USE RELATIVE ADDRESSING - Relocation of instructions that use relative addressing. Metadata relating to an instruction that uses relative addressing to access data and is to be relocated is stored prior to relocation. Based on relocating the instruction from one memory location to another memory location, a determination is made of an address to be used to access the data by the instruction. The determining is based on at least one of the metadata or an address of the another memory location. The instruction is executed at the another memory location, and the determined address is used to access the data. | 12-24-2015 |
20150370559 | ENDIAN-MODE-INDEPENDENT MEMORY ACCESS IN A BI-ENDIAN-MODE PROCESSOR ARCHITECTURE - Embodiments relate to vector processors. An aspect includes endian-mode-sensitive memory instructions for a vector processor. One embodiment includes a computer-implemented method for copying data between a vector register that includes byte elements 0 to S and a memory that is byte addressable. The computer-implemented method includes obtaining a vector instruction by a processor in a computer. The processor determines that the vector instruction is a memory access instruction specifying the vector register and a memory address. In response to the determination that is instruction is a memory access instruction and independent of a current global endian mode setting that is selectable in the processor, the processor executes the memory access instruction by copying the byte data between the memory and the vector register so that the byte element n of the vector register corresponds to the memory address+n for n=0 to S. | 12-24-2015 |
20150378737 | SENDING PACKETS USING OPTIMIZED PIO WRITE SEQUENCES WITHOUT SFENCES - Method and apparatus for sending packets using optimized PIO write sequences without sfences. Sequences of Programmed Input/Output (PIO) write instructions to write packet data to a PIO send memory are received at a processor supporting out of order execution. The PIO write instructions are received in an original order and executed out of order, with each PIO write instruction writing a store unit of data to a store buffer or a store block of data to the store buffer. Logic is provided for the store buffer to detect when store blocks are filled, resulting in the data in those store blocks being drained via PCIe posted writes that are written to send blocks in the PIO send memory at addresses defined by the PIO write instructions. Logic is employed for detecting the fill size of packets and when a packet's send blocks have been filled, enabling the packet data to be eligible for egress. | 12-31-2015 |
20150378738 | ACCURATE TRACKING OF TRANSACTIONAL READ AND WRITE SETS WITH SPECULATION - Improving the tracking of read sets and write sets associated with cache lines of a transaction in a pipelined processor executing memory instructions having the read sets and write sets associated with the cache lines is provided. Included is active read set and write set cache indicators associated with the memory operation of executing memory instructions and associated with a recovery pool based on memory instructions being not-speculative are updated when the memory instruction is not-newer in program order than an un-resolved branch instruction. Based on encountering a speculative branch instruction in the processor pipeline, a representation of the active read sets and write sets is copied to the recovery pool. Based on completing the speculative branch instruction, updating the active read sets and write sets from the representations copied to the recovery pool associated with the branch instruction upon a detection of a misprediction. | 12-31-2015 |
20150378739 | ACCURATE TRACKING OF TRANSACTIONAL READ AND WRITE SETS WITH SPECULATION - Improving the tracking of read sets and write sets associated with cache lines of a transaction in a pipelined processor executing memory instructions having the read sets and write sets associated with the cache lines is provided. Included is active read set and write set cache indicators associated with the memory operation of executing memory instructions and associated with a recovery pool based on memory instructions being not-speculative are updated when the memory instruction is not-newer in program order than an un-resolved branch instruction. Based on encountering a speculative branch instruction in the processor pipeline, a representation of the active read sets and write sets is copied to the recovery pool. Based on completing the speculative branch instruction, updating the active read sets and write sets from the representations copied to the recovery pool associated with the branch instruction upon a detection of a misprediction. | 12-31-2015 |
20160011876 | MANAGING INSTRUCTION ORDER IN A PROCESSOR PIPELINE | 01-14-2016 |
20160019067 | MECHANISM FOR INSTRUCTION SET BASED THREAD EXECUTION ON A PLURALITY OF INSTRUCTION SEQUENCERS - In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed. | 01-21-2016 |
20160098274 | LOAD-MONITOR MWAIT - Techniques are disclosed relating to suspending execution of a processor thread while monitoring for a write to a specified memory location. An execution subsystem may be configured to perform a load instruction that causes the processor to retrieve data from a specified memory location and atomically begin monitoring for a write to the specified location. The load instruction may be a load-monitor instruction. The execution subsystem may be further configured to perform a wait instruction that causes the processor to suspend execution of a processor thread during at least a portion of an interval specified by the wait instruction and to resume execution of the processor thread at the end of the interval. The wait instruction may be a monitor-wait instruction. The processor may be further configured to resume execution of the processor thread in response to detecting a write to a memory location specified by a previous monitor instruction. | 04-07-2016 |
20160124748 | NONTRANSACTIONAL STORE INSTRUCTION - A NONTRANSACTIONAL STORE instruction, executed in transactional execution mode, performs stores that are retained, even if a transaction associated with the instruction aborts. The stores include user-specified information that may facilitate debugging of an aborted transaction. | 05-05-2016 |
20160147532 | METHOD FOR HANDLING INTERRUPTS - Provided is a method for handling interrupts. The method includes receiving a first interrupt, and allocating the first interrupt to a first task queue of a first processing unit among a plurality of processing units, receiving a second interrupt, and allocating the second interrupt to the first task queue, handling the first interrupt allocated to the first task queue on the first processing unit, selecting a second processing unit that will handle the second interrupt among the plurality of processing units while the first interrupt is handled, and transferring the second interrupt allocated to the first task queue to a second task queue of the selected second processing unit. | 05-26-2016 |
20160147533 | INSTRUCTION TO LOAD DATA UP TO A SPECIFIED MEMORY BOUNDARY INDICATED BY THE INSTRUCTION - A Load to Block Boundary instruction is provided that loads a variable number of bytes of data into a register while ensuring that a specified memory boundary is not crossed. The boundary may be specified a number of ways, including, but not limited to, a variable value in the instruction text, a fixed instruction text value encoded in the opcode, or a register based boundary. | 05-26-2016 |
20160154650 | EFFICIENT USAGE OF A MULTI-LEVEL REGISTER FILE UTILIZING A REGISTER FILE BYPASS | 06-02-2016 |
20160170767 | TEMPORARY TRANSFER OF A MULTITHREADED IP CORE TO SINGLE OR REDUCED THREAD CONFIGURATION DURING THREAD OFFLOAD TO CO-PROCESSOR | 06-16-2016 |
20160179517 | NON-SERIALIZED PUSH INSTRUCTION FOR PUSHING A MESSAGE PAYLOAD FROM A SENDING THREAD TO A RECEIVING THREAD | 06-23-2016 |
20160179518 | NON-SERIALIZED PUSH INSTRUCTION FOR PUSHING A MESSAGE PAYLOAD FROM A SENDING THREAD TO A RECEIVING THREAD | 06-23-2016 |
20160188340 | SYSTEM AND METHOD FOR PERFORMING PARALLEL OPERATIONS USING A PLURALITY OF NODES - A root node is connected to each of leaf nodes directly or via one or more relay nodes in a hierarchical topology. A processor in each relay node stores, in a queue, a first instruction storing a result of performing a first portion of a predetermined operation. A downstream node, which is connected to each relay node and positioned on a leaf-node side of the each relay node in the hierarchical topology, generates a second instruction including data held in the downstream node, and transmits the generated second instruction to each relay node. An interface unit of each relay node performs the first portion of the predetermined operation on an intermediate result stored in the first instruction in the queue, based on data included in the second instruction. The root node performs a second operation by using results of the one or more relay nodes having performed the first operation. | 06-30-2016 |
20160202976 | MEMORY MANAGEMENT IN SECURE ENCLAVES | 07-14-2016 |
20160202983 | PROCESSOR SYSTEM AND METHOD BASED ON INSTRUCTION READ BUFFER | 07-14-2016 |
20160378476 | NON-DEFAULT INSTRUCTION HANDLING WITHIN TRANSACTION - Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner. | 12-29-2016 |
20180024933 | INSTRUCTION TO QUERY CACHE RESIDENCY | 01-25-2018 |
20190146794 | CONFLICT MASK GENERATION | 05-16-2019 |