Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


INSTRUCTION FETCHING

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712207000 Prefetching 67
712206000 Of multiple instructions simultaneously 25
Entries
DocumentTitleDate
20120173849Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response - Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is to required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism.07-05-2012
20090193232Apparatus, processor and method of cache coherency control - An apparatus includes a plurality of processors each of which includes a cache memory, and a controller which suspends a request of at least one of the processors during a predetermined period when a processor fetches a data from a main memory to the cache memory, wherein the controller suspends the request of at least one of the processors except the processor which fetches the data from the main memory to the cache memory.07-30-2009
20090193231METHOD AND APPARATUS FOR THREAD PRIORITY CONTROL IN A MULTI-THREADED PROCESSOR OF AN INFORMATION HANDLING SYSTEM - An information handling system employs a processor that includes a thread priority controller. An issue unit in the processor sends branch issue information to the thread priority controller when a branch instruction of an instruction thread issues. In one embodiment, if the branch issue information indicates low confidence in a branch prediction for the branch instruction, the thread priority controller speculatively increases or boosts the priority of the instruction thread containing this low confidence branch instruction. In the manner, should a branch redirect actually occur due to a mispredict, a fetcher is ready to access a redirect address in a memory array sooner than would otherwise be possible.07-30-2009
20110202748LOAD PAIR DISJOINT FACILITY AND INSTRUCTION THEREFORE - A Load/Store Disjoint instruction, when executed by a CPU, accesses operands from two disjoint memory locations and sets condition code indicators to indicate whether or not the two operands appeared to be accessed atomically by means of block-concurrent interlocked fetch with no intervening stores to the operands from other CPUs. In a Load Pair Disjoint form of the instruction, the accesses are loads and the disjoint data is stored in general registers.08-18-2011
20120246446Dynamically Determining the Profitability of Direct Fetching in a Multicore Architecture - Technologies are generally described herein for determining a profitability of direct fetching in a multicore processor. The multicore processor may include a first and a second tile. The first tile may include a first core and a first cache. The second tile may include a second core, a second cache, and a fetch location pointer register (FLPR). The multicore processor may migrate a thread executing on the first core to the second core. The multicore processor may store a location of the first cache in the FLPR. The multicore processor may execute the thread on the second core. The multicore processor may identify a cache miss for a block in the second cache. The multicore processor may determine whether a profitability of direct fetching of the block indicates direct fetching or directory-based fetching. The multicore processor may perform direct fetching or directory-based fetching based on the determination.09-27-2012
20120246445CENTRAL PROCESSING UNIT AND MICROCONTROLLER - A program data area 09-27-2012
20100077180GENERATING PREDICATE VALUES BASED ON CONDITIONAL DATA DEPENDENCY IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values.03-25-2010
20100115239VARIABLE INSTRUCTION WIDTH DIGITAL SIGNAL PROCESSOR - A DSP architecture achieves high code density and performance by using 16 bit encoding/decoding of three-register instructions and including orthogonal 64 register selection fields within a 32-bit instruction. A 64 entry register file allows high performance, while the 16-bit instruction size provides excellent code density in control type applications.05-06-2010
20100115238STREAM PROCESSING SYSTEM HAVING A RECONFIGURABLE MEMORY MODULE - A stream processing system includes a stream processing module coupled to a memory module and operable so as to fetch stream elements from the memory module, to process the stream elements fetched thereby, and to store processed stream elements in the memory module. The stream processing module includes a number (N) of stream processing units, and the memory module is configured with a number (N) of memory bank units each corresponding to a respective one of the stream processing units. The memory module is reconfigurable based on a desired inter-level configuration so that each of the memory bank units is configured to have a memory size sufficient to meet processing requirement of the respective one of the stream processing units.05-06-2010
20130086360FIFO Load Instruction - An instruction identifies a register and a memory location. Upon execution of the instruction by a processor, an item is loaded from the memory location and a shift and insert operation is performed to shift data in the register and to insert the item into the register.04-04-2013
20130086362Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand First-Use Information - A prefix instruction is executed and passes operands to a net instruction without storing the operands in an architected resource such that the execution of the next instruction uses the operands provided by the prefix instruction to perform an operation, the operands may be prefix instruction immediate field or a target register of the prefix instruction execution.04-04-2013
20130086361Scalable Decode-Time Instruction Sequence Optimization of Dependent Instructions - Producer-consumer instructions, comprising a first instruction and a second instruction in program order, are fetched requiring in-order execution, the second instruction is modified by the processor so that the first instruction and second instruction can be completed out-of-order, the modification comprising any one of extending an immediate field of the second instruction using immediate field information of the first instruction or providing a source location of the first instruction as an additional source location to source locations of the second instruction.04-04-2013
20130086359Processor Hardware Pipeline Configured for Single-Instruction Address Extraction and Memory Access Operation - Memory access instructions, such as load and store instructions, are processed in a processor-based system. Processor hardware pipeline configurations enable efficient performance of memory access instructions, such as a pipeline configuration that enables, for a memory access operation request by a register-operand based virtual machine, computation of the memory location corresponding to a virtual-machine register by extracting a bit-field from the virtual-machine instruction and accessing (load or store) the computed memory location that represents a virtual register of the virtual-machine, in a single pass through the pipeline. Thus this processor hardware pipeline configuration enables a virtual machine register read/write operation to be performed by a single hardware processor instruction through a single pass in the processor hardware pipeline, for a register-operand based virtual machine.04-04-2013
20130080740FAST CONDITION CODE GENERATION FOR ARITHMETIC LOGIC UNIT - In one embodiment, a microprocessor includes fetch logic for retrieving an instruction, decode logic configured to identify an arithmetic operation specified in the instruction, and execution logic configured to receive operands specified by the instruction. The execution logic includes a primary logic path configured to perform the arithmetic operation on such operands and a secondary parallel logic path configured to output metadata associated with the result of the arithmetic operation.03-28-2013
20130036294SYSTEM AND METHOD FOR INSTRUCTION SETS WITH RUN-TIME CONSISTENCY CHECK - A system and method includes modules for determining whether an instruction is a target of a non-sequential fetch operation with an expected numerical property value, and avoiding execution of the instruction if it is the target of the non-sequential fetch operation and does not have the expected numerical property. Other embodiments include encoding an instruction with a functionality that is a target of a non-sequential fetch operation with an expected numerical property value. Instructions with the same functionality that are not targets of non-sequential fetch operations can be encoded with a different numerical property value. More specific embodiments can include a numerical property of parity, determining whether the instruction is valid, and throwing an exception, setting status bits, sending an interrupt to a control processor, and a combination thereof to avoid execution.02-07-2013
20100169612Data-Processing Unit for Nested-Loop Instructions - A data-processing unit has a fetching circuitry (07-01-2010
20090119486Method and a System for Accelerating Procedure Return Sequences - A method for retrieving a return address from a link stack when returning from a procedure in a pipeline processor is disclosed. The method identifies a retrieve instruction operable to retrieve a return address from a software stack. The method further identifies a branch instruction operable to branch to the return address. The method retrieves the return address from the link stack, in response to both the instruction and the branch instruction being identified and fetches instructions using the return address.05-07-2009
20130042089WORD LINE LATE KILL IN SCHEDULER - A method for picking an instruction for execution by a processor includes providing a multiple-entry vector, each entry in the vector including an indication of whether a corresponding instruction is ready to be picked. The vector is partitioned into equal-sized groups, and each group is evaluated starting with a highest priority group. The evaluating includes logically canceling all other groups in the vector when a group is determined to include an indication that an instruction is ready to be picked, whereby the vector only includes a positive indication for the one instruction that is ready to be picked.02-14-2013
20100100710Information processing apparatus, cache memory controlling apparatus, and memory access order assuring method - According to an aspect of the embodiment, when data on a cache RAM is rewritten in a storage processing of one thread, an determination unit searches a fetch port which holds a request of another thread, checks whether a request exists whose processing is completed, whose instruction is a load type instruction, and whose target address corresponds to a target address in a storage processing. When the corresponding request is detected, the determination unit sets a re-execution request flag to all the entries of the fetch port from the next entry of the entry which holds the oldest request to the entry which holds the detected request. When the processing of the oldest request is executed, a re-execution request unit transfers a re-execution request of an instruction to an instruction control unit for the request held in the entry in which the re-execution request flag is set.04-22-2010
20100042811METHOD FOR MANAGING BRANCH INSTRUCTIONS AND A DEVICE HAVING BRANCH INSTRUCTION MANAGEMENT CAPABILITIES - A method for managing branch instructions, the method includes: providing, to pipeline stages of a processor, multiple variable length groups of instructions; wherein each pipeline stage executes a group of instruction during a single execution cycle; receiving, at a certain execution cycle, multiple instruction fetch requests from multiple pipeline stages, each pipeline stage that generates an instruction fetch request stores a variable length group of instructions that comprises a branch instruction; sending to the fetch unit an instruction fetch command that is responsive to a first in order branch instruction in the pipeline stages; wherein if the first in order fetch command is a conditional fetch command then the instruction fetch command comprises a resolved target address; wherein the sending of the instruction fetch command is restricted to a single instruction fetch command per a single execution cycle.02-18-2010
20100106943Processing device - It is possible to realize fetch of instructs constituting a loop by using a simple configuration without fixing a loop start point. Provided is a processing method performed by a processing device including: a instruction buffer; a instruction decoder; a pointer arranged to correspond to the instruction buffer and indicating a connection relationship between one instruction buffer from which a instruction stream is read out and other instruction buffer containing the next instruction stream to be read out, according to an identifier of other instruction buffer; a start point storage unit containing an identifier of the instruction buffer containing a instruction stream serving as a start point of repetition when performing a instruction fetch of such a predetermined instruction that processing of a instruction stream is repeated in a loop. When a instruction stream is read out from the instruction buffer and the predetermined instruction is detected from the instruction stream, the identifier stored in the start point storage unit is set as a pointer of the identifier of the next instruction buffer from which a instruction is to be read out.04-29-2010
20120191948Nested Virtualization Performance In A Computer System - A virtualization architecture for improving the performance of nested virtualization in a computer system. A virtualization instruction reads or writes data in a control structure used by a virtual machine monitor (VMM) to maintain state on a virtual machine (VM) to support transitions between a root mode of operation of a CPU in which the VMM executes and a non-root mode of operation of the CPU in which the VM executes. A privileged data access is made to a primary control structure according to the virtualization instruction if the CPU is in the root mode. A non-privileged data access is made to a secondary control structure according to the virtualization instruction if the CPU is in the non-root mode and a secondary control structure field in the primary control structure is enabled.07-26-2012
20130073834MFENCE AND LFENCE MICRO-ARCHITECTURAL IMPLEMENTATION METHOD AND SYSTEM - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.03-21-2013
20130073833REDUCING STORE-HIT-LOADS IN AN OUT-OF-ORDER PROCESSOR - A technique for reducing store-hit-loads in an out-of-order processor includes storing a store address of a store instruction associated with a store-hit-load (SHL) pipeline flush in an SHL entry. In response to detecting another SHL pipeline flush for the store address, a current count associated with the SHL entry is updated. In response to the current count associated with the SHL entry reaching a first terminal count, a dependency for the store instruction is created such that execution of a younger load instruction with a load address that overlaps the store address stalls until the store instruction executes.03-21-2013
20090138679Enhanced Boolean Processor - A processor including a Boolean logic unit, wherein the Boolean logic unit is operated for performing the short-circuit evaluation of a Normal Form Boolean expression/operation, a plurality of input/output interfaces in communication with the Boolean logic unit, wherein the plurality of input/output interfaces are operated for receiving a plurality of compiled Boolean expressions/operations and transmitting a plurality of compiled results, and a plurality of registers coupled to the plurality of input/output interface circuits, wherein the plurality of multi-bit registers include an instruction register, a first address register and a second address register.05-28-2009
20090282220Microprocessor with Compact Instruction Set Architecture - A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions and can be used to unify one or more ISA extensions such as application specific ASEs. The re-encoded ISA maintains assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions.11-12-2009
20130067200MFENCE AND LFENCE MICRO-ARCHITECTURAL IMPLEMENTATION METHOD AND SYSTEM - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.03-14-2013
20090182983Compare and Branch Facility and Instruction Therefore - An atomic compare and branch instruction is executed that combines the function of a compare instruction having an option field with a conditional branch or jump instruction such that condition codes are preserved rather than setting condition codes to a value representative of the compare results. One comparand is obtained from any one of a memory location or an immediate field and the other comparand is obtained from a register field.07-16-2009
20090235051System and Method of Selectively Committing a Result of an Executed Instruction - In a particular embodiment, a method is disclosed that includes receiving an instruction packet including a first instruction and a second instruction that is dependent on the first instruction at a processor having a plurality of parallel execution pipelines, including a first execution pipeline and a second execution pipeline. The method further includes executing in parallel at least a portion of the first instruction and at least a portion of the second instruction. The method also includes selectively committing a second result of executing the at least a portion of the second instruction with the second execution pipeline based on a first result related to execution of the first instruction with the first execution pipeline.09-17-2009
20120272043REQUEST COALESCING FOR INSTRUCTION STREAMS - Sequential fetch requests from a set of fetch requests are combined into longer coalesced requests that match the width of a system memory interface in order to improve memory access efficiency for reading the data specified by the fetch requests. The fetch requests may be of different classes and each data class is coalesced separately, even when intervening fetch requests are of a different class. Data read from memory is ordered according to the order of the set of fetch requests to produce an instruction stream that includes the fetch requests for the different classes.10-25-2012
20090006812Method and Apparatus for Accessing a Cache With an Effective Address - A method and apparatus for accessing a processor cache. The method includes executing an access instruction in a processor core of the processor. The access instruction provides an untranslated effective address of data to be accessed by the access instruction. The method also includes determining whether a level one cache for the processor core includes the data corresponding to the effective address of the access instruction. The effective address of the access instruction is used without address translation to determine whether the level one cache for the processor core includes the data corresponding to the effective address. If the level one cache includes the data corresponding to the effective address, the data for the access instruction is provided from the level one cache.01-01-2009
20130166880PROCESSING DEVICE AND METHOD FOR CONTROLLING PROCESSING DEVICE - A processing device has an instruction buffer retaining one or more instructions obtained by an instruction fetch request, an instruction execution control unit decoding and executing an instruction, a branch prediction mechanism retaining one or more branch histories including a distance flag indicating a difference between a branch instruction address and a branch destination instruction address and performing a branch prediction of an instruction, and an instruction fetch control unit issuing the instruction fetch request. When a branch prediction result is a branch taken and it is judged based on the distance flag that the instruction fetch request for the branch destination instruction address is included in the instruction fetch requests in a sequential direction which are issued until the branch prediction result is outputted, the control unit causes to output an instruction retained in the instruction buffer without issuing an instruction fetch request for the branch destination instruction address.06-27-2013
20090235052Data Processing Device and Electronic Equipment - A data processing device is provided using pipeline architecture to reduce a time loss due to a branch without causing an increase in circuit scale. The data processing device uses pipeline control. The data processing device includes an instruction queue in which a plurality of instruction codes can be fetched, a fetch address operation circuit which calculates a fetch address, a fetch circuit which fetches an instruction code based on the fetch address, and a branch information setting circuit which decodes a branch setting instruction, stores a branch address in a branch address storage register, and stores a branch target address in a branch target address storage register. The fetch address operation circuit compares either a previous fetch address or an expected next fetch address with a value stored in the branch address storage register, and determines a next fetch address to be output, based on the comparison result.09-17-2009
20090119485Predecode Repair Cache For Instructions That Cross An Instruction Cache Line - A predecode repair cache is described in a processor capable of fetching and executing variable length instructions having instructions of at least two lengths which may be mixed in a program. An instruction cache is operable to store in an instruction cache line instructions having at least a first length and a second length, the second length longer than the first length. A predecoder is operable to predecode instructions fetched from the instruction cache that have invalid predecode information to form repaired predecode information. A predecode repair cache is operable to store the repaired predecode information associated with instructions of the second length that span across two cache lines in the instruction cache. Methods for filling the predecode repair cache and for executing an instruction that spans across two cache lines are also described.05-07-2009
20110289299System and Method to Evaluate a Data Value as an Instruction - A system and method to evaluate a data value as an instruction is disclosed. For example, an apparatus configured to execute program code includes an execute unit configured to execute a first instruction associated with a location of a second instruction. The first instruction is identified by a program counter. The apparatus also includes a decode unit configured to receive the second instruction from the location and to decode the second instruction to generate a decoded second instruction without changing the program counter to point to the second instruction.11-24-2011
20110289300Indirect Branch Target Predictor that Prevents Speculation if Mispredict Is Expected - In one embodiment, a processor implements an indirect branch target predictor to predict target addresses of indirect branch instructions. The indirect branch target predictor may store target addresses generated during previous executions of indirect branches, and may use the stored target addresses as predictions for current indirect branches. The indirect branch target predictor may also store a validation tag corresponding to each stored target address. The validation tag may be compared to similar data corresponding to the current indirect branch being predicted. If the validation tag does not match, the indirect branch is presumed to be mispredicted (since the branch target address actually belongs to a different instruction). The indirect branch target predictor may inhibit speculative execution subsequent to the mispredicted indirect branch until the redirect is signalled for the mispredicted indirect branch.11-24-2011
20090300330DATA PROCESSING METHOD AND SYSTEM BASED ON PIPELINE - A data processing system and method are disclosed. The system comprises an instruction-fetch stage where an instruction is fetched and a specific instruction is input into decode stage; a decode stage where said specific instruction indicates that contents of a register in a register file are used as an index, and then, the register file pointed to by said index is accessed based on said index; an execution stage where an access result of said decode stage is received, and computations are implemented according to the access result of the decode stage.12-03-2009
20090006811Method and System for Expanding a Conditional Instruction into a Unconditional Instruction and a Select Instruction - A method of expanding a conditional instruction having a plurality of operands within a pipeline processor is disclosed. The method identifies the conditional instruction prior to an issue stage and determines if the plurality of operands exceeds a predetermined threshold. The method expands the conditional instruction into a non-conditional instruction and a select instruction. The method further executes the non-conditional instruction and the select instruction in separate pipelines.01-01-2009
20090287907System for providing trace data in a data processor having a pipelined architecture - The invention is a method and system for providing trace data in a pipelined data processor. Aspects of the invention include providing a trace pipeline in parallel to the execution pipeline, providing trace information on whether conditional instructions complete or not, providing trace information on the interrupt status of the processor, replacing instructions in the processor with functionally equivalent instructions that also produce trace information and modifying the scheduling of instructions in the processor based on the occupancy of a trace output buffer.11-19-2009
20100169611BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS - A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.07-01-2010
20090187740Reducing errors in pre-decode caches - In a data processing system, data representing program instructions is fetched from memory, each instruction being from one of a plurality of sets of instructions including at least first and second sets of instructions and each program instruction within the fetched data comprising one or more blocks to be pre-decoded, each block representing a portion of an instruction. Pre-decoding circuitry is configured to perform pre-decoding operations on the blocks. For at least one portion of an instruction from the first set of instructions and at least one portion of an instruction from the second set of instructions the pre-decoding operation performed on a block fetched from memory is independent of whether the block is identified as representing the at least one portion of an instruction from the first set of instructions or as the at least one portion of an instruction from the second set of instructions.07-23-2009
20090113177INTEGRATED CIRCUIT WITH DMA MODULE FOR LOADING PORTIONS OF CODE TO A CODE MEMORY FOR EXECUTION BY A HOST PROCESSOR THAT CONTROLS A VIDEO DECODER - A system, method, and apparatus for dynamically booting processor code memory with a wait instruction is presented herein. A wait instruction precedes the transfer of a new code portion to the code memory. The wait instruction causes the processor to temporarily cease using the code memory. When the processor ceases using the code memory, the processor signals a direct memory access (DMA) module to transfer a new code portion to the code memory. The DMA module transfers the new code portion to the code memory and transmits a signal to the processor when the transfer is completed. The signal causes the processor to resume. When the processor resumes, the processor begins executing the instructions at the next code address.04-30-2009
20100017580Pre-decode checking for pre-decoded instructions that cross cache line boundaries - A data processing and method are provided for pre-decoding instructions. The data processing apparatus has pre-decoding circuitry for receiving instructions fetched from a memory and for performing a pre-decoding operation to generate corresponding pre-decoded instructions, which are then stored in the cache for access by the processing circuitry. If a pre-decoded instruction crosses a cache line boundary, then checking circuitry in respect of selected types of pre-decoded instruction checks for consistency between the first portion of the pre-decoded instruction stored within a first cache line and a contiguous second portion of the pre-decoded instruction stored within a second cache line. If this consistency check is passed such that the two portions are self-consistent, then the pre-decoded instruction can be further decoded and issued. If the consistency check is failed, or the pre-decoded instruction is not of a type for which consistency checking is supported, then re-generation of the pre-decoded instruction is triggered.01-21-2010
20100122067ACROSS-THREAD OUT-OF-ORDER INSTRUCTION DISPATCH IN A MULTITHREADED MICROPROCESSOR - Instruction dispatch in a multithreaded microprocessor such as a graphics processor is not constrained by an order among the threads. Instructions for each thread are fetched, and a dispatch circuit determines which instructions in the buffer are ready to execute. The dispatch circuit may issue any ready instruction for execution, and an instruction from one thread may be issued prior to an instruction from another thread regardless of which instruction was fetched first. If multiple functional units are available, multiple instructions can be dispatched in parallel.05-13-2010
20100122066INSTRUCTION METHOD FOR FACILITATING EFFICIENT CODING AND INSTRUCTION FETCH OF LOOP CONSTRUCT - Instruction set techniques have been developed to identify explicitly the beginning of a loop body and to code a conditional loop-end in ways that allow a processor implementation to efficiently manage an instruction fetch buffer and/or entries in an instruction cache. In particular, for some computations and processor implementations, a machine instruction is defined that identifies a loop start, stores a corresponding loop start address on a return stack (or in other suitable storage) and directs fetch logic to take advantage of the identification by retaining in a fetch buffer or instruction cache the instruction(s) beginning at the loop start address, thereby avoiding usual branch delays on subsequent iterations of the loop. A conditional loop-end instruction can be used in conjunction with the loop start instruction to discard (or simply mark as no longer needed) the loop start address and the loop body instructions retained in the fetch buffer or instruction cache.05-13-2010
20100125720INSTRUCTION MODE IDENTIFICATION APPARATUS AND METHOD - An instruction mode identification apparatus includes a program counter and a processor. The program counter stores an instruction address, which comprises a plurality of bits for indicating an address of an instruction currently executed or to be executed. At least one of the plurality of bits is a redundant bit. The processor identifies an instruction mode according to the redundant bit. The instruction mode represents an execution mode of the current instruction. An instruction mode identification method is also disclosed.05-20-2010
20090210659PROCESSOR AND METHOD FOR WORKAROUND TRIGGER ACTIVATED EXCEPTIONS - A processor includes a microarchitecture for working around a processing flaw, the microarchitecture including: at least one detector adapted for detecting a predetermined state associated with the processing flaw; and at least one mechanism to modify default processor processing behavior; and upon modification of processing behavior, the processing of an instruction involving the processing flaw can be completed by avoiding the processing flaw.08-20-2009
20100082947VERY-LONG INSTRUCTION WORD ARCHITECTURE WITH MULTIPLE PROCESSING UNITS - A processor may include a plurality of processing units for processing instructions, where each processing unit is associated with a discrete instruction queue. Data is read from a data queue selected by each instruction, and a sequencer manages distribution of instructions to the plurality of discrete instruction queues.04-01-2010
20100082946Microcomputer and its instruction execution method - A microcomputer in accordance with an exemplary embodiment of the preset invention include an instruction decoder 04-01-2010
20100100709Instruction control apparatus and instruction control method - In a CPU having a SMT function of executing plural threads composed of a series of instructions representing processing, there are provided a decode section for decoding processing represented by instructions of plural threads, an instruction buffer for obtaining instructions from a thread and holding the instructions, and inputting the held instructions to the decode section in order in the thread, and an execution pipeline for executing processing of instructions decoded by the decode section. The decode section checks whether or not an executable condition is ready for an instruction when the instruction is decoded and requests that the instructions held in the instruction buffer and an instruction subsequent to an instruction that is not ready with an executable condition are inputted again to the decode section.04-22-2010
20100100708Processing device - A processing device which can execute a plurality of threads includes: an execution unit which executes a command; a supply unit which supplies a command to the execution unit; a buffer unit which holds the command supplied from the supply unit; and a control unit which manages the buffer unit. The buffer unit has a set of buffer elements. Each of the buffer elements has a data unit for storing a command and a pointer unit for defining a connection relationship between the buffer elements. The control unit has a thread allocation unit which allocates a sequence of buffer elements whose connection relationship has been defined by the pointer unit for respective threads executed by the processing device.04-22-2010
20090300329VOLTAGE DROOP MITIGATION THROUGH INSTRUCTION ISSUE THROTTLING - A system and method for providing a digital real-time voltage droop detection and subsequent voltage droop reduction. A scheduler within a reservation station may store a weight value for each instruction corresponding to node capacitance switching activity for the instruction derived from pre-silicon power modeling analysis. For instructions picked with available source data, the corresponding weight values are summed together to produce a local current consumption value and this value is summed with any existing global current consumption values from corresponding schedulers of other processor cores yielding an activity event. The activity event is stored. Hashing functions within the scheduler are used to determine both a recent and an old activity average using the calculated activity event and stored older activity events. Instruction issue throttling occurs if either a difference between the old activity average and the recent activity average exceed a first threshold or the recent activity average exceeds a second threshold.12-03-2009
20090119487ARITHMETIC PROCESSING APPARATUS FOR EXECUTING INSTRUCTION CODE FETCHED FROM INSTRUCTION CACHE MEMORY - An arithmetic processing apparatus includes a cache block which stores a plurality of instruction codes from a main memory, a central processing unit which fetch-accesses the cache block and sequentially loads and executes the plurality of instruction codes, and a repeat buffer which stores an instruction code group corresponding to a buffer size, the instruction code group ranging from a head instruction code to a terminal instruction code among the head instruction code to an end instruction code of a repeat block repeatedly executed in the processing program, in the plurality of instruction codes stored in the cache block. The arithmetic processing apparatus further includes an instruction cache control unit which performs control so that the instruction code group stored in the repeat buffer is selected and supplied to the central processing unit when the repeat block is repeatedly executed.05-07-2009
20080276071REDUCING THE FETCH TIME OF TARGET INSTRUCTIONS OF A PREDICTED TAKEN BRANCH INSTRUCTION - A method and processor for reducing the fetch time of target instructions of a predicted taken branch instruction. Each entry in a buffer, referred to herein as a “branch target buffer”, may store an address of a branch instruction predicted taken and the instructions beginning at the target address of the branch instruction predicted taken. When an instruction is fetched from the instruction cache, a particular entry in the branch target buffer is indexed using particular bits of the fetched instruction. The address of the branch instruction in the indexed entry is compared with the address of the instruction fetched from the instruction cache. If there is a match, then the instructions beginning at the target address of that branch instruction are dispatched directly behind the branch instruction. In this manner, the fetch time of target instructions of a predicted taken branch instruction is reduced.11-06-2008
20080276070REDUCING THE FETCH TIME OF TARGET INSTRUCTIONS OF A PREDICTED TAKEN BRANCH INSTRUCTION - A method and processor for reducing the fetch time of target instructions of a predicted taken branch instruction. Each entry in a buffer, referred to herein as a “branch target buffer”, may store an address of a branch instruction predicted taken and the instructions beginning at the target address of the branch instruction predicted taken. When an instruction is fetched from the instruction cache, a particular entry in the branch target buffer is indexed using particular bits of the fetched instruction. The address of the branch instruction in the indexed entry is compared with the address of the instruction fetched from the instruction cache. If there is a match, then the instructions beginning at the target address of that branch instruction are dispatched directly behind the branch instruction. In this manner, the fetch time of target instructions of a predicted taken branch instruction is reduced.11-06-2008
20090063819Method and Apparatus for Dynamically Managing Instruction Buffer Depths for Non-Predicted Branches - A method and apparatus for dynamically managing instruction buffer depths for non-predicted branches reduces wasted energy and resources associated with low confidence branch prediction conditions. A portion of the instruction buffer for a instruction thread is allocated for storing predicted branch instruction streams and another portion, which may be zero-sized during high prediction confidence conditions, is allocated to the non-predicted branch instruction stream. The size of the buffers is adjusted dynamically in conformity with an on-going prediction confidence that provides a measure of how well branch prediction mechanisms are working for a given instruction thread. An alternate instruction fetch address table can be maintained and multiplexed with the main fetch address register for addressing the instruction cache, so that the instruction stream can be quickly shifted to the non-predicted path when a branch instruction is resolved to the non-predicted path.03-05-2009
20080313431Method and System for Altering Processor Execution of a Group of Instructions - An embodiment of the invention is a processor for detecting one or more groups of instructions and initiating a processor action upon detecting one or more groups of instructions. The processor includes an instruction unit for fetching and decoding a group of instructions. An instruction register receives the group of instruction having at least one instruction opcode. A control register includes a control word including a control opcode and an action field defining a processor action. An execution unit includes compare logic for comparing the instruction opcode and the control opcode. The execution unit initiates the processor action upon the compare logic detecting a hit between the instruction opcode and the control opcode.12-18-2008
20080244229Information processing apparatus - In an information processing apparatus, a fetch to a storage address of a first storage unit which stores a first instruction executed at first within a plurality of instructions that is included in a software and executed when a processor starts the software via the channel is detected. It is detected that the processor executed a specific instruction within the plurality of instructions via the channel. It is determined whether a predetermined time has passed since the detection of the fetch to the storage address until the detection of the execution of the specific instruction. When it is determined that the predetermined time has not passed, it is determined whether an interrupt to the processor is prohibited based on a result of the processor executing the specific instruction, and an access is released to the process according to a result of determination.10-02-2008
20100100707DATA STRUCTURE FOR CONTROLLING AN ALGORITHM PERFORMED ON A UNIT OF WORK IN A HIGHLY THREADED NETWORK ON A CHIP - A computer-implemented method, system and computer program product for controlling an algorithm that is performed on a unit of work in a subsequent software pipeline stage in a Network On a Chip (NOC) is presented. In one embodiment, the method executes a first operation in a first node of the NOC. The first node generates payload, and then loads that payload into a message. The message with the payload is transmitted to a nanokernel that controls a second node in the NOC. The nanokernel calls an algorithm that is needed by a second operation in a second node in the NOC, which uses the algorithm to execute the second operation.04-22-2010
20080270757Fetch and Dispatch Disassociation Apparatus for Multistreaming Processors - A dynamic multistreaming processor has instruction queues, each instruction queue corresponding to an instruction stream, and execution units. The dynamic multistreaming processor also has a dispatch stage to select at least one instruction from one of the instruction queues and to dispatch the selected at least one instruction to one of the execution units. Lastly the dynamic multistreaming processor has a queue counter, associated with each instruction queue, for indicating the number of instructions in each queue, and a fetch counter, associated with each instruction queue, for indicating an address from which to obtain instructions when the associated instruction queue is not full. The dynamic multistreaming processor might also have fetch counters for indicating a next instruction address from which to obtain at least one instruction when the associated instruction queue is not full. The dynamic multistreaming processor could also have a second counter for indicating a next instruction address.10-30-2008
20090182984Execute Relative Long Facility and Instructions Therefore - A method, system and program product for an execute relative instruction, which when executed fetches and executes a target instruction at a relative address and then returns processing to the next instruction following the execute relative instruction. The relative address is formed by adding the value of the program counter to a sign extended immediate field. The fetched target instruction is optionally modified before execution by OR'ing bits into predetermined bits of the target instruction.07-16-2009
20090182985Move Facility and Instructions Therefore - A move instruction, having a signed immediate field, copies a sign extended signed immediate field value to an operand location in memory. The size of the operand is determined by the opcode of the instruction. Preferably, the address of the operand is determined by adding a displacement field of the instruction to a value associated with a register field of the instruction.07-16-2009
20090138678Multifunction Hexadecimal Instruction Form System and Program Product - A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.05-28-2009
20110225395DATA PROCESSING SYSTEM AND CONTROL METHOD THEREOF - In a data processing system which includes a processor performing a processing in correspondence with a fetched instruction and a DRC capable of dynamically reconfiguring a circuit configuration in correspondence with configuration data, when the processor fetches the instruction, a configuration data decoder identifies whether or not the instruction is a configuration data instruction. When the instruction is the configuration data instruction, the configuration data is read from the configuration data memory in which the configuration data is housed and supplied to the DRC based on address information included in the configuration data instruction, thereby to enable the configuration data to be supplied to the DRC at a timing the same as a timing at which the instruction is fetched, so that the configuration data can be supplied at a high speed.09-15-2011
20090198963COMPLETION OF ASYNCHRONOUS MEMORY MOVE IN THE PRESENCE OF A BARRIER OPERATION - A method within a data processing system by which a processor executes an asynchronous memory move (AMM) store (ST) instruction to complete a corresponding AMM operation in parallel with an ongoing (not yet completed), previously issued barrier operation. The processor receives the AMM ST instruction after executing the barrier operation (or SYNC instruction) and before the completion of the barrier operation or SYNC on the system fabric. The processor continues executing the AMM ST instruction, which performs a move in virtual address space and then triggers the generation of the AMM operation. The AMM operation proceeds while the barrier operation continues, independent of the processor. The processor stops further execution of all other memory access requests, excluding AMM ST instructions that are received after the barrier operation, but before completion of the barrier operation.08-06-2009
20100161941METHOD AND SYSTEM FOR IMPROVED FLASH CONTROLLER COMMANDS SELECTION - A system for selecting a subset of issued flash storage commands to improve processing time for command execution. A plurality of ports stores a first plurality of command identifiers and are associated with the plurality of ports. Each of the first plurality of arbiters selects an oldest command identifier among command identifiers within each corresponding port resulting in a second plurality of command identifiers. A second arbiter makes a plurality of selections from the second plurality of command identifiers based on command identifier age and the priority of the port. A session identifier queue stores commands associated with the plurality of selections among other commands forming a third plurality of commands. A microcontroller selects an executable command from the third plurality of commands for execution based on an execution optimization heuristic. After execution of the command, the command identifier in the port is cleared.06-24-2010
20120079241INSTRUCTION EXECUTION BASED ON OUTSTANDING LOAD OPERATIONS - One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, such as whether outstanding load operations have been executed. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.03-29-2012
20110231632SEMICONDUCTOR DEVICE - Receiving a request for canceling setting, a control circuit erases data stored in a corresponding block, changes a value of a protection flag, and cancels protection setting. When an overall protection is set for any block, the control circuit prohibits access to all blocks, except when it is an operation mode for activating a memory program contained in the microcomputer. Further, control circuit permits an access to a block M only when partial protection is set, CPU is in the mode for activating a memory program contained in the microcomputer and the access is for reading an instruction code in accordance with an instruction fetch.09-22-2011
20090089547SYSTEM AND METHOD FOR MONITORING DEBUG EVENTS - A system has a pipelined processor for executing a plurality of instructions by sequentially fetching, decoding, executing and writing results associated with execution of each instruction. Debug circuitry is coupled to the pipelined processor for monitoring execution of the instructions to determine when a debug event occurs. The debug circuitry generates a debug exception to interrupt instruction processing flow. The debug circuitry has control circuitry for indicating a number of instructions, if any, that complete instruction execution between an instruction that caused the debug event and a point in instruction execution when the exception is taken.04-02-2009
20090198962DATA PROCESSING SYSTEM, PROCESSOR AND METHOD OF DATA PROCESSING HAVING BRANCH TARGET ADDRESS CACHE INCLUDING ADDRESS TYPE TAG BIT - In at least one embodiment, a processor includes an execution unit and instruction sequencing logic that fetches instructions from a memory system for execution by the execution unit. The instruction sequencing logic includes branch logic that outputs predicted branch target addresses for use as instruction fetch addresses. The branch logic includes a branch target address prediction circuitry concurrently holding a first entry providing storage for a first branch target address prediction associating a first instruction fetch address with a first branch target address to be used as an instruction fetch address and a second entry providing storage for a second branch target address prediction associating the first instruction fetch address with a different second branch target address. The first entry indicates a first instruction address type for the first instruction fetch address, and the second entry indicates a second instruction address type for the first instruction fetch address.08-06-2009
20110225394INSTRUCTION BREAKPOINTS IN A MULTI-CORE, MULTI-THREAD NETWORK COMMUNICATIONS PROCESSOR ARCHITECTURE - Described embodiments provide a packet classifier for a network processor that generates tasks corresponding to each received packet. The packet classifier includes a scheduler to generate threads of contexts corresponding to tasks received by the packet classifier from a plurality of processing modules of the network processor. A multi-thread instruction engine processes instructions corresponding to threads received from the scheduler. The multi-thread instruction engine executes instructions by fetching an instruction of the thread from an instruction memory of the packet classifier and determining whether a breakpoint mode of the network processor is enabled. If the breakpoint mode is enabled, and breakpoint indicator of the fetched instruction is set, the packet classifier enters a breakpoint mode. Otherwise, if the breakpoint indicator of the fetched instruction is not set, the multi-thread instruction engine executes the fetched instruction.09-15-2011
20090210660Prioritising of instruction fetching in microprocessor systems - A method and system are provided for prioritising the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer 08-20-2009
20090222645METRIC FOR SELECTIVE BRANCH TARGET BUFFER (BTB) ALLOCATION - A method and data processing system allocates entries in a branch target buffer (BTB). Instructions are fetched from a plurality of instructions and one of the plurality of instructions is determined to be a branch instruction. A corresponding branch target address is determined. A determination is made whether the branch target address is stored in a branch target buffer (BTB). When the branch target address is not stored in the branch target buffer, an entry in the branch target buffer is identified for allocation to receive the branch target address based upon stored metrics such as data processing cycle saving information and branch prediction state. In one form the stored metrics are stored in predetermined fields of the entries of the BTB.09-03-2009
20090249033Data processing apparatus and method for handling instructions to be executed by processing circuitry - A data processing apparatus and method are provided for handling instructions to be executed by processing circuitry. The processing circuitry has a plurality of processor states, each processor state having a different instruction set associated therewith. Pre-decoding circuitry receives the instructions fetched from the memory and performs a pre-decoding operation to generate corresponding pre-decoded instructions, with those pre-decoded instructions then being stored in a cache for access by the processing circuitry. The pre-decoding circuitry performs the pre-decoding operation assuming a speculative processor state, and the cache is arranged to store an indication of the speculative processor state in association with the pre-decoded instructions. The processing circuitry is then arranged only to execute an instruction in the sequence using the corresponding pre-decoded instruction from the cache if a current processor state of the processing circuitry matches the indication of the speculative processor state stored in the cache for that instruction. This provides a simple and effective mechanism for detecting instructions that have been corrupted by the pre-decoding operation due to an incorrect assumption of processor state.10-01-2009
20100180103MECHANISM FOR INCREASING THE EFFECTIVE CAPACITY OF THE WORKING REGISTER FILE - A computer processor pipeline has both an architectural register file and a working register file. The lifetime of an entry in the working register file is determined by a predetermined number of instructions passing through a specified stage in the pipeline after the location in the working register file is allocated for an instruction. The size of the working register file is selected based upon performance characteristics. A working register file creditor indicator is coupled to the front end pipeline portion and to the back end pipeline portion. The working register file credit indicator is monitored to prevent a working register file overflow. When the a location in the architectural register file is read early, the location is monitored to determine whether the location is written to prior to issuance of the instruction associated with the early read.07-15-2010
20100161942INFORMATION HANDLING SYSTEM INCLUDING A PROCESSOR WITH A BIFURCATED ISSUE QUEUE - An information handling system includes a processor with a bifurcated unified issue queue that may perform unified issue queue VSU store instruction dependency operations. The bifurcated unified issue queue BUIQ maintains VSU store instructions in the form of internal operations data. The BUIQ includes a unified issue queue UIQ 06-24-2010
20080307201Method and Apparatus for Cooperative Software Multitasking In A Processor System with a Partitioned Register File - A processor system executes multiple applet programs within a software application program in an information handling system. The information handling system includes operating system software that manages processor system hardware and software in a multi-tasking environment. In particular, the operating system software manages partitioning of a register file in the processor system to achieve a cooperative relationship among multiple applet programs within respective partitions of the register file. In one embodiment, the operating system software manages unique applet ID's to modify register file partition sizes and locations during applet program instruction text execution. In one embodiment, applet ID masking hardware provides sharing of register file space among multiple copies of applet program code.12-11-2008
20100180102Enhancing processing efficiency in large instruction width processors - A processor includes one or more processing units, an execution pipeline and control circuitry. The execution pipeline includes at least first and second pipeline stages that are cascaded so that program instructions, specifying operations to be performed by the processing units in successive cycles of the pipeline, are fetched from a memory by the first pipeline stage and conveyed to the second pipeline stage, which causes the processing units to perform the specified operations.07-15-2010
20100161943PROCESSOR CAPABLE OF POWER CONSUMPTION SCALING - The present invention relates to a processor capable of power consumption scaling, and more particularly, to a technique that variably controls the energy consumption of a processor according to the energy capacity being supplied by providing a pipeline register with a bypass function so as to control the operating frequency of the processor.06-24-2010
20100262806Tracking Effective Addresses in an Out-of-Order Processor - Mechanisms, in a data processing system, are provided for tracking effective addresses through a processor pipeline of the data processing system. The mechanisms comprise logic for fetching an instruction from an instruction cache and associating, by an effective address table logic in the data processing system, an entry in an effective address table (EAT) data structure with the fetched instruction. The mechanisms further comprise logic for associating an effective address tag (eatag) with the fetched instruction, the eatag comprising a base eatag that points to the entry in the EAT and an eatag offset. Moreover, the mechanisms comprise logic for processing the instruction through the processor pipeline by processing the eatag.10-14-2010
20100185834Data Storing Method and Processor Using the Same - A data storing method applied to a processor having a pipelined processing unit is provided. The pipelined processing unit includes stages. The stages include a source operand fetch stage and a write-back stage. The method includes the following steps. Firstly, a storing instruction is fetched and decoded. Next, the storing instruction is entered to the source operand fetch stage, and whether there is a late-done instruction in the pipelined processing unit is determined. The late-done instruction not lagged behind the storing instruction generates a late-coming result before entering the write-back stage. If it is determined that there is a late-done instruction in the pipelined processing unit, then the late-coming result is fetched before the storing instruction is entered to the write-back stage. Thereafter, the storing instruction is entered to the write-back stage, and the late-coming result is stored to a target memory which the storing instruction corresponds to.07-22-2010
20100228953REDUCING DATA HAZARDS IN PIPELINED PROCESSORS TO PROVIDE HIGH PROCESSOR UTILIZATION - A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.09-09-2010
20100153688APPARATUS AND METHOD FOR DATA PROCESS - An exemplary aspect of the present invention is a data processing apparatus for processing a loop in a pipeline that includes an instruction memory and a fetch circuit that fetches an instruction stored in the instruction memory. The fetch circuit includes an instruction queue that stores an instruction to be output from the fetch circuit, an evacuation queue that stores an instruction fetched from the instruction memory, a selector that selects one of the instruction output from the instruction queue and the instruction output from the evacuation queue, and a loop queue that stores the instruction selected by the selector and outputs to the instruction queue.06-17-2010
20100211762Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors - A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.08-19-2010
20100241832Instruction fetching following changes in program flow - This application is concerned with a device and method for fetching instructions from a data store for processing by a data processor. The device comprises: a register for storing an address of an instruction to be processed by said data processor; a fetch unit responsive to an address input to said fetch unit to fetch an instruction stored at said address; an adder for adding a predetermined amount to said address stored in said register prior to sending said address to said fetch unit, said predetermined amount determining a position in a program flow said fetched instruction has with respect to said instruction addressed in said register; said adder being responsive to detection of a change in program flow to reset said predetermined amount to an initial value, and to increase said predetermined amount for subsequent fetches by an amount equal to the separation between addresses such that consecutive addresses are fetched up to a maximum predetermined amount.09-23-2010
20080263326METHOD AND APPARATUS FOR AN EFFICIENT MULTI-PATH TRACE CACHE DESIGN - A novel trace cache design and organization to efficiently store and retrieve multi-path traces. A goal is to design a trace cache, which is capable of storing multi-path traces without significant duplication in the traces. Furthermore, the effective access latency of these traces is reduced.10-23-2008
20100211761DIGITAL SIGNAL PROCESSOR (DSP) WITH VECTOR MATH INSTRUCTION - In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. A vector math instruction decoded by the instruction decode unit causes input vectors and output vectors to be aligned with a maximum boundary of the register set and causes parallel operations by the work units.08-19-2010
20120144163DATA PROCESSING METHOD AND SYSTEM BASED ON PIPELINE - A data processing system and method are disclosed. The system comprises an instruction-fetch stage where an instruction is fetched and a specific instruction is input into decode stage; a decode stage where said specific instruction indicates that contents of a register in a register file are used as an index, and then, the register file pointed to by said index is accessed based on said index; an execution stage where an access result of said decode stage is received, and computations are implemented according to the access result of the decode stage.06-07-2012
20090327657GENERATING AND PERFORMING DEPENDENCY CONTROLLED FLOW COMPRISING MULTIPLE MICRO-OPERATIONS (uops) - A processor to perform an out-of-order (OOO) processing in which a reservation station (RS) may generate and process a dependency controlled flow comprising multiple micro-operations (uops) with specific clock based dispatch scheme. The RS may either combine two or more uops into a single RS entry or make a direct connection between two or more RS entries. The RS may allow more than two source values to be associated with a single RS by combining sources from the two or more uops. One or more execution units may be provisioned to perform the function defined by the uops. The execution units may receive more than two sources at a given time point and produce two or more results on different ports.12-31-2009
20090327658COMPARE, SWAP AND STORE FACILITY WITH NO EXTERNAL SERIALIZATION - A compare, swap and store facility is provided that does not require external serialization. A compare and swap operation is performed using an interlocked update operation. If the comparison indicates equality, a store operation is performed. The compare, swap and store operations are performed as a single unit of operation.12-31-2009
20090254734Partial Load/Store Forward Prediction - In one embodiment, a processor comprises a prediction circuit and another circuit coupled to the prediction circuit. The prediction circuit is configured to predict whether or not a first load instruction will experience a partial store to load forward (PSTLF) event during execution. A PSTLF event occurs if a plurality of bytes, accessed responsive to the first load instruction during execution, include at least a first byte updated responsive to a previous uncommitted store operation and also include at least a second byte not updated responsive to the previous uncommitted store operation. Coupled to receive the first load instruction, the circuit is configured to generate one or more load operations responsive to the first load instruction. The load operations are to be executed in the processor to execute the first load instruction, and a number of the load operations is dependent on the prediction by the prediction circuit.10-08-2009
20120173848PIPELINE FLUSH FOR PROCESSOR THAT MAY EXECUTE INSTRUCTIONS OUT OF ORDER - An embodiment of an instruction pipeline includes first and second sections. The first section is operable to provide first and second ordered instructions, and the second section is operable, in response to the second instruction, to read first data from a data-storage location, is operable, in response to the first instruction, to write second data to the data-storage location after reading the first data, and is operable, in response to the writing the second data after reading the first data, to cause the flushing of a some, but not all, of the pipeline. Such an instruction pipeline may reduce the processing time lost and the energy expended due to a pipeline flush by flushing only a portion of the pipeline instead of flushing the entire pipeline.07-05-2012
20090113179OPERATIONAL PROCESSING APPARATUS, PROCESSOR, PROGRAM CONVERTING APPARATUS AND PROGRAM - The present invention provides an operational processing apparatus which can guarantee a period for executing instructions in the shortest cycle when the operational processing apparatus synchronizes with a hardware accelerator. A processor in the present invention simultaneously issues and executes instructions including instruction groups having a simultaneously issueable instruction. The processor executes a program including a specific instruction. The specific instruction instructs to exclude an instruction subsequent to the specific instruction out of the instruction groups including the specific instruction, and to suspend issuing the instruction subsequent to the specific instruction only during a predetermined period immediately after the specific instruction is issued.04-30-2009
20090113178Microprocessor based on event-processing instruction set and event-processing method using the same - Provided are a microprocessor based on event-processing instruction set and an event-processing method using the same. The microprocessor includes an event register controlling an event according to an event-processing instruction set provided in an instruction set architecture (ISA) and an event controller transmitting externally generated events into the microprocessor. Therefore, the microprocessor may be useful to reduce its unnecessary power consumption by suspending the execution of its program when an instruction decoded to execute the program is an event-processing instruction, and also to cut off its unnecessary power consumption that is caused for an interrupt delay period since the program of the microprocessor may be executed again by immediately re-running the microprocessor with the operation of the event register and the event controller when external events are generated.04-30-2009
20080276069METHOD AND APPARATUS FOR PREDICTIVE DECODING - Predictive decoding is achieved by fetching an instruction, accessing a predictor containing predictor information including prior instruction execution characteristics, obtaining predictor information for the fetched instruction from the predictor; and generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction. The decode operation stream is selected based on the predictor information.11-06-2008
20090106533DATA PROCESSING APPARATUS - The data processing apparatus includes two or more execution resources, each enabling a predetermined process for executing an instruction. The execution resources enable a pipeline process. Each execution resource treats instructions according to an in-order system following the instructions' flow order in case that the execution resource is in charge of the instructions. Also, each execution resource treats instructions according to an out-of-order system regardless of the instructions' flow order in case that the instructions are treated by different execution resources. Thus, local processes in the execution resources can be simplified and materialized in a small-scale of hardware. Consequently, the need for the whole synchronization in processing across execution resources is eliminated, and the locality of processes and the efficiency of electric power are increased.04-23-2009
20090037696PROCESSOR - A processor (02-05-2009
20090070555DEVICE AND METHOD FOR FINDING EXTREME VALUES IN A DATA BLOCK - A method for locating an extreme value data chunk within a data block, the method includes: fetching, by a processor, an instruction; fetching, in response to a content of the instruction, a data unit that comprises multiple data chunks; selectively masking the fetched data chunks in response to a value of a mask; comparing, by a hardware accelerator, between values of valid data chunks to provide a extreme value data chunk; wherein valid data chunks include un-masked data chunks that belong to the data block; updating the value of the mask and jumping to the stage of fetching a new data unit, until the whole data block is fetched.03-12-2009
20090070554Register File System and Method for Pipelined Processing - The present disclosure includes a multi-threaded processor that includes a first register file associated with a first thread and a second register file associated with a second thread. At least one hardware resource is shared by the first and second register files. In addition, the first thread may have a pipeline access position that is non-sequential to the second thread. A method of accessing a plurality of register files is also disclosed. The method includes reading data from a first register file while concurrently reading data from a second register file. The first register file is associated with a first instruction stream and the second register file is associated with a second instruction stream. The first instruction stream is sequential to the second instruction stream in an execution pipeline of a processor, and the first register file is in a non-adjacent location with respect to the second register file.03-12-2009
20130138924EFFICIENT MICROCODE INSTRUCTION DISPATCH - An apparatus and method for avoiding bubbles and maintaining a maximum instruction throughput rate when cracking microcode instructions. A lookahead pointer scans the newest entries of a dispatch queue for microcode instructions. A detected microcode instruction is conveyed to a microcode engine to be cracked into a sequence of micro-ops. Then, the sequence of micro-ops is placed in a queue, and when the original microcode instruction entry in the dispatch queue is selected for dispatch, the sequence of micro-ops is dispatched to the next stage of the processor pipeline.05-30-2013
20130145123Computing Core Application Access Utilizing Dispersed Storage - A computing core includes a processing module, main memory, and a memory controller. The memory controller receives a request to fetch an instruction from the processing module and determines whether the instruction is currently stored in the main memory. When the instruction is not currently stored in the main memory, the memory controller determines whether the instruction is stored in a distributed storage network (DSN) memory as one or more sets of encoded instruction slices; and, when it is, the memory controller addresses the DSN memory to retrieve the one or more sets of encoded instruction slices. When at least a threshold number of encoded instruction slices are retrieved for each of the one or more sets of encoded instruction slices, the one or more sets of encoded instruction slices are decoded using a dispersed storage error coding function to reconstruct the instruction, which is provided to the processing module.06-06-2013
20110029758CENTRAL PROCESSING UNIT MEASUREMENT FACILITY - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.02-03-2011
20110087862Multiprocessor resource optimization - Embodiments include a device and a method. In an embodiment, a method applies a first resource management strategy to a first resource associated with a first processor and executes an instruction block in a first processor. The method also applies a second resource management strategy to a second resource of a similar type as the first resource and executes the instruction block in a second processor. The method further selects a resource management strategy likely to provide a substantially optimum execution of the instruction group from the first resource management strategy and the second resource management strategy.04-14-2011
20090037695DATA FETCH CIRCUIT AND METHOD THEREOF - A data fetch circuit and a method thereof are provided. A multi-phase clock signal is generated according to an input clock, and an input data is over-sampled according to the multi-phase clock signal in order to detect transition points of the input data. One of reference phases of the multi-phase clock signal is selected according to the detected transition point for fetching the input data and obtaining enough setup/hold time margin. Accordingly, appropriate data fetch points is found without complicated negative feed-back mechanism. Besides, a periodical monitoring mechanism may be further adopted for improving the accuracy of data fetch.02-05-2009
20100037037METHOD FOR INSTRUCTION PIPELINING ON IRREGULAR REGISTER FILES - A method for pipelining instructions on a PAC processor includes determining a minimum initial interval, and grouping the instructions so that the operands of dependent instructions are assigned to the same local register file. The virtual registers of the instructions that have data dependency across the first functional unit and the second functional unit are assigned to a global register file. The instructions are then modulo scheduled based on a current value of initial interval. The virtual registers of the scheduled instructions are allocated to the corresponding register files. If the allocation fails, a set of virtual registers is transferred from the first or second register file to the global register file.02-11-2010
20100037036METHOD TO IMPROVE BRANCH PREDICTION LATENCY - An apparatus to generate a branch prediction of an instruction based at least in part on the address of the previous branch instruction, wherein the previous instruction is prior to the instruction in a program order. The prediction can also based on a branch history value with respect to the previous branch instruction and one or more previous branch predictions.02-11-2010
20090217002SYSTEM AND METHOD FOR PROVIDING ASYNCHRONOUS DYNAMIC MILLICODE ENTRY PREDICTION - A system and method for asynchronous dynamic millicode entry prediction in a processor are provided. The system includes a branch target buffer (BTB) to hold branch information. The branch information includes: a branch type indicating that the branch represents a millicode entry (mcentry) instruction targeting a millicode subroutine, and an instruction length code (ILC) associated with the mcentry instruction. The system also includes search logic to perform a method. The method includes locating a branch address in the BTB for the mcentry instruction targeting the millicode subroutine, and determining a return address to return from the millicode subroutine as a function of the an instruction address of the mcentry instruction and the ILC. The system further includes instruction fetch controls to fetch instructions of the millicode subroutine asynchronous to the search logic. The search logic may also operate asynchronous with respect to an instruction decode unit.08-27-2009
20090217003METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR REDUCING CACHE MEMORY POLLUTION - A method for reducing cache memory pollution including fetching an instruction stream from a cache line, preventing a fetching for the instruction stream from a sequential cache line, searching for a next predicted taken branch instruction, determining whether a length of the instruction stream extends beyond a length of the cache line based on the next predicted taken branch instruction, continuing preventing the fetching for the instruction stream from the sequential cache line if the length of the instruction stream does not extend beyond the length of the cache line, and allowing the fetching for the instruction stream from the sequential cache line if the length of the instruction stream extends beyond the length of the cache line, whereby the fetching from the sequential cache line and a resulting polluting of a cache memory that stores the instruction stream is minimized. A corresponding system and computer program product.08-27-2009
20080256334Processing System and Method for Executing Instructions - A processing system for executing instructions comprises a first part (10-16-2008
20110252221MICROCOMPUTER AND INTERRUPT CONTROL METHOD - A microcomputer includes: a plurality of register lists having a plurality of register patterns, respectively, wherein each of plurality of register patterns designates registers, data of which are to be saved in a data memory; an instruction fetch control circuit configured to fetch instruction code from an instruction memory in response to an interrupt request issued based on occurrence of an interrupt factor; and a register data saving control circuit configured to acquire one register pattern from one of the plurality of register lists in response to the interrupt request, and issue a microinstruction based on the acquired register pattern in response to the interrupt request. An instruction executing section is configured to execute the microinstruction prior to the fetched instruction code, to save the data of registers designated based on the acquired register pattern in the data memory.10-13-2011
20100125719Instruction Target History Based Register Address Indexing - A circuit arrangement and method support instruction target history based register address indexing, whereby register addresses to be used by an instruction are decoded using a target history table of previous target register addresses, and an index into the target history table supplied by an index value in the instruction. An instruction may include at least one index value that identifies a previously used register address. During execution of the instruction, the index is retrieved from the instruction, and then a register address is retrieved from the target history table using the index.05-20-2010
20120303934METHOD AND APPARATUS FOR GENERATING AN ENHANCED PROCESSOR RESYNC INDICATOR SIGNAL USING HASH FUNCTIONS AND A LOAD TRACKING UNIT - A method and apparatus are described for generating a signal to resync a processor. In one embodiment, a particular load operation is picked from a load queue in the processor, and the particular load operation is completed out of order with respect to other load operations in the load queue. A load ordering block (LOB) in the processor receives a physical address of the completed load operation, and receives a probe data address that indicates an address of a requested data line. The LOB generates a signal to resync the processor when the physical address of the completed load operation matches the probe data address, (i.e., when bits, that have been set in a bit vector (e.g., Bloom filter) of the LOB by hashing the physical address of the completed load operation, match bits generated by hashing the probe data address).11-29-2012
20110072242CONFIGURABLE PROCESSING APPARATUS AND SYSTEM THEREOF - A configurable processing apparatus includes a plurality of processing units, at least an instruction synchronization control circuit, and at least a configuration memory. Each processing apparatus has a stall-output signal generating circuit to output a stall-output signal, wherein the stall-output signal indicates that an unexpected stall is occurred in the processing unit. The processing unit has a stall-in signal, and an external circuit of the processing unit can control whether the processing unit is stalled according to the stall-in signal. The instruction synchronization control circuit generates the stall-in signals to the processing units in response to a content stored in the configuration memory and the stall-output signals of the processing units, so as to determine operation modes and instruction synchronization of the processing units.03-24-2011
20120066478METHOD FOR FAST PARALLEL INSTRUCTION LENGTH DETERMINATION - The present invention provides a method and apparatus that may be used for parallel instruction length decoding. One embodiment of the method includes concurrently determining a plurality of masks identifying bytes in a plurality of candidate instructions. Each mask uses a different byte in a first fetch window as a starting byte and the corresponding one of the plurality of candidate instructions includes the starting byte. This embodiment of the method also includes selecting one of the masks to identify one of the candidate instructions as a first instruction using information indicating an ending byte of a previous instruction.03-15-2012
20100262807Partial Flush Handling with Multiple Branches Per Group - Mechanisms are provided for partial flush handling with multiple branches per instruction group. The instruction fetch unit sorts instructions into groups. A group may include a floating branch instruction and a boundary branch instruction. For each group of instructions, the instruction sequencing unit creates an entry in a global completion table (GCT), which may also be referred to herein as a group completion table. The instruction sequencing unit uses the GCT to manage completion of instructions within each outstanding group. Because each group may include up to two branches, the instruction sequencing unit may dispatch instructions beyond the first branch, i.e. the floating branch. Therefore, if the floating branch results in a misprediction, the processor performs a partial flush of that group, as well as a flush of every group younger than that group.10-14-2010
20100058032Effective Use of a BHT in Processor Having Variable Length Instruction Set Execution Modes - In a processor executing instructions in at least a first instruction set execution mode having a first minimum instruction length and a second instruction set execution mode having a smaller, second minimum instruction length, line and counter index addresses are formed that access every counter in a branch history table (BHT), and reduce the number of index address bits that are multiplexed based on the current instruction set execution mode. In one embodiment, counters within a BHT line are arranged and indexed in such a manner that half of the BHT can be powered down for each access in one instruction set execution mode.03-04-2010
20120204004Processor with a Hybrid Instruction Queue - A queuing apparatus having a hierarchy of queues, in one of a number of aspects, is configured to control backpressure between processors in a multiprocessor system. A fetch queue is coupled to an instruction cache and configured to store first instructions for a first processor and second instructions for a second processor in an order fetched from the instruction cache. An in-order queue is coupled to the fetch queue and configured to store the second instructions accepted from the fetch queue in response to a write indication. An out-of-order queue is coupled to the fetch queue and to the in-order queue and configured to store the second instructions accepted from the fetch queue in response to an indication that space is available in the out-of-order queue, wherein the second instructions may be accessed out-of-order with respect to other second instructions executing on different execution pipelines.08-09-2012
20100005276INFORMATION PROCESSING DEVICE AND METHOD OF CONTROLLING INSTRUCTION FETCH - An information processing device includes an instruction fetch unit, an instruction buffer, an instruction executing unit, and an instruction fetch control unit. The instruction fetch unit supplies a fetch address to an instruction memory. The instruction buffer stores an instruction read out from the instruction memory. The instruction executing unit decodes and executes the instruction supplied from the instruction buffer. The instruction fetch control unit stops supply of the fetch address to the instruction memory by the instruction fetch unit when the fetch address corresponds to a first address or an address after the first address while the instruction executing unit executes loop processing. The loop processing is repeatedly executed for a predetermined number of times in accordance with decoding of the loop instruction by the instruction executing unit. The first address is an address after an address of an end instruction included in the loop processing.01-07-2010
20120204005Processor with a Coprocessor having Early Access to Not-Yet Issued Instructions - Apparatus and methods provide early access of instructions. A fetch queue is coupled to an instruction cache and configured to store a mix of processor instructions for a first processor and coprocessor instructions for a second processor. A coprocessor instruction selector is coupled to the fetch queue and configured to copy coprocessor instructions from the fetch queue. A queue is coupled to the coprocessor instruction selector and from which coprocessor instructions are accessed for execution before the coprocessor instruction is issued to the first processor. Execution of the copied coprocessor instruction is started in the coprocessor before the coprocessor instruction is issued to a processor. The execution of the copied coprocessor instruction is completed based on information received from the processor after the coprocessor instruction has been issued to the processor.08-09-2012
20090240918METHOD, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR ELIMINATING OR REDUCING OPERAND LINE CROSSING PENALTY - Eliminating or reducing an operand line crossing penalty by performing an initial fetch for an operand from a data cache of a processor. The initial fetch is performed by allowing or permitting the initial fetch to occur unaligned with reference to a quadword boundary. A plurality of subsequent fetches for a corresponding plurality of operands from the data cache are performed wherein each of the plurality of subsequent fetches is aligned to any of a plurality of quadword boundaries to prevent each of a plurality of individual fetch requests from spanning a plurality of lines in the data cache. A steady stream of data is maintained by placing an operand buffer at an output of the data cache to store and merge data from the initial fetch and the plurality of subsequent fetches, and to return the stored and merged data to the processor.09-24-2009
20110153986PREDICTING AND AVOIDING OPERAND-STORE-COMPARE HAZARDS IN OUT-OF-ORDER MICROPROCESSORS - A method and information processing system manage load and store operations executed out-of-order. At least one of a load instruction and a store instruction is executed. A determination is made that an operand store compare hazard has been encountered. An entry within an operand store compare hazard prediction table is created based on the determination. The entry includes at least an instruction address of the instruction that has been executed and a hazard indicating flag associated with the instruction. The hazard indicating flag indicates that the instruction has encountered the operand store compare hazard. When a load instruction is associated with the hazard indicating flag the load instruction becomes dependent upon all store instructions associated with a substantially similar flag.06-23-2011
20080320281PROCESSING MODULE WITH MMW TRANSCEIVER INTERCONNECTION - A processing module includes a fetch and decode module, an instruction register, a data register, an execution module, and a MMW transceiver section. The fetch and decode module is operable to fetch and decode an instruction of a program and to identify data associated with the instruction. The execution module is operable to execute the instruction upon the data associated with the instruction. The MMW transceiver section is operable to wirelessly receive at least one of the instruction and the data associated with the instruction from memory.12-25-2008
20080229067DATA POINTERS WITH FAST CONTEXT SWITCHING - An apparatus and method are disclosed for multiple data pointer registers and a means for quickly switching active context between the data pointer registers.09-18-2008
20110161630GENERAL PURPOSE HARDWARE TO REPLACE FAULTY CORE COMPONENTS THAT MAY ALSO PROVIDE ADDITIONAL PROCESSOR FUNCTIONALITY - An apparatus and method is described herein for replacing faulty core components. General purpose hardware is provided to replace core pipeline components, such as execution units. In the embodiment of execution unit replacement, a proxy unit is provided, such that mapping logic is able to map instruction/operations, which correspond to faulty execution units, to the proxy unit. As a result, the proxy unit is able to receive the operations, send them to general purpose hardware for execution, and subsequently write-back the execution results to a register file; it essentially replaces the defective execution unit allowing a processor with defective units to be sold or continue operation.06-30-2011
20120151185FINE-GRAINED PRIVILEGE ESCALATION - A processor and a method for privilege escalation in a processor are provided. The method may comprise fetching an instruction from a fetch address, where the instruction requires the processor to be in supervisor mode for execution, and determining whether the fetch address is within a predetermined address range. The instruction is filtered through an instruction mask and then it is determined whether the instruction, after being filtered through the mask, equals the value in an instruction value compare register. The processor privilege is raised to supervisor mode for execution of the instruction in response to the fetch address being within the predetermined address range and the filtered instruction equaling the value in the instruction value compare register, wherein the processor privilege is raised to supervisor mode without use of an interrupt. The processor privilege returns to its previous level after execution of the instruction.06-14-2012
20080256333SYSTEM AND METHOD FOR IGNORING FETCH PROTECTION - A system, method, and program product is provided that receives an instruction to fetch data from a data page. The data page is associated with a storage key and a fetch protection bit, and the instruction is pointed to by the program status word (PSW) that includes a PSW key and an ignore fetch protection bit. The data is fetched from the data page when the PSW key is a non-zero value, the PSW key is different than the storage key, and both the fetch protection bit and the ignore fetch protection bit are set ON. However, the data is not fetched from the data page when the PSW key is a non-zero value, the PSW key is different than the storage key, the fetch protection bit is ON, and the ignore fetch protection bit is OFF.10-16-2008
20080250228Integrated circuit with restricted data access - A semiconductor integrated circuit includes a hardware mechanism arranged to ensure that associations between instructions and data are enforced so that a processor cannot fetch data from an instruction that is not authorised to do so. A Memory Protection Unit stores entries comprising instructions and associated data memory ranges. A hardware arrangement impairs the operation of the circuit if the CPU attempts to make a data fetch from an instruction that is outside the range associated with data in a Memory Protection Unit. Such functioning may be by issuing a chip reset. The Memory Protection Unit may be implemented in a Memory Management Unit having an extension so as to store a validity flag. The validity flag may only be set by a secure process such as the CPU well entrusted code or by a separate trusted hardware source. In this way, an operating system may function as normal referring to the Memory Management Unit as necessary, but security may be enforced through hardware.10-09-2008
20080229068ADAPTIVE FETCH GATING IN MULTITHREADED PROCESSORS, FETCH CONTROL AND METHOD OF CONTROLLING FETCHES - A multithreaded processor, fetch control for a multithreaded processor and a method of fetching in the multithreaded processor. Processor event and use (EU) signals are monitored for downstream pipeline conditions indicating pipeline execution thread states. Instruction cache fetches are skipped for any thread that is incapable of receiving fetched cache contents, e.g., because the thread is full or stalled. Also, consecutive fetches may be selected for the same thread, e.g., on a branch mis-predict. Thus, the processor avoids wasting power on unnecessary or place keeper fetches.09-18-2008
20110179254LIMITING SPECULATIVE INSTRUCTION FETCHING IN A PROCESSOR - The described embodiments relate to a processor that speculatively executes instructions. During operation, the processor often executes instructions in a speculative-execution mode. Upon detecting an impending pipe-clearing event while executing instructions in the speculative-execution mode, the processor stalls an instruction fetch unit to prevent the instruction fetch unit from fetching instructions. In some embodiments, the processor stalls the instruction fetch unit until a condition that originally caused the processor to operate in the speculative-execution mode is resolved. In alternative embodiments, the processor maintains the stall of the instruction fetch unit until the pipe-clearing event has been completed (i.e., has been handled in the processor).07-21-2011
20100293357Method and apparatus for providing platform independent secure domain - A platform independent secure domain providing apparatus, which determines whether an execution environment is to be in a secure domain and a non-secure domain by a secure bit. The apparatus includes a secure monitor that is adapted to generate a branch instruction when a call to a secure code is sensed, turn on the secure bit when the branch instruction has been successfully executed, and turn off the secure bit when the execution of the secure code is finished, an instruction bypass read only memory (ROM) adapted to receive the branch instruction from the secure monitor, and a processor adapted to execute the branch instruction that is fetched from the instruction bypass ROM.11-18-2010
20100312991Microprocessor with Compact Instruction Set Architecture - A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions, and the re-encoded ISA can maintain assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions, including instructions with encoded arguments determined by statistical analysis and instructions that have the effect of combinations of instructions.12-09-2010
20120311304PROCESSOR, COMPUTER PRODUCT, COMPRESSION APPARATUS, AND COMPRESSION METHOD - A processor accesses memory storing a compressed instruction sequence that includes compression information indicating that an instruction that with respect to the preceding instruction, has identical operation code and operand continuity is compressed. The processor includes a fetcher that fetches a bit string from the memory and determines whether the bit string is a non-compressed instruction, where if so, transfers the given bit string and if not, transfers the compression information; and a decoder that upon receiving the non-compressed instruction, holds in a buffer, instruction code and an operand pattern of the non-compressed instruction and executes processing to set to an initial value, the value of an instruction counter that indicates a count of consecutive instructions having identical operation code and operand continuity, and upon receiving the compression information, restores the instruction code based on the instruction code held in the buffer, the instruction counter value, and the operand pattern.12-06-2012
20120151186CONTROLLING SIMULATION OF A MICROPROCESSOR INSTRUCTION FETCH UNIT THROUGH MANIPULATION OF INSTRUCTION ADDRESSES - Instruction fetch unit (IFU) verification is improved by dynamically monitoring the current state of the IFU model and detecting any predetermined states of interest. The instruction address sequence is automatically modified to force a selected address to be fetched next by the IFU model. The instruction address sequence may be modified by inserting one or more new instruction addresses, or by jumping to a non-sequential address in the instruction address sequence. In exemplary implementations, the selected address is a corresponding address for an existing instruction already loaded in the IFU cache, or differs only in a specific field from such an address. The instruction address control is preferably accomplished without violating any rules of the processor architecture by sending a flush signal to the IFU model and overwriting an address register corresponding to a next address to be fetched.06-14-2012
20110264892DATA PROCESSING DEVICE - Provided is a data processing device (10-27-2011
20110307686Method for Instructing a data processor to process data - A data processor which executes instructions described in first and second instruction formats. The first instruction format defines a register-addressing field of a predetermined size, while the second instruction format defines a register-addressing field of a size larger than that of the register-addressing field defined by the first instruction format. The data processor includes: instruction-type identifier, responsive to an instruction, for identifying the received instruction as being described in the first or second instruction format by the instruction itself; a first register file including a plurality of registers; and a second register file also including a plurality of registers, the number of the registers included in the second register file being larger than that of the registers included in the first register file.12-15-2011
20090172355INSTRUCTIONS WITH FLOATING POINT CONTROL OVERRIDE - Methods and apparatus relating to instructions with floating point control override are described. In an embodiment, floating point operation settings indicated by a floating point control register may be overridden on a per instruction basis. Other embodiments are also described.07-02-2009
20110167242Multiple instruction execution mode resource-constrained device - A resource-constrained device comprises a processor configured to execute multiple instruction streams comprising multiple instructions having an opcode and zero or more operands. Each of the multiple instruction streams is associated with one of multiple instruction execution modes having an instruction set comprising multiple instruction implementations. At least one of the multiple instruction implementations is configured to change the processor from a first instruction execution mode to a second instruction execution mode. The processor comprises an instruction fetcher configured to fetch an instruction from one of the multiple instruction streams based at least in part upon a current instruction execution mode.07-07-2011
20080215853System and Method for Line Rate Frame Processing Engine Using a Generic Instruction Set - A system comprises a frame parser and lookup engine operable to receive an incoming data frame, extract control data from payload data in the data frame, and using the control data to access a memory to fetch a plurality of instructions, a destination and tag management module operable to receive the fetched instructions and execute the instructions to transform the data frame control data, and an assemble module operable to assemble the transformed control data and the payload data.09-04-2008
20120023312SUBPROCESSOR, INTEGRATED CIRCUIT DEVICE, AND ELECTRONIC APPARATUS - A subprocessor, an integrated circuit device, and an electronic apparatus or the like capable of performing data processing efficiently are provided. A subprocessor is connected to a host processor through a bus controller. The subprocessor includes: a command fetch unit that fetches a command from a subprocessor program; a register unit; a command decoding unit that decodes the command; and an operation unit that performs command execution processing. The host processor sets a program counter value indicating a storage destination of the subprocessor program and a processing start command for, the processing of the subprocessor to the register unit. The command fetch unit fetches a command designated by the program counter value, the command decoding unit decodes the command, and the operation unit performs command execution processing.01-26-2012
20120023311PROCESSOR APPARATUS AND MULTITHREAD PROCESSOR APPARATUS - A processor apparatus according to the present invention is a processor apparatus which shares hardware resources between a plurality of processors, and includes: a first determination unit which determines whether or not a register in each of the hardware resources holds extension context data of a program that is currently executed; a second determination unit which determines to which processor the extension context data in the hardware resource corresponds; a first transfer unit which saves and restores the extension context data between programs in the processor; and a second transfer unit which saves and restores the extension context data between programs between different processors.01-26-2012
20120159124METHOD AND SYSTEM FOR COMPUTATIONAL ACCELERATION OF SEISMIC DATA PROCESSING - A computer-implemented method and a system for computational acceleration of seismic data processing are described. The method includes defining a specific non-uniform memory access (NUMA) scheduling for a plurality of cores in a processor according to data to be processed; and running two or more threads through each of the plurality of cores.06-21-2012
20120159125EFFICIENCY OF SHORT LOOP INSTRUCTION FETCH - A method, system and computer program product for instruction fetching within a processor instruction unit, utilizing a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. During instruction fetch, modified instruction buffers coupled to an instruction cache (I-cache) temporarily store instructions from a single branch, backwards short loop. The modified instruction buffers may be a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. Instructions are stored in the modified instruction buffers for the length of the loop cycle. The instruction fetch within the instruction unit of a processor retrieves the instructions for the short loop from the modified buffers during the loop cycle, rather than from the instruction cache.06-21-2012
20120072701Method Macro Expander - One embodiment of the present invention sets forth a [TODO once claims are reviewed]03-22-2012
20120072700MULTI-LEVEL REGISTER FILE SUPPORTING MULTIPLE THREADS - A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.03-22-2012
20110066827MULTIPROCESSOR - A multiprocessor of a single processor, including a pipeline processing unit which successively fetches an instruction sequence to be independently processed on each of the multiprocessor with a shifted phase in one cycle.03-17-2011
20110107062Interrupt Handling - Techniques for handling interrupts of multiple instruction threads within a multi-thread processing environment. The techniques include: interleavingly fetching and issuing instructions of (i) a first instruction execution thread and (ii) a second instruction thread for execution by an execution block of the multi-thread processing environment; providing a first interrupt signal via a first interrupt signal line within the multi-thread processing environment to interrupt fetching and issuing of instructions of the first instruction execution thread; and providing a second interrupt signal via a second interrupt signal line within the multi-thread processing environment to interrupt fetching and issuing of instructions of the second instruction execution thread. The first interrupt signal line and the second interrupt signal line are physically separate and distinct signal lines that are directly electrically coupled to one another.05-05-2011
20120124335System Core for Transferring Data Between an External Device and Memory - Details of a highly cost effective and efficient implementation of a manifold array (ManArray) architecture and instruction syntax for use therewith are described herein. Various aspects of this approach include the regularity of the syntax, the relative ease with which the instruction set can be represented in database form, the ready ability with which tools can be created, the ready generation of self-checking codes and parameterized test cases. Parameterizations can be fairly easily mapped and system maintenance is significantly simplified.05-17-2012
20090132790SYSTEM AND METHOD FOR PROCESSOR WITH PREDICTIVE MEMORY RETRIEVAL ASSIST - A system and method are described for a memory management processor which, using a table of reference addresses embedded in the object code, can open the appropriate memory pages to expedite the retrieval of information from memory referenced by instructions in the execution pipeline. A suitable compiler parses the source code and collects references to branch addresses, calls to other routines, or data references, and creates reference tables listing the addresses for these references at the beginning of each routine. These tables are received by the memory management processor as the instructions of the routine are beginning to be loaded into the execution pipeline, so that the memory management processor can begin opening memory pages where the referenced information is stored. Opening the memory pages where the referenced information is located before the instructions reach the instruction processor helps lessen memory latency delays which can greatly impede processing performance.05-21-2009
20120166767SYSTEM, APPARATUS, AND METHOD FOR SEGMENT REGISTER READ AND WRITE REGARDLESS OF PRIVILEGE LEVEL - Embodiments of systems, apparatuses, and methods for performing privilege agnostic segment base register read or write instruction are described. An exemplary method may include fetching the privilege agnostic segment base register write instruction, wherein the privilege agnostic write instruction includes a 64-bit data source operand, decoding the fetched privilege agnostic segment base register write instruction, and executing the decoded privilege agnostic segment base register write instruction to write the 64-bit data of the source operand into the segment base register identified by the opcode of the privilege agnostic segment base register write instruction.06-28-2012
20120216020INSTRUCTION SUPPORT FOR PERFORMING STREAM CIPHER - Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2′, and to perform a modular addition using a value R2 to produce a value R1'.08-23-2012
20100228952APPARATUS AND METHOD FOR FAST CORRECT RESOLUTION OF CALL AND RETURN INSTRUCTIONS USING MULTIPLE CALL/RETURN STACKS IN THE PRESENCE OF SPECULATIVE CONDITIONAL INSTRUCTION EXECUTION IN A PIPELINED MICROPROCESSOR - A microprocessor having a plurality of call/return stacks (CRS) correctly resolves a call or return instruction rather than issuing the instruction to execution units of the microprocessor to be resolved. The microprocessor fetches a call or return instruction and determines whether the instruction is the first call or return instruction fetched after fetching a conditional branch instruction that has yet to be resolved. The microprocessor copies the contents of a current CRS to another CRS and designates the other CRS as the current CRS, if the state exists. The microprocessor pushes the address of the next sequential instruction following the call instruction onto the current CRS and fetches an instruction at the call instruction target address if the instruction is a call instruction. The microprocessor pops a second return address from the current CRS and fetches an instruction at the second return address, if the instruction is a return instruction.09-09-2010
20100205405STATIC BRANCH PREDICTION METHOD AND CODE EXECUTION METHOD FOR PIPELINE PROCESSOR, AND CODE COMPILING METHOD FOR STATIC BRANCH PREDICTION - A static branch prediction method and code execution method for a pipeline processor, and a code compiling method for static branch prediction, are provided herein. The static branch prediction method includes predicting a conditional branch code as taken or not-taken, adding the prediction information, converting the conditional branch code into a jump target address setting (JTS) code including target address information, branch time information, and a test code, and scheduling codes in a block. The code may be scheduled into a last slot of the block, and the JTS code may be scheduled into an empty slot after all the other codes in the block are scheduled. When the conditional branch code is predicted as taken in the prediction operation, a target address indicated by the target address information may be fetched at a cycle time indicated by the branch time information.08-12-2010
20100205404PIPELINED MICROPROCESSOR WITH FAST CONDITIONAL BRANCH INSTRUCTIONS BASED ON STATIC MICROCODE-IMPLEMENTED INSTRUCTION STATE - A microprocessor includes a memory that stores instructions of a non-user program to implement a user program instruction of the user-visible instruction set of the microprocessor. The non-user program includes a conditional branch instruction. A first fetch unit fetches instructions of the user program that includes the instruction that is implemented by the non-user program. An instruction decoder decodes the user program instructions and saves a state in response to decoding the user program instruction that is implemented by the non-user program. An execution unit executes the user program instructions fetched by the first fetch unit and executes instructions of the non-user program other than the conditional branch instruction. A second fetch unit fetches the non-user program instructions from the memory and resolves the conditional branch instruction based on the saved state without sending the conditional branch instruction to the execution unit to resolve the conditional branch instruction.08-12-2010
20100205403PIPELINED MICROPROCESSOR WITH FAST CONDITIONAL BRANCH INSTRUCTIONS BASED ON STATIC EXCEPTION STATE - A microprocessor includes a memory that stores an exception handler to handle an exception condition. The exception handler is a non-user program private to the microprocessor and includes a conditional branch instruction. A first fetch unit fetches instructions of a user program that includes a user program instruction that causes the exception condition. An execution unit executes the user program instructions fetched by the first fetch unit and executes instructions of the exception handler. The execution unit also saves a state in response to detecting the exception condition caused by the user program instruction. A second fetch unit fetches the exception handler instructions from the memory and resolves the conditional branch instruction based on the saved state without sending the conditional branch instruction to the execution unit to resolve the conditional branch instruction.08-12-2010
20100205402PIPELINED MICROPROCESSOR WITH NORMAL AND FAST CONDITIONAL BRANCH INSTRUCTIONS - A microprocessor includes a first branch condition state and a second branch condition state. The microprocessor also includes a conditional branch instruction of a first type that instructs the microprocessor to wait to correctly resolve the conditional branch instruction of the first type based on the first branch condition state until other instructions within the microprocessor that update the first branch condition state and that are older than the conditional branch instruction of the first type have updated the first branch condition state. A conditional branch instruction of a second type instructs the microprocessor to correctly resolve the conditional branch instruction of the second type based on the second branch condition state without regard to whether other instructions within the microprocessor that update the second branch condition state and that are older than the conditional branch instruction of the second type have yet updated the second branch condition state.08-12-2010
20100205401PIPELINED MICROPROCESSOR WITH FAST NON-SELECTIVE CORRECT CONDITIONAL BRANCH INSTRUCTION RESOLUTION - A microprocessor includes a register that stores a state and a fetch unit that fetches instructions of a program. The program includes a first instruction followed non-immediately by a second instruction. The first instruction instructs the microprocessor to update the state in the register. The second instruction is a conditional branch instruction that specifies a branch condition based on the register state. The fetch unit dispatches the first instruction for execution but refrains from dispatching the second instruction for execution. Execution units receive the first instruction from the fetch unit and responsively update the register state. The fetch unit non-selectively correctly resolves the conditional branch instruction based on the register state when the execution units have updated the register state. The fetch unit also non-selectively refrains from sending the conditional branch instruction to the execution units to be resolved regardless of whether the execution units have updated the register state.08-12-2010
20100205400EXECUTING ROUTINES BETWEEN AN EMULATED OPERATING SYSTEM AND A HOST OPERATING SYSTEM - Approaches for emulating an operating system. A method includes executing a first operating system (OS) on an instruction processor. The first OS includes instructions of a first instruction set that are native to the instruction processor. A second OS is emulated on the first OS and includes instructions of a second instruction set that are not native to the instruction processor. An emulated transfer-of-control instruction is determined during emulation of the second OS to target either instructions of the first set or the second set. In response to determining that instructions of the first set are targeted, control is transferred to the targeted instructions of the first set on the instruction processor. In response to determining that instructions of the second set are targeted, the targeted instructions of the second set are retrieved and emulated.08-12-2010
20100174888Memory System - A memory system includes a storage device storing a plurality of instructions and a central processing unit processing an instruction fetched from the storage device, wherein the central processing unit detects a change in the instruction fetched from the storage device while processing the instruction.07-08-2010
20120254593SYSTEMS, APPARATUSES, AND METHODS FOR JUMPS USING A MASK REGISTER - Embodiments of systems, apparatuses, and methods for performing a jump instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a conditional jump to an address of a target instruction when all of bits of a writemask are zero, wherein the address of the target instruction is calculated using an instruction pointer of the instruction and the relative offset.10-04-2012
20120254590SYSTEM AND TECHNIQUE FOR A PROCESSOR HAVING A STAGED EXECUTION PIPELINE - A technique includes receiving a request from a processor to retrieve a first instruction from a memory for a staged execution pipeline. The technique includes selectively retrieving the first instruction from the memory in response to the request based on a determination of whether the processor will execute the first instruction.10-04-2012
20110208949HARDWARE THREAD DISABLE WITH STATUS INDICATING SAFE SHARED RESOURCE CONDITION - A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.08-25-2011
20100049946PROCESSOR, COMPUTER READABLE RECORDING MEDIUM, AND STORAGE DEVICE - A processor includes: a first storage part that stores instructions of a program including sets of instruction groups, which sets are hierarchically structured; a second storage part that stores an address value of the first storage part in which an instruction to be read next is stored; a third storage part that includes storage areas respectively corresponding to hierarchical levels of the program; and a control part that executes, when an instruction read from the first storage part is a call instruction that calls a different one of the sets of instruction groups, a control to store the address value in the second storage part in one of the storage areas of the third storage part that corresponds to one of the hierarchical levels with which the different one of the sets of instruction groups being executed is associated.02-25-2010
20090113180Fetch Director Employing Barrel-Incrementer-Based Round-Robin Apparatus For Use In Multithreading Microprocessor - A fetch director in a multithreaded microprocessor that concurrently executes instructions of N threads is disclosed. The N threads request to fetch instructions from an instruction cache. In a given selection cycle, some of the threads may not be requesting to fetch instructions. The fetch director includes a circuit for selecting one of threads in a round-robin fashion to provide its fetch address to the instruction cache. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the threads is selected next. The first addend is an N-bit vector where each bit is false if the corresponding thread is requesting to fetch instructions from the instruction cache. The second addend is a 1-hot vector indicating the last selected thread. In one embodiment threads with an empty instruction buffer are selected at highest priority; a last dispatched but not fetched thread at middle priority; all other threads at lowest priority. The threads are selected round-robin within the highest and lowest priorities.04-30-2009
20120084534SYSTEM AND METHOD FOR FAST BRANCHING USING A PROGRAMMABLE BRANCH TABLE - Methods and systems consistent with the present invention provide a programmable table which allows software to define a plurality of branching functions, each of which maps a vector of condition codes to a branch offset. This technique allows for a flexible multi-way branching functionality, using a conditional branch outcome table that can be specified by a programmer. Any instruction can specify the evaluation of arbitrary conditional expressions to compute the values for the condition codes, and can choose a particular branching function. When the processor executes the instruction, the processor's arithmetic/logical functional units evaluate the conditional expressions and then the processor performs the branch operation, according to the specified branching function.04-05-2012
20120191947COMPUTER OPERATION CONTROL METHOD, PROGRAM AND SYSTEM - A computer implemented control method, article of manufacture, and computer implemented system for determining whether stack allocation is possible. The method includes: allocating an object created by a method frame to a stack. The allocation is performed in response to: calling a first and second instruction in the method frame; the first instruction causes an escape of the object, and the second instruction cancels the escape of the object; the object does not escape to a thread other than a thread to which the object has escaped, at the point in time when the escape is cancelled; the first instruction has been called before the second instruction is called; and the object does not escape in accordance with an instruction other than the first instruction in the method frame, regardless of whether the object escapes in accordance with the first instruction.07-26-2012
20110004742Variable-Cycle, Event-Driven Multi-Execution Flash Processor - A Multi-Execution Flash Processor core performs operations associated with accessing non-volatile semiconductor based memory units. Execution units included in the core can execute instructions requiring different numbers of clock cycles to complete by generating an event control signal in response to completing an instruction. The core can be used in a controller to access and control external memory units. Data memory access operations include using an instruction decoder to select one or more execution units to perform an operation associated with the instruction, and generating an event control signal upon completion of the operation. In some cases, executing the instruction includes selecting a second execution unit.01-06-2011
20110131394APPARATUS AND METHOD FOR USING BRANCH PREDICTION HEURISTICS FOR DETERMINATION OF TRACE FORMATION READINESS - A single unified level one instruction(s) cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instruction(s) consistent with conventional cache lines. Formation of trace lines in the cache is delayed on initial operation of the system to assure quality of the trace lines stored.06-02-2011
20120239908DUAL THREAD PROCESSOR - Pipeline processor architectures, processors, and methods are provided. A described processor includes thread allocation counters for corresponding processor threads. For example, a first counter is configured to store a first processor time allocation that controls first periods of processor time for a first processor thread, the first processor thread retaining control of the processor during each of the first periods of processor time. The processor causes data associated with the first processor thread to pass through the processor's pipeline during the first periods of processor time. A second counter is similarly configured. The processor can be configured to receive an input defining processor time to be allocated to one or more processor threads and to use the input to change one or more of the counters such that subsequent periods of processor times for the one or more processor threads are affected.09-20-2012
20120272044PROCESSOR FOR EXECUTING HIGHLY EFFICIENT VLIW - A 32-bit instruction 10-25-2012
20110238952INSTRUCTION FETCH APPARATUS, PROCESSOR AND PROGRAM COUNTER ADDITION CONTROL METHOD - An instruction fetch apparatus is disclosed which includes: a program counter configured to manage the address of an instruction targeted to be executed in a program in which instructions belonging to a plurality of instruction sequences are placed sequentially; a change designation register configured to designate a change of an increment value on the program counter; an increment value register configured to hold the changed increment value; and an addition control section configured such that if the change designation register designates the change of the increment value on the program counter, then the addition control section increments the program counter based on the changed increment value held in the increment value register, the addition control section further incrementing the program counter by an instruction word length if the change designation register does not designate any change of the increment value on the program counter.09-29-2011
20120089816QUERY SAMPLING INFORMATION INSTRUCTION - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.04-12-2012
20110276785BYTE CODE CONVERSION ACCELERATION DEVICE AND A METHOD FOR THE SAME - Provided is a bytecode conversion acceleration device and a method for the same: allowing a reduction in the size of a storage unit for a look-up table including a decoding table, a link table and a native code table; increasing the number of bytecodes that can be processed by hardware by using the look-up table to thereby enhance the overall performance of a virtual machine; and allowing an execution portion to immediately execute the first native code to thereby enhance performance of the virtual machine.11-10-2011
20120096239Low Power Execution of a Multithreaded Program - Technologies for low power execution of one or more threads of a multithreaded program by one or more processing elements are generally disclosed.04-19-2012
20110296142PROCESSOR AND METHOD PROVIDING INSTRUCTION SUPPORT FOR INSTRUCTIONS THAT UTILIZE MULTIPLE REGISTER WINDOWS - A processor including instruction support for large-operand instructions that use multiple register windows may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may also include an instruction execution unit that, during operation, receives instructions for execution from the instruction fetch unit and executes a large-operand instruction defined within the ISA, where execution of the large-operand instruction is dependent upon a plurality of registers arranged within a plurality of register windows. The processor may further include control circuitry (which may be included within the fetch unit, the execution unit, or elsewhere within the processor) that determines whether one or more of the register windows depended upon by the large-operand instruction are not present. In response to determining that one or more of these register windows are not present, the control circuitry causes them to be restored.12-01-2011
20110320775ACCELERATING EXECUTION OF COMPRESSED CODE - Methods and apparatus relating to accelerating execution of compressed code are described. In one embodiment, a two-level embedded code decompression scheme is utilized which eliminates bubbles, which may increase speed and/or reduce power consumption. Other embodiments are also described and claimed.12-29-2011
20110320774OPERAND FETCHING CONTROL AS A FUNCTION OF BRANCH CONFIDENCE - A system for data operand fetching control includes a computer processor that includes a control unit for determining memory access operations. The control unit is configured to perform a method. The method includes calculating a summation weight value for each instruction in a pipeline, the summation weight value calculated as a function of branch uncertainty and a pendency in which the instruction resides in the pipeline relative to other instructions in the pipeline. The method also includes mapping the summation weight value of a selected instruction that is attempting to access system memory to a memory access control, each memory access control specifying a manner of handling data fetching operations. The method further includes performing a memory access operation for the selected instruction based upon the mapping.12-29-2011
20110320773FUNCTION VIRTUALIZATION FACILITY FOR BLOCKING INSTRUCTION FUNCTION OF A MULTI-FUNCTION INSTRUCTION OF A VIRTUAL PROCESSOR - In a processor supporting execution of a plurality of functions of an instruction, an instruction blocking value is set for blocking one or more of the plurality of functions, such that an attempt to execute one of the blocked functions, will result in a program exception and the instruction will not execute, however the same instruction will be able to execute any of the functions that are not blocked functions.12-29-2011
20110320772CONTROLLING THE SELECTIVELY SETTING OF OPERATIONAL PARAMETERS FOR AN ADAPTER - An instruction is provided to establish various operational parameters for an adapter. These parameters include adapter interruption parameters, input/output address translation parameters, resetting error indications, setting measurement parameters, and setting an interception control, as examples. The instruction specifies a function information block, which is a program representation of a device table entry used by the adapter, to be used in certain situations in establishing the parameters. A store instruction is also provided that stores the current contents of the function information block.12-29-2011
20120290817BRANCH TARGET STORAGE AND RETRIEVAL IN AN OUT-OF-ORDER PROCESSOR - A processor configured to facilitate transfer and storage of predicted targets for control transfer instructions (CTIs). In certain embodiments, the processor may be multithreaded and support storage of predicted targets for multiple threads. In some embodiments, a CTI branch target may be stored by one element of a processor and a tag may indicate the location of the stored target. The tag may be associated with the CTI rather than associating the complete target address with the CTI. When the CTI reaches an execution stage of the processor, the tag may be used to retrieve the predicted target address. In some embodiments using a tag to retrieve a predicted target, CTI instructions from different processor threads may be interleaved without affecting retrieval of predicted targets.11-15-2012
20130013895BYTE-ORIENTED MICROCONTROLLER HAVING WIDER PROGRAM MEMORY BUS SUPPORTING MACRO INSTRUCTION EXECUTION, ACCESSING RETURN ADDRESS IN ONE CLOCK CYCLE, STORAGE ACCESSING OPERATION VIA POINTER COMBINATION, AND INCREASED POINTER ADJUSTMENT AMOUNT - An exemplary byte-oriented microcontroller includes a program memory, a program memory bus, and a core circuit. The program memory bus has a bus width wider than one instruction byte, and the core circuit is coupled to the program memory through the program memory bus for executing at least one instruction by processing a plurality of instruction bytes fetched from the program memory. The core circuit includes a fetch unit, for fetching the instruction bytes through the program memory bus and re-ordering the fetched instruction bytes to form a complete instruction.01-10-2013
20130013896LOAD/MOVE AND DUPLICATE INSTRUCTIONS FOR A PROCESSOR - A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.01-10-2013
20130024661HARDWARE ACCELERATION COMPONENTS FOR TRANSLATING GUEST INSTRUCTIONS TO NATIVE INSTRUCTIONS - A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.01-24-2013
20080256335MICROPROCESSOR, MICROCOMPUTER, AND ELECTRONIC INSTRUMENT - A microprocessor includes a pipeline control section which controls a pipeline process. The pipeline control section decodes an instruction code of an interrupt instruction and causes an immediate generation section to generate a vector address used for referring to information relating to a branch destination address corresponding to the interrupt instruction stored in a vector table based on a decoding result in a first instruction execution stage of the interrupt instruction. The pipeline control section controls the pipeline process so that the vector address is set in a pipeline register 10-16-2008
20130173885Processor and Methods of Adjusting a Branch Misprediction Recovery Mode - A processor core includes a fetch control unit for fetching instructions and placing the instructions into an instruction queue and includes a branch predictor for controlling the fetch control unit to speculatively fetch at least one instruction subsequent to an unresolved branch instruction. The processor further includes a controller configured to dispatch instructions from the instruction queue and, in response to a branch misprediction of an unresolved control instruction, to apply a selected one of a checkpointing-based recovery mode and a commit-time-based recovery mode.07-04-2013
20110264893DATA PROCESSOR AND IC CARD - The data processor includes: a memory device for storing a program compiled by a compiler; and CPU operable to fetch an instruction code included by a program stored in the memory device. Further, the data processor has a filter for judging an instruction code which the compiler never outputs to limit, in action, CPU in case that CPU fetches the instruction code, which limits, in action, CPU in the case where the program is rewritten by not only an undefined instruction, but also an instruction other than an undefined instruction. The level of security is increased by limiting, in action, CPU.10-27-2011
20080222393Method and arrangements for pipeline processing of instructions - In one embodiment a method for operating a processing pipeline is disclosed. The method can include fetching an instruction in a first clock cycle, decoding the instruction in a second clock cycle and fetching an instruction data associated with the instruction in the second clock cycle. The method can also include associating the instruction data with the instruction and feeding the instruction and the instruction data to a processing unit utilizing the association. The method can also include loading a register with instruction data wherein the number of bits of instruction data loaded per clock cycle varies based on the amount of instruction data required to execute at least one instruction in a clock cycle.09-11-2008
20080222392Method and arrangements for pipeline processing of instructions - In one embodiment a method for parallel processing in a processing pipeline is disclosed. The method can include determining that a jump instruction is loaded in a main path of a processing pipeline prior to the jump instruction being executed. The method can load a jump hit target instruction in a bypass path of the pipeline in response to determining that the jump instruction is loaded in the main path. The bypass path can bypass at least one stage of the processing pipeline and couple into the main path in a stage that is prior to the execute stage. The method can switch the jump hit target instruction into the main path in response to a successful jump-hit condition. The bypass path and the main path can operate concurrently and in parallel.09-11-2008
20120254594Hardware Assist Thread for Increasing Code Parallelism - Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread.10-04-2012
20120254592SYSTEMS, APPARATUSES, AND METHODS FOR EXPANDING A MEMORY SOURCE INTO A DESTINATION REGISTER AND COMPRESSING A SOURCE REGISTER INTO A DESTINATION MEMORY LOCATION - Embodiments of systems, apparatuses, and methods for performing an expand and/or compress instruction in a computer processor are described. In some embodiments, the execution of an expand instruction causes the selection of elements from a source that are to be sparsely stored in a destination based on values of the writemask and store each selected data element of the source as a sparse data element into a destination location, wherein the destination locations correspond to each writemask bit position that indicates that the corresponding data element of the source is to be stored.10-04-2012
20120254591SYSTEMS, APPARATUSES, AND METHODS FOR STRIDE PATTERN GATHERING OF DATA ELEMENTS AND STRIDE PATTERN SCATTERING OF DATA ELEMENTS - Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask.10-04-2012
20130124827INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.05-16-2013
20110276784HIERARCHICAL MULTITHREADED PROCESSING - In one embodiment, a current candidate thread is selected from each of multiple first groups of threads using a low granularity selection scheme, where each of the first groups includes multiple threads and first groups are mutually exclusive. A second group of threads is formed comprising the current candidate thread selected from each of the first groups of threads. A current winning thread is selected from the second group of threads using a high granularity selection scheme. An instruction is fetched from a memory based on a fetch address for a next instruction of the current winning thread. The instruction is then dispatched to one of the execution units for execution, whereby execution stalls of the execution units are reduced by fetching instructions based on the low granularity and high granularity selection schemes.11-10-2011
20130179661Performing A Multiply-Multiply-Accumulate Instruction - In one embodiment, the present invention includes a processor having multiple execution units, at least one of which includes a circuit having a multiply-accumulate (MAC) unit including multiple multipliers and adders, and to execute a user-level multiply-multiply-accumulate instruction to populate a destination storage with a plurality of elements each corresponding to an absolute value for a pixel of a pixel block. Other embodiments are described and claimed.07-11-2013
20120284488Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as 11-08-2012
20130185541BITSTREAM BUFFER MANIPULATION WITH A SIMD MERGE INSTRUCTION - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.07-18-2013
20130124826Optimizing System Throughput By Automatically Altering Thread Co-Execution Based On Operating System Directives - A technique for optimizing program instruction execution throughput in a central processing unit core (CPU). The CPU implements a simultaneous multithreading (SMT) operational mode wherein program instructions associated with at least two software threads are executed in parallel as hardware threads while sharing one or more hardware resources used by the CPU, such as cache memory, translation lookaside buffers, functional execution units, etc. As part of the SMT mode, the CPU implements an autothread (AT) operational mode. During the AT operational mode, a determination is made whether there is a resource conflict between the hardware threads that undermines instruction execution throughput. If a resource conflict is detected, the CPU adjusts the relative instruction execution rates of the hardware threads based on relative priorities of the software threads.05-16-2013
20110314260HIGH-WORD FACILITY FOR EXTENDING THE NUMBER OF GENERAL PURPOSE REGISTERS AVAILABLE TO INSTRUCTIONS - A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.12-22-2011
20130198490SYSTEMS AND METHODS FOR REDUCING BRANCH MISPREDICTION PENALTY - In a processing system capable of single and multi-thread execution, a branch prediciton unit can be configured to detect hard to predict branches and loop instructions. In a dual-threading (simultaneous multi-threading) configuration, one instruction queues (IQ) is used for each thread and instructions are alternately sent from each IQ to decode units. In single thread mode, the second IQ can be used to store the “not predicted path” of the hard-to-predict branch or the “fall-through” path of the loop. On mis-prediction, the mis-prediction penalty is reduced by getting the instructions from IQ instead of instruction cache.08-01-2013
20120066479METHODS AND APPARATUS FOR HANDLING SWITCHING AMONG THREADS WITHIN A MULTITHREAD PROCESSOR - A system, apparatus and method for handling switching among threads within a multithread processor are described herein. Embodiments of the present invention provide a method for multithread handling that includes fetching and issuing one or more instructions, corresponding to a first instruction execution thread, to an execution block for execution during a cycle count associated with the first instruction execution thread and when the instruction execution thread is in an active mode. The method further includes switching a second instruction execution thread to the active mode when the cycle count corresponding to the first instruction execution thread is complete, and fetching and issuing one or more instructions, corresponding to the second instruction execution thread, to the execution block for execution during a cycle count associated with the second instruction execution thread. The method additionally includes resetting the cycle counts when a master instruction execution thread is in the active mode.03-15-2012
20130205116MULTI-THREADED PROCESSOR INSTRUCTION BALANCING THROUGH INSTRUCTION UNCERTAINTY - A computer-implemented method for instruction execution in a pipeline, includes fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches, and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline.08-08-2013
20130205117MFENCE AND LFENCE MICRO-ARCHITECTURAL IMPLEMENTATION METHOD AND SYSTEM - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.08-08-2013

Patent applications in class INSTRUCTION FETCHING

Patent applications in all subclasses INSTRUCTION FETCHING