Class / Patent application number | Description | Number of patent applications / Date published |
712206000 | Of multiple instructions simultaneously | 62 |
20080244230 | SCALABLE PROCESSING ARCHITECTURE - A computation node according to various embodiments of the invention includes at least one input port capable of being coupled to at least one first other | 10-02-2008 |
20080270758 | Multiple thread instruction fetch from different cache levels - A data processing apparatus is provided wherein processing circuitry executes multiple program threads including at least one high priority thread and at least one lower priority thread. Instructions required by the threads are retrieved from a cache memory hierarchy comprising multiple cache levels. The cache memory hierarchy includes a bypass path for omitting a predetermined level of the cache memory hierarchy when performing a lookup procedure for a required instruction and for bypassing said predetermined level of the cache memory hierarchy when returning said required instruction to said processing circuitry. The bypass path is used by default when the requested instruction is for a lower priority thread. | 10-30-2008 |
20080276072 | System and Method for using a Local Condition Code Register for Accelerating Conditional Instruction Execution in a Pipeline Processor - A method of executing a conditional instruction within a pipeline processor having a plurality of pipelines, the processor having a first condition code register associated with a first pipeline and a second condition code register associated with a second pipeline is disclosed. The method saves a most recent condition code value to either the first condition code register or the second condition code register. The method further sets an indicator indicating whether the second condition code register has the most recent condition code value and retrieves the most recent condition code value from either the first or second condition code register based on the indicator. The method uses the most recent condition code value to determine if the conditional instruction should be executed. | 11-06-2008 |
20080301409 | SCHEDULING THREADS IN A PROCESSOR - The invention provides a processor for executing threads, each thread comprising a sequence of instructions, said instructions defining operations and at least some of those instructions defining a memory access operation. The processor comprises: a plurality of instruction buffers, each for holding at least one instruction of a thread associated with that buffer; an instruction issue stage for issuing instructions from the instruction buffers; and a memory access stage connected to a memory and arranged to receive instructions issued by the instruction issue stage. The memory access stage comprises: detecting logic adapted to detect whether a memory access operation is defined in each issued instruction; and instruction fetch logic adapted to instigate an instruction fetch to fetch an instruction of a thread when no memory access operation is detected. | 12-04-2008 |
20090210661 | METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR AN IMPLICIT PREDICTED RETURN FROM A PREDICTED SUBROUTINE - A method, system and computer program product for performing an implicit predicted return from a predicted subroutine are provided. The system includes a branch history table/branch target buffer (BHT/BTB) to hold branch information, including a target address of a predicted subroutine and a branch type. The system also includes instruction buffers, and instruction fetch controls to perform a method including fetching a branch instruction at a branch address and a return-point instruction. The method also includes receiving the target address and the branch type, and fetching a fixed number of instructions in response to the branch type. The method further includes referencing the return-point instruction within the instruction buffers such that the return-point instruction is available upon completing the fetching of the fixed number of instructions absent a re-fetch of the return-point instruction. | 08-20-2009 |
20090287908 | PREDICATION SUPPORT IN AN OUT-OF-ORDER PROCESSOR BY SELECTIVELY EXECUTING AMBIGUOUSLY RENAMED WRITE OPERATIONS - A predication technique for out-of-order instruction processing provides efficient out-of-order execution with low hardware overhead. A special op-code demarks unified regions of program code that contain predicated instructions that depend on the resolution of a condition. Field(s) or operand(s) associated with the special op-code indicate the number of instructions that follow the op-code and also contain an indication of the association of each instruction with its corresponding conditional path. Each conditional register write in a region has a corresponding register write for each conditional path, with additional register writes inserted by the compiler if symmetry is not already present, forming a coupled set of register writes. Therefore, a unified instruction stream can be decoded and dispatched with the register writes all associated with the same re-name resource, and the conditional register write is resolved by executing the particular instruction specified by the resolved condition. | 11-19-2009 |
20100031006 | THREAD COMPLETION RATE CONTROLLED SCHEDULING - A method, processor and processing system provide management of per-thread pipeline resource allocation in a simultaneous multi-threaded (SMT) processor by counting indications of instruction completion for each of the threads. The indication may be the commit phase of the pipeline, which indicates results of the pipeline instruction execution are ready for write-back. The completion counts are used in a relative or absolute form to control the pipeline resource allocation. The decode or fetch rates of instructions for the threads can be controlled from the relative or absolute completion counts, providing control of scheduling instructions among the threads for execution by execution pipeline(s). Alternatively, or in combination, the thread priority registers in any thread priority management scheme can be controlled by comparison and/or scaling of the completion counts. | 02-04-2010 |
20100064118 | Method and Apparatus for Reducing Latency Associated with Executing Multiple Instruction Groups - A method and apparatus for reducing latency in computer processors. The method incorporates a special instruction set that provides an indication of whether a particular instruction is capable of being executed nearly simultaneously with a preceding instruction in the same group. In such a situation, multiple instructions may be executed at a rate faster than expected. A simple apparatus for accomplishing this method is illustrated. | 03-11-2010 |
20100228954 | GENERAL PURPOSE EMBEDDED PROCESSOR - The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from. A pipeline control unit which launches instructions from plural threads for concurrent execution on plural execution units. | 09-09-2010 |
20100299499 | DYNAMIC ALLOCATION OF RESOURCES IN A THREADED, HETEROGENEOUS PROCESSOR - Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads. | 11-25-2010 |
20110010527 | PROCESSOR FOR EXECUTING INSTRUCTION STREAM AT LOW COST, METHOD FOR THE EXECUTION, AND PROGRAM FOR THE EXECUTION - A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams. The processor may set instruction priority order for each of the instruction streams, designate a memory bank to be used by each of the instruction streams from the memory banks based on bank number information, which indicates a number of memory banks each instruction stream uses, and an instruction address of each of the instruction streams, determine a memory bank to be used in descending priority order based on the instruction stream priority order when a plurality of instruction streams are to use a same memory bank, and supply an instruction | 01-13-2011 |
20110107063 | VECTOR PROCESSING APPARATUS AND METHOD - There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation. | 05-05-2011 |
20110225396 | Methods and Apparatus for Storing Expanded Width Instructions in a VLIW Memory for Deferred Execution - Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units. | 09-15-2011 |
20120060017 | PROCESSOR - A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units. | 03-08-2012 |
20120066480 | Processor - A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding. | 03-15-2012 |
20120110306 | TRANSLATED MEMORY PROTECTION APPARATUS FOR AN ADVANCED MICROPROCESSOR - A method of responding to an attempt to write a memory address including a target instruction which has been translated to a host instruction for execution by a host processor including the steps of marking a memory address including a target instruction which has been translated to a host instruction, detecting a memory address which has been marked when an attempt is made to write to the memory address, and responding to the detection of a memory address which has been marked by protecting a target instruction at the memory address until it has been assured that translations associated with the memory address will not be utilized before being updated. | 05-03-2012 |
20120144164 | PROCESSOR REGISTER RECOVERY AFTER FLUSH OPERATION - An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation. | 06-07-2012 |
20120198209 | GUEST INSTRUCTION BLOCK WITH NEAR BRANCHING AND FAR BRANCHING SEQUENCE CONSTRUCTION TO NATIVE INSTRUCTION BLOCK - A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions comprising at least one guest far branch, and building an instruction sequence from the plurality of guest instructions by using branch prediction on the at least one guest far branch. The method further includes assembling a guest instruction block from the instruction sequence. The guest instruction block is translated to a corresponding native conversion block, wherein an at least one native far branch that corresponds to the at least one guest far branch and wherein the at least one native far branch includes an opposite guest address for an opposing branch path of the at least one guest far branch. Upon encountering a missprediction, a correct instruction sequence is obtained by accessing the opposite guest address. | 08-02-2012 |
20120233441 | MULTI-THREADED INSTRUCTION BUFFER DESIGN - An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read. | 09-13-2012 |
20120297168 | PROCESSING INSTRUCTION GROUPING INFORMATION - Processing instruction grouping information is provided that includes: reading addresses of machine instructions grouped by a processor at runtime from a buffer to form an address file; analyzing the address file to obtain grouping information of the machine instructions; converting the machine instructions in the address file into readable instructions; and obtaining grouping information of the readable instructions based on the grouping information of the machine instructions and the readable instructions resulted from conversion. Status of grouping and processing performed on instructions by a processor at runtime can be acquired dynamically, such that processing capability of the processor can be better utilized. | 11-22-2012 |
20130117535 | Selective Writing of Branch Target Buffer - A method includes executing a branch instruction and determining if a branch is taken. The method further includes evaluating a number of instructions associated with the branch instruction. Upon determining that the branch is taken, the method includes selectively writing an entry into a branch target buffer that corresponds to the taken branch responsive to determining that the number of instructions is less than a threshold. | 05-09-2013 |
20130117536 | RECONFIGURABLE INSTRUCTION ENCODING METHOD AND PROCESSOR ARCHITECTURE - A reconfigurable instruction encoding method includes the followings. An instruction distribution of an application is counted, and multiple instruction pairs with higher utilization rates are accordingly found. Multiple instructions of the instruction pairs are duplicately encoded according to multiple reserved sections of an original instruction table, so that the instructions have corresponding reconfigured codes and a reconfigured instruction table extended from the original instruction table and including the reconfigured codes is obtained. A compiler is utilized to generate multiple machine codes according to the reconfigured instruction table and consecutive execution instructions. Hamming distance of the machine codes corresponding to the reconfigured instruction table and the execution instructions are not longer than Hamming distance of the machine codes generated according to the original instruction table and the execution instructions. | 05-09-2013 |
20130151816 | DELAY IDENTIFICATION IN DATA PROCESSING SYSTEMS - Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register. | 06-13-2013 |
20130166881 | METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS USING PRE-DECODE DATA - Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction. | 06-27-2013 |
20130166882 | METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE - Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction. | 06-27-2013 |
20130179662 | Method and System for Resolving Thread Divergences - An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution. | 07-11-2013 |
20130205118 | MULTI-THREADED PROCESSOR INSTRUCTION BALANCING THROUGH INSTRUCTION UNCERTAINTY - A computer system for instruction execution includes a processor having a pipeline. The system is configured to perform a method including fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline. | 08-08-2013 |
20130262825 | Obfuscated Hardware Multi-Threading - Obfuscating a multi-threaded computer program is carried out using an instruction pipeline in a computer processor by streaming first instructions of a first thread of a multi-threaded computer application program into the pipeline, the first instructions entering the pipeline at the fetch stage, detecting a stall signal indicative of a stall condition in the pipeline, and responsively to the stall signal injecting second instructions of a second thread of the multi-threaded computer application program into the pipeline. The injected second instructions enter the pipeline at an injection stage that is disposed downstream from the fetch stage up to and including the register stage for processing therein. The stall condition exists at one of the stages that is located upstream from the in injection stage. | 10-03-2013 |
20130290677 | EFFICIENT EXTRACTION OF EXECUTION SETS FROM FETCH SETS - An apparatus having a buffer and a circuit is disclosed. The buffer may be configured to store a plurality of fetch sets. Each fetch set generally includes a prefix word and a plurality of instruction words. Each prefix word may include a plurality of symbols. Each symbol generally corresponds to a respective one of the instruction words. The circuit may be configured to (i) identify each of the symbols in each of the fetch sets having a predetermined value and (ii) parse the fetch sets into a plurality of execution sets in response to the symbols having the predetermined value. | 10-31-2013 |
20140019720 | METHODS, APPARATUS, AND INSTRUCTIONS FOR CONVERTING VECTOR DATA - A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed. | 01-16-2014 |
20140075157 | Methods and Apparatus for Adapting Pipeline Stage Latency Based on Instruction Type - Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes. Application code can be given new degrees of optimization freedom where instruction class and the mix of instructions can be chosen based on performance and power requirements. | 03-13-2014 |
20140181474 | ATOMIC WRITE AND READ MICROPROCESSOR INSTRUCTIONS - Methods and apparatus for performing an atomic hardware operation (HWOP) instruction. According to a method in a computer processor coupled to a memory, the method includes fetching, decoding, and executing the atomic HWOP instruction. The instruction includes a source operand indicating a source location and a destination operand indicating a destination location, wherein each of the source location and the destination location is either a register of the computer processor or an address of the memory. Executing the atomic HWOP instruction includes sending a message to an external agent to cause the external agent to atomically access a set of one or more memory locations of the memory based upon a value stored at the source location, and return a result obtained from said atomic access of the set of memory locations to the destination location. The external agent is external to the computer processor. | 06-26-2014 |
20140208074 | INSTRUCTION SCHEDULING FOR A MULTI-STRAND OUT-OF-ORDER PROCESSOR - In one embodiment, a multi-strand system with a pipeline includes a front-end unit, an instruction scheduling unit (ISU), and a back-end unit. The front-end unit performs an out-of-order fetch of interdependent instructions queued using a front-end buffer. The ISU dedicates two hardware entries per strand for checking operand-readiness of an instruction and for determining an execution port to which the instruction is dispatched. The back-end unit receives instructions dispatched from the hardware device and stores the instructions until they are executed. Other embodiments are described and claimed. | 07-24-2014 |
20140215187 | SOLUTION TO DIVERGENT BRANCHES IN A SIMD CORE USING HARDWARE POINTERS - A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler generates code wherein when executed determines a size of a next very large instruction world (VLIW) to process and determine multiple pointer values to store in multiple corresponding PC registers in a target processor. The updated PC registers point to instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The processor includes a vector register for mapping PC registers to execution lanes. | 07-31-2014 |
20140281389 | METHODS AND APPARATUS FOR FUSING INSTRUCTIONS TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY ON MULTIPLE TEST SOURCES - Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler. Some embodiments also fuse the novel test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set. | 09-18-2014 |
20140317383 | APPARATUS AND METHOD FOR COMPRESSING INSTRUCTION FOR VLIW PROCESSOR, AND APPARATUS AND METHOD FOR FETCHING INSTRUCTION - Provided are an instruction compression apparatus and method for a very long instruction word (VLIW) processor, and an instruction fetching apparatus and method. The instruction compression apparatus includes: an indicator generator configured to generate an indicator code that indicates an issue width of an instruction bundle to be executed in the VLIW processor, and a number of No-Operation (NOP) instruction bundles following the instruction bundle; an instruction compressor configured to compress the instruction bundle by removing at least one of NOP instructions from the instruction bundle and the NOP instruction bundles following the instruction bundle; and an instruction converter configured to include the generated indicator code in the compressed instruction bundle. | 10-23-2014 |
20150019840 | Highly Integrated Scalable, Flexible DSP Megamodule Architecture - This invention addresses implements a range of interesting technologies into a single block. Each DSP CPU has a streaming engine. The streaming engines include: a SE to L2 interface that can request 512 bits/cycle from L2; a loose binding between SE and L2 interface, to allow a single stream to peak at 1024 bits/cycle; one-way coherence where the SE sees all earlier writes cached in system, but not writes that occur after stream opens; full protection against single-bit data errors within its internal storage via single-bit parity with semi-automatic restart on parity error. | 01-15-2015 |
20150032997 | TRACKING LONG GHV IN HIGH PERFORMANCE OUT-OF-ORDER SUPERSCALAR PROCESSORS - Tracking global history vector in high performance out of order superscalar processors, in one aspect, may comprise providing a shift register storing global history vector that stores branch predictions and outcomes. A counter is maintained to determine a number of bits to shift the shift register to recover branch history. In another aspect, the global history vector may be implemented with a circular buffer structure. Youngest and oldest pointers to the circular buffer are maintained and used in recovery. | 01-29-2015 |
20150039859 | MICROPROCESSOR ACCELERATED CODE OPTIMIZER - A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The optimized microinstruction sequence is output to a microprocessor pipeline for execution. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence. | 02-05-2015 |
20150095615 | INSTRUCTION DEFINITION TO IMPLEMENT LOAD STORE REORDERING AND OPTIMIZATION - A method for forwarding data from the store instructions to a corresponding load instruction in an out of order processor. The method includes accessing an incoming sequence of instructions, and of said sequence of instructions, splitting store instructions into a store address instruction and a store data instruction, wherein the store address performs address calculation and fetch, and wherein the store data performs a load of register contents to a memory address. The method further includes, of said sequence of instructions, splitting load instructions into a load address instruction and a load data instruction, wherein the load address performs address calculation and fetch, and wherein the load data performs a load of memory address contents into a register, and reordering the store address and load address instructions earlier and further away from LD/SD the instruction sequence to enable earlier dispatch and execution of the loads and the stores. | 04-02-2015 |
20150100760 | PROCESSOR TO PERFORM A BIT RANGE ISOLATION INSTRUCTION - Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. | 04-09-2015 |
20150100761 | SYSTEM-ON-CHIP (SoC) TO PERFORM A BIT RANGE ISOLATION INSTRUCTION - Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. | 04-09-2015 |
20150309800 | Instruction That Performs A Scatter Write - A processor is described having an instruction execution pipeline. The instruction execution pipeline has an instruction fetch stage to fetch an instruction specifying multiple target resultant registers. The instruction execution pipeline has an instruction decode stage to decode the instruction. The instruction execution pipeline has a functional unit to prepare resultant content specific to each of the multiple target resultant registers. The instruction execution pipeline has a write-back stage to write back said resultant content specific to each of said multiple target resultant registers. | 10-29-2015 |
20150370561 | SKIP INSTRUCTION TO SKIP A NUMBER OF INSTRUCTIONS ON A PREDICATE - A pipelined run-to-completion processor executes a conditional skip instruction. If a predicate condition as specified by a predicate code field of the skip instruction is true, then the skip instruction causes execution of a number of instructions following the skip instruction to be “skipped”. The number of instructions to be skipped is specified by a skip count field of the skip instruction. In some examples, the skip instruction includes a “flag don't touch” bit. If this bit is set, then neither the skip instruction nor any of the skipped instructions can change the values of the flags. Both the skip instruction and following instructions to be skipped are decoded one by one in sequence and pass through the processor pipeline, but the execution stage is prevented from carrying out the instruction operation of a following instruction if the predicate condition of the skip instruction was true. | 12-24-2015 |
20150370562 | EFFICIENT CONDITIONAL INSTRUCTION HAVING COMPANION LOAD PREDICATE BITS INSTRUCTION - A pipelined run-to-completion processor can decode three instructions in three consecutive clock cycles, and can also execute the instructions in three consecutive clock cycles. The first instruction causes the ALU to generate a value which is then loaded due to execution of the first instruction into a register of a register file. The second instruction accesses the register and loads the value into predicate bits in a register file read stage. The predicate bits are loaded in the very next clock cycle following the clock cycle in which the second instruction was decoded. The third instruction is a conditional instruction that uses the values of the predicate bits as a predicate code to determine a predicate function. If a predicate condition (as determined by the predicate function as applied to flags) is true then an instruction operation of the third instruction is carried out, otherwise it is not carried out. | 12-24-2015 |
20150370571 | PROCESSOR HAVING A TRIPWIRE BUS PORT AND EXECUTING A TRIPWIRE INSTRUCTION - A pipelined run-to-completion processor has a special tripwire bus port and executes a novel tripwire instruction. Execution of the tripwire instruction causes the processor to output a tripwire value onto the port during a clock cycle when the tripwire instruction is being executed. A first multi-bit value of the tripwire value is data that is output from registers, and/or flags, and/or pointers, and/or data values stored in the pipeline. A field of the tripwire instruction specifies what particular stored values will be output as the first multi-bit value. A second multi-bit value of the tripwire value is a number that identifies the particular processor that output the tripwire value. The processor has a TE enable/disable control bit. This bit is programmable by a special instruction to disable all tripwire instructions. If disabled, a tripwire instruction is fetched and decoded but does not cause the output of a tripwire value. | 12-24-2015 |
20160011875 | UNDEFINED INSTRUCTION RECODING | 01-14-2016 |
20160085551 | HETEROGENEOUS FUNCTION UNIT DISPATCH IN A GRAPHICS PROCESSING UNIT - A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs. | 03-24-2016 |
20160117173 | PROCESSOR CORE INCLUDING PRE-ISSUE LOAD-HIT-STORE (LHS) HAZARD PREDICTION TO REDUCE REJECTION OF LOAD INSTRUCTIONS - A processor core supporting out-of-order execution (OOE) includes load-hit-store (LHS) hazard prediction at the instruction execution phase, reducing load instruction rejections and queue flushes at the dispatch phase. The instruction dispatch unit (IDU) detects likely LHS hazards by generating entries for pending stores in a LHS detection table. The entries in the table contain an address field (generally the immediate field) of the store instruction and the register number of the store. The ISU compares the address field and register number for each load with entries in the table to determine if a likely LHS hazard exists and if an LHS hazard is detected, the load is dispatched to the issue queue of the load-store unit (LSU) with a tag corresponding to the matching store instruction, causing the LSU to dispatch the load only after the corresponding store has been dispatched for execution. | 04-28-2016 |
20160117174 | PROCESSING METHOD INCLUDING PRE-ISSUE LOAD-HIT-STORE (LHS) HAZARD PREDICTION TO REDUCE REJECTION OF LOAD INSTRUCTIONS - A processing method supporting out-of-order execution (OOE) includes load-hit-store (LHS) hazard prediction at the instruction execution phase, reducing load instruction rejections and queue flushes at the dispatch phase. The instruction dispatch unit (IDU) detects likely LHS hazards by generating entries for pending stores in a LHS detection table. The entries in the table contain an address field (generally the immediate field) of the store instruction and the register number of the store. The ISU compares the address field and register number for each load with entries in the table to determine if a likely LHS hazard exists and if an LHS hazard is detected, the load is dispatched to the issue queue of the load-store unit (LSU) with a tag corresponding to the matching store instruction, causing the LSU to dispatch the load only after the corresponding store has been dispatched for execution. | 04-28-2016 |
20160132338 | DEVICE AND METHOD FOR MANAGING SIMD ARCHITECTURE BASED THREAD DIVERGENCE - Provided are an apparatus and a method for effectively managing threads diverged by a conditional branch based on Single Instruction Multiple-based Data (SIMD). The apparatus includes: a plurality of Front End Units (FEUs) configured to fetch, for execution by SIMD lanes, instructions of thread groups of a program flow; and a controller configured to schedule a thread group based on SIMD lane availability information, activate an FEU of the plurality of FEUs, and control the activated FEU to fetch an instruction for processing the scheduled thread group. | 05-12-2016 |
20160202986 | PARALLEL SLICE PROCESSOR HAVING A RECIRCULATING LOAD-STORE QUEUE FOR FAST DEALLOCATION OF ISSUE QUEUE ENTRIES | 07-14-2016 |
20160202987 | Instruction and logic to test transactional execution status | 07-14-2016 |
20160253178 | PROCESSOR AND INSTRUCTION CODE GENERATION DEVICE | 09-01-2016 |
20160378492 | Decoding Information About a Group of Instructions Including a Size of the Group of Instructions - A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor is provided. The method further includes decoding at least one of a first instruction or a second instruction, where: (1) decoding the first instruction results in a processing of information about a group of instructions, including information about a size of the group of instructions, and (2) decoding the second instruction results in a processing of at least one of: (a) a reference to a memory location having the information about the group of instructions, including information about the size of the group of instructions or (b) a processor status word having information about the group of instructions, including information about the size of the group of instructions. | 12-29-2016 |
20160378493 | BULK ALLOCATION OF INSTRUCTION BLOCKS TO A PROCESSOR INSTRUCTION WINDOW - A processor core in an instruction block-based microarchitecture includes a control unit that allocates instructions into an instruction window in bulk by fetching blocks of instructions and associated resources including control bits and operands at once. Such bulk allocation supports increased efficiency in processor core operations by enabling consistent management and policy implementation across all the instructions in the block during execution. For example, when an instruction block branches back on itself, it may be reused in a refresh process rather than being re-fetched from the instruction cache. As all of the resources for that instruction block are in one place, the instructions can remain in place and only valid bits need to be cleared. Bulk allocation also facilitates operand sharing by instructions in a block and explicit messaging among instructions. | 12-29-2016 |
20160378494 | Processing Encoding Format to Interpret Information Regarding a Group of Instructions - A method including fetching information regarding a group of instructions, where the group of instructions is configured to execute atomically by a processor, including an encoding format for the information regarding the group of instructions, is provided. The method further includes processing the encoding format to interpret the information regarding the group of instructions. | 12-29-2016 |
20160378496 | Explicit Instruction Scheduler State Information for a Processor - A method including fetching a group of instructions, where the group of instructions is configured to execute atomically by a processor, is provided. The method further includes scheduling at least one of the group of instructions for execution by the processor before decoding the at least one of the group of instructions based at least on pre-computed ready state information associated with the at least one of the group of instructions. | 12-29-2016 |
20160378500 | SPLIT-LEVEL HISTORY BUFFER IN A COMPUTER PROCESSING UNIT - A split level history buffer in a central processing unit is provided. A history buffer is partitioned into a first portion and a second portion, wherein the first portion includes a first tagged instruction. A result is generated for the first tagged instruction. A determination whether a second tagged instruction is to be stored in the first portion of the history buffer is made. Responsive to the determination that the second tagged instruction is to be stored in the first portion of the history buffer, the first tagged instruction and the generated result for the first tagged instruction is written to the second portion of the history buffer. | 12-29-2016 |
20160378501 | SPLIT-LEVEL HISTORY BUFFER IN A COMPUTER PROCESSING UNIT - A split level history buffer in a central processing unit is provided. A history buffer is partitioned into a first portion and a second portion, wherein the first portion includes a first tagged instruction. A result is generated for the first tagged instruction. A determination whether a second tagged instruction is to be stored in the first portion of the history buffer is made. Responsive to the determination that the second tagged instruction is to be stored in the first portion of the history buffer, the first tagged instruction and the generated result for the first tagged instruction is written to the second portion of the history buffer. | 12-29-2016 |
20170235575 | UNIFIED REGISTER FILE FOR SUPPORTING SPECULATIVE ARCHITECTURAL STATES | 08-17-2017 |
20170235579 | PROCESSOR FOR SPECULATIVE EXECUTION EVENT COUNTER CHECKPOINTING AND RESTORING | 08-17-2017 |