Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


INSTRUCTION DECODING (E.G., BY MICROINSTRUCTION, START ADDRESS GENERATOR, HARDWIRED)

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712210000 Decoding instruction to accommodate variable length instruction or operand 30
712213000 Predecoding of instruction component 17
712209000 Decoding instruction to accommodate plural instruction interpretations (e.g., different dialects, languages, emulation, etc.) 8
712212000 Decoding by plural parallel decoders 6
712211000 Decoding instruction to generate an address of a microroutine 4
20110022824Address Generation Unit with Pseudo Sum to Accelerate Load/Store Operations - In an embodiment, an address generation unit (AGU) is configured to generate a pseudo sum from an index portion of two or more operands. The pseudo sum may equal the index if the carry-in of the actual sum to the least significant bit of the index is a selected value (e.g. zero). The AGU may also include circuitry coupled to receive the operands and to generate the actual carry-in to the least significant bit of the index. The AGU may transmit the pseudo sum and the carry-in to a decode block for a memory array. The decode block may decode the pseudo sum into one or more one-hot vectors. The one-hot vectors may be input to muxes, and the one-hot vectors rotated by one position may be the other input. The actual carry-in may be the selection control of the mux.01-27-2011
20090024836MULTIPLE-CORE PROCESSOR WITH HIERARCHICAL MICROCODE STORE - A multiple-core processor having a hierarchical microcode store. A processor may include multiple processor cores, each configured to independently execute instructions defined according to a programmer-visible instruction set architecture (ISA). Each core may include a respective local microcode unit configured to store microcode entries. The processor may also include a remote microcode unit accessible by each of the processor cores. Any given one of the processor cores may be configured to generate a given microcode entrypoint corresponding to a particular microcode entry including one or more operations to be executed by the given processor core, and to determine whether the particular microcode entry is stored within the respective local microcode unit of the given core. In response to determining that the particular microcode entry is not stored within the respective local microcode unit, the given core may convey a request for the particular microcode entry to the remote microcode unit.01-22-2009
20120079248Aliased Parameter Passing Between Microcode Callers and Microcode Subroutines - An apparatus of an aspect includes a plurality of microcode alias locations and a microcode storage. A microinstruction of a microcode subroutine is stored in the microcode storage. The microinstruction has an indication of a microcode alias location. A microcode caller of the microcode subroutine is also stored in the microcode storage. The microcode caller is operable to specify a location of a parameter in the microcode alias location that is indicated by the microinstruction of the microcode subroutine. The apparatus also includes parameter location determination logic that is coupled with the microcode alias locations. The parameter location determination logic is operable, responsive to the microinstruction of the microcode subroutine, to receive the indication of the microcode alias location from the microinstruction and determine the location of the parameter specified in the microcode alias location indicated by the microinstruction.03-29-2012
20100070741MICROPROCESSOR WITH FUSED STORE ADDRESS/STORE DATA MICROINSTRUCTION - A microprocessor includes an instruction translator that translates a store macroinstruction into exactly one fused store microinstruction. The store macroinstruction in the microprocessor's macroarchitecture macroinstruction set instructs the microprocessor to store data from a general purpose register of the microprocessor to a memory location. The fused store microinstruction is an instruction in the microprocessor's microarchitecture microinstruction set. A reorder buffer (ROB) receives the fused store microinstruction from the instruction translator into exactly one of its plurality of entries. An instruction dispatcher dispatches for execution a store address microinstruction and a store data microinstruction to different respective execution units of the microprocessor in response to receiving the fused store microinstruction. Neither the store address microinstruction nor the store data microinstruction occupy any of the ROB entries. The ROB retires the fused store microinstruction after being notified that both the store address microinstruction and the store data microinstruction have been executed.03-18-2010
Entries
DocumentTitleDate
20130080741HARDWARE CONTROL OF INSTRUCTION OPERANDS IN A PROCESSOR - An apparatus generally having a first circuit, a second circuit and a third circuit is disclosed. The first circuit may have a counter and may be configured to adjust at least one control signal in response to a current value of the counter. The first circuit may be implemented only in hardware. The counter generally counts a number of loops in which a plurality of instructions are executed. The second circuit may be configured to set the counter to an initial value. The third circuit may be configured to execute the instructions using a plurality of data items as a plurality of operands such that at least two of the instructions use different ones of the operands. The data items may be routed to the third circuit in response to the control signal. The apparatus generally forms a processor.03-28-2013
20130080742METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.03-28-2013
20090119489Methods and Apparatus for Transforming, Loading, and Executing Super-Set Instructions - Techniques are described for loading decoded instructions and super-set instructions in a memory for later access. For loading a decoded instruction, the decoded instruction is a transformed form of an original instruction that was stored in the program memory. The transformation is from an encoded assembly level format to a binary machine level format. In one technique, the transformation mechanism is invoked by a transform and load instruction that causes an instruction retrieved from program memory to be transformed into a new language format and then loaded into a transformed instruction memory. The format of the transformed instruction may be optimized to the implementation requirements, such as improving critical path timing. The transformation of instructions may extend to other needs beyond timing path improvement, for example, requiring super-set instructions for increased functionality and improvements to instruction level parallelism. Techniques for transforming, loading, and executing super-set instructions are described.05-07-2009
20100042812Data Dependent Instruction Decode - A circuit arrangement and method support data dependent instruction decoding, whereby instructions are decoded, in part, using decode data that is stored in operand registers identified by such instructions. An instruction may include an opcode and at least one operand that identifies a register. During execution of the instruction, the instruction is first decoded using the opcode, and then decode data stored in the operand register is retrieved and used to further decode the instruction, e.g., to select from among a plurality of operations or instruction types associated with the same opcode.02-18-2010
20100332802Priority circuit, processor, and processing method - A priority circuit is connected to a reservation station and a plurality of arithmetic units that processes different operations and dispatches, when it is determined that an executable flag indicating that an instruction can be executed by only a specific arithmetic unit is on, an instruction to an arithmetic unit that is different from the specific arithmetic unit and of which a queue is vacant in accordance with the input performed by an instruction decoder and the reservation station.12-30-2010
20100106944Data processing apparatus and method for performing rearrangement operations - A data processing apparatus and method are provided for performing rearrangement operations. The data processing apparatus has a register data store with a plurality of registers, each register storing a plurality of data elements. Processing circuitry is responsive to control signals to perform processing operations on the data elements. An instruction decoder is responsive to at least one but no more than N rearrangement instructions, where N is an odd plural number, to generate control signals to control the processing circuitry to perform a rearrangement process at least equivalent to: obtaining as source data elements the data elements stored in N registers of said register data store as identified by the at least one re-arrangement instruction; performing a rearrangement operation to rearrange the source data elements between a regular N-way interleaved order and a de-interleaved order in order to produce a sequence of result data elements; and outputting the sequence of result data elements for storing in the register data store. This provides a particularly efficient technique for performing N-way interleave and de-interleave operations, where N is an odd number, resulting in high performance, low energy consumption, and reduced register use when compared with known prior art techniques.04-29-2010
20130046960METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATION - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.02-21-2013
20130046958Systems and Methods for Local Iteration Adjustment - Various embodiments of the present invention provide systems and methods for data processing. As an example, a data processing circuit is disclosed that includes: a data decoder circuit and a local iteration adjustment circuit. The data decoder circuit is operable to perform a number of local iterations on a decoder input to yield a data output. The local iteration adjustment circuit is operable to generate a limit on the number of local iterations performed by the data decoder circuit02-21-2013
20130046956SYSTEMS AND METHODS FOR HANDLING INSTRUCTIONS OF IN-ORDER AND OUT-OF-ORDER EXECUTION QUEUES - Systems and methods are disclosed that can include a processor having an instruction unit, a decode/issue unit, a first execution queue configured to provide instructions of a first instruction type to a first execution unit, and a second execution queue configured to provide instructions of a second instruction type to a second execution unit. A first instruction (IMUL) of the second instruction type is received. The first instruction is decoded by the decode/issue unit to determine operands of the first instruction. The operands of the first instruction are determined to include a dependency on a second instruction (Id) of the first instruction type stored in a first entry of the first execution queue. The first instruction is stored in a first entry of the second execution queue. In response to determining that the operands of the first instruction include the dependency on the second instruction: a synchronization indicator corresponding to the first instruction in a second entry of the first execution queue is set immediately adjacent the first entry of the first execution queue, which indicates that the first instruction is stored in another execution queue. A synchronization pending indicator is set in the first entry of the second execution queue to indicate that the first instruction has a corresponding synchronization indicator stored in another execution queue.02-21-2013
20130046959METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATION - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.02-21-2013
20130046957SYSTEMS AND METHODS FOR HANDLING INSTRUCTIONS OF IN-ORDER AND OUT-OF-ORDER EXECUTION QUEUES - Processing systems and methods are disclosed that can include an instruction unit which provides instructions for execution by the processor; a decode/issue unit which decodes instructions received from the instruction unit and issues the instructions; and a plurality of execution queues coupled to the decode/issue unit, wherein each issued instruction from the decode/issue unit can be stored into an entry of at least one queue of the plurality of execution queues. The plurality of queues can comprise an independent execution queue, a dependent execution queue, and a plurality of execution units coupled to receive instructions for execution from the plurality of execution queues. The plurality of execution units can comprise a first execution unit, coupled to receive instructions from the dependent execution queue and the independent execution queue which have been selected for execution. When a multi-cycle instruction at a bottom entry of the dependent execution queue is selected for execution, it may not be removed from the dependent execution queue until a result is received from the first execution unit. When a multi-cycle instruction at a bottom entry of the independent execution queue is selected for execution, it can be removed from the independent execution queue without waiting to receive a result from the first execution unit.02-21-2013
20120191949PREDICTING A RESULT OF A DEPENDENCY-CHECKING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments include a processor that executes a vector instruction. In the described embodiments, while dispatching instructions at runtime, the processor encounters a dependency-checking instruction. Upon determining that a result of the dependency-checking instruction is predictable, the processor dispatches a prediction micro-operation associated with the dependency-checking instruction, wherein the prediction micro-operation generates a predicted result vector for the dependency-checking instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if a predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, the processor sets the element to zero.07-26-2012
20110016292OUT-OF-ORDER EXECUTION IN-ORDER RETIRE MICROPROCESSOR WITH BRANCH INFORMATION TABLE TO ENJOY REDUCED REORDER BUFFER SIZE - An out-of-order execution in-order retire microprocessor includes a branch information table comprising N entries. Each of the N entries stores information associated with a branch instruction. The microprocessor also includes a reorder buffer comprising M entries. Each of the M entries stores information associated with an unretired instruction within the microprocessor. Each of the M entries includes a field that indicates whether the unretired instruction is a branch instruction and, if so, a tag identifying one of the N entries in the branch information table storing information associated with the branch instruction. N is significantly less than M such that the overall die space and power consumption is reduced over a processor in which each reorder buffer entry stores the branch information.01-20-2011
20130073835PRIMITIVES TO ENHANCE THREAD-LEVEL SPECULATION - A processor may include an address monitor table and an atomic update table to support speculative threading. The processor may also include one or more registers to maintain state associated with execution of speculative threads. The processor may support one or more of the following primitives: an instruction to write to a register of the state, an instruction to trigger the committing of buffered memory updates, an instruction to read the a status register of the state, and/or an instruction to clear one of the state bits associated with trap/exception/interrupt handling. Other embodiments are also described and claimed.03-21-2013
20110047355Offset Based Register Address Indexing - A circuit arrangement and method support offset based register address indexing, wherein register addresses to be used by an instruction are calculated using offsets to the full target register address, and the offsets are contained in the instruction and occupy less instruction space than the full address widths. An instruction may include at least one offset value that identifies a register address. During decoding of the instruction, an offset and a full target address are retrieved from the instruction, and then a register address is calculated by addition of the offset to the full target address.02-24-2011
20120117359NO-DELAY MICROSEQUENCER - An apparatus generally including a memory and a circuit is disclosed. The memory may be configured to store a plurality of instructions. Each of the instructions generally includes a corresponding command and a corresponding command repeat count. At least one of the instructions may include a subprocedure call. The circuit may be configured to (i) decode the instructions one at a time and (ii) present a sequence of the commands at an interface. The sequence (i) may be based on the decoding and (ii) may have no delays between consecutive the commands at the interface.05-10-2012
20110022823INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING METHOD THEREOF - An information processing system includes an execution unit and a decoder. The execution unit includes a plurality of arithmetic units each having a first operation circuit that performs a first operation on a first input value and a second input value, a second operation circuit that performs a second operation on the first input value and the second input value, and a selector that selects and outputs either a first output value output from the first operation circuit or a second output value output from the second operation circuit based on a selection signal. The decoder decodes an operation instruction and determines each value of the selection signal of each arithmetic unit. The decoder determines the value of the selection signal corresponding to the operation instruction with respect to each program.01-27-2011
20090235055SCHEDULING APPARATUS AND SCHEDULING METHOD - A scheduling apparatus includes a state information acquisition unit that acquires, from a decoding unit, state information that serves as an indicator for computing a processing time actually required for a decoding unit to perform a decoding process on first processing target data, and a schedule generating unit that determines a second scheduled processing time for which the decoding unit is able to perform the decoding process on second processing target data, based on the processing time computed based on the state information. The schedule generating unit further generates a second schedule for the decoding unit to perform the decoding process on the second processing target data, by allocating the second scheduled processing time to the decoding unit.09-17-2009
20090235054DISASSEMBLING AN EXECUTABLE BINARY - A method for disassembling an executable binary (binary). In one implementation, a plurality of potential address references may be identified based on the binary and a plurality of storage addresses containing the binary. A plurality of assembler source code instructions (instructions) may be generated by disassembling the binary. The binary may be disassembled at one or more sequential addresses starting at each of the plurality of potential address references.09-17-2009
20090006815SYSTEM AND METHOD FOR INTERFACING DEVICES - A system in one embodiment includes (i) an interface module to control and monitor the system; a plurality of power cells which act as a point of power delivery and monitor environmental variables that effect function and reliability, (ii) a radio frequency transmitter and receiver to manage nodes distributed across the plurality of power cells, (iii) a maintenance module presenting information requests to be forwarded to the plurality of power cells, and (iv) a communication bus for distribution of data throughout the system.01-01-2009
20090006814Immediate and Displacement Extraction and Decode Mechanism - An extraction and decode mechanism for acquiring and processing instructions and the corresponding constant(s) embedded within the instructions. The extraction and decode mechanism may be included within a processing unit, and may comprise an instruction decode unit and at least one constant steer network. During operation, the instruction decode unit may obtain and decode instructions which are to be executed by the processing unit. For each instruction, the instruction decode unit may also determine the location of one or more constants embedded within the instruction. The constant steer network may receive the location information from the instruction decode unit. While the instruction decode unit decodes the instruction, the constant steer network may obtain the constant(s) embedded within the instruction based on the location information and store the constant(s). The constant(s) embedded within the instruction may be immediate or displacement (imm/disp) constant(s).01-01-2009
20130166883METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.06-27-2013
20130166884METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS - A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.06-27-2013
20100161944Processor and instruction control method - An original first instruction word (I06-24-2010
20110283090Instruction Addressing Using Register Address Sequence Detection - A circuit arrangement and method support efficient indexing into large register files by utilizing register address sequence detection, wherein register addresses to be used by an instruction are produced by concatenating a portion of the address that is contained in the instruction with another portion that is speculatively produced by sequence detection logic. The portion of the correct full address that is not contained in the instruction is stored in a software accessible special purpose register. If the end of a particular sequence of addresses is detected by the sequence detection logic, the invention speculatively assumes that the next address in the sequence will be used. Since only a portion of the full addresses are stored in the instruction, they occupy less instruction space than the full address widths. An instruction may include at least one address portion that identifies a register address.11-17-2011
20130024664METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.01-24-2013
20130024663Table Call Instruction for Frequently Called Functions - An apparatus includes a memory that stores an instruction including an opcode and an operand. The operand specifies an immediate value or a register indicator of a register storing the immediate value. The immediate value is usable to identify a function call address. The function call address is selectable from a plurality of function call addresses.01-24-2013
20130024662RELAXATION OF SYNCHRONIZATION FOR ITERATIVE CONVERGENT COMPUTATIONS - Systems and methods are disclosed that allow atomic updates to global data to be at least partially eliminated to reduce synchronization overhead in parallel computing. A compiler analyzes the data to be processed to selectively permit unsynchronized data transfer for at least one type of data. A programmer may provide a hint to expressly identify the type of data that are candidates for unsynchronized data transfer. In one embodiment, the synchronization overhead is reducible by generating an application program that selectively substitutes codes for unsynchronized data transfer for a subset of codes for synchronized data transfer. In another embodiment, the synchronization overhead is reducible by employing a combination of software and hardware by using relaxation data registers and decoders that collectively convert a subset of commands for synchronized data transfer into commands for unsynchronized data transfer.01-24-2013
20100268918ASIP ARCHITECTURE FOR EXECUTING AT LEAST TWO DECODING METHODS - A system for execution of a decoding method is disclosed. The system is capable of executing at least two data decoding methods which are different in underlying coding principle, wherein at least one of the data decoding methods requires data shuffling operations on the data. In one aspect, the system includes at least one application specific processor having an instruction set having arithmetic operators excluding multiplication, division and power. The processor is selected for execution of approximations of each of the at least two data decoding methods. The system also includes at least a first memory unit, e.g. background memory, for storing data. The system also includes a transfer unit for transferring data from the first memory unit towards the at least one programmable processor. The transfer unit includes a data shuffler. The system may also include a controller for controlling the data shuffler independent from the processor.10-21-2010
20090049278EFFICIENT MEMORY UPDATE PROCESS FOR ON-THE-FLY INSTRUCTION TRANSLATION FOR WELL BEHAVED APPLICATIONS EXECUTING ON A WEAKLY-ORDERED PROCESSOR - A multiprocessor data processing system (MDPS) with a weakly-ordered architecture providing processing logic for substantially eliminating issuing sync instructions after every store instruction of a well-behaved application. Instructions of a well-behaved application are translated and executed by a weakly-ordered processor. The processing logic includes a lock address tracking utility (LATU), which provides an algorithm and a table of lock addresses, within which each lock address is stored when the lock is acquired by the weakly-ordered processor. When a store instruction is encountered in the instruction stream, the LATU compares the target address of the store instruction against the table of lock addresses. If the target address matches one of the lock addresses, indicating that the store instruction is the corresponding unlock instruction (or lock release instruction), a sync instruction is issued ahead of the store operation. The sync causes all values updated by the intermediate store operations to be flushed out to the point of coherency and be visible to all processors.02-19-2009
20120159127SECURITY SANDBOX - Different instruction sets are provided for different units of execution such as threads, processes, and execution contexts. Execution units may be associated with instruction sets. The instruction sets may have mutually exclusive opcodes, meaning an opcode in one instruction set is not included in any other instruction set. When executing a given execution unit, the processor only allows execution of instructions in the instruction set that corresponds to the current execution unit. A failure occurs if the execution unit attempts to directly execute an instruction in another instruction set.06-21-2012
20090089549H.264 Video Decoder CABAC Core Optimization Techniques - A device employing techniques to optimize Context-based Adaptive Binary Arithmetic Coding (CABAC) for the H.264 video decoding is provided. The device includes a processing circuit operative to implement a set of instructions to decode multiple bins simultaneously and renormalize an offset register and a range register after the multiple bins are decoded. The range register and offset registers may be 32 or 64 bits. The use of a larger range register allows renormalization to be skipped when enough bits are still in the range register.04-02-2009
20110264895METHOD AND APPARATUS FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.10-27-2011
20110219213INSTRUCTION CRACKING BASED ON MACHINE STATE - A method, information processing system, and computer program product manage instruction execution based on machine state. At least one instruction is received. The at least one instruction is decoded. A current machine state is determined in response to the decoding. The at least one instruction is organized into a set of unit of operations based on the current machine state that has been determined. The set of unit of operations is executed.09-08-2011
20110219212System and Method of Processing Hierarchical Very Long Instruction Packets - A system and method of processing a hierarchical very long instruction word (VLIW) packet is disclosed. In a particular embodiment, a method of processing instructions is disclosed. The method includes receiving a hierarchical VLIW packet of instructions and decoding an instruction from the packet to determine whether the instruction is a single instruction or whether the instruction includes a subpacket that includes a plurality of sub-instructions. The method also includes, in response to determining that the instruction includes the subpacket, executing each of the sub-instructions.09-08-2011
20090217005METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR SELECTIVELY ACCELERATING EARLY INSTRUCTION PROCESSING - A method for selectively accelerating early instruction processing including receiving an instruction data that is normally processed in an execution stage of a processor pipeline, wherein a configuration of the instruction data allows a processing of the instruction data to be accelerated from the execution stage to an address generation stage that occurs earlier in the processor pipeline than the execution stage, determining whether the instruction data can be dispatched to the address generation stage to be processed without being delayed due to an unavailability of a processing resource needed for the processing of the instruction data in the address generation stage, dispatching the instruction data to be processed in the address generation stage if it can be dispatched without being delayed due to the unavailability of the processing resource, and dispatching the instruction data to be processed in the execution stage if it can not be dispatched without being delayed due to the unavailability of the processing resource, wherein the processing of the instruction data is selectively accelerated using an address generation interlock scheme. A corresponding system and computer program product.08-27-2009
20100169613TRANSLATING INSTRUCTIONS IN A SPECULATIVE PROCESSOR - A method for use by a host microprocessor which translates sequences of instructions from a target instruction set for a target processor to sequences of instructions for the host microprocessor including the steps of beginning execution of a speculative sequence of target instructions by committing state of the target processor and storing memory stores previously generated by execution at a point in the execution of instructions at which state of the target processor is known, executing the speculative sequence of host instructions until another point in the execution of target instructions at which state of the target processor is known, rolling back to last committed state of the target processor and discarding the memory stores generated by the speculative sequence of host instructions if execution fails, and beginning execution of a next sequence of target instructions if execution succeeds.07-01-2010
20100268917Systems and Methods for Ramped Power State Control in a Semiconductor Device - Various embodiments of the present invention provide systems and methods for ramping current usage in a semiconductor device. For example, various embodiments of the present invention provide semiconductor devices that include at least a first function circuit and a second function circuit, and a power state change control circuit. The power state change control circuit is operable to transition the power state of the first function circuit from a reduced power state to an operative power state, and to transition the second function circuit from a reduced power state to an operative power state. Transition of the power state of at least one of the first function circuit and the second function circuit is done in at least a first stage at a first time and a second stage at a second time, with the second time being after the first time.10-21-2010
20090063820Application Specific Instruction Set Processor for Digital Radio Processor Receiving Chain Signal Processing - This invention is an application specific integrated processor to implement the complete fixed-rate DRX signal processing paths (FDRX) for a reconfigurable processor-based multi-mode 3G wireless application. This architecture is based on the baseline 16-bit RISC architecture with addition functional blocks (ADU) tightly coupled with the based processor's data path. Each ADU accelerates a computation-intensive tasks in FDRX signal path, such as multi-tap FIRs, IIRs, complex domain and vectored data processing. The ADUs are controlled through custom instructions based on the load/store architecture. The whole FDRX data path can be easily implemented by the software employing these custom instructions.03-05-2009
20080288752DESIGN STRUCTURE FOR FORWARDING STORE DATA TO LOADS IN A PIPELINED PROCESSOR - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for forwarding store data to loads in a pipelined processor is provided. In one implementation, a processor is provided that includes a decoder operable to decode an instruction, and a plurality of execution units operable to respectively execute a decoded instruction from the decoder. The plurality of execution units include a load/store execution unit operable to execute decoded load instructions and decoded store instructions and generate corresponding load memory operations and store memory operations. The store queue is operable to buffer one or more store memory operations prior to the one or more memory operations being completed, and the store queue is operable to forward store data of the one or more store memory operations buffered in the store queue to a load memory operation on a byte-by-byte basis.11-20-2008
20100122068MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD - A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.05-13-2010
20100100711DATA PROCESSOR DEVICE AND METHODS THEREOF - A microsequencer is disclosed that controls the order in which microcode instructions are fetched from a microcode ROM. Each microcode instruction includes an execution command for execution by one or more execution units. Each microcode instruction also includes a microsequencer command to indicate the location of another microcode instruction at the microcode ROM. The microcode instruction can also include a delay field, indicating a selectable time when the associated microcode instruction is to be decoded. The delay field thereby provides more flexible control of the sequencing of microcode instructions.04-22-2010
20090089550JEK DYNAMIC INSTRUMENTATION - A method and system for performing dynamic instrumentation. At least some of the illustrative embodiments are methods comprising setting at least one monitor value (wherein the at least one monitor value is associated with a software monitoring handler), detecting a value within a register equal to the at least one monitor value, and executing the software monitoring handler based on the detecting.04-02-2009
20080209174Processor And Its Instruction Issue Method - An instruction issue method for use in a pipelined processor, comprising the steps of: decoding an instruction to be processed to get a type of the instruction; computing the number of cycles to be occupied at execution stage for the instruction, according to the type of the instruction; marking a target operand of the instruction as acquirable in a predefined cycle before the instruction enters write-back stage, according to the number of cycles, so that subsequent instructions taking the target operand as their source operands perform subsequent operations according to the case that the target operand is acquirable.08-28-2008
20080209175Computer system and method of adapting a computer system to support a register window architecture - A target computing system 08-28-2008
20110271083MICROPROCESSOR ARCHITECTURE AND METHOD OF INSTRUCTION DECODING - A microprocessor architecture comprises an instruction decoding network for decoding in a first mode partially suppressed opcodes of a sequence of instructions, the opcodes comprising a first part containing parameters being invariant for each opcode of the sequence and a second part comprising a flag indicating an end of the sequence, the first part being suppressed for all opcodes of the sequence except a first opcode of the sequence. Further, a method of instruction decoding in a microprocessor architecture comprising an instruction decoding network for decoding in a first mode partially suppressed opcodes of a sequence of instructions, and in a second mode uncompressed instructions comprises decoding an opcode of an instruction in the second mode when the instruction is not compressible; and decoding an opcode of an instruction in the first mode when the instruction is compressible.11-03-2011
20110271082PERFORMING ACTIONS ON FRAME ENTRIES IN RESPONSE TO RECEIVING BULK INSTRUCTION - Various example embodiments are disclosed. According to an example embodiment, a switch may comprise an instruction decode stage and a lookup stage. The instruction decode stage may be configured to receive a bulk instruction identifying an action to perform on frame entries of the lookup stage, and in response to receiving the bulk instruction, send, to the lookup stage, at least first and second frame entry instructions, each of the first and second frame entry instructions identifying the action and identifying a unique frame entry in the lookup stage upon which to perform the action. The lookup stage may be configured to receive the first and second frame entry instructions from the instruction decode stage, and in response to receiving each of the first and second frame entry instructions, perform the identified action on the frame entry identified by the respective frame entry instruction.11-03-2011
20130219151PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.08-22-2013
20080282065Image forming apparatus and control method for the same - An image forming apparatus includes a job-history managing unit that creates and manages a job history, registers a job history, as a job history instance, containing an operation code that instructs a series of job operations to be executed and also containing an operating condition of the job operation, and that registers a macro instance which can be interpreted by other image forming apparatuses and is encoded to an executable shared code. The image forming apparatus also includes an interpreter unit that reads the job history instance, encodes the operation code to the shared code, and registers the shared code as the macro instance.11-13-2008
20080270760DEBUG SUPPORT METHOD AND APPARATUS - According to the present invention, there is provided a debug support apparatus having, a decoder configured to receive an instruction output from a compiler which receives a source code, decode the instruction, and output a decoding result; and a display unit configured to receive debug information output from the compiler and the decoding result output from the decoder and display at least a correspondence between each decoded instruction and a position in the source code.10-30-2008
20080270759Computer Having Dynamically-Changeable Instruction Set in Real Time - A computer allows dynamic change of an instruction set during a real-time execution. The computer includes a CPU (Central Processing Unit) having an instruction fetch unit for fetching an instruction from a memory, an instruction decoding unit for generating a predetermined control code corresponding to the instruction fetched by the instruction fetch unit, and an arithmetic logic unit operated by the control code. The instruction decoding unit includes a basic instruction decoding unit for generating a control code for a basic instruction set; and a dynamic instruction decoding unit for generating another control code different from the control code corresponding to an instruction of the basic instruction set, or generating a control code corresponding to an instruction not existing in the basic instruction set. An instruction stored in the dynamic instruction decoding unit or a corresponding control code is configured to be changeable during execution in real time.10-30-2008
20110125987Dedicated Arithmetic Decoding Instruction - A dedicated arithmetic decoding instruction is disclosed. In a particular embodiment, an apparatus includes a memory and a processor coupled to the memory. The processor is configured to execute general purpose instructions and to execute a dedicated arithmetic decoding instruction retrieved from the memory.05-26-2011
20090031113Processor Array, Processor Element Complex, Microinstruction Control Appraratus, and Microinstruction Control Method - A processor array including area-saving microprogram memories is provided. In the processor array, microprogram memories of a plurality of adjacent processor arrays are shared. Effective data and position information 01-29-2009
20090063822MICROPROCESSOR - A microprocessor includes: a processor core that performs pipeline processing; an instruction analyzing section that analyzes an instruction to be processed by the processor core and outputs analysis information indicating whether the instruction matches with a specific instruction; and a memory that temporary stores the instruction with the analysis information, wherein the processor core includes: an instruction fetch unit that fetches the instruction stored in the memory; an instruction decode unit that decodes the instruction fetched by the instruction fetch unit; an instruction execute unit that executes the instruction decoded by the instruction decode unit; and a specific instruction execute controller that reads out the analysis information stored in the memory and controls operation of at least one of the instruction fetch unit and the instruction decode unit when the analysis instruction indicates that the instruction matches with the specific instruction.03-05-2009
20090063821PROCESSOR APPARATUS INCLUDING OPERATION CONTROLLER PROVIDED BETWEEN DECODE STAGE AND EXECUTE STAGE - A processor apparatus includes a sequence controller that decodes the instruction code stored in an instruction memory, an operation array that executes operation of the decoded instruction code, and an asynchronous FIFO. The asynchronous FIFO is provided between a decode stage for decoding the instruction code into at least one instruction by the sequence controller and an execute stage for executing the decoded instruction by the operation array. The asynchronous FIFO executes control, so that the read timing and the execute timing of the decoded instruction are different from each other, and the decoded instruction is continuously executed by the operation array.03-05-2009
20090198966Multi-Addressable Register File - A single register file may be addressed using both scalar and SIMD instructions. That is, subsets of registers within a multi-addressable register file according to the illustrative embodiments, are addressable with different instruction forms, e.g., scalar instructions, SIMD instructions, etc., while the entire set of registers may be addressed with yet another form of instructions, referred to herein as Vector-Scalar Extension (VSX) instructions. The operation set that may be performed on the entire set of registers using the VSX instruction form is substantially similar to that of the operation sets of the subsets of registers. Such an arrangement allows legacy instructions to access subsets of registers within the multi-addressable register file while new instructions, i.e. the VSX instructions, may access the entire range of registers within the multi-addressable register file.08-06-2009
20110145549PIPELINED DECODING APPARATUS AND METHOD BASED ON PARALLEL PROCESSING - An apparatus and method for decoding moving images based on parallel processing are provided. The apparatus for decoding images based on parallel processing can improve operational performance by pipelining massive-data transmission between processors while performing context-adaptive variable length decoding (CAVLD), inverse quantization (IQ), inverse transformation (IT), motion compensation (MC), intra prediction (IP) and deblocking filter (DF) operations in parallel in units of pluralities of macroblocks (MBs).06-16-2011
20120198210Microprocessor Having Novel Operations - A processor. The processor includes a first register for storing a first packed data, a decoder, and a functional unit. The decoder has a control signal input. The control signal input is for receiving a first control signal and a second control signal. The first control signal is for indicating a pack operation. The second control signal is for indicating an unpack operation. The functional unit is coupled to the decoder and the register. The functional unit is for performing the pack operation and the unpack operation using the first packed data. The processor also supports a move operation.08-02-2012
20120079246APPARATUS, METHOD, AND SYSTEM FOR PROVIDING A DECISION MECHANISM FOR CONDITIONAL COMMITS IN AN ATOMIC REGION - An apparatus and method is described herein for conditionally committing /andor speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.03-29-2012
20120079245DYNAMIC OPTIMIZATION FOR CONDITIONAL COMMIT - An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions.03-29-2012
20120079244METHOD AND APPARATUS FOR UNIVERSAL LOGICAL OPERATIONS - An apparatus and method are described for performing arbitrary logical operations specified by a table. For example, one embodiment of a method for performing a logical operation on a computer processor comprises: reading data from each of two or more source operands; combining the data read from the source operands to generate an index value, the index value identifying a subset of bits within an immediate value transmitted with an instruction; reading the bits from the immediate value; and storing the bits read from the immediate value within a destination register to generate a result of the instruction.03-29-2012
20110231633Operand size control - A data processing system 09-22-2011
20090138680VECTOR ATOMIC MEMORY OPERATIONS - A processor is operable to execute one or more vector atomic memory operations. A further embodiment provides support for atomic memory operations in a memory manger, which is operable to process atomic memory operations and to return a completion notification or a result.05-28-2009
20090254735MERGE MICROINSTRUCTION FOR MINIMIZING SOURCE DEPENDENCIES IN OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH VARIABLE DATA SIZE MACROARCHITECTURE - A microprocessor processes a macroinstruction that instructs the microprocessor to write an 8-bit result into only a lower 8 bits of an N-bit architected general purpose register. An instruction translator translates the macroinstruction into a merge microinstruction that specifies an N-bit first source register, an 8-bit second source register, and an N-bit destination register to receive an N-bit result. The N-bit first source register and the N-bit destination register are the N-bit architected general purpose register. An execution unit receives the merge microinstruction and responsively generates the N-bit result to be subsequently written to the N-bit architected general purpose register even though the macroinstruction only instructs the microprocessor to write the 8-bit result into the lower 8 bits of the N-bit architected general purpose register. Specifically, the execution unit directs the 8-bit result into the lower 8 bits of the N-bit result and directs the upper N-8 bits of the N-bit first source register into corresponding upper N-8 bits of the N-bit result.10-08-2009
20100191938INFORMATION PROCESSING DEVICE, ARITHMETIC PROCESSING METHOD, ELECTRONIC APPARATUS AND PROJECTOR - An information processing device including: a first arithmetic processing unit performing first arithmetic processing; a second arithmetic processing unit performing second arithmetic processing; input registers adapted to include a first input register allocated to the first arithmetic processing unit, and a second input register allocated to the second arithmetic processing unit; and output registers storing a processing results of the first arithmetic processing unit and a processing results of the second arithmetic processing unit, in each of given execution cycles, the first arithmetic processing unit performs the first arithmetic processing using stored data of the first input register and stores a processing result of the first arithmetic processing in the output registers and the second arithmetic processing unit performs the second arithmetic processing using stored data of the second input register and stores a processing result of the second arithmetic processing in the output registers.07-29-2010
20100185835Data Processing Circuit With A Plurality Of Instruction Modes, Method Of Operating Such A Data Circuit And Scheduling Method For Such A Data Circuit - A data processing circuit has an execution circuit (07-22-2010
20100191937Implied Storage Operation Decode Using Redundant Target Address Detection - A logic arrangement and method to support implied storage operation decode uses redundant target address detection, whereby target addresses of previous instructions are compared with the target address of the current instruction, and if equal, and the target addresses of previous instructions are not used as sources, the current instruction is decoded as a store instruction. This allows a redundant operation in an instruction set architecture to be redefined as a store instruction, freeing up opcodes normally used for store instructions to be used for other instructions.07-29-2010
20100217958Address calculation and select-and-insert instructions within data processing systems - A data processing system 08-26-2010
20100250900DEPENDENCY TRACKING FOR ENABLING SUCCESSIVE PROCESSOR INSTRUCTIONS TO ISSUE - An information handling system includes a processor with an issue unit (IU) that may perform instruction dependency tracking for successive instruction issue operations. The IU maintains non-shifting issue queue (NSIQ) and shifting issue queue (SIQ) instructions along with relative instruction to instruction dependency information. A mapper maps queue position data for instructions that dispatch to issue queue locations within the IU. The IU may test an issuing producer instruction against consumer instructions in the IU for queue position (QPOS) and register tag (RTAG) matches. A matching consumer instruction may issue in a successive manner in the case of a queue position match or in a next processor cycle in the case of a register tag match.09-30-2010
20120036337Processor on an Electronic Microchip Comprising a Hardware Real-Time Monitor - A processor on an electronic microchip is capable of executing mathematical processes, each of said processes being associated with a priority, and includes means for the management of the processes, the means for the management of the processes taking the form of hardware, the management of the processes comprising the activation and the suspension of the processes and the management of the execution of the processes according to their priorities.02-09-2012
20120144166CONTROL SIGNAL MEMOIZATION IN A MULTIPLE INSTRUCTION ISSUE MICROPROCESSOR - A dynamic predictive and/or exact caching mechanism is provided in various stages of a microprocessor pipeline so that various control signals can be stored and memorized in the course of program execution. Exact control signal vector caching may be done. Whenever an issue group is formed following instruction decode, register renaming, and dependency checking, an encoded copy of the issue group information can be cached under the tag of the leading instruction. The resulting dependency cache or control vector cache can be accessed right at the beginning of the instruction issue logic stage of the microprocessor pipeline the next time the corresponding group of instructions come up for re-execution. Since the encoded issue group bit pattern may be accessed in a single cycle out of the cache, the resulting microprocessor pipeline with this embodiment can be seen as two parallel pipes, where the shorter pipe is followed if there is a dependency cache or control vector cache hit.06-07-2012
20100064119DATA PROCESSOR - The present invention is directed to realize efficient issue of a superscalar instruction in an instruction set including an instruction with a prefix. A circuit is employed which retrieves an instruction of each instruction code type other than a prefix on the basis of a determination result of decoders for determining an instruction code type, adds the immediately preceding instruction to the retrieved instruction, and outputs the resultant to instruction executing means. When an instruction of a target instruction code type is detected in a plurality of instruction units to be searched, the circuit outputs the detected instruction code and the immediately preceding instruction other than the target instruction code type as prefix code candidates. When an instruction of a target instruction code type cannot be detected at the rear end of the instruction units to be searched, the circuit outputs the instruction at the rear end as a prefix code candidate. When an instruction of a target instruction code type is detected at the head in the instruction code search, the circuit outputs the instruction code at the head.03-11-2010
20100199072Register file - A register file comprising a plurality of register entries for storing data values for use in the execution of data processing instructions is provided, and comprises at least one write port and at least one read port, and circuitry responsive to a write request received at said at least one write port to update one of said plurality of register entries identified by an address specified by said write request with a data value specified by said write request. The register file also comprises further circuitry responsive to a received control signal to set at least a portion of a predetermined register entry to a predetermined value. In this way, certain register file updating instructions can be executed in parallel with other instructions without the need for additional full write-ports as would be required for typical dual-issue, thereby reducing area and routing complexity and cost compared with the use of an additional write-port due to the lower gate count required by the proposed further circuitry.08-05-2010
20090327659TRAP-BASED MECHANISM FOR TRACKING MEMORY ACCESSES - In general, the invention relates to a method. The method includes receiving notification, which includes context information, of a trap. The method further includes accessing, based at least partially upon the context information, a particular instruction that caused the trap, determining, based at least partially upon the context information, a particular address that is to be accessed by the particular instruction, updating a set of log information to indicate accessing of the particular address, causing subsequent accesses of the particular address to not give rise to a trap, after causing subsequent accesses of the particular address to not give rise to a trap, accessing the particular address, after accessing the particular address, causing subsequent accesses of the particular address to give rise to a trap, and causing the particular instruction to not be executed.12-31-2009
20090172357USING A PROCESSOR IDENTIFICATION INSTRUCTION TO PROVIDE MULTI-LEVEL PROCESSOR TOPOLOGY INFORMATION - Embodiments of an invention for using a processor identification instruction to provide multi-level processor topology information are disclosed. In one embodiment, a processor includes decode logic and control logic. The decode logic is to receive an identification instruction having an associated topological level value. The control logic is to provide, in response to the decode logic receiving the identification instruction, processor identification information corresponding to the associated topological level value.07-02-2009
20100306504Controlling issue and execution of instructions having multiple outcomes - At least one instruction of a sequence of program instructions has a plurality of alternative outcomes including at least a first outcome that is independent of at least one operand and a second outcome that is dependent on the at least one operand. The at least one operand is a value generated by a preceding instruction in the sequence. The instruction is issued for execution independently of when the at least one operand is generated by the preceding instruction. Recovery circuitry is provided to perform a recovery operation in the event that the second outcome is executed for the at least one instruction and the at least one operand has not been generated by the preceding instruction when the at least one instruction is to be executed by said instruction execution circuitry.12-02-2010
20130145125Bitstream Buffer Manipulation With A SIMD Merge Instruction - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.06-06-2013
20100325394System and Method for Balancing Instruction Loads Between Multiple Execution Units Using Assignment History - A system and method for balancing instruction loads between multiple execution units are disclosed. One or more execution units may be represented by a slot configured to accept instructions on behalf of the execution unit(s). A decode unit may assign instructions to a particular slot for subsequent scheduling for execution. Slot assignments may be made based on an instruction's type and/or on a history of previous slot assignments. A cumulative slot assignment history may be maintained in a bias counter, the value of which reflects the bias of previous slot assignments. Slot assignments may be determined based on the value of the bias counter, in order to balance the instruction load across all slots, and all execution units. The bias counter may reflect slot assignments made only within a desired historical window. A separate data structure may store data reflecting the actual slot assignments made during the desired historical window.12-23-2010
20100332801Adaptively Handling Remote Atomic Execution - In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed.12-30-2010
20110029759METHOD AND APPARATUS FOR SHUFFLING DATA - Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set.02-03-2011
20110035572Computing device, information processing apparatus, and method of controlling computing device - Multiple data processing instructions instruct a computing device to process multiple data including first data and second data. When a multiple data processing instruction is decoded, two allocatable registers are selected. One is used to store the result of a processing operation performed on first data by one processing unit, and the other is used to store the result of a processing operation performed on second data by another processing unit. Those stored processing results are then transferred to result registers. Normal data processing instructions, on the other hand, instruct a processing operation on third data. When a normal data processing instruction is decoded, one allocatable register is selected and used to store the result of processing that a processing unit performs on the third data. The stored processing result is then transferred to a result register.02-10-2011
20110040953MICROPROCESSOR WITH MICROTRANSLATOR AND TAIL MICROCODE INSTRUCTION FOR FAST EXECUTION OF COMPLEX MACROINSTRUCTIONS HAVING BOTH MEMORY AND REGISTER FORMS - A microprocessor includes a first instruction translator that translates an instruction of an instruction set architecture of a microprocessor. The instruction may specify a first form that writes its result to a destination register or a second form that writes its result to memory. The first instruction translator generates, in response to encountering an instance of the instruction, an indication of whether the instance is of the first form or the second form. A microcode memory stores a tail instruction as part of a microcode routine invoked in response to encountering the instance of the instruction. A second instruction translator receives the tail instruction from the microcode memory and the indication and responsively generates a first micro-operation that writes the result to the destination register if the indication specifies the first form or a second micro-operation that completes a write of the result to memory if the indication specifies the second form.02-17-2011
20120173851MECHANISM FOR MAINTAINING DYNAMIC REGISTER-LEVEL MEMORY-MODE FLAGS IN A VIRTUAL MACHINE SYSTEM - A method for maintaining dynamic register-level memory-mode flags in a virtual machine includes parsing a machine instruction of a live memory analysis command in a virtual machine (VM). The machine instruction can include an instruction opcode, a source address referring to a first type of memory and a destination address referring to a second type of memory. A register bitmap can be stored as a register-level memory-mode flag array. Thereafter, it can be determined whether or not the instruction opcode maps to an inheritance class. Finally, in response to a bit in the register-level memory mode flag array referencing virtual memory and the instruction opcode being mapped to an inheritance class, the register bitmap can be replaced with new bit values that represent redefined memory types for each register represented in the register bitmap. Subsequently, the new register bitmap can be used in simulation of a next machine instruction of a live memory analysis command executing in the virtual machine.07-05-2012
20110078415Efficient Predicated Execution For Parallel Processors - The invention set forth herein describes a mechanism for predicated execution of instructions within a parallel processor executing multiple threads or data lanes. Each thread or data lane executing within the parallel processor is associated with a predicate register that stores a set of 1-bit predicates. Each of these predicates can be set using different types of predicate-setting instructions, where each predicate setting instruction specifies one or more source operands, at least one operation to be performed on the source operands, and one or more destination predicates for storing the result of the operation. An instruction can be guarded by a predicate that may influence whether the instruction is executed for a particular thread or data lane or how the instruction is executed for a particular thread or data lane.03-31-2011
20100180104APPARATUS AND METHOD FOR PATCHING MICROCODE IN A MICROPROCESSOR USING PRIVATE RAM OF THE MICROPROCESSOR - A microprocessor has a microcode memory for storing original microcode instructions to implement user program instructions, and an interface to an external memory for storing a microcode patch. The microcode patch includes substitute microcode instructions and validation information. The microprocessor includes a private random access memory (PRAM), addressable by the original and substitute microcode instructions but not addressable by user program instructions. The microprocessor also includes patch hardware, which conditionally receives the substitute microcode instructions. The microprocessor executes the substitute microcode instructions when applied to the patch hardware instead of corresponding original microcode instructions. The microprocessor is configured to load the microcode patch from external memory into PRAM, determine whether the microcode patch is valid, apply substitute microcode instructions from PRAM to the patch hardware if the microcode patch is valid, and refrain from applying the substitute microcode instructions to the patch hardware, if the microcode patch is invalid.07-15-2010
20080256336MICROPROCESSOR WITH PRIVATE MICROCODE RAM - A microprocessor includes a private RAM (PRAM), for use by microcode, which is non-user-accessible and within its own distinct address space from the system memory address space. The PRAM is denser and slower than user-accessible registers of the microprocessor macroarchitecture, thereby enabling it to provide significantly more storage for microcode. The microinstruction set includes a microinstruction for loading data from the PRAM into the user-accessible registers, and a microinstruction for storing data from user-accessible registers to the PRAM. The microcode may also use the two microinstructions to load/store between the PRAM and non-user-accessible registers of the microarchitecture. Examples of PRAM uses include: computational temporary storage area; storage of x86 VMX VMCS in response to VMREAD and VMWRITE macroinstructions; instantiation of non-user-accessible storage, such as the x86 SMBASE register; and instantiation of x86 MSRs that tolerate the additional access latency of the PRAM, such as the IA32_SYSENTER_CS MSR.10-16-2008
20080256337Method of Decoding A Bit Sequence, Network Element Apparatus And PDU Specification Tool Kit - In the field of data communications, it is desirable to track bits of a bit sequence remaining to be decoded by a decoder. A method of decoding the bit sequence that corresponds to a PDU comprises reading-in a bit sequence and processing the bit sequence. In order to maintain a record of the bits reaming to be processed, a data stack is used during decoding of the bit sequence.10-16-2008
20120303936DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION - In a processor having an instruction unit, a decode/issue unit, and execution queues configured to provide instructions to correspondingly different types execution units, a method comprises maintaining a duplicate free list for the execution queues. The duplicate free list includes a plurality of duplicate dependent instruction indicators that indicate when a duplicate instruction for a dependent instruction is stored in at least one of the execution queues. One of the duplicate dependent instruction indicators is assigned to an execution queue for a dependent instruction. The dependent instruction is executed only when the one of the duplicate dependent instruction indicators is reset.11-29-2012
20120303935MICROPROCESSOR SYSTEMS AND METHODS FOR HANDLING INSTRUCTIONS WITH MULTIPLE DEPENDENCIES - A processor includes an instruction unit which provides instructions for execution by the processor, a decode/issue unit which decodes instructions received from the instruction unit and issues the instructions, and a plurality of execution queues coupled to the decode/issue unit. Each issued instruction from the decode/issue unit is stored into an entry of at least one queue of the plurality of execution queues, wherein each entry of the plurality of execution queues is configured to store an issued instruction and a duplicate indicator corresponding to the issued instruction which indicates whether or not a duplicate instruction of the issued instruction is also stored in an entry of another queue of the plurality of execution queues.11-29-2012
20130159673PROVIDING CAPACITY GUARANTEES FOR HARDWARE TRANSACTIONAL MEMORY SYSTEMS USING FENCES - A method is provided that includes determining a number of outstanding out-of-order instructions in an instruction stream. The method includes determining a number of available hardware resources for executing out-of-order instructions and inserting fencing instructions into the instruction stream if the number of outstanding out-of-order instructions exceeds the determined number of available hardware resources. A second method is provided for compiling source code that includes determining a speculative region. The second method includes generating machine-level instructions and inserting fencing instructions into the machine-level instructions in response to determining the speculative region. A processing device is provided that includes cache memory and a processing unit to execute processing device instructions in an instruction stream. The processing device includes an out-of-order speculation supervisor unit to determine hardware resource availability and generate an indication to insert fencing instructions in response to the availability. Computer readable storage media are also provided.06-20-2013
20130159674INSTRUCTION PREDICATION USING INSTRUCTION FILTERING - A method and circuit arrangement for selectively predicating instructions in an instruction stream based upon a predication filter criteria defined by a predication filter, which describes types or patterns of instructions that should be predicated. Predication logic compares a respective instruction of an instruction stream to predication filter criteria to determine whether the respective instruction matches the predication filter criteria, and the respective instruction is selectively predicated based on whether the respective instruction matches the predication filter criteria.06-20-2013
20120204007Controlling the execution of adjacent instructions that are dependent upon a same data condition - A data processing apparatus is disclosed, having: an instruction decoder configured to decode a stream of instructions, a data processor configured to process the decoded stream of instructions; wherein in response to a plurality of adjacent instructions within the stream of instructions execution of which is dependent upon a data condition being met and whose execution when said data condition is not met does not change a state of said processing apparatus, the processor is configured to: commence determining whether the data condition is met or not; and commence processing said plurality of adjacent instructions; and in response to determining that said data condition is not met; skip to a next instruction to be executed after said plurality of adjacent instructions without executing any intermediate ones of said plurality of adjacent instructions not yet executed and continue execution at the next instruction.08-09-2012
20120204006Embedded opcode within an intermediate value passed between instructions - A data processing system 08-09-2012
20110153990SYSTEM, APPARATUS, AND METHOD FOR SUPPORTING CONDITION CODES - An apparatus is described having decode circuitry to decode a first instruction, wherein the first instruction indicates that a copy of a plurality of condition codes bits is to be copied from a first register to a second register. The apparatus also has first execution circuitry to copy a plurality of condition code bits from a first register to a second register.06-23-2011
20110153989SYNCHRONIZING SIMD VECTORS - A vector compare-and-exchange operation is performed by: decoding by a decoder in a processing device, a single instruction specifying a vector compare-and-exchange operation for a plurality of data elements between a first storage location, a second storage location, and a third storage location; issuing the single instruction for execution by an execution unit in the processing device; and responsive to the execution of the single instruction, comparing data elements from the first storage location to corresponding data elements in the second storage location; and responsive to determining a match exists, replacing the data elements from the first storage location with corresponding data elements from the third storage location.06-23-2011
20080229074Design Structure for Localized Control Caching Resulting in Power Efficient Control Logic - A design structure for an integrated circuit (IC) including a decoder decoding instructions, shadow latches storing instructions as a localized loop, and a state machine controlling the decoder and the plurality of shadow latches. When the state machine identifies instructions that are the same as those stored in the localized loop, it deactivates the decoder and activates the plurality of shadow latches to retrieve and execute the localized loop in place of the instructions provided by the decoder. Additionally, a method of providing localized control caching operations in an IC to reduce power dissipation is provided. The method includes initializing a state machine to control the IC, providing a plurality of shadow latches, decoding a set of instructions, detecting a loop of decoded instructions, caching the loop of decoded instructions in the shadow latches as a localized loop, detecting a loop end signal for the loop and stopping the caching of the localized loop.09-18-2008
20080229073Address calculation and select-and insert instructions within data processing systems - A data processing system 09-18-2008
20120204008Processor with a Hybrid Instruction Queue with Instruction Elaboration Between Sections - Methods and apparatus for processing instructions by elaboration of instructions prior to issuing the instructions for execution are described. An instruction is received at a hybrid instruction queue comprised of a first queue and a second queue. When the second queue has available space, the instruction is elaborated to expand one or more bit fields to reduce decoding complexity when the elaborated instruction is issued, wherein the elaborated instruction is stored in the second queue. When the second queue does not have available space, the instruction is stored in an unelaborated form in a first queue. The first queue is configured as an exemplary in-order queue and the second queue is configured as an exemplary out-of-order queue.08-09-2012
20110161633Systems and Methods for Monitoring Out of Order Data Decoding - Various embodiments of the present invention provide systems and methods for monitoring out of order data decoding. For example, a method for monitoring out of order data processing is provided that includes receiving a plurality of data sets that is associated with a plurality of identifiers with each of the plurality of identifiers indicates a respective one of the plurality of data sets; storing each of the plurality of identifiers in a FIFO memory in an order that the corresponding data sets of the plurality of data sets was received; processing the plurality of data sets such that at least one of the plurality of data sets is provided as an output data set; accessing the next available identifier from the FIFO memory; and asserting an out of order signal when the next available identifier is not the same as the identifier associated with the output data set.06-30-2011
20110161632COMPILER ASSISTED LOW POWER AND HIGH PERFORMANCE LOAD HANDLING - A method and apparatus for handling low power and high performance loads is herein described. Software, such as a compiler, is utilized to identify producer loads, consumer reuse loads, and consumer forwarded loads. Based on the identification by software, hardware is able to direct performance of the load directly to a load value buffer, a store buffer, or a data cache. As a result, accesses to cache are reduced, through direct loading from load and store buffers, without sacrificing load performance.06-30-2011
20100287358Branch Prediction Path Instruction - A method for branch prediction, the method comprising, receiving a branch wrong guess instruction having a branch wrong guess instruction address and data including an opcode and a branch target address, determining whether the branch wrong guess instruction was predicted by a branch prediction mechanism, sending the branch wrong guess instruction to an execution unit responsive to determining that the branch wrong guess instruction was predicted by the branch prediction mechanism, and receiving and decoding instructions at the branch target address.11-11-2010
20110040954DATA PROCESSING SYSTEM - The present invention provides a data processor or a data processing system which can be used in compatible modes among which the number of bits of an address specifying a logical address space varies at the time of referring to a branch address table by extension of displacement of a branch instruction. At the time of generating a branch address of a first branch instruction, the data processor or the data processing system optimizes a multiple with which a displacement is multiplied in accordance with the number of bits of an address specifying a logical address space, adds extended address information to the value of a register, and refers to a branch address table with address information obtained by the addition. The referred information is used as a branch address. To be adapted to a compatible mode using different number of bits of an address specifying a logical address space, it is sufficient to change a multiple with which the displacement is multiplied in accordance with the mode.02-17-2011
20110078414MULTIPORTED REGISTER FILE FOR MULTITHREADED PROCESSORS AND PROCESSORS EMPLOYING REGISTER WINDOWS - A processor includes an instruction fetch unit configured to issue instructions for execution, where the instructions are selected from a number of threads, where each given instruction has a corresponding thread identifier, and where at least some of the instructions specify operand(s) via register identifiers. A register file stores operands usable by the instructions, and may include several banks, each corresponding to a register identifiers and including several entries corresponding to the several threads, wherein the entries are configured to store data values. In response to receiving a request to read a particular register identifier for a given thread identifier, the register file may be configured to decode the given thread identifier to retrieve entries from the banks that correspond to the given thread identifier. The register file may further select, from among the retrieved entries, a data value corresponding to the particular register identifier to be output.03-31-2011
20110307687IN-LANE VECTOR SHUFFLE INSTRUCTIONS - In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.12-15-2011
20090172358IN-LANE VECTOR SHUFFLE INSTRUCTIONS - In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.07-02-2009
20090172356COMPRESSED INSTRUCTION FORMAT - A technique for decoding an instruction in a variable-length instruction set. In one embodiment, an instruction encoding is described, in which legacy, present, and future instruction set extensions are supported, and increased functionality is provided, without expanding the code size and, in some cases, reducing the code size.07-02-2009
20120047351DATA PROCESSING SYSTEM HAVING SELECTIVE REDUNDANCY AND METHOD THEREFOR - A method includes: decoding an instruction a first time to obtain a first decoded instruction; decoding the instruction a second time to obtain a second decoded instruction; comparing at least a portion of the first decoded instruction to at least a portion of the second decoded instruction; and when the at least a portion of the first decoded instruction matches the at least a portion of the second decoded instruction, executing the instruction.02-23-2012
20120233443PROCESSOR TO EXECUTE SHIFT RIGHT MERGE INSTRUCTIONS - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.09-13-2012
20110093683Program flow control - A data processing apparatus includes a data engine 04-21-2011
20120317400SYSTEM AND APPARATUS FOR GROUP FLOATING-POINT INFLATE AND DEFLATE OPERATIONS - Systems and apparatuses are presented relating a programmable processor comprising an execution unit that is operable to decode and execute instructions received from an instruction path and partition data stored in registers in the register file into multiple data elements, the execution unit capable of executing group data handling operations that re-arrange data elements in different ways in response to data handling instructions, the execution unit further capable of executing a plurality of different group floating-point and group integer arithmetic operations that each arithmetically operates on the multiple data elements stored in registers in the register file to produce a catenated result that is returned to a register in the register file, wherein the catenated result comprises a plurality of individual results.12-13-2012
20100095095Instruction processing apparatus - An instruction processing apparatus includes a thread execution processing section executing threads each including plural instructions, a register file including a register window having plural registers, a current window pointer indicating a position of the register where the register window is possible to be inputted and outputted, a current register reading data held by the register window designated by the current window pointer to hold the data and a replacement buffer holding data transferred from the register file to the current register, a first transfer path transferring data in a register file to one of the replacement buffer, a second data transfer transferring data in a replacement buffer to one of the current registers, a calculation section executing a switching instruction of the register window, and a control section controlling, if the calculation section executes the switching instruction, the first data transfer path and the second data transfer path.04-15-2010
20100095094METHOD FOR PROCESSING DATA - A method and device for translating a program to a system including at least one first processor and a reconfigurable unit. Code portions of the program which are suitable for the reconfigurable unit are determined. The remaining code of the program is extracted and/or separated for processing by the first processor.04-15-2010
20100095093INFORMATION PROCESSING APPARATUS AND METHOD OF CONTROLLING REGISTER - An information processing apparatus and a method of controlling the same that employs a register window system and a Simultaneous Multithreading method for reducing circuit areas by sharing a data transfer bus between threads, said bus connecting a master register and a work register provided for each thread and for avoiding interference in instruction execution with other threads caused by a conflict between accesses to a register between threads. An information processing apparatus and a method of controlling the information processing apparatus employing a register window system for register reading, in which a master register and a work register are held for each thread and a bus for transferring data from the master to the work register is shared by threads in order to realize Simultaneous Multithreading.04-15-2010
20100095092Instruction execution control device and instruction execution control method - An instruction execution control device operates a plurality of threads in a simultaneous multi-thread system. And the instruction execution control device has a thread selection circuit (04-15-2010
20100095091Processor, Method and Computer Program - To accelerate processing speed of a processor while keeping increased complexity in the processor's circuitry to a minimum. A processor is offered, comprising a decoder which sequentially acquires and decodes an instruction from a program, including an instruction of a first type and a second type, which are classified according to a property of data upon which the instruction is to operate; a first operation unit which sequentially receives from the decoder, and executes, the instruction of the first type; an operand processing circuit which substitutes a variable value, which is set into a register that is associated with the first operation unit, and which is included within an operand of the instruction of the second type, with a constant; a buffer which queues the instruction of the second type that has been decoded by the decoder, and the operand thereof has been substituted by the operand processing circuit; and a second operation unit which sequentially receives from the buffer, and executes, the instruction of the second type. Methods and computer program for implementing the methods are also disclosed.04-15-2010
20120124337Size mis-match hazard detection - An out-of-order processor 05-17-2012
20120221835MICROPROCESSOR SYSTEMS AND METHODS FOR LATENCY TOLERANCE EXECUTION - An instruction unit provides instructions for execution by a processor. A decode unit decodes instructions received from the instruction unit. Queues are coupled to receive instructions from the decode unit. Each instruction in a same queue is executed in order by a corresponding execution unit. An arbiter is coupled to each queue and to the execution unit that executes instructions of a first instruction type. The arbiter selects a next instruction of the first instruction type from a bottom entry of the queue for execution by the first execution unit.08-30-2012
20120131312Data processing apparatus and method - A data processing apparatus 05-24-2012
20110185159PROCESSOR INCLUDING AGE TRACKING OF ISSUE QUEUE INSTRUCTIONS - An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store IDS. The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update. The issue queue may selectively clock a row and a column of cells of the age matrix that correspond to a dispatched instruction's queue position while leaving other cells unclocked to conserve power.07-28-2011
20100174889GUEST-SPECIFIC MICROCODE - Embodiments of apparatuses, methods, and systems for modifying the behavior of a guest installed to run within a VM are disclosed. In one embodiment, an apparatus includes virtualization logic, first storage, second storage, decode logic, and multiplexing logic. The virtualization logic is to provide a mode in which to operate a virtual machine. The first storage is to store a first plurality of micro-instructions to control the apparatus. The second storage is to store a second plurality of micro-instructions to control the apparatus. The decode logic is to decode a macro-instruction into one of a first plurality and a second plurality of micro-instructions. The multiplexing logic is to cause the macro-instruction to be decoded into the second plurality of micro-instructions instead of the first plurality of micro-instructions only when issued from the virtual machine.07-08-2010
20100299502BAD BRANCH PREDICTION DETECTION, MARKING, AND ACCUMULATION FOR FASTER INSTRUCTION STREAM PROCESSING - An apparatus for extracting instructions from a stream of undifferentiated instruction bytes in a microprocessor having an instruction set architecture in which the instructions are variable length. Decode logic decodes the instruction bytes of the stream to generate for each a corresponding opcode byte indictor and end byte indicator and receives a corresponding taken indicator for each of the instruction bytes. The taken indicator is true if a branch predictor predicted the instruction byte is the opcode byte of a taken branch instruction. The decode logic generates a corresponding bad prediction indicator for each of the instruction bytes. The bad prediction indicator is true if the corresponding taken indicator is true and the corresponding opcode byte indicator is false. The decode logic sets to true the bad prediction indicator for each remaining byte of an instruction whose opcode byte has a true bad prediction indicator. Control logic extracts instructions from the stream and sends the extracted instructions for further processing by the microprocessor. The control logic foregoes sending an instruction having both a true end byte indicator and a true bad prediction indicator.11-25-2010
20100299501INSTRUCTION EXTRACTION THROUGH PREFIX ACCUMULATION - An apparatus has a queue, each entry stores a different line of a stream of instruction bytes and accumulated prefix information associated with each instruction byte. Control logic: (a) detects a condition where an initial portion of an instruction partially within a first line stored in the bottom entry (BE) of the queue remains unextracted from the queue, wherein the initial portion instruction bytes are all prefix bytes; (b) saves away the initial portion length, shifts the first line in the BE out of the queue, and shifts a second line into the BE, in response to detecting the condition; (c) extracts instruction bytes of the unextracted instruction from the second line in the BE and extracts accumulated prefix information from the second line of the BE in place of the already shifted out initial portion prefix bytes; (d) calculates the unextracted instruction length using the saved length; and (e) extracts an instruction other than the unextracted instruction from the second line in the BE using the calculated length.11-25-2010
20100299500PREFIX ACCUMULATION FOR EFFICIENT PROCESSING OF INSTRUCTIONS WITH MULTIPLE PREFIX BYTES - In a microprocessor that has an instruction set architecture in which the instructions may include a variable number of prefix bytes, an apparatus for efficiently extracting instructions from a stream of undifferentiated instruction bytes. Decode logic determines which byte is an opcode byte for each instruction of a plurality of instructions within the stream of undifferentiated instruction bytes. The opcode byte is the first non-prefix byte of the instruction. The decode logic accumulates prefix information onto the opcode byte of the instruction for each instruction of the plurality of instructions. A queue holds the stream of undifferentiated instruction bytes and the accumulated prefix information. Extraction logic extracts the plurality of instructions from the queue in one clock cycle independent of the number of prefix bytes included in each of the plurality of instructions.11-25-2010
20100049948Serial flash semiconductor memory - A serial flash memory is provided with multiple configurable pins, at least one of which is selectively configurable for use in either single-bit serial data transfers or multiple-bit serial data transfers. In single-bit serial mode, data transfer is bit-by-bit through a pin. In multiple-bit serial mode, a number of sequential bits are transferred at a time through respective pins. The serial flash memory may have 16 or fewer pins, and even 8 or fewer pins, so that low pin count packaging such as the 8-pin or 16-pin SOIC package and the 8-contact MLP/QFN/SON package may be used. The availability of the single-bit serial type protocol enables compatibility with a number of existing systems, while the availability of the multiple-bit serial type protocol enables the serial flash memory to provide data transfer rates, in systems that can support them, that are significantly faster than available with standard serial flash memories.02-25-2010
20100011190DECODING MULTITHREADED INSTRUCTIONS - A microprocessor capable of decoding a plurality of instructions associated with a plurality of threads is disclosed. The microprocessor may comprise a first array comprising a first plurality of microcode operations associated with an instruction from within the plurality, the first array capable of delivering a first predetermined number of microcode operations from the first plurality of microcode operations. The microprocessor may further comprise a second array comprising a second plurality of microcode operations, the second array capable of providing one or more of the second plurality of microcode operations in the event that the instruction decodes into more than the first predetermined number of microcode operations. The microprocessor may further comprise an arbiter coupled between the first and second arrays, where the arbiter may determine which thread from the plurality of threads accesses the second array.01-14-2010
20120260069Processor Including Age Tracking of Issue Queue Instructions - An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store (IDS). The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update.10-11-2012
20120084536PRIMITIVES TO ENHANCE THREAD-LEVEL SPECULATION - A processor may include an address monitor table and an atomic update table to support speculative threading. The processor may also include one or more registers to maintain state associated with execution of speculative threads. The processor may support one or more of the following primitives: an instruction to write to a register of the state, an instruction to trigger the committing of buffered memory updates, an instruction to read the a status register of the state, and/or an instruction to clear one of the state bits associated with trap/exception/interrupt handling. Other embodiments are also described and claimed.04-05-2012
20120084535Opcode Space Minimizing Architecture Utilizing Instruction Address to Indicate Upper Address Bits - Due to the ever expanding number of registers and new instructions in modern microprocessor cores, the address widths present in the instruction encoding continue to widen, and fewer instruction opcodes are available, making it more difficult to add new instructions to existing architectures without resorting to inelegant tricks that have drawbacks such as source destructive operations. The disclosed invention utilizes specialized decode and address calculation hardware that concatenates a fixed number of least significant bits of the instruction address onto the upper address bits of each register address portion contained in the instruction, yielding the full register address, instead of providing the full register address widths for every register used in the instruction. This frees up valuable opcode space for other instructions and avoids compiler complexity. This aligns nicely with how most loops are unrolled in assembly language, where independent operations are near each other in memory.04-05-2012
20090019262PROCESSOR MICRO-ARCHITECTURE FOR COMPUTE, SAVE OR RESTORE MULTIPLE REGISTERS, DEVICES, SYSTEMS, METHODS AND PROCESSES OF MANUFACTURE - An electronic circuit (01-15-2009
20080301410Processor Configured for Operation with Multiple Operation Codes Per Instruction - A processor configured to operate with multiple operation codes for each of a plurality of instructions comprises memory circuitry and processing circuitry coupled to the memory circuitry. The processing circuitry is configured to decode a first operation code to produce a given one of the instructions and to decode a second operation code different than the first operation code to also produce the given instruction. Thus, the same instruction is produced for execution by the processing circuitry regardless of whether the first operation code or the second operation code is decoded. The assignment of multiple operation codes to a given instruction may occur in conjunction with the design of the processor, and dynamic selection of a particular one of those operation codes may be performed in conjunction with assembly of code for execution by the processor.12-04-2008
20120265966PROCESSOR WITH INCREASED EFFICIENCY VIA EARLY INSTRUCTION COMPLETION - Methods and apparatuses are provided for increased efficiency in a processor via early instruction completion. An apparatus is provided for increased efficiency in a processor via early instruction completion. The apparatus comprises an execution unit for processing instructions and determining whether a later issued instruction is ready for completion or an earlier issued instruction is ready for completion and a retire unit for retiring the later issued instruction when the later instruction is ready for completion or to retire the earlier instruction when later instruction is not ready for completion and the earlier issued instruction has a known good completion status. A method is provided for increased efficiency in a processor via early instruction completion. The method comprises completing an earlier issued instruction having a known good completion status ahead of a later issued instruction when the later issued instruction is not ready for completion.10-18-2012
20110225397Mapping between registers used by multiple instruction sets - A processor 09-15-2011
20120278592MICROPROCESSOR SYSTEMS AND METHODS FOR REGISTER FILE CHECKPOINTING - In a processor, a decode unit identifies instructions needing a checkpoint and enables selected checkpoints, and a register file unit includes a plurality of architectural registers; a first set of checkpoint registers corresponding to a first checkpoint, wherein each checkpoint register of the first set corresponds to a corresponding architectural register of the plurality of architectural registers; a first set of indicators corresponding to the first set of checkpoint registers which, for each checkpoint register in the first set of checkpoint registers, indicates whether the corresponding architectural register has been modified or is intended to be modified prior to enabling of the first checkpoint; and a second set of indicators corresponding to the first set of checkpoint registers which, for each checkpoint register in the first set of checkpoint registers, indicates whether the corresponding architectural register has been modified or is intended to be modified after enabling of the first checkpoint.11-01-2012
20120278591CROSSBAR SWITCH MODULE HAVING DATA MOVEMENT INSTRUCTION PROCESSOR MODULE AND METHODS FOR IMPLEMENTING THE SAME - A microprocessor is provided that has a datapath that is split into upper and lower portions. The microprocessor includes a centralized crossbar switch module having a single data movement module. The data movement module is capable of processing instructions that require operands to be exchanged between upper and lower 64-bit halves of the split architecture. The data movement module can access and process all instructions that require simultaneous access to the entire register contents of the upper and lower portions. The data movement module is configured to execute any one of a number of different instructions to perform data manipulation with respect to one or more “split-operands” (also referred to simply as “operands” herein). The data movement module can exchange data (bytes and/or bits) of operands for the upper and lower 64-bit halves so that bytes and/or bits of operands can be moved or rearranged to other positions during execution of a particular instruction. The data movement module can allow for various types of operand data movement/manipulation that may be required to implement instruction processing that may be required per various instructions, such as permute, pack, shuffle, vectored conditional move, extract, shift, rotate instructions, any other instruction in which operand data is manipulated, shifted, moved, re-ordered, shuffled or scrambled.11-01-2012
20120089817Conditional selection of data elements - A data processing apparatus, method and computer program that perform an operation on one data element such as a register and then conditionally select either that register or a further register on which no operation has been performed. The apparatus comprises an instruction decoder configured to decode at least one conditional select instruction, said at least one conditional select instruction specifying a primary source register, a secondary source register, a destination register, a condition, and an operation to be performed on a data element from the secondary source register; a data processor configured to perform data processing operations controlled by the instruction decoder wherein: the data processor is responsive to the decoded at least one conditional select instruction and the condition having a predetermined outcome to perform the operation on the data element from the secondary source register to form a resultant data element and to store the resultant data element in the destination register; and the data processor is responsive to the decoded at least one conditional select instruction and the condition not having the predetermined outcome to form the resultant data element from the data element from the primary register and to store the resultant data element in the destination register.04-12-2012
20120096243MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD - A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.04-19-2012
20120096242Method and Apparatus for Performing Control of Flow in a Graphics Processor Architecture - Methods and systems for performing control of flow in a graphics processor architecture are provided. For example, in at least one embodiment, a computing system includes a memory storing a plurality of instructions and a graphics processing unit. The graphics processing unit is configured to process the instructions according to a multi-stage scalar pipeline and store condition code values in the branch control stack. The graphics processing unit is further configured to process branch instructions using condition code values stored in the condition register at the top of the branch control stack.04-19-2012
20100205406OUT-OF-ORDER EXECUTION MICROPROCESSOR THAT SPECULATIVELY EXECUTES DEPENDENT MEMORY ACCESS INSTRUCTIONS BY PREDICTING NO VALUE CHANGE BY OLDER INSTRUCTIONS THAT LOAD A SEGMENT REGISTER - An out-of-order execution microprocessor executes an architectural segment register-loading instruction that instructs the microprocessor to load a new value into an architectural segment register of the microprocessor. A comparator compares the new value specified by the architectural segment register-loading instruction with a current contents of the architectural segment register. A control unit causes to be re-executed using the new value all instructions in the microprocessor that used the current architectural segment register contents as a source operand and that are newer in program order than the architectural segment register-loading instruction whenever the comparator indicates the new value does not equal the current contents. An instruction scheduler retrieves the current contents and issues for execution instructions that use the retrieved current contents, even though the instructions are newer in program order than the register-loading instruction and the register-loading instruction has not yet written the new value to the architectural segment register.08-12-2010
20120144165SIDEBAND PAYLOADS IN PSEUDO NO-OPERATION INSTRUCTIONS - A pseudo no-op instruction in an instruction stream is detected, and the pseudo no-op instruction is decoded as being an opcode, wherein a parameter of the pseudo no-op instruction uniquely identifies the opcode. The method makes use of a pseudo no-op instruction and provides the pseudo no-op instruction with additional semantics outside of the instruction stream execution. New or enhanced functionality can be implemented in application software in a fashion that fully preserves backward compatibility to software and processors that do not support the new or enhanced functionality. If these functionalities are not supported, then the legacy software or processor will merely see and execute the pseudo no-op instruction, which will effectively do nothing at all.06-07-2012
20110161634Processor, co-processor, information processing system, and method for controlling processor, co-processor, and information processing system - A processor includes a buffer that separates a sequence of instructions having no operand into segments and stores the segments, a data holder that holds data to be processed, a decoder that references the data and sequentially decodes at least one of the instructions from the top of the sequence, an instruction execution unit that executes the instruction, and an instruction sequence control unit that controls updating of the instruction sequence in accordance with the decoding result. When the decoded top instruction is a branch instruction and if a branch is taken, the instruction sequence control unit updates the sequence so that the top instruction of one of the segments is located at the top of the sequence. If a branch is not taken, the instruction sequence control unit updates the sequence so that an instruction immediately next to the branch instruction is located at the top of the sequence.06-30-2011
20130024665METHOD, APPARATUS AND INSTRUCTIONS FOR PARALLEL DATA CONVERSIONS - Method, apparatus, and program means for performing a conversion. In one embodiment, a disclosed apparatus includes a destination storage location corresponding to a first architectural register. A functional unit operates responsive to a control signal, to convert a first packed first format value selected from a set of packed first format values into a plurality of second format values. Each of the first format values has a plurality of sub elements having a first number of bits The second format values have a greater number of bits. The functional unit stores the plurality of second format values into an architectural register.01-24-2013
20080222394Systems and Methods for TDM Multithreading - Systems and methods for distributing thread instructions in the pipeline of a multi-threading digital processor are disclosed. More particularly, hardware and software are disclosed for successively selecting threads in an ordered sequence for execution in the processor pipeline. If a thread to be selected cannot execute, then a complementary thread is selected for execution.09-11-2008
20120254595PROCESSOR, INFORMATION PROCESSING APPARATUS AND CONTROL METHOD THEREOF - Instructions decoded by the instruction decoding part are issued to an arithmetic and logic part, a consumption current value that has been consumed by the arithmetic and logic part with instructions issued during a first predetermined period and a consumption current estimated value for a current that is consumed by the arithmetic and logic part with instructions issuable during a second predetermined period of the decoded instructions are calculated, and issuance of some instructions of the decoded instructions is inhibited in the second predetermined period in a case where a change amount of the consumption current estimated value with respect to the consumption current value exceeds a predetermined limit value.10-04-2012
20130097408CONDITIONAL COMPARE INSTRUCTION - An instruction decoder (04-18-2013
20130124830Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.05-16-2013
20130124831Method and Apparatus for Packing Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.05-16-2013
20130124833Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.05-16-2013
20130124835Method and Apparatus for Packing Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.05-16-2013
20130124834Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.05-16-2013
20130124832Method and Apparatus for Unpacking Packed Data - An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to unpack the packed data responsive to an unpack instruction received by the decoder. A first packed data element and a third packed data element are received from the first source register. A second packed data element and a fourth packed data element are received from the second source register. The circuit copies the packed data elements into a destination register resulting with the second packed data element adjacent to the first packed data element, the third packed data element adjacent to the second packed data element, and the fourth packed data element adjacent to the third packed data element.05-16-2013
20130145124SYSTEM AND METHOD FOR PERFORMING SHAPED MEMORY ACCESS OPERATIONS - One embodiment of the present invention sets forth a technique that provides an efficient way to retrieve operands from a register file. Specifically, the instruction dispatch unit receives one or more instructions, each of which includes one or more operands. Collectively, the operands are organized into one or more operand groups from which a shaped access may be formed. The operands are retrieved from the register file and stored in a collector. Once all operands are read and collected in the collector, the instruction dispatch unit transmits the instructions and corresponding operands to functional units within the streaming multiprocessor for execution. One advantage of the present invention is that multiple operands are retrieved from the register file in a single register access operation without resource conflict. Performance in retrieving operands from the register file is improved by forming shaped accesses that efficiently retrieve operands exhibiting recognized memory access patterns.06-06-2013
20130151817METHOD, APPARATUS, AND COMPUTER PROGRAM PRODUCT FOR PARALLEL FUNCTIONAL UNITS IN MULTICORE PROCESSORS - Method, apparatus, and computer program product embodiments of the invention maximize the use of functional processing units in a multicore processor integrated circuit architecture. Example embodiments of the invention determine that instructions to be executed in a functional processor of a local processor core of a multicore processor, are capable of execution in a functional processor of a neighbor processor core of the multicore processor. A compute request is sent to the neighbor processor core to initiate execution of the instructions in the functional processor. A compute response is received from the neighbor processor core, if the functional processor has been able to execute the instructions.06-13-2013
20130151818MICRO ARCHITECTURE FOR INDIRECT ACCESS TO A REGISTER FILE IN A PROCESSOR - A method and system for improving performance and latency of instruction execution within an execution pipeline in a processor. The method includes finding, while decoding an instruction, a pointer register used by the instruction; reading the pointer register; validating a pointer register entry; reading, if the pointer register entry is valid, a register file entry; validating a register file entry; validating, if the register file entry is invalid, a valid register file entry wherein the valid register file entry is in the register file's future file; bypassing, if the valid register file entry is valid, a valid register file value from the register file's future file to the execution pipeline wherein the valid register file value is in the valid register file entry; and executing the instruction using the valid register file value; wherein at least one of the steps is carried out using a computer device.06-13-2013
20130159676INSTRUCTION SET ARCHITECTURE WITH EXTENDED REGISTER ADDRESSING - A method and circuit arrangement selectively repurpose bits from a primary opcode portion of an instruction for use in decoding one or more operands for the instruction. Decode logic of a processor, for example, may be placed in a predetermined mode that decodes a primary opcode for an instruction that is different from that specified in the primary opcode portion of the instruction, and then utilize one or more bits in the primary opcode portion to decode one or more operands for the instruction. By doing so, additional space is freed up in the instruction to support a larger register file and/or additional instruction types, e.g., as specified by a secondary or extended opcode.06-20-2013
20130159675INSTRUCTION PREDICATION USING UNUSED DATAPATH FACILITIES - A method and circuit arrangement for selectively predicating an instruction in an instruction stream based upon a value corresponding to a predication register address indicated by a portion of an operand associated with the instruction. A first compare instruction in an instruction stream stores a compare result in at a register address of a predication register. The register address of the predication register is stored in a portion of an operand associated with a second instruction, and during decoding the second instruction, the predication register is accessed to determine a value stored at the register address of the predication register, and the second instruction is selectively predicated based on the value stored at the register address of the predication register.06-20-2013
20130191615INSTRUCTIONS AND LOGIC TO PERFORM MASK LOAD AND STORE OPERATIONS - In one embodiment, logic is provided to receive and execute a mask move instruction to transfer a vector data element including a plurality of packed data elements from a source location to a destination location, subject to mask information for the instruction. Other embodiments are described and claimed.07-25-2013
20130198492MAJOR BRANCH INSTRUCTIONS - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.08-01-2013
20130198491MAJOR BRANCH INSTRUCTIONS WITH TRANSACTIONAL MEMORY - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.08-01-2013
20130205119INSTRUCTION AND LOGIC TO TEST TRANSACTIONAL EXECUTION STATUS - Novel instructions, logic, methods and apparatus are disclosed to test transactional execution status. Embodiments include decoding a first instruction to start a transactional region. Responsive to the first instruction, a checkpoint for a set of architecture state registers is generated and memory accesses from a processing element in the transactional region associated with the first instruction are tracked. A second instruction to detect transactional execution of the transactional region is then decoded. An operation is executed, responsive to decoding the second instruction, to determine if an execution context of the second instruction is within the transactional region. Then responsive to the second instruction, a first flag is updated. In some embodiments, a register may optionally be updated and/or a second flag may optionally be updated responsive to the second instruction.08-08-2013
20130212357Floating Point Constant Generation Instruction - Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value.08-15-2013
20130212358DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION - A data processing system comprises a processor unit that includes an instruction decode/issue unit including a re-order buffer having entries that include an execution queue tag that indicates an execution queue location of an instruction to which a re-order buffer entry is assigned, a result valid indicator to indicate that a corresponding instruction has executed with a status bit valid result, and a forward indicator to indicate that the status bit can be forwarded to an execution queue of an instruction pointed to that is waiting to receive the status bit.08-15-2013
20130212359Processor to Execute Shift Right Merge Instructions - Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.08-15-2013

Patent applications in class INSTRUCTION DECODING (E.G., BY MICROINSTRUCTION, START ADDRESS GENERATOR, HARDWIRED)

Patent applications in all subclasses INSTRUCTION DECODING (E.G., BY MICROINSTRUCTION, START ADDRESS GENERATOR, HARDWIRED)