Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712215000 Simultaneous issuance of multiple instructions 34
20120246447Region-Weighted Accounting of Multi-Threaded Processor Core According to Dispatch State - According to one embodiment of the present disclosure, an approach is provided in which a thread is selected from multiple active threads, along with a corresponding weighting value. Computational logic determines whether one of the multiple threads is dispatching an instruction and, if so, computes a dispatch weighting value using the selected weighting value and a dispatch factor that indicates a weighting adjustment of the selected weighting value. In turn, a resource utilization value of the selected thread is computed using the dispatch weighting value.09-27-2012
20100077181System and Method for Issuing Load-Dependent Instructions in an Issue Queue in a Processing Unit of a Data Processing System - A system and method for issuing load-dependent instructions in an issue queue in a processing unit. A load miss queue is provided. The load miss queue comprises a physical address field, an issue queue position field, a valid identifier field, a source identifier field, and a data type field. A load instruction that misses a first level cache is dispatched, and both the physical address field and the data type field are set. A load-dependent instruction is identified. In response to indentifying the load-dependent instruction, each of the issue queue position field, valid identifier field, and source identifier field are set. If the issue queue position field refers to a flushed instruction, the valid identifier field is cleared. The load instruction is recycled, and a value of the valid identifier field is determined. The load-dependent instruction is then selected for issue on a next processing cycle independent of an age of the load-dependent instruction.03-25-2010
20090119490PROCESSOR AND INSTRUCTION SCHEDULING METHOD - An instruction scheduling method and a processor using an instruction scheduling method are provided. The instruction scheduling method includes selecting a first instruction that has a highest priority from a plurality of instructions, and allocating the selected first instruction and a first time slot to one of the functional units, allocating a second instruction and a second time slot to one of the functional units, wherein the second instruction is dependent on the first instruction.05-07-2009
20130042090TEMPORAL SIMT EXECUTION OPTIMIZATION - One embodiment of the present invention sets forth a technique for optimizing parallel thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When the threads in a parallel thread group execute temporally on a common processing pipeline rather than spatially on parallel processing pipelines, execution cycles may be reduced when some threads in the parallel thread group are inactive due to divergence. Similarly, an instruction can be dispatched for execution by only one thread in the parallel thread group when the threads in the parallel thread group are executing a scalar instruction. Reducing the number of threads that execute an instruction removes unnecessary or redundant operations for execution by the processing pipelines. Information about scalar operands and operations and divergence of the threads is used in the instruction dispatch logic to eliminate unnecessary or redundant activity in the processing pipelines.02-14-2013
20120191951MFENCE and LFENCE Micro-Architectural Implementation Method and System - A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.07-26-2012
20120191950PREDICTING A RESULT FOR A PREDICATE-GENERATING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true.07-26-2012
20090271592Apparatus For Storing Instructions In A Multithreading Microprocessor - A circuit for selecting one of N requesters in a round-robin fashion is disclosed. The circuit 1-bit left rotatively increments a first addend by a second addend to generate a sum that is ANDed with the inverse of the first addend to generate a 1-hot vector indicating which of the requestors is selected next. The first addend is an N-bit vector where each bit is false if the corresponding requester is requesting access to a shared resource. The second addend is a 1-hot vector indicating the last selected requestor. A multithreading microprocessor dispatch scheduler employs the circuit for N concurrent threads each thread having one of P priorities. The dispatch scheduler generates P N-bit 1-hot round-robin bit vectors, and each thread's priority is used to select the appropriate round-robin bit from P vectors for combination with the thread's priority and an issuable bit to create a dispatch level used to select a thread for instruction dispatching.10-29-2009
20130166885METHOD AND APPARATUS FOR ON-CHIP TEMPERATURE - When an instruction is executed on an integrated circuit (IC), an activity level and temperature are measured. A relationship between the activity level and temperature is determined, to estimate the temperature from the activity level. The activity level is monitored and is input to a scheduler, which estimates the IC temperature based on the activity level. The scheduler distributes work taking into account the temperature of various IC regions and may include distributing work to the IC region that has a lowest estimated temperature or relatively lower estimated temperature (e.g., lower than the average IC or IC region temperature). When the utilization level of one or more IC regions is high, the scheduler is configured to reduce the clock speed or the voltage of the one or more IC regions, or flag the regions as being unavailable for additional workload.06-27-2013
20110296143PIPELINE PROCESSOR AND AN EQUAL MODEL CONSERVATION METHOD - A pipeline processor which meets a latency restriction on an equal model is provided. The pipeline processor includes a pipeline processing unit to process an instruction at a plurality of stages and an equal model compensator to store the results of the processing of some or all of the instructions located in the pipeline processing unit and to write the results of the processing in a register file based on the latency of each instruction.12-01-2011
20120023314PAIRED EXECUTION SCHEDULING OF DEPENDENT MICRO-OPERATIONS - A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.01-26-2012
20120089819ISSUING INSTRUCTIONS WITH UNRESOLVED DATA DEPENDENCIES - The described embodiments include a processor that determines instructions that can be issued based on unresolved data dependencies. In an issue unit in the processor, the processor keeps a record of each instruction that is directly or indirectly dependent on a base instruction. Upon determining that the base instruction has been deferred, the processor monitors instructions that are being issued from an issue queue to an execution unit for execution. Upon determining that an instruction from the record has reached a head of the issue queue, the processor immediately issues the instruction from the issue queue.04-12-2012
20090217006Heuristic backtracer - A heuristic backtracer is described. In one embodiment, a scanner scans a stack of an application for a pointer to a word of a machine code of the application. A preceding byte locator identifies one or more bytes immediately preceding the pointed-to machine code. A parser parses the one or more bytes immediately preceding the pointed-to machine code of the machine code for a call instruction. A return address identifier determines the pointed-to as a return address when the one or more bytes constitute the call instruction.08-27-2009
20090182986Processing Unit Incorporating Issue Rate-Based Predictive Thermal Management - A circuit arrangement and method utilize an issue rate-based predictive thermal management technique in a microprocessor or other integrated circuit that tracks the rate in which instructions are issued to one or more execution units in the processing unit, and selectively delays the issuance of subsequent instructions to the execution unit(s) based upon the tracked issue rate to predictively control the thermal output of the integrated circuit.07-16-2009
20110138152INSTRUCTION CONTROL DEVICE - A processor which executes threads having different characteristics is provided with an instruction control device. In the instruction control device, a first instruction control unit issues an instruction included in a first instruction sequence to an instruction execution unit. In addition, a second instruction control unit issues an instruction included in a second instruction sequence to the instruction execution unit. Here, the delay time of the second instruction control unit is shorter than the delay time of the first instruction control unit.06-09-2011
20090240919PROCESSOR AND METHOD FOR SYNCHRONOUS LOAD MULTIPLE FETCHING SEQUENCE AND PIPELINE STAGE RESULT TRACKING TO FACILITATE EARLY ADDRESS GENERATION INTERLOCK BYPASS - A pipelined processor including an architecture for address generation interlocking, the processor including: an instruction grouping unit to detect a read-after-write dependency and to resolve instruction interdependency; an instruction dispatch unit (IDU) including address generation interlock (AGI) and operand fetching logic for dispatching an instruction to at least one of a load store unit and an execution unit; wherein the load store unit is configured with access to a data cache and to return fetched data to the execution unit; wherein the execution unit is configured to write data into a general purpose register bank; and wherein the architecture provides support for bypassing of results of a load multiple instruction for address generation while such instruction is executing in the execution unit before the general purpose register bank is written. A method and a computer system are also provided.09-24-2009
20090177867Processor architectures for enhanced computational capability - A digital signal processor includes a control block configured to issue instructions based on a stored program, and a compute array including two or more compute engines configured such that each of the issued instructions executes in successive compute engines of at least a subset of the compute engines at successive times. The digital signal processor may be utilized with a control processor or as a stand-alone processor. The compute array may be configured such that each of the issued instructions flows through successive compute engines of at least a subset of the compute engines at successive times.07-09-2009
20080209176Time stamping transactions to validate atomic operations in multiprocessor systems - A multi-core microprocessor has a plurality of processor cores which are coupled to a bridge element. The bridge element sends transactions to and/or receives transactions from the processor cores, where each transaction has one or more packets. The transactions include atomic transactions. The bridge element comprises a buffer unit storing a time stamp for each packet sent or received. Furthermore, a multi-core multi-node processor system is provided that has debug hardware to capture and time stamp intra-node and/or inter-node transaction packets. Atomic operations are, for example, atomic read-modify-write instructions.08-28-2008
20080209178Method and Apparatus for Back to Back Issue of Dependent Instructions in an Out of Order Issue Queue - A method is provided for evaluating two or more instructions in an out of order issue queue during a particular cycle of the queue, to select an instruction for issue during the next following cycle. If an instruction was previously designated to issue during the particular cycle, one or more instructions in the queue are evaluated to determine if any of them are dependent on the designated instruction. For the evaluation, each instruction placed into the queue is accompanied by corresponding logic elements that provide destination to source compares for the instruction. In an embodiment comprising a method, the oldest ready instruction in the queue during a particular cycle is identified. When an instruction was previously designated to issue during the particular cycle, it is determined whether at least a first instruction in the queue complies with each condition in a set of conditions, the set including at least the conditions that the first instruction has a dependency on the designated instruction, and that the first instruction is older than the oldest ready instruction. The first instruction is selected for issue during the next following cycle only if the first instruction complies with each condition in the set.08-28-2008
20080209177Mechanism in a Multi-Threaded Microprocessor to Maintain Best Case Demand Instruction Redispatch - A method and system for maintaining a best-case demand redispatch of an instruction to allow for maximizing the time a rejected thread may execute in lookahead execution mode, while maintaining the smallest L1 cache miss penalty supported by the memory subsystem. In response to a demand miss, a load/store unit sends a fetch request to the next level cache. The cache line of the demand miss is examined to identify the critical sector. Once the critical sector is identified, a best-case data return time is determined based on the fastest time the next level cache is able to return the critical sector of the cache line. The load/store unit then sends a speculative warning to the dispatch unit to coincide with the best-case data return, wherein the speculative warning prepares the dispatch unit to resend the instruction for execution as soon as data is available to the processor core.08-28-2008
20090282221Preferential Dispatching Of Computer Program Instructions - A computer processor that includes a plurality of execution pipelines, each execution pipeline including a configuration of one or more execution units of the processor, each execution pipeline characterized by an execution pipeline type, each execution pipeline type determined according to the types of computer program instructions executed in each execution pipeline; a plurality of hardware threads of execution, each hardware thread including computer program instructions characterized by instruction types, each hardware thread including sequences of instructions of a same instruction type, the sequences interspersed with instructions of other types; and an instruction dispatcher capable of dispatching instructions preferentially during a predefined preference period from a preferred hardware thread to a particular execution pipeline in dependence upon whether the preferred hardware thread presents a sequence of instructions, ready for execution from the preferred hardware thread, of a type for execution in the particular execution pipeline.11-12-2009
20080282067Issue policy control within a multi-threaded in-order superscalar processor - A multi-threaded in-order superscalar processor 11-13-2008
20080215854System and Method for Adaptive Run-Time Reconfiguration for a Reconfigurable Instruction Set Co-Processor Architecture - A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.09-04-2008
20080313432BLOCK ALLOCATION TIMES IN A COMPUTER SYSTEM - A method and apparatus improves the block allocation time in a parallel computer system. A pre-load controller pre-loads blocks of hardware in a supercomputer cluster in anticipation of demand from a user application. In the preferred embodiments the pre-load controller determines when to pre-load the compute nodes and the block size to allocate the nodes based on pre-set parameters and previous use of the computer system. Further, in preferred embodiments each block of compute nodes in the parallel computer system has a stored hardware status to indicate whether the block is being pre-loaded, or already has been pre-loaded. In preferred embodiments, the hardware status is stored in a database connected to the computer's control system. In other embodiments, the compute nodes are remote computers in a distributed computer system.12-18-2008
20080229076MACROSCALAR PROCESSOR ARCHITECTURE - A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.09-18-2008
20090024837System and Method for Language Specification - Described are a system and method for language specification. The device may include (a) a processor running an operating system and (b) an image capturing device scanning an image. The operating system is configured to display a user interface. The image includes data that is transmitted to the operating system to execute a language installation protocol. The protocol indicates a display language for the user interface.01-22-2009
20110225398ADVANCED PROCESSOR SCHEDULING IN A MULTITHREADED SYSTEM - An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.09-15-2011
20090210664System and Method for Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having four or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receiving an issue group of instructions; (2) scheduling the instructions in program order received; and (3) executing the issue group of instructions in the cascaded delayed execution pipeline unit. The present invention can also be viewed as providing methods for providing a group priority issue schema for a cascaded pipeline. The method includes: (1) receiving an issue group of instructions; (2) scheduling the instructions in the program order received; and (3) executing the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090249034PROCESSOR AND SIGNATURE GENERATION METHOD, AND MULTIPLE SYSTEM AND MULTIPLE EXECUTION VERIFICATION METHOD - A processor performs instruction execution regardless of a program order. An execution unit executes an instruction, and transmits end information of the instruction whose execution has ended. A retire unit receives the end information, rearranges a result of the instruction whose execution has ended in a program order to determine the instruction execution, and transmits completed instruction information which reports that the instruction execution has been determined. A signature generation unit receives the completed instruction information from the retire unit, and generates a signature using the completed instruction information.10-01-2009
20090144526SYSTEM AND METHOD OF ACCESSING A DEVICE - A method of accessing a device is provided. A command is received from an agent, over a network, for executing at least one instruction for accessing the device. Information is sent to the agent, over the network, regarding the execution of the at least one instruction.06-04-2009
20100262808MANAGING INSTRUCTIONS FOR MORE EFFICIENT LOAD/STORE UNIT USAGE - The illustrative embodiments described herein provide a computer-implemented method, apparatus, and a system for managing instructions. A load/store unit receives a first instruction at a port. The load/store unit rejects the first instruction in response to determining that the first instruction has a first reject condition. Then, the instruction sequencing unit activates a first bit in response to the load/store unit rejection the first instruction. The instruction sequencing unit blocks the first instruction from reissue while the first bit is activated. The processor unit determines a class of rejection of the first instruction. The instruction sequencing unit starts a timer. The length of the timer is based on the class of rejection of the first instruction. The instruction sequencing unit resets the first bit in response to the timer expiring. The instruction sequencing unit allows the first instruction to become eligible for reissue in response to resetting the first bit.10-14-2010
20100228955METHOD AND APPARATUS FOR IMPROVED POWER MANAGEMENT OF MICROPROCESSORS BY INSTRUCTION GROUPING - A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decode unit; an execution unit coupled to receive and send signals from and to the instruction scheduling unit; and a state machine located within the execution unit, the method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, subsequent to obtaining the number of instructions per cycle, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a signal from the state machine when the instructions are independent of each other without incurring significant loss of performance until an issue queue in the instruction scheduling unit is filled with instruction data.09-09-2010
20100250901Selecting Fixed-Point Instructions to Issue on Load-Store Unit - Issue logic identifies a simple fixed point instruction, included in a unified payload, which is ready to issue. The simple fixed point instruction is a type of instruction that is executable by both a fixed point execution unit and a load-store execution unit. In turn, the issue logic determines that the unified payload does not include a load-store instruction that is ready to issue. As a result, the issue logic issues the simple fixed point instruction to the load-store execution unit in response to determining that the simple fixed point instruction is ready to issue and determining that the unified payload does not include a load-store instruction that is ready to issue.09-30-2010
20100306505Result path sharing between a plurality of execution units within a processor - A processor 12-02-2010
20090037697SYSTEM AND METHOD OF LOAD-STORE FORWARDING - A system and method for data forwarding from a store instruction to a load instruction during out-of-order execution, when the load instruction address matches against multiple older uncommitted store addresses or if the forwarding fails during the first pass due to any other reason. In a first pass, the youngest store instruction in program order of all store instructions older than a load instruction is found and an indication to the store buffer entry holding information of the youngest store instruction is recorded. In a second pass, the recorded indication is used to index the store buffer and the store bypass data is forwarded to the load instruction. Simultaneously, it is verified if no new store, younger than the previously identified store and older than the load has not been issued due to out-of-order execution.02-05-2009
20090070558MULTIPLEXING PER-PROBEPOINT INSTRUCTION SLOTS FOR OUT-OF-LINE EXECUTION - The present invention provides a probe system and method for multithreaded user-space programs. The system includes an instrumentation module that enables single stepping out of line processing for multithreaded programs, an establish probepoint module that divides up an area of the probed program's memory into a plurality of instruction slots, an ensure slot assigned module that ensures that an instruction slot is assigned to a probepoint, a slot acquisition module that acquires the instruction slot for the probepoint, stealing a slot from another probepoint as needed, and a free slot module that relinquishes the instruction slot owned by the probepoint when the probepoint is being unregistered.03-12-2009
20100312992MULTITHREAD EXECUTION DEVICE AND METHOD FOR EXECUTING MULTIPLE THREADS - A multithread execution device includes: a program memory in which a plurality of programs are stored; an instruction issue unit that issues an instruction retrieved from the program memory; an instruction execution unit that executes the instruction; a target execution speed information memory that stores target execution speed information of the instruction; an execution speed monitor that monitors an execution speed of the instruction; a feedback control unit that commands the instruction issue unit to issue the instruction such that the execution speed of the instruction approximately corresponds to the target execution speed information.12-09-2010
20130138925PROCESSING CORE WITH SPECULATIVE REGISTER PREPROCESSING - A method and circuit arrangement speculatively preprocess data stored in a register file during otherwise unused cycles in an execution unit, e.g., to prenormalize denormal floating point values stored in a floating point register file, to decompress compressed values stored in a register file, to decrypt encrypted values stored in a register file, or to otherwise preprocess data that is stored in an unprocessed form in a register file.05-30-2013
20090024838MECHANISM FOR SUPPRESSING INSTRUCTION REPLAY IN A PROCESSOR - A mechanism for suppressing instruction replay includes a processor having one or more execution units and a scheduler that issue instruction operations for execution by the one or more execution units. The scheduler may also cause instruction operations that are determined to be incorrectly executed to be replayed, or reissued. In addition, a prediction unit within the processor may predict whether a given instruction operation will replay and to provide an indication that the given instruction operation will replay. The processor also includes a decode unit that may decode instructions and in response to detecting the indication, may flag the given instruction operation. The scheduler may further inhibit issue of the flagged instruction operation until a status associated with the flagged instruction is good.01-22-2009
20100100712Multi-Execution Unit Processing Unit with Instruction Blocking Sequencer Logic - A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.04-22-2010
20120246448MEMORY FRAGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A system for executing instructions using a plurality of memory fragments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality memory fragments are coupled to the partitionable engines for providing data storage.09-27-2012
20090327660MEMORY THROUGHPUT INCREASE VIA FINE GRANULARITY OF PRECHARGE MANAGEMENT - Methods and apparatus to improve throughput in memory devices are described. In one embodiment, memory throughput is increased via fine granularity of precharge management. In an embodiment, three separate precharge timings may be used, e.g., optimized per memory bank, per memory bank group, and/or per a memory device. Other embodiments are also disclosed and claimed.12-31-2009
20100064120Replay Reduction for Power Saving - In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledgement indication is asserted that corresponds to an identified replay case of the subset.03-11-2010
20120303937COMPUTER SYSTEM AND CONTROL METHOD THEREOF - A computer system used to execute an application includes a motion sensing unit, a processor and an instruction transfer unit. The motion sensing unit senses a gesture of a human body and generates an input instruction based on the gesture. The processor executes the application (or a game). The instruction transfer unit is connected with the motion sensing unit and the processor and serves as a communication interface between the motion sensing unit and the application. The instruction transfer unit transfers the input instruction to a control command, and the processor controls and executes the application in accordance with the control command.11-29-2012
20110078416APPARATUS AND METHOD FOR CONTROL PROCESSING IN DUAL PATH PROCESSOR - A computer processor comprises a decode unit and a processing channel. The decode unit decodes a stream of instruction packets from a memory, each instruction packet comprising a plurality of instructions. The processing channel comprises a plurality of functional units and operable to perform control processing operations. The decode unit is operable to receive and decode instruction packets of a bit length of 64 bits and to detect if the instruction packet defines three control instructions each having a length of 21 bits. The decode unit detects that the instruction packet comprises the three control instructions. The control instructions are supplied to the processing channel for execution in the order in which they appear in the instruction packet. The detection uses an identification bit in the instruction packet.03-31-2011
20110072243Unified Collector Structure for Multi-Bank Register File - One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.03-24-2011
20110153991DUAL ISSUING OF COMPLEX INSTRUCTION SET INSTRUCTIONS - A system and method for issuing a processor instruction to multiple processing sections arranged in an out-of-order processing pipeline architecture. The multiple processing sections include a first execution unit with a pipeline length and a second execution unit operating upon data produced by the first execution unit. An instruction issue unit accepts a complex instruction that is cracked into respective micro-ops for the first execution unit and the second execution unit. The instruction issue unit issues the first micro-op to the first execution unit to produce intermediate data. The instruction issue unit then delays for a time period corresponding to the processing pipeline length of the first execution unit. After the delay, a second micro-op is issued to the second execution unit.06-23-2011
20120204009MULTI-LEVEL REGISTER FILE SUPPORTING MULTIPLE THREADS - A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.08-09-2012
20080320282Method And Systems For Providing Transaction Support For Executable Program Components - Methods and systems are described for providing transaction support for executable program components. In one embodiment, transaction information is associated with an instruction included in an executable addressable entity included in an executable program component generated from source code written in a programming language, wherein the transaction information is independent of the source code and the programming language. Further, an access to the instruction is detected for executing by a processor. A transaction operation to perform in association with the executing of the instruction is determined based on the transaction information associated with the instruction. The transaction operation is performed in association with the executing of the instruction, wherein the transaction operation is performed by a program component other than the executable program component including the executable addressable entity.12-25-2008
20110055523EARLY BRANCH DETERMINATION - A method and apparatus for branch determination. The method includes a first command issuing within a computer processor, wherein execution of the first command by the computer processor includes evaluating one or more conditions to set one or more flags. The method further includes a second command issuing, subsequent to the first command issuing, within the computer processor, wherein execution of the second command by the computer processor includes causing the computer processor to wait until the one or more flags are set. Subsequent to the first and second commands issuing, the method includes a third command issuing within the computer processor, wherein execution of the third command by the computer processor includes performing a jump operation based on a value of at least one of the one or more flags set by the first command.03-03-2011
20100332804UNIFIED HIGH-FREQUENCY OUT-OF-ORDER PICK QUEUE WITH SUPPORT FOR SPECULATIVE INSTRUCTIONS - Systems and methods for efficient picking of instructions for out-of-order issue and execution in a processor. In one embodiment, a processor comprises a unified pick queue that is dynamically allocated. Each entry is configured to store age and dependency information relative to other decoded instructions. Also, each entry stores a picked field, which when asserted indicates the decoded instruction has already been picked for out-of-order issue and execution. When asserted, a trigger field indicates a result of a corresponding decoded instruction will be available a predetermined number of clock cycles afterward. A younger instruction dependent on a result of an older instruction is ready to be picked before the result of the older instruction is available. In this case, the older instruction has asserted picked and trigger fields.12-30-2010
20110072244Credit-Based Streaming Multiprocessor Warp Scheduling - One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.03-24-2011
20110099354Information processing apparatus and instruction decoder for the information processing apparatus - An information processing apparatus includes an instruction supplying section that supplies a plurality of instructions as a single instruction group, an executing section that repetitively executes a plurality of execution processes corresponding to the plurality of instructions in parallel, an issue timing control section that controls an issue timing of each of the instructions to the executing section so that the plurality of execution processes are executed with a timing delayed in accordance with a predetermined latency, and an operand transforming section that transforms an operand register address of each of the instructions in accordance with a predetermined increment value upon every repetition of execution in the executing section.04-28-2011
20120272045CONTROL METHOD AND SYSTEM OF MULTIPROCESSOR - A control method and a system for dispatching the execution sequence of the processes in a multiprocessors system so as to dispatch an operation sequence for executing different operation programs by a monitoring processor and a plurality of target processors. The monitoring processor obtains operation status of other processors from a buffer; the monitoring processor selects at least one target processor according to the operation status; the monitoring processor assigns the target processor to execute a corresponding slave operation program, and modifies the operation status of the target processors in the buffer module; and the monitoring processor repeats the setting the operation status and assigning other target processors to execute corresponding operation programs, till a master operation program is completed.10-25-2012
20100299504MICROPROCESSOR WITH MICROINSTRUCTION-SPECIFIABLE NON-ARCHITECTURAL CONDITION CODE FLAG REGISTER - A microprocessor includes an architectural register and a non-architectural register, each having a plurality of condition code flags. A first instruction of the microarchitectural instruction set of the microprocessor instructs the microprocessor to update the plurality of condition code flags based on a result of the first instruction. The first instruction includes a field for indicating whether to update the plurality of condition code flags of the architectural or non-architectural register. A second instruction of the microarchitectural instruction set instructs the microprocessor to conditionally perform an operation based on one of the plurality of condition code flags. The second instruction includes a field for indicating whether to use the one of the plurality of condition code flags of the architectural or non-architectural register to determine whether to perform the operation.11-25-2010
20080301411INVERTING DATA ON RESULT BUS TO PREPARE FOR INSTRUCTION IN THE NEXT CYCLE FOR HIGH FREQUENCY EXECUTION UNITS - A method of operating an arithmetic logic unit (ALU) by inverting a result of an operation to be executed during a current cycle in response to control signals from instruction decode logic which indicate that a later operation will require a complement of the result, wherein the result is inverted during the current cycle. The later operation may be a subtraction operation that immediately follows the first operation. The later instruction is decoded prior to the current cycle to control the inversion in the ALU. The ALU includes an adder, a rotator, and a data manipulation unit which invert the result during the current cycle in response to an invert control signal. The second operation subtracts the result during a subsequent cycle in which a carry control signal to the adder is enabled, and the rotator and the data manipulation unit are disabled. The ALU may be used in an execution unit of a microprocessor, such as a fixed-point unit.12-04-2008
20110004743Pipe scheduling for pipelines based on destination register number - A data processing apparatus 01-06-2011
20120331273Reduced Instruction Set - A method of reducing a set of instructions for execution on a processor, the method comprising: extracting information from a first instruction of the set of instructions; identifying unencoded space in one or more further instructions of the set of instructions; replacing the unencoded space of the one or more further instructions with the extracted information of the first instruction so as to form one or more amalgamated instructions; and removing the first instruction from the set of instructions.12-27-2012
20130013897METHOD TO DYNAMICALLY DISTRIBUTE A MULTI-DIMENSIONAL WORK SET ACROSS A MULTI-CORE SYSTEM - A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.01-10-2013
20130024666METHOD OF SCHEDULING A PLURALITY OF INSTRUCTIONS FOR A PROCESSOR - A method of scheduling a plurality of instructions for a processor comprises the steps of: establishing a functional unit resource table comprising a plurality of columns, each of which corresponds to one of a plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a functional unit of the processor; establishing a ping-pong resource table comprising a plurality of columns, each of which corresponds to one of the plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a read port or a write port of a register bank of the processor; and allotting the plurality of instructions to the plurality of operation cycles of the processor and registering the functional units and the ports of the register banks corresponding to the allotted instructions on the functional unit resource table and the ping-pong resource table.01-24-2013
20130145127ZERO VALUE PREFIXES FOR OPERANDS OF DIFFERING BIT-WIDTHS - A data processing system is provided in which destination operands to be stored within architectural registers are constrained to have zero values added as prefixes in order that the architectural register value has a fixed bit width irrespective of the bit width of the destination operand being written thereto. Instead of adding these zero values everywhere in the data path, they are instead represented by zero flags in at least the physical registers utilised for register renaming operations and in the result queue prior to results being written to the architectural register file. This saves circuitry resources and reduces energy consumption.06-06-2013
20100318769USING VECTOR ATOMIC MEMORY OPERATION TO HANDLE DATA OF DIFFERENT LENGTHS - A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.12-16-2010
20130185542EXTERNAL AUXILIARY EXECUTION UNIT INTERFACE TO OFF-CHIP AUXILIARY EXECUTION UNIT - An external Auxiliary Execution Unit (AXU) interface is provided between a processing core disposed in a first programmable chip and an off-chip AXU disposed in a second programmable chip to integrate the AXU with an issue unit, a fixed point execution unit, and optionally other functional units in the processing core. The external AXU interface enables the issue unit to issue instructions to the AXU in much the same manner as the issue unit would be able to issue instructions to an AXU that was disposed on the same chip. By doing so, the AXU on the second programmable chip can be designed, tested and verified independent of the processing core on the first programmable chip, thereby enabling a common processing core, which has been designed, tested, and verified, to be used in connection with multiple different AXU designs.07-18-2013
20130191616INSTRUCTION CONTROL CIRCUIT, PROCESSOR, AND INSTRUCTION CONTROL METHOD - In a vector processing device, a data dependence detecting unit detects a data dependence relation between a preceding instruction and a succeeding instruction which are inputted from an instruction buffer, and an instruction issuance control unit controls issuance of an instruction based on a detection result thereof. When there is a data dependence relation between the preceding instruction and the succeeding instruction, the instruction issuance control unit generates a new instruction equivalent to processing related to a vector register including the data dependence relation with the succeeding instruction in processing executed by the preceding instruction and issues the new instruction between the preceding instruction and the succeeding instruction, and thereby a data hazard can be avoided between the preceding instruction and the succeeding instruction without making a stall occur.07-25-2013

Patent applications in class INSTRUCTION ISSUING

Patent applications in all subclasses INSTRUCTION ISSUING