Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


DYNAMIC INSTRUCTION DEPENDENCY CHECKING, MONITORING OR CONFLICT RESOLUTION

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712219000 Reducing an impact of a stall or pipeline bubble 24
712217000 Scoreboarding, reservation station, or aliasing 21
712218000 Commitment control or register bypass 12
Entries
DocumentTitleDate
20090193233Mechanism for Avoiding Check Stops in Speculative Accesses While Operating in Real Mode - A method and processor for avoiding check stops in speculative accesses. An execution unit, e.g., load/store unit, may be coupled to a queue configured to store instructions. A register, coupled to the execution unit, may be configured to store a value corresponding to an address in physical memory. When the processor is operating in real mode, the execution unit may retrieve the value stored in the register. Upon the execution unit receiving a speculative instruction, e.g., speculative load instruction, from the queue, a determination may be made as to whether the address of the speculative instruction is at or below the retrieved value. If the address of the speculative instruction is at or below this value, then the execution unit may safely speculatively execute this instruction while avoiding a check stop since all the addresses at or below this value are known to exist in physical memory.07-30-2009
20090019264ADAPTIVE EXECUTION CYCLE CONTROL METHOD FOR ENHANCED INSTRUCTION THROUGHPUT - A method, system and processor for increasing the instruction throughput in a processor executing longer latency instructions within the instruction pipeline. Logic associated with specific stages of the execution pipeline, responsible for executing the particular type of instructions, determines when at least a threshold number of the particular-type instructions is scheduled to be executed. The logic then automatically changes an execution cycle frequency of the specific pipeline stages from a first cycle frequency to a second, pre-established higher cycle frequency, which enables more efficient execution and higher execution throughput of the particular-type instructions. The cycle frequency of only the one or more functional stages are switched to the higher cycle frequency independent of the cycle frequency of the other functional stages in the processor pipeline. The logic also automatically switches the execution cycle frequency of the specific pipeline stages back from the second, higher cycle frequency to the first cycle frequency, when the number of scheduled first-type instructions has completed execution.01-15-2009
20120246450REGISTER FILE SEGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.09-27-2012
20120246449METHOD AND APPARATUS FOR EFFICIENT LOOP INSTRUCTION EXECUTION USING BIT VECTOR SCANNING - A method, apparatus and computer program product for performing efficient loop instruction execution using bit vector scanning is presented. A bit vector is scanned, each bit in the bit vector representing at least one of a feature and a conditional status. The presence of a bit of said bit vector set to a first state is detected. The bit is set to a second state. An instruction address for a routine corresponding to said bit set to a first state is looked up using a bit position of said bit that was set to a first state. The routine is executed. The scanning, said detecting, said setting and said using are repeated until there are no remaining bits of said bit vector set to said first state.09-27-2012
20100077183CONDITIONAL DATA-DEPENDENCY RESOLUTION IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating a vector of tracked positions of actual dependencies, where a given tracked position indicates the position of a given actual dependency, and where the given actual dependency occurs when the pair of condition matches one or more criteria. Then, the processor executes the instructions for generating the vector of tracked positions.03-25-2010
20100077182GENERATING STOP INDICATORS BASED ON CONDITIONAL DATA DEPENDENCY IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more stop indicators based on actual dependencies, where a given stop indicator indicates the position of a given actual dependency that can lead to different results when the data elements are processed in parallel than when the data elements are processed sequentially, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more stop indicators.03-25-2010
20130080743ROUND ROBIN PRIORITY SELECTOR - Method and structures for performing round robin priority selection receive an input vector into an input port. The methods and structures group the bits of the input vector into groups of bits and supply the groups of bits to round robin priority selectors. Then, the methods and structures simultaneously identify an individual group priority bit within each group of bits based on the starting bit location, using the round robin priority selectors. The methods and structures also choose, using the group selector, a round robin priority selector based on the starting bit location. The methods and structures then output, from the group selector to a multiplexor, the individual group priority bit of the selected round robin priority selector. Following this the method outputs, from the multiplexor, an output vector having a first value (e.g., 1) only in the individual group priority bit output by the group selector.03-28-2013
20100325395DEPENDENCE PREDICTION IN A MEMORY SYSTEM - Techniques related to dependence prediction for a memory system are generally described. Various implementations may include a predictor storage storing a value corresponding to at least one prediction type associated with at least one load operation, and a state-machine having multiple states. For example, the state-machine may determine whether to execute the load operation based upon a prediction type associated with each of the states and a corresponding precedent to the load operation for the associated prediction type. The state-machine may further determine the prediction type for a subsequent load operation based on a result of the load operation. The states of the state machine may correspond to prediction types, which may be a conservative prediction type, an aggressive prediction type, or one or more N-store prediction types, for example.12-23-2010
20130046961SPECULATIVE MEMORY WRITE IN A PIPELINED PROCESSOR - An apparatus generally having an interface circuit and a processor. The interface circuit may have a queue and a connection to a memory. The processor may have a pipeline. The processor is generally configured to (i) place an address in the queue in response to processing a first instruction in a first stage of the pipeline, (ii) generate a flag by processing a second instruction in a second stage of the pipeline, the second instruction may be processed in the second stage after the first instruction is processed in the first stage, and (iii) generate a signal based on the flag in a third stage of the pipeline. The third stage may be situated in the pipeline after the second stage. The interface circuit is generally configured to cancel the address from the queue without transferring the address to the memory in response to the signal having a disabled value.02-21-2013
20090043994RESOURCE FLOW COMPUTER - A scalable processing system includes a memory device having a plurality of executable program instructions, wherein each of the executable program instructions includes a timetag data field indicative of the nominal sequential order of the associated executable program instructions. The system also includes a plurality of processing elements, which are configured and arranged to receive executable program instructions from the memory device, wherein each of the processing elements executes executable instructions having the highest priority as indicated by the state of the timetag data field.02-12-2009
20090043992Method And System For Data Speculation On Multicore Systems - The method and system for data speculation of multicore systems are disclosed. In one embodiment, a method includes dynamically determining whether a current speculative load instruction and an associated store instruction have same memory addresses in an application thread in compiled code running on a main core using a dynamic helper thread running on a idle core substantially before encountering the current speculative load instruction. The instruction sequence associated with the current speculative load instruction is then edited by the dynamic helper thread based on the outcome of the determination so that the current speculative load instruction becomes a non-speculative load instruction.02-12-2009
20090043991Scheduling Multithreaded Programming Instructions Based on Dependency Graph - A computer implemented method for scheduling multithreaded programming instructions based on the dependency graph wherein the dependency graph organizes the programming instruction logically based on blocks, nodes, and super blocks and wherein the programming instructions could be executed outside of a critical section may be executed outside of the critical section by inserting dependency relationship in the dependency graph.02-12-2009
20120191953Parallel Execution Unit that Extracts Data Parallelism at Runtime - Mechanisms for extracting data dependencies during runtime are provided. With these mechanisms, a portion of code having a loop is executed. A first parallel execution group is generated for the loop, the group comprising a subset of iterations of the loop less than a total number of iterations of the loop. The first parallel execution group is executed by executing each iteration in parallel. Store data for iterations are stored in corresponding store caches of the processor, Dependency checking logic of the processor determines, for each iteration, whether the iteration has a data dependence. Only the store data for stores where there was no data dependence determined are committed to memory.07-26-2012
20120191952PROCESSOR IMPLEMENTING SCALAR CODE OPTIMIZATION - Methods and apparatuses are provided for increased efficiency and enhanced power saving in a processor via scalar code optimization. The method comprises determining that an instruction comprises a scalar instruction and then processing the instruction using only a lower portion of an XMM register. The apparatus comprises an operational unit capable of determining whether an instruction comprises a scalar instruction and execution units responsive that determining for processing the scalar instruction using only a lower portion of an XMM register of the processor. By not processing the upper portion of the XMM register efficiency is increased and power saving is enhanced.07-26-2012
20090031115METHOD FOR HIGH INTEGRITY AND HIGH AVAILABILITY COMPUTER PROCESSING - A method of providing high integrity checking for an N-lane computer processing module (Module), N being an integer greater than equal to two. The method comprises the steps of: detecting, by a data Output Management unit (OM), when any of the N processing lanes sends different output data; configuring each Hosted Application as either normal or high integrity; for the Hosted Applications configured as high integrity, running an identical version of the software source code targeted for similar or dissimilar microprocessors on all N processing lanes, and activating a Time Management Unit, Critical Regions Management Unit, data Input Management Unit and data Output Management Unit for each of the N processing lanes; and for the Hosted Applications configured as normal integrity, running a copy of the software on one of the N processing lanes, and not activating the Time Management Unit, Critical Regions Management Unit, Input Management Unit and Output Management Unit for the one activated processing lane while that Hosted Application is running.01-29-2009
20130067201MULTIPROCESSOR SYSTEM, EXECUTION CONTROL METHOD AND EXECUTION CONTROL PROGRAM - The multiprocessor system includes one or a plurality of main processors and a plurality of sub-processors, and an execution control circuit which conducts execution control of each the sub-processors, wherein the execution control circuit includes an execution control processor for execution control processing of each the sub-processors, a control bus output unit for activation of a command to each the sub-processors, a status bus input unit for status notification from each the sub-processors, a determination circuit which determines whether or not the status notification has one-to-one dependency with a processing command to be issued next on an operation sequence and is to be processed at a high speed, a status accelerator which issues a corresponding processing activation command when the status notification is to be processed at a high speed, and a status FIFO control unit which processes the status notification by using the execution control processor.03-14-2013
20090006817Mechanisms for Placing a Processor into a Gradual Slow Mode of Operation - Mechanisms for placing a processor into a gradual slow down mode of operation are provided. The gradual slow down mode of operation comprises a plurality of stages of slow down operation of an issue unit in a processor in which the issuance of instructions is slowed in accordance with a staging scheme. The gradual slow down of the processor allows the processor to break out of livelock conditions. Moreover, since the slow down is gradual, the processor may flexibly avoid various degrees of livelock conditions. The mechanisms of the illustrative embodiments impact the overall processor performance based on the severity of the livelock condition by taking a small performance impact on less severe livelock conditions and only increasing the processor performance impact when the livelock condition is more severe.01-01-2009
20130166886SYSTEMS, APPARATUSES, AND METHODS FOR A HARDWARE AND SOFTWARE SYSTEM TO AUTOMATICALLY DECOMPOSE A PROGRAM TO MULTIPLE PARALLEL THREADS - Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. For example, a method according to one embodiment comprises: analyzing a single-threaded region of executing program code, the analysis including identifying dependencies within the single-threaded region; determining portions of the single-threaded region of executing program code which may be executed in parallel based on the analysis; assigning the portions to two or more parallel execution tracks; and executing the portions in parallel across the assigned execution tracks.06-27-2013
20080294877NUMERICAL CONTROLLER HAVING FUNCTION OF RESUMING LOOK-AHEAD OF BLOCK - A numerical controller which performs look-ahead control by suspending analysis of a read block of a machining program and resuming the analysis of the read block at a suspended stage when execution of a block immediately preceding the read block is completed. The numerical controller successively reads and analyzes blocks of the machining program in advance, stores the analyzed blocks into a buffer, and then executes the stored blocks, and comprises means for determining whether or not a read block contains a look-ahead stop code to suspend analysis of the block, means for suspending the analysis of the block when the look-ahead stop code is determined, means for determining whether or not execution of a block immediately preceding the suspended block is completed, and resuming means for resuming the analysis of the suspended block when the execution of the block immediately preceding the suspended block is completed.11-27-2008
20110035573OUT-OF-ORDER X86 MICROPROCESSOR WITH FAST SHIFT-BY-ZERO HANDLING - An out-of-order execution microprocessor includes a register alias table configured to generate a first indicator that indicates whether an instruction is dependent upon a condition code result of a shift instruction. The microprocessor also includes a first execution unit configured to execute the shift instruction and to generate a second indicator that indicates whether a shift amount of the shift instruction is zero. The microprocessor also includes a second execution unit configured to receive the first and second indicators and to generate a replay signal to cause the instruction to be replayed if the first indicator indicates the instruction is dependent upon the condition code result of the shift instruction and a second indicator indicates the shift amount of the shift instruction is zero.02-10-2011
20100274995PROCESSOR AND METHOD OF CONTROLLING INSTRUCTION ISSUE IN PROCESSOR - One exemplary embodiment includes a processor including a plurality of execution units and an instruction unit. The instruction unit discriminates whether an instruction is a target instruction for which determination about availability of parallel issue based on dependency among instructions is to be made with respect to each instruction contained in an instruction stream. When a first instruction contained in the instruction stream is the target instruction, the instruction unit adjusts the number of instructions to be issued in parallel to the plurality of execution units based on a detection result of dependency among the first instruction and at least one subsequent instruction. Further, when the first instruction is not the target instruction, the instruction unit issues a group of a predetermined fixed number of instructions including the first instruction in parallel to the plurality of execution units unconditionally regardless of a detection result of dependency among the instruction group.10-28-2010
20110167244EARLY INSTRUCTION TEXT BASED OPERAND STORE COMPARE REJECT AVOIDANCE - A method and system for early instruction text based operand store compare avoidance in a processor are provided. The system includes a processor pipeline for processing instruction text in an instruction stream, where the instruction text includes operand address information. The system also includes delay logic to monitor the instruction stream. The delay logic performs a method that includes detecting a load instruction following a store instruction in the instruction stream, comparing the operand address information of the store instruction with the load instruction. The method also includes delaying the load instruction in the processor pipeline in response to detecting a common field value between the operand address information of the store instruction and the load instruction.07-07-2011
20110296144REDUCING DATA HAZARDS IN PIPELINED PROCESSORS TO PROVIDE HIGH PROCESSOR UTILIZATION - A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.12-01-2011
20080270762Method, System and Computer Program Product for Register Management in a Simulation Enviroment - A method for register management in a simulation environment including receiving an instruction from an instruction unit decode pipeline. An address generation interlock (AGI) function is executed in the simulation environment if the instruction is an AGI instruction. The executing an AGI function is responsive to a pool of registers controlled by a register manager and to the instruction. An early AGI function is executed in the simulation environment if the instruction is an early AGI instruction. The executing an early AGI function is responsive to the pool of registers and to the instruction.10-30-2008
20110219215ATOMICITY: A MULTI-PRONGED APPROACH - In a multiprocessor system with speculative execution, atomicity can be approached in several fashions. One approach is to have atomic instructions that achieve multiple functions and are guaranteed to complete. Another approach is to have blocks of code that are grouped to succeed or fail together. A system can incorporate more than one such approach. In implementing more than one approach, the system may prioritize one over another. When conflict detection is done through a directory lookup in cache memory, atomic instructions and atomicity related operations may be implemented in a cache data array access pipeline in that cache memory. This implementation may include feedback to the pipeline for implementing multiple functions within an atomic instruction and also for cascading atomic instructions.09-08-2011
20100169617POWER EFFICIENT SYSTEM FOR RECOVERING AN ARCHITECTURE REGISTER MAPPING TABLE - A system for recovering an architecture register mapping table (ARMT). The system includes a first number of collection circuits and decode circuits, a second number of selection circuits, and an enable circuit. Information related to the mapping between each physical register and an appropriate architecture register is obtained from a physical register mapping table (PRMT) by one and only one collection circuit during only one of a fourth number of instruction cycles. Each decode circuit has its input coupled to the output of one different collection circuit and is capable of converting its input into a third number bit wide binary string selection code at its output. Each selection circuit is configured to receive from each selection code a bit from a bit position associated with that selection circuit. The enable circuit is configured to appropriately enable mapping of information from the PRMT to the ARMT.07-01-2010
20100169616REDUCING INSTRUCTION COLLISIONS IN A PROCESSOR - An embodiment of a technique for selecting instructions for execution from an issue queue at multiple function units while reducing the chances of instruction collisions. Each function unit in a processor may include a selection logic circuit that selects a specific instruction from the issue queue for execution. In order to avoid instruction collision, a function unit may have a selection logic circuit that may select two instructions from an instruction queue: one according to a first selection technique and one according to a second selection technique. Then, by comparing the instruction selected by the first selection technique to the instruction selected by the selection logic circuit of another function unit, the instruction selected by the second technique may be used instead if there will be an instruction collision because the instruction selected by the first selection technique is the same as the instruction selected at a different function unit.07-01-2010
20090210674System and Method for Prioritizing Branch Instructions - The present invention provides a system and method for prioritizing branch instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one branch instruction is in the issue group, if so scheduling the least one branch instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one branch instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20100122069Macroscalar Processor Architecture - A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.05-13-2010
20090210671System and Method for Prioritizing Store Instructions - The present invention provides a system and method for prioritizing store instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one store instruction is in the issue group, if so scheduling the least one store instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one store instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090276608MICRO PROCESSOR, METHOD FOR ENCODING BIT VECTOR, AND METHOD FOR GENERATING BIT VECTOR - In a microprocessor for pipeline processing instruction execution, dependency relationship information representing a dependency relationship of each of a plurality of instructions with all the preceding instructions is stored, and whether or not the instructions in stages after instruction issue depend on the instruction of a miss speculation is judged based on the dependency relationship information if the miss speculation occurs during the execution of the plurality of instructions in accordance with a set schedule. Thus, this microprocessor can perform a recovery processing for invalidating only the instructions in a dependency relationship at once in the case of a miss speculation in speculative scheduling.11-05-2009
20080276075SIMPLE LOAD AND STORE DISAMBIGUATION AND SCHEDULING AT PREDECODE - Embodiments of the invention provide a method and processor for executing instructions. In one embodiment, the method includes receiving a load instruction and a store instruction to be executed in the processor and detecting a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The method further includes scheduling execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict.11-06-2008
20080276074SIMPLE LOAD AND STORE DISAMBIGUATION AND SCHEDULING AT PREDECODE - Embodiments of the invention provide a processor for executing instructions. In one embodiment, the processor includes circuitry to receive a load instruction and a store instruction to be executed in the processor and detect a conflict between the load instruction and the store instruction. Detecting the conflict includes determining if load-store conflict information indicates that the load instruction previously conflicted with the store instruction. The load-store conflict information is stored for both the load instruction and the store instruction. The processor further includes circuitry to schedule execution of the load instruction and the store instruction so that execution of the load instruction and the store instruction do not result in a conflict.11-06-2008
20090055631Method And Apparatus For Register Renaming Using Multiple Physical Register Files And Avoiding Associative Search - A method for implementing a register renaming scheme for a digital data processor using a plurality of physical register files for supporting out-of-order execution of a plurality of instructions from one or more threads, the method comprising: using a DEF table to store the instruction dependencies between the plurality of instructions using the instruction tags, the DEF table being indexed by a logical register name and including one entry per logical register; using a rename USE table indexed by the instruction tags to store logical-to-physical register mapping information shared by multiple sets of different types of non-architected copies of logical registers used by multiple threads; using a last USE table to transfer data of the multiple sets of different types of non-architected copies of logical registers into the first set of architected registered files, the last USE table being indexed by a physical register name in the second set of rename registered files; and performing the register renaming scheme at the instruction dispatch or wake-up/issue time.02-26-2009
20090265527Multiport Execution Target Delay Queue Fifo Array - One embodiment provides a method of forwarding data in a processor. The method generally includes providing at least one cascaded delayed execution pipeline unit having at least a first pipeline and a second pipeline for executing first and second instructions in a common issue group, wherein the second pipeline executes the second instruction in a delayed manner relative to the execution of the first instruction in the first pipeline, storing results generated by an execution unit of the first pipeline in a first-in first-out (FIFO) storage target delay queue, determining if the target delay queue contains source data for executing the second instruction, and if the target delay queue contains source data for the second instruction, forwarding the source data for the second instruction from the target delay queue to an execution unit of the second pipeline.10-22-2009
20080313434Rendering Processing Apparatus, Parallel Processing Apparatus, and Exclusive Control Method12-18-2008
20090043993Monitoring Values of Signals within an Integrated Circuit - An integrated circuit, and method of reviewing values of one or more signals occurring within that integrated circuit, are provided. The integrated circuit comprises processing logic for executing a program, and monitoring logic for reviewing values of one or more signals occurring within the integrated circuit as a result of execution of the program. The monitoring logic stores configuration data, which can be software programmed in relation to the signals to be monitored. Further, the monitoring logic makes use of a Bloom filter which, for a value to be reviewed, performs a hash operation on that value in order to reference the configuration data to determine whether that value is either definitely not a value within the range or is potentially a value within the range of values. If the value is determined to be within the set of values, then a trigger signal is generated which can be used to trigger a further monitoring process.02-12-2009
20090089552Dependency Graph Parameter Scoping - A number of tasks are defined according to a dependency graph. Multiple parameter contexts are maintained, each associated with a different scope of the tasks. A parameter used in a first of the tasks is bound to a value. This binding includes identifying a first of the contexts according to the dependency graph and retrieving the value for the parameter from the identified context.04-02-2009
20090182988Compare Relative Long Facility and Instructions Therefore - A method, system and program product for comparing two operands wherein one operand is obtained from memory wherein the address of the memory operand is based an offset of the program counter rather than an explicitly defined address location. The offset is defined by an immediate field of the instruction which is sign extended and is aligned as a halfword address when added to the value of the program counter.07-16-2009
20090138681SYNCHRONIZATION OF PARALLEL PROCESSES - A speculative execution capability of a processor is exposed to program control through at least one machine instruction. The at least one machine instruction may be two instructions designed to facilitate synchronization between parallel processes. According to an aspect, an instruction set architecture includes circuitry that handles a speculative execution instruction and a speculation termination instruction. The speculative execution instruction may be an instruction that takes first and second operands, causes the processor to speculatively execute additional instructions if a memory location contains a value, and causes the processor to start executing instructions from an address indicated by the second operand if a mis-speculation occurs, and the speculation termination instruction may be an instruction that causes the processor to begin retiring the additional instructions.05-28-2009
20090198969Microprocessor systems - A microprocessor pipeline arrangement 08-06-2009
20110145550NON-QUIESCING KEY SETTING FACILITY - A non-quiescing key setting facility is provided that enables manipulation of storage keys to be performed without quiescing operations of other processors of a multiprocessor system. With this facility, a storage key, which is accessible by a plurality of processors of the multiprocessor system, is updated absent a quiesce of operations of the plurality of processors. Since the storage key is updated absent quiescing of other operations, the storage key may be observed by a processor as having one value at the start of an operation performed by the processor and a second value at the end of the operation. A mechanism is provided to enable the operation to continue, avoiding a fatal exception.06-16-2011
20120144167APPARATUS FOR EXECUTING PROGRAMS FOR A FIRST COMPUTER ARCHITECTURE ON A COMPUTER OF A SECOND ARCHITECTURE - A multi-instruction set architecture (ISA) computer system includes a computer program, a first processor, a second processor, a profiler, and a translator. The computer program includes instructions of a first ISA, the first ISA having a first complexity. The first processor is configured to execute instructions of the first ISA. The second processor is configured to execute instructions of a second ISA, the second ISA being different than the first ISA and having a second complexity, wherein the second complexity is less than the first complexity. The profiler is configured to select a block of the computer program for translation to instructions of the second ISA, wherein the block includes one or more instructions of the first ISA. The translator is configured to translate the block of the first ISA into instructions of the second ISA for execution by the second processor.06-07-2012
20090063823Method and System for Tracking Instruction Dependency in an Out-of-Order Processor - A method of tracking instruction dependency in a processor issuing instructions speculatively includes recording in an instruction dependency array (IDA) an entry for each instruction that indicates data dependencies, if any, upon other active instructions. An output vector read out from the IDA indicates data readiness based upon which instructions have previously been selected for issue. The output vector is used to select and read out issue-ready instructions from an instruction buffer.03-05-2009
20090198968Method, Apparatus and Software for Processing Software for Use in a Multithreaded Processing Environment - A method, apparatus and software for processing software code for use in a multithreaded processing environment in which lock verification mechanisms are automatically inserted in the software code and arranged to determine whether a respective shared storage element is locked prior to the use of the respective shared storage element by a given processing thread in a multithreaded processing environment.08-06-2009
20090198967METHOD AND STRUCTURE FOR LOW LATENCY LOAD-TAGGED POINTER INSTRUCTION FOR COMPUTER MICROARCHITECHTURE - A methodology and implementation of a load-tagged pointer instruction for RISC based microarchitecture is presented. A first lower latency, speculative implementation reduces overall throughput latency for a microprocessor system by estimating the results of a particular instruction and confirming the integrity of the estimate a little slower than the normal instruction execution latency. A second higher latency, non-speculative implementation that always produces correct results is invoked by the first when the first guesses incorrectly. The methodologies and structures disclosed herein are intended to be combined with predictive techniques for instruction processing to ultimately improve processing throughput.08-06-2009
20090210668System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if a plurality of load instructions are in the issue group, if so, schedule the plurality of load instructions in descending order of longest dependency chain depth to shortest dependency chain depth in a shortest to longest available execution pipelines; and (3) execute the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090210673System and Method for Prioritizing Compare Instructions - The present invention provides a system and method for prioritizing compare instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one compare instruction is in the issue group, if so scheduling the least one compare instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one compare instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090210670System and Method for Prioritizing Arithmetic Instructions - The present invention provides a system and method for prioritizing arithmetic instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one arithmetic instruction is in the issue group, if so scheduling the least one arithmetic instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; (3) determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by scheduling the at least one arithmetic instruction in a different execution pipeline; (4) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090210669System and Method for Prioritizing Floating-Point Instructions - The present invention provides a system and method for prioritizing floating-point instructions in a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (08-20-2009
20090210672System and Method for Resolving Issue Conflicts of Load Instructions - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: receive an issue group of instructions; determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a one of the plurality of execution pipelines based upon a first prioritization scheme; determine if there is an issue conflict for one of the plurality of execution pipelines and resolving the issue conflict by determining which load involved in the issue conflict has the closest dependency, and schedule any load involved in the issue conflict not having the closest dependency in a delayed execution pipeline.08-20-2009
20090259827SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR CREATING DEPENDENCIES AMONGST INSTRUCTIONS USING TAGS - A system, method, and computer program product are provided for creating dependencies amongst instructions using tags. In operation, tags are associated with a first instruction and a second instruction. Additionally, a dependency is created between the first instruction and the second instruction, utilizing the tags. Furthermore, the first instruction and the second instruction are executed in accordance with the dependency.10-15-2009
20100217959INFORMATION PROCESSING APPARATUS, VIRTUAL STORAGE MANAGEMENT METHOD, AND STORAGE MEDIUM - A virtual storage management method that can increase the overall processing speed while preventing a processor from being overloaded. A request for acquisition of a memory area in a primary storage device is received from a process executed by a processor. It is determined whether or not the process that has made the acquisition request is a utility process executable in cooperation with another process. Control is provided so as to restrict swap-out of the utility process when it is determined that the process that has made the received acquisition request is a utility process executable in cooperation with another process, and a process cooperating with the utility process executed by the processor is a preferred process of which swap-out is restricted, and a processor utilization of the preferred process is greater than a predetermined value.08-26-2010
20100250902Tracking Deallocated Load Instructions Using a Dependence Matrix - A mechanism is provided for tracking deallocated load instructions. A processor detects whether a load instruction in a set of instructions in an issue queue has missed. Responsive to a miss of the load instruction, an instruction scheduler allocates the load instruction to a load miss queue and deallocates the load instruction from the issue queue. The instruction scheduler determines whether there is a dependence entry for the load instruction in an issue queue portion of a dependence matrix. Responsive to the existence of the dependence entry for the load instruction in the issue queue portion of the dependence matrix, the instruction scheduler reads data from the dependence entry of the issue queue portion of the dependence matrix that specifies a set of dependent instructions that are dependent on the load instruction and writes the data into a new entry in a load miss queue portion of the dependence matrix.09-30-2010
20090113182System and Method for Issuing Load-Dependent Instructions from an Issue Queue in a Processing Unit - A system and method for issuing load-dependent instructions from an issue queue in a processing unit in a data processing system. In response to a LSU determining that a load request from a load instruction missed a first level in a memory hierarchy, a LMQ allocates a load-miss queue entry corresponding to the load instruction. The LMQ associates at least one instruction dependent on the load request with the load-miss queue entry. Once data associated with the load request is retrieved, the LMQ selects at least one instruction dependent on the load request for execution on the next cycle. At least one instruction dependent on the load request is executed and a result is outputted.04-30-2009
20110119468MECHANISM OF SUPPORTING SUB-COMMUNICATOR COLLECTIVES WITH O(64) COUNTERS AS OPPOSED TO ONE COUNTER FOR EACH SUB-COMMUNICATOR - A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.05-19-2011
20100306508OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing load instruction replay likelihood due to store collisions. A register alias table (RAT) is coupled to first and second queues of entries and generates dependencies used to determine when instructions may execute out of order. The RAT allocates an entry of the first queue and populates the allocated entry with an instruction pointer of a load instruction, when it determines that the load instruction must be replayed. The RAT allocates an entry of the second queue when it encounters a store instruction and populates the allocated entry with a dependency that identifies an instruction upon which the store instruction depends for its data. The RAT causes a subsequent instance of the load instruction to share the dependency when it encounters the subsequent instance of the load instruction and determines that its instruction pointer matches the instruction pointer of an entry of the first queue.12-02-2010
20100306507OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing the likelihood of having to replay a load instruction due to a store collision. The microprocessor includes a queue of entries, each entry configured to hold information that identifies sources of a store instruction used to compute its store address and to hold a dependency that identifies an instruction upon which the store instruction depends for its data. A register alias table (RAT), coupled to the queue of entries, is configured to encounter instructions in program order and to generate dependencies used to determine when the instructions may execute out of program order. In response to encountering a load instruction the RAT determines whether sources of the load instruction used to compute its load address match the sources of the store instruction in an entry of the queue, and if so, causes the load instruction to share the dependency of the matching store instruction.12-02-2010
20090070560Method and Apparatus for Accelerating the Access of a Multi-Core System to Critical Resources - A method accelerates access of a multi-core system to its critical resources, which includes preparing to delete a critical node in a critical resource, separating the critical node from the critical resource, and deleting the critical node if the conditions for deleting the critical node are satisfied. An apparatus includes a confirmation module for the node to be deleted and a deletion module to accelerate access of a multi-core system to its critical resources.03-12-2009
20100332806DEPENDENCY MATRIX FOR THE DETERMINATION OF LOAD DEPENDENCIES - Systems and methods for identification of dependent instructions on speculative load operations in a processor. A processor allocates entries of a unified pick queue for decoded and renamed instructions. Each entry of a corresponding dependency matrix is configured to store a dependency bit for each other instruction in the pick queue. The processor speculates that loads will hit in the data cache, hit in the TLB and not have a read after write (RAW) hazard. For each unresolved load, the pick queue tracks dependent instructions via dependency vectors based upon the dependency matrix. If a load speculation is found to be incorrect, dependent instructions in the pick queue are reset to allow for subsequent picking, and dependent instructions in flight are canceled. On completion of a load miss, dependent operations are re-issued. On resolution of a TLB miss or RAW hazard, the original load is replayed and dependent operations are issued again from the pick queue.12-30-2010
20100332805Remapping source Registers to aid instruction scheduling within a processor - An out-of-order renaming processor is provided with a register file within which aliasing between registers of different sizes may occur. In this way a program instruction having a source register of a double precision size may alias with two single precision registers being used as destinations of one or more preceding program instructions. In order to track this data dependency the double precision register may be remapped into a micro-operation specifying two single precision registers as its source register. In this way, scheduling circuitry may use its existing hazard detection and management mechanisms to handle potential data hazards and dependencies. Not all program instructions having such data hazards between registers of different sizes are handled by this source register remapping. For these other program instructions a slower mechanism for dealing with the data dependency hazard is provided. This slower mechanism may, for example, be to drain all the preceding micro-operations from the execution pipelines before issuing the micro-operation having the data hazard.12-30-2010
20090210675METHOD AND SYSTEM FOR EARLY INSTRUCTION TEXT BASED OPERAND STORE COMPARE REJECT AVOIDANCE - A method and system for early instruction text based operand store compare avoidance in a processor are provided. The system includes a processor pipeline for processing instruction text in an instruction stream, where the instruction text includes operand address information. The system also includes delay logic to monitor the instruction stream. The delay logic performs a method that includes detecting a load instruction following a store instruction in the instruction stream, comparing the operand address information of the store instruction with the load instruction. The method also includes delaying the load instruction in the processor pipeline in response to detecting a common field value between the operand address information of the store instruction and the load instruction.08-20-2009
20100058033System and Method for Double-Issue Instructions Using a Dependency Matrix and a Side Issue Queue - A method receives a complex instruction comprising a first portion and a second portion. The method sets a single issue queue slot and allocates an execution unit for the complex instruction, and identifies dependencies in the first and second portions. The method sets a dependency matrix slot and a consumers table slot for the first and section portion. In the event the first portion dependencies have been satisfied, the method issues the first portion and then issues the second portion from the single issue queue slot. In the event the second portion dependencies have not been satisfied, the method places the second portion into a side issue queue. The method issues the second portion when the side issue queue indicates that the second portion is eligible for issue.03-04-2010
20100058034CREATING REGISTER DEPENDENCIES TO MODEL HAZARDOUS MEMORY DEPENDENCIES - A method of transforming low-level programming language code written for execution by a target processor includes receiving data comprising a plurality of low-level programming language instructions ordered for sequential execution by the target processor; detecting a pair of instructions in the plurality of low-level programming language instructions having a memory dependency therebetween; and inserting one or more instructions between the detected pair of instructions in the plurality of low-level programming language instructions having a memory dependency therebetween. The one or more instructions inserted between the detected pair of instructions create a true data dependency on a value stored in an architectural register of the target processor between the detected pair of instructions.03-04-2010
20100262809System and Method for Conflict Resolution During the Consolidation of Information Relating to a Data Service - A context aware mechanism including a system configured to receive presence data relating to at least one of a presentity and a watcher, and to resolve a conflict that arises during processing of the presence data according to a criteria. A method is also provided.10-14-2010
20080229077COMPUTER PROCESSING SYSTEM EMPLOYING AN INSTRUCTION REORDER BUFFER - A method and a system for operating a plurality of processors that each includes an execution pipeline for processing dependence chains, the method comprising: configuring the plurality of processors to execute the dependence chains on execution pipelines; implementing a Super Re-Order Buffer (SuperROB) in which received instructions are re-ordered after out-of-order execution when at least one of the plurality of processors is in an Instruction Level Parallelism (ILP) mode and at least one of the plurality of processors has a Thread Level Parallelism (TLP) core; detecting an imbalance in a dispatch of instructions of a first dependence chain compared to a dispatch of instructions of a second dependence chain with respect to dependence chain priority; determining a source of the imbalance; and activating the ILP mode when the source of the imbalance has been determined.09-18-2008
20110258415APPARATUS AND METHOD FOR HANDLING DEPENDENCY CONDITIONS - Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency. Embodiments including a compiler that inserts instructions in a generated instruction stream to eliminate dependency conditions are also contemplated.10-20-2011
20120204010NON-QUIESCING KEY SETTING FACILITY - A non-quiescing key setting facility is provided that enables manipulation of storage keys to be performed without quiescing operations of other processors of a multiprocessor system. With this facility, a storage key, which is accessible by a plurality of processors of the multiprocessor system, is updated absent a quiesce of operations of the plurality of processors. Since the storage key is updated absent quiescing of other operations, the storage key may be observed by a processor as having one value at the start of an operation performed by the processor and a second value at the end of the operation. A mechanism is provided to enable the operation to continue, avoiding a fatal exception.08-09-2012
20080201556PROGRAM INSTRUCTION REARRANGEMENT METHODS IN COMPUTER - A program instruction rearrangement method calculates the dependency depth of each instruction of a program based on dependency between instructions, based on register access order, and rearranging instructions based on the dependency depth. Additionally, the dependency between instructions can be utilized to locate and remove redundant instructions.08-21-2008
20110119469BALANCING WORKLOAD IN A MULTIPROCESSOR SYSTEM RESPONSIVE TO PROGRAMMABLE ADJUSTMENTS IN A SYNCRONIZATION INSTRUCTION - In a multiprocessor system with threads running in parallel, workload balancing is facilitated by recognizing a plurality of levels of sub-tasks of a memory synchronization instruction and selectively choosing for at least one thread to do less than all of levels of these sub-tasks in response to the memory synchronization instruction. Which thread waits to synchronize can be impacted by this choice. The programmer can cause a thread expected to be a bottleneck to wait less than other threads. Where one thread is a producer and another thread is a consumer, types of memory synchronization can be adapted to these roles.05-19-2011
20110078417COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS - One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.03-31-2011
20110093684TRANSPARENT CONCURRENT ATOMIC EXECUTION - Executing a block of code is disclosed. Executing includes receiving an indication that the block of code is to be executed using a synchronization mechanism and speculatively executing the block of code on a virtual machine. The block of code may include application code. The block of code does not necessarily indicate that the block of code should be speculatively executed.04-21-2011
20100023731Generation of parallelized program based on program dependence graph - A method of generating a parallelized program includes calculating an execution order of vertices of a degenerate program dependence graph, generating basic blocks by consolidating vertices including neither branching nor merging, generating procedures each corresponding to a respective one of the vertices, and generating a procedure control program by arranging an instruction to execute a first procedure after an instruction to wait for output data transfer from a second procedure for a dependence relation crossing a border between the basic blocks, generating an instruction to register a dependence relation that a third procedure has on output data transfer from a fourth procedure for a dependence relation within one of the basic blocks, and generating an instruction to perform a given data transfer directly from procedure to procedure for each of a data transfer within one of the basic blocks and a data transfer crossing a border between the basic blocks.01-28-2010
20100017581LOW OVERHEAD ATOMIC MEMORY OPERATIONS - Embodiments that provide low-overhead restricted memory transactions are disclosed. In accordance with one embodiment, the method includes providing one or more references to processor-specific data that corresponds to a first processor. The method further includes detecting an interrupt to the first processor when the interrupt indicates modification of the one or more references to the processor-specific data during the execution of one or more instructions. The method also includes taking remedial action on the one or more instructions when the interrupt is detected.01-21-2010
20090172360INFORMATION PROCESSING APPARATUS EQUIPPED WITH BRANCH PREDICTION MISS RECOVERY MECHANISM - The information processing apparatus comprises a cache miss detection unit detects a cache miss of a load instruction; an instruction issuance stop unit stops the issuance of an instruction subsequent to a conditional branch instruction if the branch direction of a conditional branch instruction subsequent to the load instruction for which a cache miss has been detected by the cache miss detection unit is not established at the timing of issuance, wherein a period of time cancels an issued instruction, the cancelation having been caused by a branch prediction miss, is deleted and thereby a penalty for the branch prediction miss is concealed under a wait time due to a cache miss.07-02-2009
20090063824Compound instructions in a multi-threaded processor - A multi-threaded processor determines which threads to execute, switches between execution of threads in dependence on the determination, each thread being coupled to a respective register for storing the state of the thread and used in executing instructions on the thread and includes a further register shared by all the threads. The executing threads use the further register to improve execution performance and prevents the switching of execution to another thread while the internal register is in use.03-05-2009
20120042152ELIMINATION OF READ-AFTER-WRITE RESOURCE CONFLICTS IN A PIPELINE OF A PROCESSOR - An apparatus having a processor and a circuit is disclosed. The processor generally has a pipeline. The circuit may be configured to (i) detect a first write instruction in the pipeline that writes to a resource, (ii) stall a read instruction in the pipeline where (a) a first read-after-write conflict exists between the first write instruction and the read instruction and (b) no other write instruction to the resource is scheduled between the first write instruction and the read instruction and (iii) not stall the read instruction due to the first read-after-write conflict where a second write instruction to the resource is scheduled between the first write instruction and the read instruction.02-16-2012
20120005459PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA MOVE ELIMINATION - Methods and apparatuses are provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The apparatus comprises a first plurality of available physical registers mapped to a second plurality of logical registers, including a source logical register and a destination logical register. A renaming unit remaps the destination logical register to the same physical register mapping as the source logical register in response to a move instruction. In this way, the move instruction is effectively executed without moving data between physical registers. A method is provided for increasing processor performance and energy saving via eliminating physical data movement to accomplish a move instruction. The method comprises determining a mapping of a logical source register and a logical destination register to physical registers of a processor and then remapping the logical destination register to the same physical register mapping as the logical source register to affect an equivalent of the move instruction with actual data movement between physical registers.01-05-2012
20120017069OUT-OF-ORDER COMMAND EXECUTION - Techniques are described for reordering commands to improve the speed at which at least one command stream may execute. Prior to distributing commands in the at least one command stream to multiple pipelines, a multimedia processor analyzes any inter-pipeline dependencies and determines the current execution state of the pipelines. The processor may, based on this information, reorder the at least one command stream by prioritizing commands that lack any current dependencies and therefore may be executed immediately by the appropriate pipeline. Such out of order execution of commands in the at least one command stream may increase the throughput of the multimedia processor by increasing the rate at which the command stream is executed.01-19-2012
20090164760METHOD AND SYSTEM FOR MODULE INITIALIZATION WITH AN ARBITRARY NUMBER OF PHASES - A method for initializing a module that includes identifying a first module for initialization, and performing a plurality of processing phases on the first module and all modules in a dependency graph of the first module. Performing the plurality of processing phases includes, for each module, executing a processing phase of the plurality of processing phases on the module, determining whether the processing phase has been executed on all modules in a dependency graph of the module, and when the processing phase has been executed for all modules in the dependency graph of the module, executing a subsequent processing phase of the plurality of processing phases on the module.06-25-2009
20090164759Execution of Single-Threaded Programs on a Multiprocessor Managed by an Operating System - A method and apparatus for speculatively executing a single threaded program within a multi-core processor which includes identifying an idle core within the multi-core processor, performing a look ahead operation on the single thread instructions to identify speculative instructions within the single thread instructions, and allocating the idle core to execute the speculative instructions.06-25-2009
20120159130MECHANISM FOR CONFLICT DETECTION USING SIMD - A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel.06-21-2012
20120124339PROCESSOR CORE SELECTION BASED AT LEAST IN PART UPON AT LEAST ONE INTER-DEPENDENCY - An embodiment may include at least one first process to be executed, at least in part, by circuitry. The at least one first process may select, at least in part, from a plurality of processor cores, one or more processor cores to execute, at least in part, at least one second process. The at least one first process may select, at least in part, the one or more processor cores based at least in part upon whether at least one inter-dependency exists, at least in part, between the at least one second process and at least one third process also to be executed by the one or more processor cores. Many alternatives, variations, and modifications are possible.05-17-2012
20120166770Instruction Execution - A method of executing an instruction set including a first instruction and a second instruction, includes reading the first instruction; determining whether the first instruction is an instruction which is integral with the second instruction; reading the second instruction; if the first instruction is integral with the second instruction, interpreting the operand field of the second instruction to indicate at least one value to be used in conjunction with at least one bit of the first instruction; and if the first instruction is not integral with the second instruction, interpreting the operand field of the second instruction to indicate an entry of a look-up table.06-28-2012
20120166769PROCESSOR HAVING INCREASED PERFORMANCE VIA ELIMINATION OF SERIAL DEPENDENCIES - Methods and apparatuses are provided for achieving increased performance via elimination of serial dependencies in instructions or instruction sequences. The apparatus comprises an operational unit for determining whether an instruction will cause dependencies during completion in an execution unit. Responsive to that determination the instruction is replaced with an alternative instruction for completion in the execution unit. In this way, the alternative instruction is completed without causing dependencies in the execution unit. The method comprises determining that an instruction will cause dependencies during completion in a processor and replacing the instruction with an alternative instruction for completion in the processor.06-28-2012
20120221836Synchronizing Commands and Dependencies in an Asynchronous Command Queue - Provided are techniques for the managing of command queue dependencies and command queue synchronization. Incoming commands are actively tracked through their dependency relationships. Command dependencies may be tracked across multiple lists, including a submission list and a completion list. Each command on the submission list is prepared for processing and ultimately submitted to command processing logic. Command completion processing is performed on each command on the completion list, including by not limited to removing dependencies from pending commands and possibly queuing pending commands for submission to the command processing logic. Also provided as features of a command queue are a standby barrier, an active barrier and a marker. Standby and active barriers are employed to synchronize and track commands through the command queue. Markers are employed to track commands through the command queue.08-30-2012
20100205408Speculative Region: Hardware Support for Selective Transactional Memory Access Annotation Using Instruction Prefix - A computer system and method is disclosed for executing selectively annotated transactional regions. The system is configured to determine whether an instruction within a plurality of instructions in a transactional region includes a given prefix. The prefix indicates that one or more memory operations performed by the processor to complete the instruction are to be executed as part of an atomic transaction. The atomic transaction can include one or more other memory operations performed by the processor to complete one or more others of the plurality of instructions in the transactional region.08-12-2010
20100205407PIPELINED MICROPROCESSOR WITH FAST NON-SELECTIVE CORRECT CONDITIONAL BRANCH INSTRUCTION RESOLUTION - A microprocessor includes a pipeline of stages for processing instructions and first and second types of conditional branch instruction includable by a program. The microprocessor makes a prediction of conditional branch instructions of the first type and flushes the pipeline of instructions if the prediction is subsequently determined to be incorrect, thereby incurring a branch misprediction penalty related to processing of conditional branch instructions of the first type. The microprocessor always correctly resolves conditional branch instructions of the second type without making a prediction of conditional branch instructions of the second type, thereby avoiding ever incurring a branch misprediction penalty related to processing of conditional branch instructions of the second type.08-12-2010
20120137109METHOD AND APPARATUS FOR PERFORMING STORE-TO-LOAD FORWARDING FROM AN INTERLOCKING STORE USING AN ENHANCED LOAD/STORE UNIT IN A PROCESSOR - A method and a processor load/store unit (LSU) are described for performing store-to-load forwarding (STLF) from an interlocking store. STLF is performed when a starting address of the store and the load do not match, or when a data size of the store is smaller than a data size of the load. The LSU detects a load that interlocks with a store, and determines whether all or only a portion of data bytes needed by the load can be provided by the interlocking store. If it is determined that only a portion of the data bytes needed by the load can be provided by the interlocking store, then that portion of the data bytes is provided by a store data buffer (SDB) and the remaining portion of the data bytes needed by the load is provided by a data cache (DC). Otherwise, the SDB provides all of the data bytes.05-31-2012
20110185160MULTI-CORE PROCESSOR WITH EXTERNAL INSTRUCTION EXECUTION RATE HEARTBEAT - A method for debugging a multi-core microprocessor includes causing the microprocessor to perform an actual execution of instructions and obtaining from the microprocessor heartbeat information that specifies an actual execution sequence of the instructions by the plurality of cores relative to one another, commanding a corresponding plurality of instances of a software functional model of the cores to execute the instructions according to the actual execution sequence specified by the heartbeat information to generate simulated results of the execution of the instructions, and comparing the simulated results with actual results of the execution of the instructions to determine whether they match. Each core outputs an instruction execution indicator indicating the number of instructions executed by the core each core clock. A heartbeat generator generates a heartbeat indicator for each core on an external bus that indicates the number of instructions executed by each core during each external bus clock cycle.07-28-2011
20100274994PROCESSOR OPERATING MODE FOR MITIGATING DEPENDENCY CONDITIONS - Various techniques for mitigating dependencies between groups of instructions are disclosed. In one embodiment, such dependencies include “evil twin” conditions, in which a first floating-point instruction has as a destination a first portion of a logical floating-point register (e.g., a single-precision write), and in which a second, subsequent floating-point instruction has as a source the first portion and a second portion of the same logical floating-point register (e.g., a double-precision read). The disclosed techniques may be applicable in a multithreaded processor implementing register renaming. In one embodiment, a processor may enter an operating mode in which detection of evil twin “producers” (e.g., single-precision writes) causes the instruction sequence to be modified to break potential dependencies. Modification of the instruction sequence may continue until one or more exit criteria are reached (e.g., committing a predetermined number of single-precision writes). This operating mode may be employed on a per-thread basis.10-28-2010
20100274993LOGICAL MAP TABLE FOR DETECTING DEPENDENCY CONDITIONS - Techniques and structures are described which allow the detection of certain dependency conditions, including evil twin conditions, during the execution of computer instructions. Information used to detect dependencies may be stored in a logical map table, which may include a content-addressable memory. The logical map table may maintain a logical register to physical register mapping, including entries dedicated to physical registers available as rename registers. In one embodiment, each entry in the logical map table includes a first value usable to indicate whether only a portion of the physical register is valid and whether the physical register includes the most recent update to the logical register being renamed. Use of this first value may allow precise detection of dependency conditions, including evil twin conditions, upon an instruction reading from at least two portions of a logical register having an entry in the logical map table whose first value is set.10-28-2010
20100274992APPARATUS AND METHOD FOR HANDLING DEPENDENCY CONDITIONS - Techniques for handling dependency conditions, including evil twin conditions, are disclosed herein. An instruction may designate a source register comprising two portions. The source register may be a double-precision register and its two portions may be single-precision portions, each specified as destinations by two other single-precision instructions. Execution of these two single-precision instructions, especially on a register renaming machine, may result in the appropriate values for the two portions of the source register being stored in different physical locations, which can complicate execution of an instruction stream. In response to detecting a potential dependency, one or more instructions may be inserted in an instruction stream to enable the appropriate values to be stored within one physical double precision register, eliminating an actual or potential evil twin dependency. Embodiments including a compiler that inserts instructions in a generated instruction stream to eliminate dependency conditions are also contemplated.10-28-2010
20120185676Processing apparatus, trace unit and diagnostic apparatus - A processing circuit 07-19-2012
20120185675APPARATUS AND METHOD FOR COMPRESSING TRACE DATA - An apparatus and method for compressing trace data is provided. The apparatus includes a detection unit configured to detect trace data corresponding to one or more function units performing a substantially significant operation in a reconfigurable processor as valid trace data, and a compression unit configured to compress the valid trace data.07-19-2012
20120260071CONDITIONAL ALU INSTRUCTION CONDITION SATISFACTION PROPAGATION BETWEEN MICROINSTRUCTIONS IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - An architectural instruction instructs a microprocessor to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the architectural instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result, determines whether the architectural condition flags satisfy the condition, and updates a non-architectural indicator to indicate whether the architectural condition flags satisfy the condition. To execute the first microinstruction, if the non-architectural indicator updated by the first microinstruction indicates the architectural condition flags satisfy the condition, it updates the destination register with the result; otherwise, it updates the destination register with the current value of the destination register.10-11-2012
20090055630Program Processing Device, Parallel Processing Program, Program Processing Method, Parallel Processing Compiler, Recording Medium Containing The Parallel Processing Compiler, And Multi-Processor System - In a multi-processor system for performing a parallel processing, each of a plurality of processors includes a communication processing unit for performing control between the processors in a data flow machine-type data-driven control method; and a program processing unit for performing control in each processor in a Neumann-type program-driven control method. The communication processing unit performs a communication between the processors in synchronization with the program processing unit, and has a function of detecting a communication data hazard between the processors. The program processing unit performs a processing based on an execution code stored in a local memory, and has a function of executing or suspending the execution code, according to a result of detecting the data hazard.02-26-2009
20090019265ADAPTIVE EXECUTION FREQUENCY CONTROL METHOD FOR ENHANCED INSTRUCTION THROUGHPUT - A method, system and processor for adaptively and selectively controlling the instruction execution frequency of a data processor. Processing logic or a software compiler determines when a number of first-type instructions, requiring longer execution latency, are scheduled to be executed. The logic/compiler then triggers the CPM unit to automatically switch the execution frequency of the instruction processor from a first frequency that is optimal for processing regular-type instructions to a second, pre-established lower frequency that is optimal for processing the first-type instructions, to enable more efficient execution and higher execution throughput of the number of first-type operations within the processor. When the first-type instructions have completed execution, the processor's instruction execution frequency is returned to the first optimal frequency.01-15-2009
20120278594PERFORMANCE BOTTLENECK IDENTIFICATION TOOL - A computer program product for identifying bottlenecks includes a computer readable storage medium with stored computer readable program instructions. The computer readable program instructions, when executed, provide a data collector module, a mapper module, and an analyzer module that are collectively configured to read mapped data and configuration files, and identify, based upon the mapped data and the configuration files, an undesirable bottleneck condition that causes a computer program to run inefficiently. A method includes reading a configuration file that includes data regarding processor components, and collecting data from hardware activity counters based upon the configuration file. The method also includes mapping the collected data to corresponding sections of code of a computer program, reading the mapped data and the configuration file, and identifying, based upon the reading of the mapped data and the configuration file, an undesirable bottleneck condition that causes the processor to run the computer program inefficiently.11-01-2012
20080256339Techniques for Tracing Processes in a Multi-Threaded Processor - A technique for tracing processes executing in a multi-threaded processor includes forming a trace message that includes a virtual core identification (VCID) that identifies an associated thread. The trace message, including the VCID, is then transmitted to a debug tool.10-16-2008
20080244233MACHINE CLUSTER TOPOLOGY REPRESENTATION FOR AUTOMATED TESTING - Software (such as server products) operating in a complex networked environment often run on multi-machine installations that are known as machine clusters. A server product can be tested on a server machine type. The server product can be tested by tracking the constituent machines of a machine cluster, and configuring and recording the roles that each machine in the machine cluster plays. Scenarios targeting a single server machine-type can be seamlessly mapped from the single machine scenario to a machine cluster of any number of machines, while handling actions such as executing tests and gathering log files from all machines of a machine cluster as a unit.10-02-2008
20130103930DATA PROCESSING DEVICE AND METHOD, AND PROCESSOR UNIT OF SAME - A processor unit (04-25-2013
20130145129REGISTER RENAMING DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING REGISTER RENAMING - A data processing apparatus and method are provided. A processor performs data processing operations in response to data processing instructions which reference logical registers. A set of physical registers stores data values which are subjected to the data processing operations. A tag storage stores for each physical register a tag value indicative of one of the logical registers. The processor references the tag storage to perform the data processing operations. A tag value exchanger performs a tag switch exchanging two tag values in the tag storage when the processor executes a predetermined instruction which references two logical registers and for which a choice of which two physical registers are mapped to which of the two logical registers will have no effect on an outcome of the data processing operations. The tag value exchanger performs the tag switch with respect to the tag values indicative of the two logical registers.06-06-2013
20100306506MICROPROCESSOR WITH SELECTIVE OUT-OF-ORDER BRANCH EXECUTION - A pipelined out-of-order execution in-order retire microprocessor includes a branch predictor that predicts a target address of a branch instruction, a fetch unit that fetches instructions at the predicted target address, and an execution unit that: resolves a target address of the branch instruction and detects that the predicted and resolved target addresses are different; determines whether there is an unretired instruction that must be corrected and that is older in program order than the branch instruction, in response to detecting that the predicted and resolved target addresses are different; execute the branch instruction by flushing instructions fetched at the predicted target address and causing the fetch unit to fetch from the resolved target address, if there is not an unretired instruction that must be corrected and that is older in program order than the branch instruction; and otherwise, refrain from executing the branch instruction.12-02-2010
20130191617COMPUTER SYSTEM, COMPUTER SYSTEM CONTROL METHOD, COMPUTER SYSTEM CONTROL PROGRAM, AND INTEGRATED CIRCUIT - A computer system includes a memory having a secure area and a plurality of processors using the memory. When an access-allowed program unit executed by one of the processors starts an access to the secure area, the program unit subject to execution by the other processors is limited to the access-allowed program unit.07-25-2013
20120290818Split Scheduler - In an embodiment, a scheduler implements a first dependency array that tracks dependencies on instruction operations (ops) within a distance N of a given op and which are short execution latency ops. Other dependencies are tracked in a second dependency array. The first dependency array may evaluate quickly, to support back-to-back issuance of short execution latency ops and their dependent ops. The second array may evaluate more slowly than the first dependency array.11-15-2012

Patent applications in class DYNAMIC INSTRUCTION DEPENDENCY CHECKING, MONITORING OR CONFLICT RESOLUTION

Patent applications in all subclasses DYNAMIC INSTRUCTION DEPENDENCY CHECKING, MONITORING OR CONFLICT RESOLUTION