Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Conditional branching

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)


712233000 - Branching (e.g., delayed branch, loop control, branch predict, interrupt)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712239000 Branch prediction 104
712237000 Prefetching a branch target (i.e., look ahead) 30
712236000 Evaluation of multiple conditions or multiway branching 4
20090249047METHOD AND SYSTEM FOR RELATIVE MULTIPLE-TARGET BRANCH INSTRUCTION EXECUTION IN A PROCESSOR - A method and system for relative multiple-target branch instruction execution in a processor is provided. One implementation involves receiving an instruction for execution; determining a next instruction to execute based on multiple condition bits or outcomes of a comparison by the current instruction; obtaining a specified instruction offset in the current instruction; and using the offset as the basis for multiple instruction targets based on said outcomes, wherein the number of conditional branches is reduced.10-01-2009
20090070567EFFICIENT IMPLEMENTATION OF BRANCH INTENSIVE ALGORITHMS IN VLIW AND SUPERSCALAR PROCESSORS - An apparatus for implementing branch intensive algorithms is disclosed. The apparatus includes a processor containing a plurality of ALUs and a plurality of result registers. Each result register has a guard input which allows the ALU to write a result to the register upon receipt of a selection signal at the guard input. A lookup table is dynamically programmed with logic to implement an upcoming branching portion of program code. Upon evaluation of the branch conditions of the branching portion of code, the lookup table outputs a selection signal for writing the correct results of the branching portion of code based on the evaluation of the branch condition statements and the truth table programmed into the lookup table to the result register.03-12-2009
20120210107PREDICATED ISSUE FOR CONDITIONAL BRANCH INSTRUCTIONS - A method and apparatus for executing branch instructions is provided. In one embodiment, the method includes receiving a branch instruction, wherein a first path of the branch instruction branches to a target instruction, and wherein a second path of the branch instruction branches to one or more interceding instructions between the branch instruction and the target instruction. The method further includes issuing the one or more interceding instructions and the target instruction and determining if the branch instruction follows the first path or the second path. Upon determining that the branch instruction follows the first path, the one or more interceding instructions between the branch instruction and the target instruction are invalidated.08-16-2012
20120331278BRANCH REMOVAL BY DATA SHUFFLING - A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.12-27-2012
712235000 Simultaneous parallel fetching or executing of both branch and fall-through path 4
20130046964SYSTEM AND METHOD FOR ZERO PENALTY BRANCH MIS-PREDICTIONS - A system and method may execute a branch instruction in a program. The branch instruction may be received defining a plurality of different possible instruction paths. Instructions for an initial predefined one of the paths may be automatically retrieved from a program memory while the correct path is being determined. If the initial path is determined to be correct, the instructions retrieved for the initial path may continue to be processed and if a different path is determined to be correct, instructions from a stored reserve of instructions may be processed for the different path to supply the program with enough correct path instructions to run the program at least until the program retrieves the correct path instructions from the program memory to recover from taking the incorrect path. The system and method may recover from taking the incorrect path with zero computational penalty.02-21-2013
20110219221Dynamic warp subdivision for integrated branch and memory latency divergence tolerance - Dynamic warp subdivision (DWS), which allows a single warp to occupy more than one slot in the scheduler without requiring extra register file space, is described. Independent scheduling entities also allow divergent branch paths to interleave their execution, and allow threads that hit in the cache or otherwise have divergent memory-access latency to run ahead. The result is improved latency hiding and memory level parallelism (MLP).09-08-2011
20080288759Memory-hazard detection and avoidance instructions for vector processing - A processor that is configured to perform parallel operations in a computer system where one or more memory hazards may be present is described. An instruction fetch unit within the processor is configured to fetch instructions for detecting one or more critical memory hazards between memory addresses if memory operations are performed in parallel on multiple addresses corresponding to at least a partial vector of addresses. Note that critical memory hazards include memory hazards that lead to different results when the memory addresses are processed in parallel than when the memory addresses are processed sequentially. Furthermore, an execution unit within the processor is configured to execute the instructions for detecting the one or more critical memory hazards.11-20-2008
20080270773Processing element having dual control stores to minimize branch latency - Embodiments involve an embedded processing element that fetches at least two possible next instructions (control words) in parallel in one cycle, and executes one of them during the following cycle based on the result of a conditional branch test. Embodiments reduce or avoid branch penalties (zero penalty branches).10-30-2008
20090193239COUNTER CONTROL CIRCUIT, DYNAMIC RECONFIGURABLE CIRCUIT, AND LOOP PROCESSING CONTROL METHOD - A counter control circuit that controls the operation of a counter arranged in a dynamic reconfigurable circuit executing an arbitrary instruction by dynamically switching an aggregation of reconfigurable processing elements (hereinafter referred to as “PEs”) according to a context reciting a processing content of the PE and a connection content between the PEs, the counter control circuit including: keeping means for keeping an operation instruction signal when the PE executing a conditional branching computation outputs, in a context being adapted to the dynamic reconfigurable circuit, the operation instruction signal of the counter for a subsequent context; output means for outputting the operation instruction signal kept in the keeping means to the counter; and control means for causing the output means to output the operation instruction signal when the context being adapted to the dynamic reconfigurable circuit is switched to the subsequent context.07-30-2009
20130061027TECHNIQUES FOR HANDLING DIVERGENT THREADS IN A MULTI-THREADED PROCESSING SYSTEM - This disclosure describes techniques for handling divergent thread conditions in a multi-threaded processing system. In some examples, a control flow unit may obtain a control flow instruction identified by a program counter value stored in a program counter register. The control flow instruction may include a target value indicative of a target program counter value for the control flow instruction. The control flow unit may select one of the target program counter value and a minimum resume counter value as a value to load into the program counter register. The minimum resume counter value may be indicative of a smallest resume counter value from a set of one or more resume counter values associated with one or more inactive threads. Each of the one or more resume counter values may be indicative of a program counter value at which a respective inactive thread should be activated.03-07-2013
20090172370EAGER EXECUTION IN A PROCESSING PIPELINE HAVING MULTIPLE INTEGER EXECUTION UNITS - One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.07-02-2009
20080313444MICROCOMPUTER AND DIVIDING CIRCUIT - Herein disclosed is a microcomputer MCU adopting the general purpose register method. The microcomputer is enabled to have a small program capacity or a high program memory using efficiency and a low system cost, while enjoying the advantage of simplification of the instruction decoding as in the RISC machine having a fixed length instruction format of the prior art, by adopting a fixed length instruction format having a power of 2 but a smaller bit number than that of the maximum data word length fed to instruction execution means. And, the control of the coded division is executed by noting the code bits.12-18-2008
20090327672SECURED PROCESSING UNIT - A method for executing by a processing unit a program stored in a memory, includes: detecting a piece of information during the execution of the program by the processing unit, and if the information is detected, triggering the execution of a hidden subprogram by the processing unit. The method may be applied to the securization of an integrated circuit.12-31-2009
20100082952PROCESSOR - When two threads (strands), for example, are executed in parallel in a processor in a simultaneous multi-thread (SMT) system, entries of a branch reservation station of an instruction control device are separately used in a strand 04-01-2010
20110197050Wait Instruction - A microprogrammable electronic device comprises a code memory storing a plurality of instructions. At least one instruction, when executed by the device, causes the device to enter into a wait state associated with a plurality of predefined wait state exit conditions. The device is configured to load into an electronic table each condition together with a corresponding code memory address of an instruction to be executed when the condition occurs; to execute, when is in the wait state, a wait instruction stored in the code memory and which, when executed, is such as to cause the device to check simultaneously the conditions loaded into said electronic table to detect if condition occurs; and, if a condition occurs, to exit from said wait state and to execute the instruction stored in the code memory at the code memory address loaded into the electronic table together with the condition that occurred.08-11-2011
20110197049TWO PASS TEST CASE GENERATION USING SELF-MODIFYING INSTRUCTION REPLACEMENT - A test code generation technique that replaces instructions having a machine state dependent result with special redirection instructions provides generation of test code in which state dependent execution choices are made without a state model. Redirection instructions cause execution of a handler than examines the machine state and replaces the redirection instruction with a replacement instruction having a desired result resolved in accordance with the current machine state. The instructions that are replaced may be conditional branch instructions and the result a desired execution path. The examination of the machine state permits determination of a branch condition for the replacement instruction so that the next pass of the test code executes along the desired path. Alternatively, the handler can execute a jump to the branch instruction, causing immediate execution of the desired branch path. The re-direction instructions may be illegal instructions, which cause execution of an interrupt handler that performs the replacement.08-11-2011
20080209188PROCESSOR AND METHOD OF PERFORMING SPECULATIVE LOAD OPERATIONS OF THE PROCESSOR - Provided is a processor and method of performing speculative load instructions of the processor in which a load instruction is performed only in the case where the load instruction substantially accesses a memory. A load instruction for canceling operations is performed in other cases except the above case, so that problems occurring by accessing an input/output (I/O) mapped memory area and the like at the time of performing speculative load instructions can be prevented using only a software-like method, thereby improving the performance of a processor.08-28-2008
20080276079MECHANISM TO MINIMIZE UNSCHEDULED D-CACHE MISS PIPELINE STALLS - A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit.11-06-2008
20080201562DATA PROCESSING SYSTEM - The present invention provides a data processor or a data processing system which can be used in compatible modes among which the number of bits of an address specifying a logical address space varies at the time of referring to a branch address table by extension of displacement of a branch instruction. At the time of generating a branch address of a first branch instruction, the data processor or the data processing system optimizes a multiple with which a displacement is multiplied in accordance with the number of bits of an address specifying a logical address space, adds extended address information to the value of a register, and refers to a branch address table with address information obtained by the addition. The referred information is used as a branch address. To be adapted to a compatible mode using different number of bits of an address specifying a logical address space, it is sufficient to change a multiple with which the displacement is multiplied in accordance with the mode.08-21-2008
20080313443PROCESSOR APPARATUS - There is disclosed a processing apparatus including, as an instruction set, a complex conditional branch instruction, and a condition setting instruction. The complex conditional branch instruction is an instruction for performing comparison operation for one or each of a plural number of conditions, and for performing branching to a branch target specified, based on comparison operation between the results of the comparison operations performed and the branching condition value specified. The condition setting instruction is an instruction for setting the condition. The processing apparatus includes a complex condition setting storage unit for storing the complex condition specified by the condition setting instruction, a condition comparison unit including a plurality of comparators for comparing the complex condition specified by the complex conditional branch instruction, in the complex condition setting storage unit, at the time of execution of the complex conditional branch instruction, a complex condition branching decision unit for determining on whether or not branching to the branch target is to be performed, using the results of comparisons performed in the comparators of the condition comparison unit and the branching condition value specified by the complex conditional branch instruction.12-18-2008
20080209189Information processing apparatus - The present invention provides an information processing apparatus having a predecoder decoding an operation code in an input instruction, generating conditional branch instruction information indicating that the input instruction is a conditional branch instruction and instruction type information indicating a type of the conditional branch instruction when the input instruction is a conditional branch instruction, and writing the input instruction, from which the operation code is deleted, the conditional branch instruction information and the instruction type information to the instruction cache memory, and a history information writing unit writing history information indicating whether or not the conditional branch instruction was branched, as a result of executing the conditional branch instruction stored in the instruction cache memory, to an area in the instruction cache memory, where the operation code of the conditional branch instruction is deleted.08-28-2008
20090177874Processor apparatus and conditional branch processing method - Disclosed is a processor apparatus including a branch condition storage unit having a plurality of storage regions in each of which a branch condition set by a condition setting instruction is stored, an instruction decoder that decodes an instruction code, an instruction memory that stores therein the instruction code, an operation register used by a processor for operation, a branch condition comparison unit that performs a comparison operation for each of branch conditions, a conditional branch determination unit that makes a determination whether or not to perform program branching in a conditional branch instruction, a selector that makes selection between a branch destination address and a next instruction address, based on an output value of the condition branch determination unit, and a program counter that indicates a processor instruction executing position. The branch condition specified by the condition setting instruction is stored in one of the storage regions in the branch condition storage unit 1 specified by the condition setting instruction. When the conditional branch instruction is executed, individual determinations on a plurality of the branch conditions stored in the branch condition storage unit are made. Among the branch conditions that simultaneously hold, the branch address corresponding to the branch condition stored in a predetermined one of the storage regions in the branch condition storage unit is selected from the branch address storage unit, and branching to the branch address is performed.07-09-2009
20090083526Program conversion apparatus, program conversion method, and comuter product - A linker generates a simulator-use executable format program from a pre-conversion object program and a simulator-use object program. A simulator executes the simulator-use object program and acquires branch trace information. A binary program converting tool, based on the branch trace information and a branch penalty table, generates a post-conversion object program having a rewritten branching prediction bit of the pre-conversion object program. Another linker generates an actual-machine-use executable format program from the post-conversion object program and an actual-machine-use object program.03-26-2009
20100161950SEMI-ABSOLUTE BRANCH INSTRUCTIONS FOR EFFICIENT COMPUTERS - Apparatus and methods are disclosed for a computation processor that can execute a semi-absolute branch instruction, as well as methods of operation and of generating the semi-absolute branch instruction.06-24-2010
20100161949SYSTEM AND METHOD FOR FAST BRANCHING USING A PROGRAMMABLE BRANCH TABLE - Methods and systems consistent with the present invention provide a programmable table which allows software to define a plurality of branching functions, each of which maps a vector of condition codes to a branch offset. This technique allows for a flexible multi-way branching functionality, using a conditional branch outcome table that can be specified by a programmer. Any instruction can specify the evaluation of arbitrary conditional expressions to compute the values for the condition codes, and can choose a particular branching function. When the processor executes the instruction, the processor's arithmetic/logical functional units evaluate the conditional expressions and then the processor performs the branch operation, according to the specified branching function.06-24-2010
20110238964DATA PROCESSOR - The data processor includes CPU operable to execute an instruction included in an instruction set. The instruction set includes a load instruction for reading data on a memory space. The data read according to the load instruction includes data of a format type having a data-read-branching-occurrence bit region. The CPU includes a data-read-branching control register; a data-read-branching address register; and a read-data-analyzing unit. On condition that a bit value showing the occurrence of data read branching has been set on the data-read-branching-occurrence bit region, and a value showing the data-read-branching-occurrence bit remaining valid has been set on the data-read-branching control register, the switching between processes is performed by branching to an address stored in the data-read-branching address register.09-29-2011
20100250906Obfuscation - In an embodiment of a method of making a conditional jump in a computer running a program, an input is provided, conditional on which a substantive conditional branch is to be made. An obfuscatory unpredictable datum is provided. Code is executed that causes an obfuscatory branch conditional on the unpredictable datum. At a point in the computer program determined by the obfuscatory conditional branch, a substantive branch is made that is conditional on the input.09-30-2010
20100281241METHOD AND SYSTEM FOR SYNCHRONIZING INCLUSIVE DECISION BRANCHES - A method and system for reconstructing an original process model that includes an inclusive decision into a reconstructed process model that includes synchronized branches. The inclusive decision is replaced with a fork. Each original path of the inclusive decision is reconstructed into a corresponding reconstructed path. Each reconstructed path includes a decision, an activity included in the corresponding original path, and a merge. The decision includes a first output connected to the activity, whose output is connected to a first input of the merge. The decision further includes a second output connected to a second input of the merge. A join is created to recombine output branches from the merge node and other merge nodes included in reconstructed paths. The output branches are synchronized so that all activated reconstructed paths must finish processing before a modeling element connected to an output of the join is executed.11-04-2010
20100281240Program Code Simulator - A system and method for facilitating simulation of a computer program. A program representation is generated from a computer program. A simulation of the program is performed. Simulation may include applying heuristics to determine program flow for selected instructions, such as a branch instruction or a loop instruction. Simulation may also include creating imaginary objects as surrogates for real objects, when program code to create real objects is restricted, or fields of the objects are unavailable or uncertain, or for other reasons. Data descriptive of the simulation is inserted into the program representation. A visualizer may retrieve the program representation and generate a visualization that shows sequence flows resulting from the simulation.11-04-2010
20130138931MAINTAINING THE INTEGRITY OF AN EXECUTION RETURN ADDRESS STACK - A processor and method for maintaining the integrity of an execution return address stack (RAS). The execution RAS is maintained in an accurate state by storing information regarding branch instructions in a branch information table. The first time a branch instruction is executed, an entry is allocated and populated in the table. If the branch instruction is re-executed, a pointer address is retrieved from the corresponding table entry and the execution RAS pointer is repositioned to the retrieved pointer address. The execution RAS can also be used to restore a speculative RAS due to a mis-speculation.05-30-2013
20110072250Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response - Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism.03-24-2011
20110078424OPTIMIZING PROGRAM CODE USING BRANCH ELIMINATION - A method for optimizing program code is provided. The method comprises detecting a branch instruction comprising a condition expression, wherein the branch instruction, when executed by a processor, causes the processor to execute either a first set of instructions or a second set of instructions according to a value of the condition expression; and replacing the branch instruction with a third set of instructions that are non-branching, wherein the third set of instructions, when executed by a processor, has a collective effect same as if either the first or second set of instructions were executed according to the value of the condition expression. The third set of instructions comprises a negation instruction to normalize the value of the condition expression.03-31-2011
20110072249UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR - One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.03-24-2011
20090240931Indirect Function Call Instructions in a Synchronous Parallel Thread Processor - An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.09-24-2009
20090138689Partitioning Processor Resources Based on Memory Usage - Processor resources are partitioned based on memory usage. A compiler determines the extent to which a process is memory-bound and accordingly divides the process into a number of threads. When a first thread encounters a prolonged instruction, the compiler inserts a conditional branch to a second thread. When the second thread encounters a prolonged instruction, a conditional branch to a third thread is executed. This continues until the last thread conditionally branches back to the first thread. An indirect segmented register file is used so that the “return to” and “branch to” logical registers within each thread are the same (e.g., R05-28-2009
20110161641SPE Software Instruction Cache - An application thread executes a direct branch instruction that is stored in an instruction cache line. Upon execution, the direct branch instruction branches to a branch descriptor that is also stored in the instruction cache line. The branch descriptor includes a trampoline branch instruction and a target instruction space address. Next, the trampoline branch instruction sends a branch descriptor pointer, which points to the branch descriptor, to an instruction cache manager. The instruction cache manager extracts the target instruction space address from the branch descriptor, and executes a target instruction corresponding to the target instruction space address. In one embodiment, the instruction cache manager generates a target local store address by masking off a portion of bits included in the target instruction space address. In turn, the application thread executes the target instruction located at the target local store address accordingly.06-30-2011
20080201563Apparatus for Improving Single Thread Performance through Speculative Processing - An apparatus is provided for using multiple thread contexts to improve processing performance of a single thread. When an exceptional instruction is encountered, the exceptional instruction and any predicted instructions are reloaded into a buffer of a first thread context. A state of the register file at the time of encountering the exceptional instruction is maintained in a register file of the first thread context. The instructions in the pipeline are executed speculatively using a second register file in a second thread context. During speculative execution, cache misses may cause loading of data to the cache may be performed. Results of the speculative execution are written to the second register file. When a stopping condition is met, contents of the first register file are copied to the second register file and the reloaded instructions are released to the execution pipeline.08-21-2008
20120311307Method And Apparatus For Obtaining A Call Stack To An Event Of Interest And Analyzing The Same - In one embodiment, a processor includes a performance monitor including a last branch record (LBR) stack to store a call stack to an event of interest, where the call stack is collected responsive to a trigger for the event. The processor further includes logic to control the LBR stack to operate in a call stack mode such that an entry to a call instruction for a leaf function is cleared on return from the leaf function. Other embodiments are described and claimed.12-06-2012
20100217961PROCESSOR SYSTEM EXECUTING PIPELINE PROCESSING AND PIPELINE PROCESSING METHOD - A processor system includes a plurality of pipeline stages, a controller, and a transfer path. The plurality of pipeline stages is subjected to processing. The controller determines whether or not each of the executable instructions to be processed in the pipeline stages requires processing in a succeeding pipeline stage. The transfer path, if the controller determines the executable instruction does not require the processing in the succeeding pipeline stage, skips the pipeline stage including the unnecessary processing.08-26-2010
20090300339LSI FOR IC CARD - To prevent exposure or tampering of data by an illegal access to a memory of an LSI, a ROM (12-03-2009
20110154002Methods And Apparatuses For Efficient Load Processing Using Buffers - Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads.06-23-2011
20110307689PROCESSOR SUPPORT FOR HARDWARE TRANSACTIONAL MEMORY - A processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processing cores. In response to determining an abort condition for an issued one of the plurality of program instructions and in response to determining that the issued program instruction is not part of a mispredicted execution path, the processing core is configured to abort an attempt to execute the speculative region of code.12-15-2011
20100115251METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR OPTIMIZING RUNTIME BRANCH SELECTION IN A FLOW PROCESS - A method, system, and computer program product for optimizing runtime branch selection in a flow process are provided. The method includes gathering performance metrics of flow branch behavior for executed flows in a runtime system over time and using aggregated performance metrics for the behavior to determine an optimal ordering of branches for a currently running flow. The optimal ordering is determined by identifying one or more branch points in the flow, generating ordering permutations for at least a portion of the branches in the branch point for the flow to identify any permutations that have not been executed, gathering metrics for permutation(s) of the branch point in the flow, comparing the metrics to performance metrics of executed flows having substantially similar flow branch behavior, and identifying optimal branch ordering for the permutation(s) based upon the comparison. The method also includes executing the flow according to the optimal branch ordering.05-06-2010
20120072707Scaleable Status Tracking Of Multiple Assist Hardware Threads - A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating.03-22-2012
20100095103Instruction execution control device and instruction execution control method - An instruction execution control device operates a plurality of threads in a simultaneous multi-thread system. The device has architecture registers (04-15-2010
20100205415PIPELINED MICROPROCESSOR WITH FAST CONDITIONAL BRANCH INSTRUCTIONS BASED ON STATIC SERIALIZING INSTRUCTION STATE - A microprocessor includes a control register that stores a control value that affects operation of the microprocessor. An instruction set architecture includes a conditional branch instruction that specifies a branch condition based on the control value stored in the control register, and a serializing instruction that updates the control value in the control register. The microprocessor completes all modifications to flags, registers, and memory by instructions previous to the serializing instruction and to drain all buffered writes to memory before it fetches and executes the next instruction after the serializing instruction. Execution units update the control value in the control register in response to the serializing instruction. A fetch unit fetches, decodes, and unconditionally correctly resolves and retires the conditional branch instruction based on the control value stored in the control register rather than dispatching the conditional branch instruction to the execution units to be resolved.08-12-2010
20100299508DYNAMICALLY ALLOCATED STORE QUEUE FOR A MULTITHREADED PROCESSOR - Systems and methods for storage of writes to memory corresponding to multiple threads. A processor comprises a store queue, wherein the queue dynamically allocates a current entry for a committed store instruction in which entries of the array may be allocated out of program order. For a given thread, the store queue conveys store data to a memory in program order. The queue is further configured to identify an entry of the plurality of entries that corresponds to an oldest committed store instruction for a given thread and determine a next entry of the array that corresponds to a next committed store instruction in program order following the oldest committed store instruction of the given thread, wherein said next entry includes data identifying the entry. The queue marks an entry as unfilled upon successful conveying of store data to the memory.11-25-2010
20090019272STORE QUEUE ARCHITECTURE FOR A PROCESSOR THAT SUPPORTS SPECULATIVE EXECUTION - Embodiments of the present invention provide a system that buffers stores on a processor that supports speculative execution. The system starts by buffering a store into an entry in the store queue during a speculative execution mode. If an entry for the store does not already exist in the store queue, the system writes the store into an available entry in the store queue and updates a byte mask for the entry. Otherwise, if an entry for the store already exists in the store queue, the system merges the store into the existing entry in the store queue and updates the byte mask for the entry to include information about the newly merged store. The system then forwards the data from the store queue to subsequent dependent loads.01-15-2009
20120324209BRANCH TARGET BUFFER ADDRESSING IN A DATA PROCESSOR - A data processor includes a branch target buffer (BTB) having a plurality of BTB entries grouped in ways. The BTB entries in one of the ways include a short tag address and the BTB entries in another one of the ways include a full tag address.12-20-2012
20110276792RESOURCE FLOW COMPUTER - A scalable processing system includes a memory device having a plurality of executable program instructions, wherein each of the executable program instructions includes a timetag data field indicative of the nominal sequential order of the associated executable program instructions. The system also includes a plurality of processing elements, which are configured and arranged to receive executable program instructions from the memory device, wherein each of the processing elements executes executable instructions having the highest priority as indicated by the state of the timetag data field.11-10-2011
20110320788METHOD AND APPARATUS FOR BRANCH REDUCTION IN A MULTITHREADED PACKET PROCESSOR - A method and apparatus for branch reduction in a multithreaded packet processor is presented. An instruction is executed which includes testing of a branch flag. The branch flag references a configuration bit vector wherein each bit in the configuration bit vector corresponds to a respective feature. When said branch flag returns a first result processing is continues at an instruction located at a first location relative to a Program Counter (PC) and when the branch flag returns a second result processing is continued at a second location relative to said PC.12-29-2011
20110320787Indirect Branch Hint - A processor implements an apparatus and a method for predicting an indirect branch address. A target address generated by an instruction is automatically identified. A predicted next program address is prepared based on the target address before an indirect branch instruction utilizing the target address is speculatively executed. The apparatus suitably employs a register for holding an instruction memory address that is specified by a program as a predicted indirect address of an indirect branch instruction. The apparatus also employs a next program address selector that selects the predicted indirect address from the register as the next program address for use in speculatively executing the indirect branch instruction.12-29-2011
20120102302PROCESSOR TESTING - Processors may be tested according to various implementations. In one general implementation, a process for processor testing may include randomly generating a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may also include randomly generating a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may additionally include generating a plurality of instructions to increment a counter when each branch instruction is encountered during execution.04-26-2012
20080250234Microprocessor Ouput Ports and Control of Instructions Provided Therefrom - A method and apparatus are provided for controlling instructions provided by a microprocessor output port to other execution units. A microprocessor pipeline of instructions is provided for each execution unit. These are scheduled via the microprocessor unit for each execution unit, a determination is made as to whether or not the execution unit can receive further instructions. If it cannot, it's associated pipeline is said to be stalled and instructions are deleted from the microprocessor pipeline. Its thread can then be restarted at a later time with the instruction corresponding to the instruction which was unable to execute.10-09-2008
20080235499APPARATUS AND METHOD FOR INFORMATION PROCESSING ENABLING FAST ACCESS TO PROGRAM - A main memory stores cache blocks obtained by dividing a program. At a position in a cache block where a branch to another cache block is provided, there is embedded an instruction for activating a branch resolution routine for performing processing, such as loading of a cache block of the branch target. A program is loaded into a local memory in units of cache blocks, and the cache blocks are serially stored in first through nth banks, which are sections provided in the storage area. Management of addresses in the local memory or processing for discarding a copy of a cache block is performed with reference to an address translation table, an inter-bank reference table and a generation number table.09-25-2008
20130179668LAST BRANCH RECORD INDICATORS FOR TRANSACTIONAL MEMORY - In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed.07-11-2013
20130091343OPERAND FETCHING CONTROL AS A FUNCTION OF BRANCH CONFIDENCE - Data operand fetching control includes calculating a summation weight value for each instruction in a pipeline, the summation weight value calculated as a function of branch uncertainty and a pendency in which the instruction resides in the pipeline relative to other instructions in the pipeline. The data operand fetching control also includes mapping the summation weight value of a selected instruction that is attempting to access system memory to a memory access control. Each memory access control specifies a manner of handling data fetching operations. The data operand fetching control further includes performing a memory access operation for the selected instruction based upon the mapping.04-11-2013
20130124838INSTRUCTION LEVEL EXECUTION PREEMPTION - One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.05-16-2013
20130151822Efficient Enqueuing of Values in SIMD Engines with Permute Unit - Mechanisms, in a data processing system having a processor, for generating enqueued data for performing computations of a conditional branch of code are provided. Mask generation logic of the processor operates to generate a mask representing a subset of iterations of a loop of the code that results in a condition of the conditional branch being satisfied. The mask is used to select data elements from an input data element vector register corresponding to the subset of iterations of the loop of the code that result in the condition of the conditional branch being satisfied. Furthermore, the selected data elements are used to perform computations of the conditional branch of code. Iterations of the loop of the code that do not result in the condition of the conditional branch being satisfied are not used as a basis for performing computations of the conditional branch of code.06-13-2013
20100318775Methods and Apparatus for Adapting Pipeline Stage Latency Based on Instruction Type - Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes. Application code can be given new degrees of optimization freedom where instruction class and the mix of instructions can be chosen based on performance and power requirements.12-16-2010
20100318774PROCESSOR INSTRUCTION GRADUATION TIMEOUT - A multiprocessor computer system comprises a plurality of processors distributed across a plurality of node coupled by a processor interconnect network. One or more of the processors is operable to manage hung processor instructions by setting a graduation timeout counter after a first program instruction graduates, resetting the graduation timeout counter if a subsequent program instruction graduates before the graduation timeout counter expires, and resetting the processor if the graduation timeout counter expires before the subsequent program instruction graduates.12-16-2010
20110314265PROCESSORS OPERABLE TO ALLOW FLEXIBLE INSTRUCTION ALIGNMENT - Methods and apparatus are provided for optimizing a processor core. Common processor subcircuitry is used to perform calculations for various types of instructions, including branch and non-branch instructions. Increasing the commonality of calculations across different instruction types allows branch instructions to jump to byte aligned memory address even if supported instructions are multi-byte or word aligned.12-22-2011
20130198498COMPILING METHOD, PROGRAM, AND INFORMATION PROCESSING APPARATUS - A method, program, and apparatus for optimizing compiled code using a dynamic compiler. The method includes the steps of: generating intermediate code from a trace, which is an instruction sequence described in machine language; computing an offset between an address value, which is a base point of an indirect branch instruction, and a start address of a memory page, which includes a virtual address referred to by the information processing apparatus immediately after processing a first instruction; determining whether an indirect branch instruction that is subsequent to the first instruction causes processing to jump to another memory page, by using a value obtained from adding the offset to a displacement made by the indirect branch instruction; and optimizing the intermediate code by using the result of the determining step.08-01-2013
20130212364PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS - One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency.08-15-2013

Patent applications in class Conditional branching

Patent applications in all subclasses Conditional branching