Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Branching (e.g., delayed branch, loop control, branch predict, interrupt)

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

712220000 - PROCESSING CONTROL

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712234000 Conditional branching 198
712244000 Exeception processing (e.g., interrupts and traps) 60
712241000 Loop execution 39
712242000 To macro-instruction routine 1
20110167248EFFICIENT RESUMPTION OF CO-ROUTINES ON A LINEAR STACK - Unsuspended co-routines are handled by the machine call stack mechanism in which the stack grows and shrinks as recursive calls are made and returned from. When a co-routine is suspended, however, additional call stack processing is performed. A suspension message is issued, and the entire resume-able part of the call stack is removed, and is copied to the heap. A frame that returns control to a driver method (a resumer) is copied to the call stack so that resumption of the co-routine does not recursively reactivate the whole call stack. Instead the resumer reactivates only the topmost or most current frame called the leaf frame. When a co-routine is suspended, it does not return to its caller, but instead returns to the resumer that has reactivated it.07-07-2011
Entries
DocumentTitleDate
20090119492Data Processing Apparatus and Method for Handling Procedure Call Instructions - A data processing apparatus and method are provided for handling procedure call instructions. The data processing apparatus has processing logic for performing data processing operations specified by program instructions fetched from a sequence of addresses, at least one of the program instructions being a procedure call instruction specifying a branch operation to be performed. Further, a control value is stored within control storage, and the processing logic is operable in response to a control value modifying instruction to modify that control value. If the control value is clear, the processing logic is operable in response to the procedure call instruction to generate a return address value in addition to performing the branch operation, whereas if the control value is set, the processing logic is operable in response to the procedure call instruction to suppress generation of the return address value and to cause the control value to be clear in addition to performing the branch operation. This provides significant flexibility in how procedure call instructions are used within the data processing apparatus.05-07-2009
20100042819COMPUTER READABLE MEDIUM, OPERATION CONTROLLING METHOD, AND OPERATION CONTROL SYSTEM - A computer readable medium storing a program causing a computer to execute a process for controlling a plurality of operations, the process including: accepting a change request to change an operation result of an operation executed prior to a current execution-permitted operation which an execution is permitted based on an operation procedure for the operations; assuming an operation for which the change request is accepted in the accepting step as a starting point; and identifying an operation permitted to be executed with reference to the starting point based on the operation procedure.02-18-2010
20100106950METHOD AND SYSTEM FOR LOADING STATUS CONTROL OF DLL - Apparatus and methods are provided for controlling the loading status of DLLs. Specifically, a streaming program compiler is provided. The compiler includes operation modules for calling DLLs during streaming program execution; association table generating units for generating association tables according to user-defined rules, where the association table includes entries indicating (i) stream branches of the streaming program and (ii) an operation module corresponding to the stream branches; and a trigger generating unit for generating a trigger based on user-defined rules, where the trigger generating unit (i) determines which conditions for loading and unloading DLLs fit the streaming program, (ii) matches these conditions to a particular stream branch to identify a matched stream branch, and (iii) sends out triggering signals indicating the matched stream branch. This invention also provides a corresponding method and controller.04-29-2010
20100250905System and Method of Routing Instructions - Disclosed are a method and system for reducing complexity of routing of instructions from an instruction issue queue to appropriate execution pipelines in a superscalar processor. In one or more embodiments, an instruction steering unit of the superscalar processor receives ordered instructions. The steering unit determines that a first instruction and a subsequent second instruction of the ordered instructions are non-branching instructions, and the steering unit stores the first and second instructions in two non-branching instruction issue queue entries of a shadow queue. The steering unit determines whether or not a third instruction the ordered instructions is a branch instruction, where the third instruction is subsequent to the second instruction. If the third instruction is a branch instruction, the steering unit stores the third instruction in a branch entry of the shadow queue; otherwise, the steering unit stores a no operation instruction in the branch entry of the shadow queue.09-30-2010
20120066483Computing Device with Asynchronous Auxiliary Execution Unit - A computing device includes: an instruction cache storing primary execution unit instructions and auxiliary execution unit instructions in a sequential order; a primary execution unit configured to receive and execute the primary execution unit instructions from the instruction cache; an auxiliary execution unit configured to receive and execute only the auxiliary execution unit instructions from the instruction cache in a manner independent from and asynchronous to the primary execution unit; and completion circuitry configured to coordinate completion of the primary execution unit instructions by the primary execution unit and the auxiliary execution unit instructions according to the sequential order.03-15-2012
20090094445PROCESS RETEXT FOR DYNAMICALLY LOADED MODULES - A computer implemented method, apparatus, and computer program product for dynamically loading a module into an application address space. In response to receiving a checkpoint signal by a plurality of threads associated with an application running in a software partition, the plurality of threads rendezvous to a point outside an application text associated with the application. Rendezvousing the plurality of threads suspends execution of application text by the plurality of threads. The application text is moved out of an application address space for the application to form an available application address space. The available application address space is an address space that was occupied by the application text. A software module is moved into the available application address space.04-09-2009
20090177873INSTRUCTION GENERATION APPARATUS - A tampering-prevention-process generation apparatus (07-09-2009
20090198980FACILITATING PROCESSING IN A COMPUTING ENVIRONMENT USING AN EXTENDED DRAIN INSTRUCTION - An extended DRAIN instruction is used to stall processing within a computing environment. The instruction includes an indication of the one or more processing stages at which processing is to be stalled. It also includes a control that allows processing to be stalled for additional cycles, as desired.08-06-2009
20090077360Software constructed stands for execution on a multi-core architecture - In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.03-19-2009
20090254740INFORMATION PROCESSING DEVICE, ENCRYPTION METHOD OF INSTRUCTION CODE, AND DECRYPTION METHOD OF ENCRYPTED INSTRUCTION CODE - It is possible to achieve the protection of software with reduced overhead. For example, a memory for storing an encrypted code prepared in advance and a decryptor module for decrypting the code are provided. The decryptor module includes, for example, a three-stage pipeline and a selector for selecting one output from the outputs of each stage of the pipeline. When a branch instruction is issued and subsequent inputs of the pipeline are in the order of CD′10-08-2009
20110066833CHURCH-TURING THESIS: THE TURING IMMORTALITY PROBLEM SOLVED WITH A DYNAMIC REGISTER MACHINE - A new computing machine and new methods of executing and solving heretofore unknown computational problems are presented here. The computing system demonstrated here can be implemented with a program composed of instructions such that instructions may be added or removed while the instructions are being executed. The computing machine is called a Dynamic Register Machine. The methods demonstrated apply to new hardware and software technology. The new machine and methods enable advances in machine learning, new and more powerful programming languages, and more powerful and flexible compilers and interpreters.03-17-2011
20090319762DYNAMIC RECONFIGURABLE CIRCUIT AND DATA TRANSMISSION CONTROL METHOD - A dynamic reconfigurable circuit includes multiple clusters each including a group of reconfigurable processing elements. The dynamic reconfigurable circuit is capable of dynamically changing a configuration of the clusters according to a context including a description of processing of the processing elements and of connection between the processing elements. A first cluster among the clusters includes a signal generating circuit that when an instruction to change the context is received, generates a report signal indicative of the instruction to change the context; a signal adding circuit that adds the report signal generated by the signal generating circuit to output data that is to be transmitted from the first cluster to a second cluster; and a data clearing circuit that, when output data to which a report signal generated by the second cluster is added is received, performs a clearing process of clearing the output data received.12-24-2009
20110072248UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR - One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.03-24-2011
20110154001SYSTEM AND METHOD FOR PROCESSING INTERRUPTS IN A COMPUTING SYSTEM - A system, processor and method are provided for digital signal processing A processor may initiate processing a sequence of instructions followed by an interrupt Each instruction may be processed in respective sequential pipeline slots A branch detector may detect or determine if an instruction is a branch instruction, for example, in turn, for each sequential instruction In one embodiment, the branch detector may detect if an instruction is a branch instruction until at least a first branch instruction is detected. A processor may annul instructions which are determined to be branch instructions when the interrupt occupies a delay slot associated with the branch instruction. An execution unit may execute at least the sequence of instructions to run a program. The branch detector and/or execution unit may be integral or separate from each other and from the processor06-23-2011
20090138688Multi-Die Processor - Disclosed are a multi-die processor apparatus and system. Processor logic to execute one or more instructions is allocated among two or more face-to-faces stacked dice. The processor includes a conductive interface between the stacked dice to facilitate die-to-die communication.05-28-2009
20120210106METHOD OF PROCESSING INSTRUCTIONS IN PIPELINE-BASED PROCESSOR - The present invention discloses a method of processing instructions in a pipeline-based central processing unit, wherein the pipeline is partitioned into base pipeline stages and enhanced pipeline stages according to functions, the base pipeline stages being activated all the while, and the enhanced pipeline stages being activated or shutdown according to requirements for performance of a workload. The present invention further discloses a method of processing instructions in a pipeline-based central processing unit, wherein the pipeline is partitioned into base pipeline stages and enhanced pipeline stages according to functions, each pipeline stage being partitioned into a base module and at least one enhanced module, the base module being activated all the while, and the enhanced module being activated or shutdown according to requirements for performance of a workload.08-16-2012
20110219220Link Stack Repair of Erroneous Speculative Update - Whenever a link address is written to the link stack, the prior value of the link stack entry is saved, and is restored to the link stack after a link stack push operation is speculatively executed following a mispredicted branch. This condition is detected by maintaining a count of the total number of uncommitted link stack write instructions in the pipeline, and a count of the number of uncommitted link stack write instructions ahead of each branch instruction. When a branch is evaluated and determined to have been mispredicted, the count associated with it is compared to the total count. A discrepancy indicates a link stack write instruction was speculatively issued into the pipeline after the mispredicted branch instruction, and pushed a link address onto the link stack. The prior link address is restored to the link stack from the link stack restore buffer.09-08-2011
20110320786Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.12-29-2011
20110167246Systems and Methods for Data Detection Including Dynamic Scaling - Various embodiments of the present invention provide systems and methods for data processing. For example, a data processing system is disclosed that includes a channel detector circuit. The channel detector circuit includes a branch metric calculator circuit that is operable to receive a number of violated checks from a preceding stage, and to scale an intrinsic branch metric using a scalar selected based at least in part on the number of violated checks to yield a scaled intrinsic branch metric.07-07-2011
20130198497MAJOR BRANCH INSTRUCTIONS WITH TRANSACTIONAL MEMORY - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.08-01-2013
20120072706Microcode for Transport Triggered Architecture Central Processing Units - The different advantageous embodiments provide an apparatus comprising a central processing unit, a microcode store, and a number of functional units. The central processing unit utilizes transport triggered architecture and is configured to execute microcoded instructions that allow a single instruction to be executed as multiple instructions. The microcode store includes a number of microcoded instruction implementations. The number of functional units includes a number of useful entry points into the microcode store.03-22-2012
20120072705Obtaining And Releasing Hardware Threads Without Hypervisor Involvement - A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.03-22-2012
20100095102INDIRECT BRANCH PROCESSING PROGRAM AND INDIRECT BRANCH PROCESSING METHOD - A processor reads an interpreter program to start an interpreter. The interpreter makes branch prediction by executing, in place of an indirect branch instruction that is necessary for execution of a source program, storing branch destination addresses in the indirect branch instruction in a link register (the processor internally stacks the branch destination addresses also in a return address stack) in inverse order and reading the addresses from the return address stack in one at a time manner.04-15-2010
20120254597BRANCH-AND-BOUND ON DISTRIBUTED DATA-PARALLEL EXECUTION ENGINES - A distributed data-parallel execution (DDPE) system splits a computational problem into a plurality of sub-problems using a branch-and-bound algorithm, designates a synchronous stop time for a “plurality of processors” (for example, a cluster) for each round of execution, processes the search tree by recursively using a branch-and-bound algorithm in multiple rounds (without inter-processor communications), determines if further processing is required based on the processing round state data, and terminates processing on the processors when processing is completed.10-04-2012
20100011195PROCESSOR - A processor includes a plurality of executing sections configured to simultaneously execute instructions for a plurality of threads, an instruction issuing section configured to issue instructions to the plurality of executing sections, and an instruction sync monitoring section configured to, when an instruction-synchronizing instruction is issued to one or more of the plurality of executing sections from the instruction issuing section, monitor completion of execution of the instruction-synchronizing instruction for each of the executing sections, to which the instruction-synchronizing instruction has been issued, thus detecting completion of execution of preceding instructions for the thread to which the instruction-synchronizing instruction belongs. After issuing the instruction-synchronizing instruction, the instruction issuing section stops issuance of succeeding instructions for the thread to which the instruction-synchronizing instruction belongs, until the completion of execution of the preceding instructions for the thread to which the instruction-synchronizing instruction belongs is detected by the instruction sync monitoring section.01-14-2010
20080301419Process model control flow with multiple synchronizations - Activations of a plurality of incoming branches may be detected at a synchronization point having a plurality of outgoing branches. A first synchronization may be executed after a first number of activations is detected, and at least one of a plurality of outgoing branches from the synchronization point may be activated, based on the first synchronization. A second synchronization may be executed after a second number of activations is detected, and at least a second one of the plurality of outgoing branches from the synchronization point may be activated, based on the second synchronization.12-04-2008
20120096246NONVOLATILE STORAGE USING LOW LATENCY AND HIGH LATENCY MEMORY - Nonvolatile storage includes first and second memory types with different read latencies. FLASH memory and phase change memory are examples. A first portion of a data block is stored in the phase change memory and a second portion of the data block is stored in the FLASH memory. The first portion of the data block is accessed prior to the second portion of the data block during a read operation.04-19-2012
20110320785Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.12-29-2011
20130019085Efficient Recombining for Dual Path ExecutionAANM Cain, III; Harold W.AACI HartsdaleAAST NYAACO USAAGP Cain, III; Harold W. Hartsdale NY USAANM Daly; David M.AACI Croton on HudsonAAST NYAACO USAAGP Daly; David M. Croton on Hudson NY USAANM Huang; Michael C.AACI RochesterAAST NYAACO USAAGP Huang; Michael C. Rochester NY USAANM Moreira; Jose E.AACI IrvingtonAAST NYAACO USAAGP Moreira; Jose E. Irvington NY USAANM Park; ILAACI SeoulAACO KRAAGP Park; IL Seoul KR - A mechanism is provided for reducing a penalty for executing a correct branch of a branch instruction. An execution unit in a processor of a data processing system executes a first branch of the branch instruction from a main thread of a processor and executes a second branch of the branch instruction from an assist thread of the processor. The execution unit determines whether the main thread is a correct branch of the branch instruction or the assist thread is the correct branch of the branch instruction. Responsive to the assist thread being the correct branch of the branch instruction, the execution unit pauses execution of the branch instruction on both the main thread and the assist thread. The execution unit then properly inherits a context of the main thread in order that execution of the second branch may continue.01-17-2013
20130024675RETURN ADDRESS OPTIMISATION FOR A DYNAMIC CODE TRANSLATOR - A dynamic code translator with isoblocking uses a return trampoline having branch instructions conditioned on different isostates to optimize return address translation, by allowing the hardware to predict that the address of a future return will be the address of trampoline. An IP relative call is inserted into translated code to write the trampoline address to a target link register and a target return address stack used by the native machine to predict return addresses. If a computed subject return address matches a subject return address register value, the current isostate of the isoblock is written to an isostate register. The isostate value in the isostate register is then used to select the branch instruction in the trampoline for the true subject return address. Sufficient code area in the trampoline instruction set can be reserved for a number of compare/branch pairs which is equal to the number of available isostates.01-24-2013
20100287361Root Cause Analysis for Complex Event Processing - Root cause analysis for complex event processing is described. In embodiments, root cause analysis at a complex event processor is automatically performed by selecting an output event from an operator and correlating the output event to an input event using event type and lifetime data for the input event and the output event stored in a data store. Embodiments describe how the lifetime data can comprise a start time and an end time for the event, and the correlation can be based on a comparison of the start and end times between the input and output events. Embodiments describe how the correlation algorithm used is selected in dependence on the event type. In embodiments, a complex event processing engine comprises a logging unit arranged to store in the data store an indicator of an event type and lifetime data for each output event from an operator.11-11-2010
20130159684BATCHED REPLAYS OF DIVERGENT OPERATIONS - One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.06-20-2013
20120290820SUPPRESSION OF CONTROL TRANSFER INSTRUCTIONS ON INCORRECT SPECULATIVE EXECUTION PATHS - Techniques are disclosed relating to a processor that is configured to execute control transfer instructions (CTIs). In some embodiments, the processor includes a mechanism that suppresses results of mispredicted younger CTIs on a speculative execution path. This mechanism permits the branch predictor to maintain its fidelity, and eliminates spurious flushes of the pipeline. In one embodiment, a misprediction bit is be used to indicate that a misprediction has occurred, and younger CTIs than the CTI that was mispredicted are suppressed. In some embodiments, the processor may be configured to execute instruction streams from multiple threads. Each thread may include a misprediction indication. CTIs in each thread may execute in program order with respect to other CTIs of the thread, while instructions other than CTIs may execute out of program order.11-15-2012
20120030453INFORMATION PROCESSING APPARATUS, CACHE APPARATUS, AND DATA PROCESSING METHOD - A more efficient technique is provided in an information processing apparatus which executes processing using pipelines. An information processing apparatus according to this invention includes a first pipeline, second pipeline, processing unit, and reorder unit. The first pipeline has a plurality of first nodes, and shifts first data held in a first node to a first node. The second pipeline has a plurality of second nodes respectively corresponding to the first nodes of the first pipeline, and shifts second data held in a second node to a second node. The processing unit executes data processing using the first data and the second data. The reorder unit holds one of the output second data based on attribute information of the second data output from the second pipeline, and outputs the held second data to the second pipeline.02-02-2012
20130198496MAJOR BRANCH INSTRUCTIONS - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.08-01-2013
20120036342METHOD AND APPARATUS FOR HANDLING LANE CROSSING INSTRUCTIONS IN AN EXECUTION PIPELINE - The present invention provides a method and apparatus for handling lane-crossing instructions in an execution pipeline. One embodiment of the method includes conveying bits of an instruction from a register to an execution stage in a pipeline along a first data path that includes a lane crossing stage configured to change a first mapping of the register to the execution stage to a second mapping. The method also includes concurrently conveying the bits along a second data path from the register to the execution stage that bypasses the lane crossing stage. The method further includes selecting the first or second data path to provide the bits to the execution stage.02-09-2012
20120089822INFORMATION PROCESSING DEVICE AND EMULATION PROCESSING PROGRAM AND METHOD - An emulation processing method causing a computer including a first and a second processor to execute emulation processing, the emulation processing method includes: calculate a next instruction address next to a received instruction address, and transmit, to the second processor, the calculated instruction address and instruction information read out on the basis of the calculated instruction address, transmit, to the first processor, a first instruction address that is an instruction address included in an execution result of executed processing, and execute processing based on the instruction information received from the first processor, when a second instruction address that is the instruction address received from the first processor is identical to the first instruction address, and read out instruction information on the basis of the first instruction address and execute processing based on the instruction information read out, when the second instruction address is not identical to the first instruction address.04-12-2012

Patent applications in class Branching (e.g., delayed branch, loop control, branch predict, interrupt)

Patent applications in all subclasses Branching (e.g., delayed branch, loop control, branch predict, interrupt)