Patent application number | Description | Published |
20090070602 | System and Method of Executing Instructions in a Multi-Stage Data Processing Pipeline - A device is disclosed that includes an instruction execution pipeline having multiple stages for executing an instruction. The device also includes a control logic circuit coupled to the instruction execution pipeline. The control logic circuit is adapted to skip at least one stage of the instruction execution pipeline during execution of the instruction. The control logic circuit is also adapted to execute at least one non-skipped stage during execution of the decoded instruction. | 03-12-2009 |
20090132793 | System and Method of Selectively Accessing a Register File - In a particular embodiment, a method is disclosed that includes identifying a first block of bits within a result to be written to a destination register by an execution unit. The result includes a plurality of bits having the first block of bits and a second block of bits. The first block of bits has a value of zero. The method further includes providing an encoded bit value representing the first block of bits to a control register and selectively writing the second block of bits, but not the first block of bits, to the destination register. The destination register is sized to receive the first and second blocks of bits. | 05-21-2009 |
20090216993 | System and Method of Data Forwarding Within An Execution Unit - In an embodiment, a method is disclosed that includes, comparing, during a write back stage at an execution unit, a write identifier associated with a result to be written to a register file from execution of a first instruction to a read identifier associated with a second instruction at an execution pipeline within an interleaved multi-threaded (IMT) processor having multiple execution units. When the write identifier matches the read identifier, the method further includes storing the result at a local memory of the execution unit for use by the execution unit in the subsequent read stage. | 08-27-2009 |
20090327674 | Loop Control System and Method - Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value. | 12-31-2009 |
20100299668 | Associating Data for Events Occurring in Software Threads with Synchronized Clock Cycle Counters - Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-processor. In a particular embodiment, a method is disclosed that includes collecting data from a plurality of software threads being processed by a processor, where the data for each of the events includes a value of an associated clock cycle counter upon occurrence of the event. Data is correlated for the events occurring for each of the plurality of threads by starting each of a plurality of clock cycle counters associated with the software threads at a common time. Alternatively, data is correlated for the events by logging a synchronizing event within each of the plurality of software threads. | 11-25-2010 |
20110138393 | Thread Allocation and Clock Cycle Adjustment in an Interleaved Multi-Threaded Processor - Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-threaded processor. In a particular embodiment, a method allocates software threads to hardware threads. A number of software threads to be allocated is identified. It is determined when the number of software threads is less than a number of hardware threads. When the number of software threads is less than the number of hardware threads, at least two of the software threads are allocated to non-sequential hardware threads. A clock signal to be applied to the hardware threads is adjusted responsive to the non-sequential hardware threads allocated. | 06-09-2011 |
20110173391 | System and Method to Access a Portion of a Level Two Memory and a Level One Memory - A system and method to access data from a portion of a level two memory or from a level one memory is disclosed. In a particular embodiment, the system includes a level one cache and a level two memory. A first portion of the level two memory is coupled to an input port and is addressable in parallel with the level one cache. | 07-14-2011 |
20110219212 | System and Method of Processing Hierarchical Very Long Instruction Packets - A system and method of processing a hierarchical very long instruction word (VLIW) packet is disclosed. In a particular embodiment, a method of processing instructions is disclosed. The method includes receiving a hierarchical VLIW packet of instructions and decoding an instruction from the packet to determine whether the instruction is a single instruction or whether the instruction includes a subpacket that includes a plurality of sub-instructions. The method also includes, in response to determining that the instruction includes the subpacket, executing each of the sub-instructions. | 09-08-2011 |
20120117327 | Bimodal Branch Predictor Encoded in a Branch Instruction - Each branch instruction having branch prediction support has branch prediction bits in architecture specified bit positions in the branch instruction. An instruction cache supports modifying the branch instructions with updated branch prediction bits that are dynamically determined when the branch instruction executes. | 05-10-2012 |
20120284488 | Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as | 11-08-2012 |
20120284489 | Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as 32-bits. To provide an extended constant, an instruction packet is formed with constant extender information and a target instruction. The constant extender information encoded as a constant extender instruction provides a first set of constant bits, such as 26-bits for example, and the target instruction provides a second set of constant bits, such as 6-bits. The first set of constant bits are combined with the second set of constant bits to generate an extended constant for execution of the target instruction. The extended constant may be used as an extended source operand, an extended address for memory access instructions, an extended address for branch type of instructions, and the like. Multiple constant extender instructions may be used together to provide larger constants than can be provided by a single extension instruction. | 11-08-2012 |
20130024663 | Table Call Instruction for Frequently Called Functions - An apparatus includes a memory that stores an instruction including an opcode and an operand. The operand specifies an immediate value or a register indicator of a register storing the immediate value. The immediate value is usable to identify a function call address. The function call address is selectable from a plurality of function call addresses. | 01-24-2013 |
20130073910 | Interleaved Architecture Tracing and Microarchitecture Tracing - Systems and method for embedded trace macrocell (ETM) devices configured to dynamically interleave architecture/program tracing with microarchitecture/hardware tracing. An ETM device includes logic to enable interleaved program tracing and hardware state sampling. A core interface is configured to receive program trace and hardware state information of a microprocessor and a combining module is configured to interleave the program trace and hardware state information. A packet generation module may be configured to packetize the program trace and hardware state information into packets at operational speeds of the microprocessor. | 03-21-2013 |
20130086290 | Low Latency Two-Level Interrupt Controller Interface to Multi-Threaded Processor - Systems and method for reducing interrupt latency time in a multi-threaded processor. A first interrupt controller is coupled to the multi-threaded processor. A second interrupt controller is configured to communicate a first interrupt and a first vector identifier to the first interrupt controller, wherein the first interrupt controller is configured to process the first interrupt and the first vector identifier and send the processed interrupt to a thread in the multi-threaded processor. Logic is configured to determine when the multi-threaded processor is ready to receive a second interrupt. A dedicated line is used to communicate an indication to the second interrupt controller that the multi-threaded processor is ready to receive the second interrupt. | 04-04-2013 |
20130117535 | Selective Writing of Branch Target Buffer - A method includes executing a branch instruction and determining if a branch is taken. The method further includes evaluating a number of instructions associated with the branch instruction. Upon determining that the branch is taken, the method includes selectively writing an entry into a branch target buffer that corresponds to the taken branch responsive to determining that the number of instructions is less than a threshold. | 05-09-2013 |
20130185511 | Hybrid Write-Through/Write-Back Cache Policy Managers, and Related Systems and Methods - Embodiments disclosed in the detailed description include hybrid write-through/write-back cache policy managers, and related systems and methods. A cache write policy manager is configured to determine whether at least two caches among a plurality of parallel caches are active. If all of one or more other caches are not active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-hack cache policy. In this manner, the cache write policy manager may conserve power and/or increase performance of a singly active processor core. If any of the one or more other caches are active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-through cache policy. In this manner, the cache write policy manager facilitates data coherency among the parallel caches when multiple processor cores are active. | 07-18-2013 |
20130185515 | Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher - Systems and methods for populating a cache using a hardware prefetcher are disclosed. A method for prefetching cache entries includes determining an initial stride value based on at least a first and second demand miss address in the cache, verifying the initial stride value based on a third demand miss address in the cache, prefetching a predetermined number of cache entries based on the verified initial stride value, determining an expected next miss address in the cache based on the verified initial stride value and addresses of the prefetched cache entries; and confirming the verified initial stride value based on comparing the expected next miss address to a next demand miss address in the cache. If the verified initial stride value is confirmed, additional cache entries are prefetched. If the verified initial stride value is not confirmed, further prefetching is stalled and an alternate stride value is determined. | 07-18-2013 |
20130185516 | Use of Loop and Addressing Mode Instruction Set Semantics to Direct Hardware Prefetching - Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an auto-increment-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines. | 07-18-2013 |
20130205115 | USING THE LEAST SIGNIFICANT BITS OF A CALLED FUNCTION'S ADDRESS TO SWITCH PROCESSOR MODES - Systems and methods for tracking and switching between execution modes in processing systems. A processing system is configured to execute instructions in at least two instruction execution triodes including a first and second execution mode chosen from a classic/aligned mode and a compressed/unaligned mode. Target addresses of selected instructions such as calls and returns are forcibly misaligned in the compressed mode, such one or more bits, such as, the least significant bits (alignment bits) of the target address in the compressed mode are different from the corresponding alignment bits in the classic mode. When the selected instructions are encountered during execution in the first mode, a decision to switch operation to the second mode is based on analyzing the alignment bits of the target address of the selected instruction. | 08-08-2013 |
20130283023 | Bimodal Compare Predictor Encoded In Each Compare Instruction - Systems and methods for branch prediction, including predicting evaluation of a producer instruction such as a compare instruction, by encoding a prediction field in the producer instruction, and predicting evaluation of the producer instruction by using the encoded prediction field. A consumer instruction such as a conditional branch instruction predicated on the producer instruction can be speculatively executed based on the predicted evaluation of the producer instruction. The producer instruction is executed in an execution pipeline to determine an actual evaluation of the producer instruction, and the prediction field is updated, if necessary, based on the actual evaluation and the predicted evaluation. The producer instruction can be updated in memory with the updated prediction field. | 10-24-2013 |
20130304994 | Per Thread Cacheline Allocation Mechanism in Shared Partitioned Caches in Multi-Threaded Processors - Systems and methods for allocation of cache lines in a shared partitioned cache of a multi-threaded processor. A memory management unit is configured to determine attributes associated with an address for a cache entry associated with a processing thread to be allocated in the cache. A configuration register is configured to store cache allocation information based on the determined attributes. A partitioning register is configured to store partitioning information for partitioning the cache into two or more portions. The cache entry is allocated into one of the portions of the cache based on the configuration register and the partitioning register. | 11-14-2013 |
20140181405 | INSTRUCTION CACHE HAVING A MULTI-BIT WAY PREDICTION MASK - In a particular embodiment, an apparatus includes control logic configured to selectively set bits of a multi-bit way prediction mask based on a prediction mask value. The control logic is associated with an instruction cache including a data array. A subset of line drivers of the data array is enabled responsive to the multi-bit way prediction mask. The subset of line drivers includes multiple line drivers. | 06-26-2014 |
20140181459 | SPECULATIVE ADDRESSING USING A VIRTUAL ADDRESS-TO-PHYSICAL ADDRESS PAGE CROSSING BUFFER - A method includes receiving an instruction to be executed by a processor. The method further includes performing a lookup in a page crossing buffer that includes one or more entries to determine if the instruction has an entry in the page crossing buffer. Each of the entries includes a physical address. The method further includes, when the page crossing buffer has the entry in the page crossing buffer, retrieving a particular physical address from the entry in the page crossing buffer. | 06-26-2014 |
20140201449 | DATA CACHE WAY PREDICTION - In a particular embodiment, a method, includes identifying one or more way prediction characteristics of an instruction. The method also includes selectively reading, based on identification of the one or more way prediction characteristics, a table to identify an entry of the table associated with the instruction that identifies a way of a data cache. The method further includes making a prediction whether a next access of the data cache based, on the instruction will access the way. | 07-17-2014 |
20140201494 | OVERLAP CHECKING FOR A TRANSLATION LOOKASIDE BUFFER (TLB) - An apparatus includes a translation lookaside buffer (TLB). The TLB includes at least one entry that includes an entry virtual address and an entry page size indication corresponding to an entry page. The apparatus also includes input logic configured to receive an input page size indication and an input virtual address corresponding to an input page. The apparatus further includes overlap checking logic configured to determine, based at least in part on the entry page size indication and the input page size indication, whether the input page overlaps the entry page. | 07-17-2014 |
20140244986 | SYSTEM AND METHOD TO SELECT A PACKET FORMAT BASED ON A NUMBER OF EXECUTED THREADS - A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer). | 08-28-2014 |
20140258680 | PARALLEL DISPATCH OF COPROCESSOR INSTRUCTIONS IN A MULTI-THREAD PROCESSOR - Techniques are addressed for parallel dispatch of coprocessor and thread instructions to a coprocessor coupled to a threaded processor. A first packet of threaded processor instructions is accessed from an instruction fetch queue (IFQ) and a second packet of coprocessor instructions is accessed from the IFQ. The IFQ includes a plurality of thread queues that are each configured to store instructions associated with a specific thread of instructions. A dispatch circuit is configured to select the first packet of thread instructions from the IFQ and the second packet of coprocessor instructions from the IFQ and send the first packet to a threaded processor and the second packet to the coprocessor in parallel. A data port is configured to share data between the coprocessor and a register file in the threaded processor. Data port operations are accomplished without affecting operations on any thread executing on the threaded processor. | 09-11-2014 |
20140281440 | PRECALCULATING THE DIRECT BRANCH PARTIAL TARGET ADDRESS DURING MISSPREDICTION CORRECTION PROCESS - An example method of storing a partial target address in an instruction cache includes receiving a branch instruction. The method also includes predicting a direction of the branch instruction as being not taken. The method further includes calculating a destination address based on executing the branch instruction. The method also includes determining a partial target address using the destination address. The method further includes in response to the predicted direction of the branch instruction changing from not taken to taken, replacing an offset in an instruction cache with the partial target address. | 09-18-2014 |