Venkumahanti, TX

Suresh Venkumahanti, Austin, TX US

Patent application number	Description	Published
20090070531	System and Method of Using An N-Way Cache - A system and method of using an n-way cache are disclosed. In an embodiment, a method includes determining a first way of a first instruction stored in a cache and storing the first way in a list of ways. The method also includes determining a second way of a second instruction stored in the cache and storing the second way in the list of ways. In an embodiment, the first way may be used to access a first cache line containing the first instruction and the second way may be used to access a second cache line containing the second instruction.	03-12-2009
20090070554	Register File System and Method for Pipelined Processing - The present disclosure includes a multi-threaded processor that includes a first register file associated with a first thread and a second register file associated with a second thread. At least one hardware resource is shared by the first and second register files. In addition, the first thread may have a pipeline access position that is non-sequential to the second thread. A method of accessing a plurality of register files is also disclosed. The method includes reading data from a first register file while concurrently reading data from a second register file. The first register file is associated with a first instruction stream and the second register file is associated with a second instruction stream. The first instruction stream is sequential to the second instruction stream in an execution pipeline of a processor, and the first register file is in a non-adjacent location with respect to the second register file.	03-12-2009
20090144519	Multithreaded Processor with Lock Indicator - Systems and methods including a multithreaded processor with a lock indicator are disclosed. In an embodiment, a system includes means for indicating a lock status of a shared resource in a multithreaded processor. The system includes means for automatically locking the shared resource before processing exception handling instructions associated with the shared resource. The system further includes means for unlocking the shared resource.	06-04-2009
20090172361	COMPLETION CONTINUE ON THREAD SWITCH MECHANISM FOR A MICROPROCESSOR - A thread switch mechanism and technique for a microprocessor is disclosed wherein a processing of a first thread is completed, and a continuation of a second thread is initiated during completion of the first thread. In one form, the technique includes processing a first thread at a pipeline of a processing device, and initiating processing of a second thread at a front end of the pipeline in response to an occurrence of a context switch event. The technique can also include initiating a instruction progress metric in response the context switch event. The technique can further include enabling completion of processing of instructions of the first thread that are at a back end of the pipeline at the occurrence of the context switch event until an expiry of the instruction progress metric.	07-02-2009
20090235051	System and Method of Selectively Committing a Result of an Executed Instruction - In a particular embodiment, a method is disclosed that includes receiving an instruction packet including a first instruction and a second instruction that is dependent on the first instruction at a processor having a plurality of parallel execution pipelines, including a first execution pipeline and a second execution pipeline. The method further includes executing in parallel at least a portion of the first instruction and at least a portion of the second instruction. The method also includes selectively committing a second result of executing the at least a portion of the second instruction with the second execution pipeline based on a first result related to execution of the first instruction with the first execution pipeline.	09-17-2009
20110289299	System and Method to Evaluate a Data Value as an Instruction - A system and method to evaluate a data value as an instruction is disclosed. For example, an apparatus configured to execute program code includes an execute unit configured to execute a first instruction associated with a location of a second instruction. The first instruction is identified by a program counter. The apparatus also includes a decode unit configured to receive the second instruction from the location and to decode the second instruction to generate a decoded second instruction without changing the program counter to point to the second instruction.	11-24-2011

Patent applications by Suresh Venkumahanti, Austin, TX US

Suresh K. Venkumahanti, Austin, TX US

Patent application number	Description	Published
20090070602	System and Method of Executing Instructions in a Multi-Stage Data Processing Pipeline - A device is disclosed that includes an instruction execution pipeline having multiple stages for executing an instruction. The device also includes a control logic circuit coupled to the instruction execution pipeline. The control logic circuit is adapted to skip at least one stage of the instruction execution pipeline during execution of the instruction. The control logic circuit is also adapted to execute at least one non-skipped stage during execution of the decoded instruction.	03-12-2009
20090132793	System and Method of Selectively Accessing a Register File - In a particular embodiment, a method is disclosed that includes identifying a first block of bits within a result to be written to a destination register by an execution unit. The result includes a plurality of bits having the first block of bits and a second block of bits. The first block of bits has a value of zero. The method further includes providing an encoded bit value representing the first block of bits to a control register and selectively writing the second block of bits, but not the first block of bits, to the destination register. The destination register is sized to receive the first and second blocks of bits.	05-21-2009
20090216993	System and Method of Data Forwarding Within An Execution Unit - In an embodiment, a method is disclosed that includes, comparing, during a write back stage at an execution unit, a write identifier associated with a result to be written to a register file from execution of a first instruction to a read identifier associated with a second instruction at an execution pipeline within an interleaved multi-threaded (IMT) processor having multiple execution units. When the write identifier matches the read identifier, the method further includes storing the result at a local memory of the execution unit for use by the execution unit in the subsequent read stage.	08-27-2009
20090327674	Loop Control System and Method - Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.	12-31-2009
20100299668	Associating Data for Events Occurring in Software Threads with Synchronized Clock Cycle Counters - Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-processor. In a particular embodiment, a method is disclosed that includes collecting data from a plurality of software threads being processed by a processor, where the data for each of the events includes a value of an associated clock cycle counter upon occurrence of the event. Data is correlated for the events occurring for each of the plurality of threads by starting each of a plurality of clock cycle counters associated with the software threads at a common time. Alternatively, data is correlated for the events by logging a synchronizing event within each of the plurality of software threads.	11-25-2010
20110138393	Thread Allocation and Clock Cycle Adjustment in an Interleaved Multi-Threaded Processor - Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-threaded processor. In a particular embodiment, a method allocates software threads to hardware threads. A number of software threads to be allocated is identified. It is determined when the number of software threads is less than a number of hardware threads. When the number of software threads is less than the number of hardware threads, at least two of the software threads are allocated to non-sequential hardware threads. A clock signal to be applied to the hardware threads is adjusted responsive to the non-sequential hardware threads allocated.	06-09-2011
20110173391	System and Method to Access a Portion of a Level Two Memory and a Level One Memory - A system and method to access data from a portion of a level two memory or from a level one memory is disclosed. In a particular embodiment, the system includes a level one cache and a level two memory. A first portion of the level two memory is coupled to an input port and is addressable in parallel with the level one cache.	07-14-2011
20110219212	System and Method of Processing Hierarchical Very Long Instruction Packets - A system and method of processing a hierarchical very long instruction word (VLIW) packet is disclosed. In a particular embodiment, a method of processing instructions is disclosed. The method includes receiving a hierarchical VLIW packet of instructions and decoding an instruction from the packet to determine whether the instruction is a single instruction or whether the instruction includes a subpacket that includes a plurality of sub-instructions. The method also includes, in response to determining that the instruction includes the subpacket, executing each of the sub-instructions.	09-08-2011
20120117327	Bimodal Branch Predictor Encoded in a Branch Instruction - Each branch instruction having branch prediction support has branch prediction bits in architecture specified bit positions in the branch instruction. An instruction cache supports modifying the branch instructions with updated branch prediction bits that are dynamically determined when the branch instruction executes.	05-10-2012
20120284488	Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as	11-08-2012
20120284489	Methods and Apparatus for Constant Extension in a Processor - Programs often require constants that cannot be encoded in a native instruction format, such as 32-bits. To provide an extended constant, an instruction packet is formed with constant extender information and a target instruction. The constant extender information encoded as a constant extender instruction provides a first set of constant bits, such as 26-bits for example, and the target instruction provides a second set of constant bits, such as 6-bits. The first set of constant bits are combined with the second set of constant bits to generate an extended constant for execution of the target instruction. The extended constant may be used as an extended source operand, an extended address for memory access instructions, an extended address for branch type of instructions, and the like. Multiple constant extender instructions may be used together to provide larger constants than can be provided by a single extension instruction.	11-08-2012
20130024663	Table Call Instruction for Frequently Called Functions - An apparatus includes a memory that stores an instruction including an opcode and an operand. The operand specifies an immediate value or a register indicator of a register storing the immediate value. The immediate value is usable to identify a function call address. The function call address is selectable from a plurality of function call addresses.	01-24-2013
20130073910	Interleaved Architecture Tracing and Microarchitecture Tracing - Systems and method for embedded trace macrocell (ETM) devices configured to dynamically interleave architecture/program tracing with microarchitecture/hardware tracing. An ETM device includes logic to enable interleaved program tracing and hardware state sampling. A core interface is configured to receive program trace and hardware state information of a microprocessor and a combining module is configured to interleave the program trace and hardware state information. A packet generation module may be configured to packetize the program trace and hardware state information into packets at operational speeds of the microprocessor.	03-21-2013
20130086290	Low Latency Two-Level Interrupt Controller Interface to Multi-Threaded Processor - Systems and method for reducing interrupt latency time in a multi-threaded processor. A first interrupt controller is coupled to the multi-threaded processor. A second interrupt controller is configured to communicate a first interrupt and a first vector identifier to the first interrupt controller, wherein the first interrupt controller is configured to process the first interrupt and the first vector identifier and send the processed interrupt to a thread in the multi-threaded processor. Logic is configured to determine when the multi-threaded processor is ready to receive a second interrupt. A dedicated line is used to communicate an indication to the second interrupt controller that the multi-threaded processor is ready to receive the second interrupt.	04-04-2013
20130117535	Selective Writing of Branch Target Buffer - A method includes executing a branch instruction and determining if a branch is taken. The method further includes evaluating a number of instructions associated with the branch instruction. Upon determining that the branch is taken, the method includes selectively writing an entry into a branch target buffer that corresponds to the taken branch responsive to determining that the number of instructions is less than a threshold.	05-09-2013
20130185511	Hybrid Write-Through/Write-Back Cache Policy Managers, and Related Systems and Methods - Embodiments disclosed in the detailed description include hybrid write-through/write-back cache policy managers, and related systems and methods. A cache write policy manager is configured to determine whether at least two caches among a plurality of parallel caches are active. If all of one or more other caches are not active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-hack cache policy. In this manner, the cache write policy manager may conserve power and/or increase performance of a singly active processor core. If any of the one or more other caches are active, the cache write policy manager is configured to instruct an active cache among the parallel caches to apply a write-through cache policy. In this manner, the cache write policy manager facilitates data coherency among the parallel caches when multiple processor cores are active.	07-18-2013
20130185515	Utilizing Negative Feedback from Unexpected Miss Addresses in a Hardware Prefetcher - Systems and methods for populating a cache using a hardware prefetcher are disclosed. A method for prefetching cache entries includes determining an initial stride value based on at least a first and second demand miss address in the cache, verifying the initial stride value based on a third demand miss address in the cache, prefetching a predetermined number of cache entries based on the verified initial stride value, determining an expected next miss address in the cache based on the verified initial stride value and addresses of the prefetched cache entries; and confirming the verified initial stride value based on comparing the expected next miss address to a next demand miss address in the cache. If the verified initial stride value is confirmed, additional cache entries are prefetched. If the verified initial stride value is not confirmed, further prefetching is stalled and an alternate stride value is determined.	07-18-2013
20130185516	Use of Loop and Addressing Mode Instruction Set Semantics to Direct Hardware Prefetching - Systems and methods for prefetching cache lines into a cache coupled to a processor. A hardware prefetcher is configured to recognize a memory access instruction as an auto-increment-address (AIA) memory access instruction, infer a stride value from an increment field of the AIA instruction, and prefetch lines into the cache based on the stride value. Additionally or alternatively, the hardware prefetcher is configured to recognize that prefetched cache lines are part of a hardware loop, determine a maximum loop count of the hardware loop, and a remaining loop count as a difference between the maximum loop count and a number of loop iterations that have been completed, select a number of cache lines to prefetch, and truncate an actual number of cache lines to prefetch to be less than or equal to the remaining loop count, when the remaining loop count is less than the selected number of cache lines.	07-18-2013
20130205115	USING THE LEAST SIGNIFICANT BITS OF A CALLED FUNCTION'S ADDRESS TO SWITCH PROCESSOR MODES - Systems and methods for tracking and switching between execution modes in processing systems. A processing system is configured to execute instructions in at least two instruction execution triodes including a first and second execution mode chosen from a classic/aligned mode and a compressed/unaligned mode. Target addresses of selected instructions such as calls and returns are forcibly misaligned in the compressed mode, such one or more bits, such as, the least significant bits (alignment bits) of the target address in the compressed mode are different from the corresponding alignment bits in the classic mode. When the selected instructions are encountered during execution in the first mode, a decision to switch operation to the second mode is based on analyzing the alignment bits of the target address of the selected instruction.	08-08-2013
20130283023	Bimodal Compare Predictor Encoded In Each Compare Instruction - Systems and methods for branch prediction, including predicting evaluation of a producer instruction such as a compare instruction, by encoding a prediction field in the producer instruction, and predicting evaluation of the producer instruction by using the encoded prediction field. A consumer instruction such as a conditional branch instruction predicated on the producer instruction can be speculatively executed based on the predicted evaluation of the producer instruction. The producer instruction is executed in an execution pipeline to determine an actual evaluation of the producer instruction, and the prediction field is updated, if necessary, based on the actual evaluation and the predicted evaluation. The producer instruction can be updated in memory with the updated prediction field.	10-24-2013
20130304994	Per Thread Cacheline Allocation Mechanism in Shared Partitioned Caches in Multi-Threaded Processors - Systems and methods for allocation of cache lines in a shared partitioned cache of a multi-threaded processor. A memory management unit is configured to determine attributes associated with an address for a cache entry associated with a processing thread to be allocated in the cache. A configuration register is configured to store cache allocation information based on the determined attributes. A partitioning register is configured to store partitioning information for partitioning the cache into two or more portions. The cache entry is allocated into one of the portions of the cache based on the configuration register and the partitioning register.	11-14-2013
20140181405	INSTRUCTION CACHE HAVING A MULTI-BIT WAY PREDICTION MASK - In a particular embodiment, an apparatus includes control logic configured to selectively set bits of a multi-bit way prediction mask based on a prediction mask value. The control logic is associated with an instruction cache including a data array. A subset of line drivers of the data array is enabled responsive to the multi-bit way prediction mask. The subset of line drivers includes multiple line drivers.	06-26-2014
20140181459	SPECULATIVE ADDRESSING USING A VIRTUAL ADDRESS-TO-PHYSICAL ADDRESS PAGE CROSSING BUFFER - A method includes receiving an instruction to be executed by a processor. The method further includes performing a lookup in a page crossing buffer that includes one or more entries to determine if the instruction has an entry in the page crossing buffer. Each of the entries includes a physical address. The method further includes, when the page crossing buffer has the entry in the page crossing buffer, retrieving a particular physical address from the entry in the page crossing buffer.	06-26-2014
20140201449	DATA CACHE WAY PREDICTION - In a particular embodiment, a method, includes identifying one or more way prediction characteristics of an instruction. The method also includes selectively reading, based on identification of the one or more way prediction characteristics, a table to identify an entry of the table associated with the instruction that identifies a way of a data cache. The method further includes making a prediction whether a next access of the data cache based, on the instruction will access the way.	07-17-2014
20140201494	OVERLAP CHECKING FOR A TRANSLATION LOOKASIDE BUFFER (TLB) - An apparatus includes a translation lookaside buffer (TLB). The TLB includes at least one entry that includes an entry virtual address and an entry page size indication corresponding to an entry page. The apparatus also includes input logic configured to receive an input page size indication and an input virtual address corresponding to an input page. The apparatus further includes overlap checking logic configured to determine, based at least in part on the entry page size indication and the input page size indication, whether the input page overlaps the entry page.	07-17-2014
20140244986	SYSTEM AND METHOD TO SELECT A PACKET FORMAT BASED ON A NUMBER OF EXECUTED THREADS - A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer).	08-28-2014
20140258680	PARALLEL DISPATCH OF COPROCESSOR INSTRUCTIONS IN A MULTI-THREAD PROCESSOR - Techniques are addressed for parallel dispatch of coprocessor and thread instructions to a coprocessor coupled to a threaded processor. A first packet of threaded processor instructions is accessed from an instruction fetch queue (IFQ) and a second packet of coprocessor instructions is accessed from the IFQ. The IFQ includes a plurality of thread queues that are each configured to store instructions associated with a specific thread of instructions. A dispatch circuit is configured to select the first packet of thread instructions from the IFQ and the second packet of coprocessor instructions from the IFQ and send the first packet to a threaded processor and the second packet to the coprocessor in parallel. A data port is configured to share data between the coprocessor and a register file in the threaded processor. Data port operations are accomplished without affecting operations on any thread executing on the threaded processor.	09-11-2014
20140281440	PRECALCULATING THE DIRECT BRANCH PARTIAL TARGET ADDRESS DURING MISSPREDICTION CORRECTION PROCESS - An example method of storing a partial target address in an instruction cache includes receiving a branch instruction. The method also includes predicting a direction of the branch instruction as being not taken. The method further includes calculating a destination address based on executing the branch instruction. The method also includes determining a partial target address using the destination address. The method further includes in response to the predicted direction of the branch instruction changing from not taken to taken, replacing an offset in an instruction cache with the partial target address.	09-18-2014

Patent applications by Suresh K. Venkumahanti, Austin, TX US

Suresh Kumar Venkumahanti, Austin, TX US

Patent application number	Description	Published
20150301573	LATENCY-BASED POWER MODE UNITS FOR CONTROLLING POWER MODES OF PROCESSOR CORES, AND RELATED METHODS AND SYSTEMS - Latency-based power mode units for controlling power modes of processor cores, and related methods and systems are disclosed. In one aspect, the power mode units are configured to reduce power provided to the processor core when the processor core has one or more threads in pending status and no threads in active status. An operand of an instruction being processed by a thread may be data in memory located outside processor core. If the processor core does not require as much power to operate while a thread waits for a request from outside the processor core, the power consumed by the processor core can be reduced during these waiting periods. Power can be conserved in the processor core even when threads are being processed if the only threads being processed are in pending status, and can reduce the overall power consumption in the processor core and its corresponding CPU.	10-22-2015

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Venkumahanti, TX

Suresh Venkumahanti, Austin, TX US

Suresh K. Venkumahanti, Austin, TX US

Suresh Kumar Venkumahanti, Austin, TX US