Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Paul Caprioli, Santa Clara US

Paul Caprioli, Santa Clara, CA US

Patent application number	Description	Published
20090187727	INDEX GENERATION FOR CACHE MEMORIES - Embodiments of the present invention provide a system that generates an index for a cache memory. The system starts by receiving a request to access the cache memory, wherein the request includes address information. The system then obtains non-address information associated with the request. Next, the system generates the index using the address information and the non-address information. The system then uses the index to fulfill access the cache memory.	07-23-2009
20090187906	SEMI-ORDERED TRANSACTIONS - Embodiments of the present invention provide a system that facilitates transactional execution in a processor. The system starts by executing program code for a thread in a processor. Upon detecting a predetermined indicator, the system starts a transaction for a section of the program code for the thread. When starting the transaction, the system executes a checkpoint instruction. If the checkpoint instruction is a WEAK_CHECKPOINT instruction, the system executes a semi-ordered transaction. During the semi-ordered transaction, the system preserves code atomicity but not memory atomicity. Otherwise, the system executes a regular transaction. During the regular transaction, the system preserves both code atomicity and memory atomicity.	07-23-2009
20090204761	PSEUDO-LRU CACHE LINE REPLACEMENT FOR A HIGH-SPEED CACHE - Embodiments of the present invention provide a system that replaces an entry in a least-recently-used way in a skewed-associative cache. The system starts by receiving a cache line address. The system then generates two or more indices using the cache line address. Next, the system generates two or more intermediate indices using the two or more indices. The system then uses at least one of the two or more indices or the two or more intermediate indices to perform a lookup in one or more lookup tables, wherein the lookup returns a value which identifies a least-recently-used way. Next, the system replaces the entry in the least-recently-used way.	08-13-2009
20090210683	METHOD AND APPARATUS FOR RECOVERING FROM BRANCH MISPREDICTION - Embodiments of the present invention provide a system that executes a branch instruction. When executing the branch instruction, the system obtains a stored prediction of a resolution of the branch instruction and fetches subsequent instructions for execution based on the predicted resolution of the branch instruction. If an actual resolution of the branch instruction is different from the predicted resolution (i.e., if the branch is mispredicted), the system updates the stored prediction of the resolution of the branch instruction to the actual resolution of the branch instruction. The system then re-executes the branch instruction. When re-executing the branch instruction, the system obtains the stored prediction of the resolution of the branch instruction and fetches subsequent instructions for execution based on the predicted resolution of the branch instruction.	08-20-2009
20090217013	METHOD AND APPARATUS FOR PROGRAMMATICALLY REWINDING A REGISTER INSIDE A TRANSACTION - Embodiments of the present invention provide a system that allocates registers in a processor. The system starts by commencing a transaction, wherein commencing the transaction involves preserving a pre-transactional state of registers in a first register file. The system then allocates one or more registers for temporary use during the transaction. Upon finishing using each allocated register during the transaction, the system executes an instruction that restores the allocated register to the pre-transactional state.	08-27-2009
20090254905	FACILITATING TRANSACTIONAL EXECUTION IN A PROCESSOR THAT SUPPORTS SIMULTANEOUS SPECULATIVE THREADING - Embodiments of the present invention provide a system that executes a transaction on a simultaneous speculative threading (SST) processor. In these embodiments, the processor includes a primary strand and a subordinate strand. Upon encountering a transaction with the primary strand while executing instructions non-transactionally, the processor checkpoints the primary strand and executes the transaction with the primary strand while continuing to non-transactionally execute deferred instructions with the subordinate strand. When the subordinate strand non-transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate the first strand ID. When the primary strand transactionally accesses a cache line during the transaction, the processor updates a record for the cache line to indicate a second strand ID.	10-08-2009
20090265532	ANTI-PREFETCH INSTRUCTION - Embodiments of the present invention execute an anti-prefetch instruction. These embodiments start by decoding instructions in a decode unit in a processor to prepare the instructions for execution. Upon decoding an anti-prefetch instruction, these embodiments stall the decode unit to prevent decoding subsequent instructions. These embodiments then execute the anti-prefetch instruction, wherein executing the anti-prefetch instruction involves: (1) sending a prefetch request for a cache line in an L1 cache; (2) determining if the prefetch request hits in the L1 cache; (3) if the prefetch request hits in the L1 cache, determining if the cache line contains a predetermined value; and (4) conditionally performing subsequent operations based on whether the prefetch request hits in the L1 cache or the value of the data in the cache line.	10-22-2009
20090282225	STORE QUEUE - Embodiments of the present invention provide a system which executes a load instruction or a store instruction. During operation the system receives a load instruction. The system then determines if an unrestricted entry or a restricted entry in a store queue contains data that satisfies the load instruction. If not, the system retrieves data for the load instruction from a cache. If so, the system conditionally forwards data from the unrestricted entry or the restricted entry by: (1) forwarding data from an unrestricted entry that contains the youngest store that satisfies the load instruction when any number of unrestricted or restricted entries contain data that satisfies the load instruction; (2) forwarding data from an unrestricted entry when only one restricted entry and no unrestricted entries contain data that satisfies the load instruction; and (3) deferring the load instruction by placing the load instruction in a deferred queue when two or more restricted entries and no unrestricted entries contain data that satisfies the load instruction.	11-12-2009
20090300338	AGGRESSIVE STORE MERGING IN A PROCESSOR THAT SUPPORTS CHECKPOINTING - Embodiments of the present invention provide a processor that merges stores in an N-entry first-in-first-out (FIFO) store queue. In these embodiments, the processor starts by executing instructions before a checkpoint is generated. When executing instructions before the checkpoint is generated, the processor is configured to perform limited or no merging of stores into existing entries in the store queue. Then, upon detecting a predetermined condition, the processor is configured to generate a checkpoint. After generating the checkpoint, the processor is configured to continue to execute instructions. When executing instructions after the checkpoint is generated, the processor is configured to freely merge subsequent stores into post-checkpoint entries in the store queue.	12-03-2009
20100161950	SEMI-ABSOLUTE BRANCH INSTRUCTIONS FOR EFFICIENT COMPUTERS - Apparatus and methods are disclosed for a computation processor that can execute a semi-absolute branch instruction, as well as methods of operation and of generating the semi-absolute branch instruction.	06-24-2010
20100180103	MECHANISM FOR INCREASING THE EFFECTIVE CAPACITY OF THE WORKING REGISTER FILE - A computer processor pipeline has both an architectural register file and a working register file. The lifetime of an entry in the working register file is determined by a predetermined number of instructions passing through a specified stage in the pipeline after the location in the working register file is allocated for an instruction. The size of the working register file is selected based upon performance characteristics. A working register file creditor indicator is coupled to the front end pipeline portion and to the back end pipeline portion. The working register file credit indicator is monitored to prevent a working register file overflow. When the a location in the architectural register file is read early, the location is monitored to determine whether the location is written to prior to issuance of the instruction associated with the early read.	07-15-2010
20100205344	UNIFIED CACHE STRUCTURE THAT FACILITATES ACCESSING TRANSLATION TABLE ENTRIES - One embodiment provides a system that includes a processor with a unified cache structure that facilitates accessing translation table entries (TTEs). This unified cache structure can simultaneously store program instructions, program data, and TTEs. During a memory access, the system receives a virtual memory address. The system then uses this virtual memory address to identify one or more cache lines in the unified cache structure which are associated with the virtual memory address. Next, the system compares a tag portion of the virtual memory address with the tags for the identified cache line(s) to identify a cache line that matches the virtual memory address. The system then loads a translation table entry that corresponds to the virtual memory address from the identified cache line.	08-12-2010
20100299482	METHOD AND APPARATUS FOR DETERMINING CACHE STORAGE LOCATIONS BASED ON LATENCY REQUIREMENTS - A method for determining whether to store binary information in a fast way or a slow way of a cache is disclosed. The method includes receiving a block of binary information to be stored in a cache memory having a plurality of ways. The plurality of ways includes a first subset of ways and a second subset of ways, wherein a cache access by a first execution core from one of the first subset of ways has a lower latency time than a cache access from one of the second subset of ways. The method further includes determining, based on a predetermined access latency and one or more parameters associated with the block of binary information, whether to store the block of binary information into one of the first set of ways or one of the second set of ways.	11-25-2010
20110167243	SPACE-EFFICIENT MECHANISM TO SUPPORT ADDITIONAL SCOUTING IN A PROCESSOR USING CHECKPOINTS - Techniques and structures are disclosed for a processor supporting checkpointing to operate effectively in scouting mode while a maximum number of supported checkpoints are active. Operation in scouting mode may include using bypass logic and a set of register storage locations to store and/or forward in-flight instruction results that were calculated during scouting mode. These forwarded results may be used during scouting mode to calculate memory load addresses for yet other in-flight instructions, and the processor may accordingly cause data to be prefetched from these calculated memory load addresses. The set of register storage locations may comprise a working register file or an active portion of a multiported register file.	07-07-2011
20120166756	INDEX GENERATION FOR CACHE MEMORIES - Embodiments of the present invention provide a system that generates an index for a cache memory. The system starts by receiving a request to access the cache memory, wherein the request includes address information. The system then obtains non-address information associated with the request. Next, the system generates the index using the address information and the non-address information. The system then uses the index to fulfill access the cache memory.	06-28-2012

Patent applications by Paul Caprioli, Santa Clara, CA US