Mohammad Abdallah, San Jose US

Mohammad Abdallah, San Jose, CA US

Patent application number	Description	Published
20120198122	GUEST TO NATIVE BLOCK ADDRESS MAPPINGS AND MANAGEMENT OF NATIVE CODE STORAGE - A method for managing mappings of storage on a code cache for a processor. The method includes storing a plurality of guest address to native address mappings as entries in a conversion look aside buffer, wherein the entries indicate guest addresses that have corresponding converted native addresses stored within a code cache memory, and receiving a subsequent request for a guest address at the conversion look aside buffer. The conversion look aside buffer is indexed to determine whether there exists an entry that corresponds to the index, wherein the index comprises a tag and an offset that is used to identify the entry that corresponds to the index. Upon a hit on the tag, the corresponding entry is accessed to retrieve a pointer to the code cache memory corresponding block of converted native instructions. The corresponding block of converted native instructions are fetched from the code cache memory for execution.	08-02-2012
20120198157	GUEST INSTRUCTION TO NATIVE INSTRUCTION RANGE BASED MAPPING USING A CONVERSION LOOK ASIDE BUFFER OF A PROCESSOR - A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions, and assembling the plurality of guest instructions into a guest instruction block. The guest instruction block is converted into a corresponding native conversion block. The native conversion block is stored into a native cache. A mapping of the guest instruction block to corresponding native conversion block is stored in a conversion look aside buffer. Upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates whether the guest instruction has a corresponding converted native instruction in the native cache. The converted native instruction is forwarded for execution in response to the hit.	08-02-2012
20120198168	VARIABLE CACHING STRUCTURE FOR MANAGING PHYSICAL STORAGE - A method for managing a variable caching structure for managing storage for a processor. The method includes using a multi-way tag array to store a plurality of pointers for a corresponding plurality of different size groups of physical storage of a storage stack, wherein the pointers indicate guest addresses that have corresponding converted native addresses stored within the storage stack, and allocating a group of storage blocks of the storage stack, wherein the size of the allocation is in accordance with a corresponding size of one of the plurality of different size groups. Upon a hit on the tag, a corresponding entry is accessed to retrieve a pointer that indicates where in the storage stack a corresponding group of storage blocks of converted native instructions reside. The converted native instructions are then fetched from the storage stack for execution.	08-02-2012
20120198209	GUEST INSTRUCTION BLOCK WITH NEAR BRANCHING AND FAR BRANCHING SEQUENCE CONSTRUCTION TO NATIVE INSTRUCTION BLOCK - A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions comprising at least one guest far branch, and building an instruction sequence from the plurality of guest instructions by using branch prediction on the at least one guest far branch. The method further includes assembling a guest instruction block from the instruction sequence. The guest instruction block is translated to a corresponding native conversion block, wherein an at least one native far branch that corresponds to the at least one guest far branch and wherein the at least one native far branch includes an opposite guest address for an opposing branch path of the at least one guest far branch. Upon encountering a missprediction, a correct instruction sequence is obtained by accessing the opposite guest address.	08-02-2012
20120246448	MEMORY FRAGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A system for executing instructions using a plurality of memory fragments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality memory fragments are coupled to the partitionable engines for providing data storage.	09-27-2012
20120246450	REGISTER FILE SEGMENTS FOR SUPPORTING CODE BLOCK EXECUTION BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A system for executing instructions using a plurality of register file segments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality register file segments are coupled to the partitionable engines for providing data storage.	09-27-2012
20120246657	EXECUTING INSTRUCTION SEQUENCE CODE BLOCKS BY USING VIRTUAL CORES INSTANTIATED BY PARTITIONABLE ENGINES - A method for executing instructions using a plurality of virtual cores for a processor. The method includes receiving an incoming instruction sequence using a global front end scheduler, and partitioning the incoming instruction sequence into a plurality of code blocks of instructions. The method further includes generating a plurality of inheritance vectors describing interdependencies between instructions of the code blocks, and allocating the code blocks to a plurality of virtual cores of the processor, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines. The code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors.	09-27-2012
20120297170	DECENTRALIZED ALLOCATION OF RESOURCES AND INTERCONNNECT STRUCTURES TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES - A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.	11-22-2012
20120297396	INTERCONNECT STRUCTURE TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES - A global interconnect system. The global interconnect system includes a plurality of resources having data for supporting the execution of multiple code sequences and a plurality of engines for implementing the execution of the multiple code sequences. A plurality of resource consumers are within each of the plurality of engines. A global interconnect structure is coupled to the plurality of resource consumers and coupled to the plurality of resources to enable data access and execution of the multiple code sequences, wherein the resource consumers access the resources through a per cycle utilization of the global interconnect structure.	11-22-2012
20130024619	MULTILEVEL CONVERSION TABLE CACHE FOR TRANSLATING GUEST INSTRUCTIONS TO NATIVE INSTRUCTIONS - A method for translating instructions for a processor. The method includes accessing a guest instruction and performing a first level translation of the guest instruction using a first level conversion table. The method further includes outputting a resulting native instruction when the first level translation proceeds to completion. A second level translation of the guest instruction is performed using a second level conversion table when the first level translation does not proceed to completion, wherein the second level translation further processes the guest instruction based upon a partial translation from the first level conversion table. The resulting native instruction is output when the second level translation proceeds to completion.	01-24-2013
20130024661	HARDWARE ACCELERATION COMPONENTS FOR TRANSLATING GUEST INSTRUCTIONS TO NATIVE INSTRUCTIONS - A hardware based translation accelerator. The hardware includes a guest fetch logic component for accessing guest instructions; a guest fetch buffer coupled to the guest fetch logic component and a branch prediction component for assembling guest instructions into a guest instruction block; and conversion tables coupled to the guest fetch buffer for translating the guest instruction block into a corresponding native conversion block. The hardware further includes a native cache coupled to the conversion tables for storing the corresponding native conversion block, and a conversion look aside buffer coupled to the native cache for storing a mapping of the guest instruction block to corresponding native conversion block, wherein upon a subsequent request for a guest instruction, the conversion look aside buffer is indexed to determine whether a hit occurred, wherein the mapping indicates the guest instruction has a corresponding converted native instruction in the native cache.	01-24-2013
20130238874	SYSTEMS AND METHODS FOR ACCESSING A UNIFIED TRANSLATION LOOKASIDE BUFFER - Systems and methods for accessing a unified translation lookaside buffer (TLB) are disclosed. A method includes receiving an indicator of a level one translation lookaside buffer (L	09-12-2013
20130311759	INSTRUCTION SEQUENCE BUFFER TO ENHANCE BRANCH PREDICTION EFFICIENCY - A method for outputting alternative instruction sequences. The method includes tracking repetitive hits to determine a set of frequently hit instruction sequences for a microprocessor. A frequently miss-predicted branch instruction is identified, wherein the predicted outcome of the branch instruction is frequently wrong. An alternative instruction sequence for the branch instruction target is stored into a buffer. On a subsequent hit to the branch instruction where the predicted outcome of the branch instruction was wrong, the alternative instruction sequence is output from the buffer.	11-21-2013
20140032844	SYSTEMS AND METHODS FOR FLUSHING A CACHE WITH MODIFIED DATA - Systems and methods for flushing a cache with modified data are disclosed. Responsive to a request to flush data from a cache with modified data to a next level cache that does not include the cache with modified data, the cache with modified data is accessed using an index and a way and an address associated with the index and the way is secured. Using the address, the cache with modified data is accessed a second time and an entry that is associated with the address is retrieved from the cache with modified data. The entry is placed into a location of the next level cache.	01-30-2014
20140032845	SYSTEMS AND METHODS FOR SUPPORTING A PLURALITY OF LOAD ACCESSES OF A CACHE IN A SINGLE CYCLE - A method for supporting a plurality of load accesses is disclosed. A plurality of requests to access a data cache is accessed, and in response, a tag memory is accessed that maintains a plurality of copies of tags for each entry in the data cache. Tags are identified that correspond to individual requests. The data cache is accessed based on the tags that correspond to the individual requests. A plurality of requests to access the same block of the plurality of blocks causes an access arbitration that is executed in the same clock cycle as is the access of the tag memory.	01-30-2014
20140032846	SYSTEMS AND METHODS FOR SUPPORTING A PLURALITY OF LOAD AND STORE ACCESSES OF A CACHE - Systems and methods for supporting a plurality of load and store accesses of a cache are disclosed. Responsive to a request of a plurality of requests to access a block of a plurality of blocks of a load cache, the block of the load cache and a logically and physically paired block of a store coalescing cache are accessed in parallel. The data that is accessed from the block of the load cache is overwritten by the data that is accessed from the block of the store coalescing cache by merging on a per byte basis. Access is provided to the merged data.	01-30-2014
20140032856	SYSTEMS AND METHODS FOR MAINTAINING THE COHERENCY OF A STORE COALESCING CACHE AND A LOAD CACHE - A method for maintaining the coherency of a store coalescing cache and a load cache is disclosed. As a part of the method, responsive to a write-back of an entry from a level one store coalescing cache to a level two cache, the entry is written into the level two cache and into the level one load cache. The writing of the entry into the level two cache and into the level one load cache is executed at the speed of access of the level two cache.	01-30-2014
20140075168	INSTRUCTION SEQUENCE BUFFER TO STORE BRANCHES HAVING RELIABLY PREDICTABLE INSTRUCTION SEQUENCES - A method for outputting reliably predictable instruction sequences. The method includes tracking repetitive hits to determine a set of frequently hit instruction sequences for a microprocessor, and out of that set, identifying a branch instruction having a series of subsequent frequently executed branch instructions that form a reliably predictable instruction sequence. The reliably predictable instruction sequence is stored into a buffer. On a subsequent hit to the branch instruction, the reliably predictable instruction sequence is output from the buffer.	03-13-2014
20140108729	SYSTEMS AND METHODS FOR LOAD CANCELING IN A PROCESSOR THAT IS CONNECTED TO AN EXTERNAL INTERCONNECT FABRIC - Systems and methods for load canceling in a processor that is connected to an external interconnect fabric are disclosed. As a part of a method for load canceling in a processor that is connected to an external bus, and responsive to a flush request and a corresponding cancellation of pending speculative loads from a load queue, a type of one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor, is converted from load to prefetch. Data corresponding to one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor is accessed and returned to cache as prefetch data. The prefetch data is retired in a cache location of the processor.	04-17-2014
20140108730	SYSTEMS AND METHODS FOR NON-BLOCKING IMPLEMENTATION OF CACHE FLUSH INSTRUCTIONS - Systems and methods for non-blocking implementation of cache flush instructions are disclosed. As a part of a method, data is accessed that is received in a write-back data holding buffer from a cache flushing operation, the data is flagged with a processor identifier and a serialization flag, and responsive to the flagging, the cache is notified that the cache flush is completed. Subsequent to the notifying, access is provided to data then present in the write-back data holding buffer to determine if data then present in the write-back data holding buffer is flagged.	04-17-2014
20140108739	SYSTEMS AND METHODS FOR IMPLEMENTING WEAK STREAM SOFTEARE DATA AND INSTRUCTION PREFETCHING USING A HARDWARE DATA PREFETCHER - A method for weak stream software data and instruction prefetching using a hardware data prefetcher is disclosed. A method includes, determining if software includes software prefetch instructions, using a hardware data prefetcher, and, accessing the software prefetch instructions if the software includes software prefetch instructions. Using the hardware data prefetcher, weak stream software data and instruction prefetching operations are executed based on the software prefetch instructions, free of training operations.	04-17-2014
20140269753	METHOD FOR IMPLEMENTING A LINE SPEED INTERCONNECT STRUCTURE - A method for line speed interconnect processing. The method includes receiving initial inputs from an input communications path, performing a pre-sorting of the initial inputs by using a first stage interconnect parallel processor to create intermediate inputs, and performing the final combining and splitting of the intermediate inputs by using a second stage interconnect parallel processor to create resulting outputs. The method further includes transmitting the resulting outputs out of the second stage at line speed.	09-18-2014
20140281242	METHODS, SYSTEMS AND APPARATUS FOR PREDICTING THE WAY OF A SET ASSOCIATIVE CACHE - A method for predicting a way of a set associative shadow cache is disclosed. As a part of a method, a request to fetch a first far taken branch instruction of a first cache line from an instruction cache is received, and responsive to a hit in the instruction cache, a predicted way is selected from a way array using a way that corresponds to the hit in the instruction cache. A second cache line is selected from a shadow cache using the predicted way and the first cache line and the second cache line are forwarded in the same clock cycle.	09-18-2014
20140281411	METHOD FOR DEPENDENCY BROADCASTING THROUGH A SOURCE ORGANIZED SOURCE VIEW DATA STRUCTURE - A method for dependency broadcasting through a source organized source view data structure. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a source organized source view data structure, wherein the source view data structure stores sources corresponding to the instruction blocks as recorded by the plurality of register templates; upon dispatch of one block of the instruction blocks, broadcasting a number belonging to the one block to a row of the source view data structure that relates that block and marking the sources of the row accordingly; and updating the dependency information of remaining instruction blocks in accordance with the broadcast.	09-18-2014
20140281412	METHOD FOR POPULATING AND INSTRUCTION VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS - A method for populating an instruction view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating and instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks as recorded by the plurality of register templates; and using the instruction view data structure to feed a plurality of stacked execution units of execution stage in accordance with the readiness of instruction sources of the instruction blocks.	09-18-2014
20140281426	METHOD FOR POPULATING A SOURCE VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS - A method for populating a source view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a source view data structure, wherein the source view data structure stores sources corresponding to the instruction blocks as recorded by the plurality of register templates; and determining which of the plurality of instruction blocks are ready for dispatch by using the populated source view data structure.	09-18-2014
20140281427	METHOD FOR IMPLEMENTING A REDUCED SIZE REGISTER VIEW DATA STRUCTURE IN A MICROPROCESSOR - A method for implementing a reduced size register view data structure in a microprocessor. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a register view data structure, wherein the register view data structure stores destinations corresponding to the instruction blocks as recorded by the plurality of register templates; and using the register view data structure to track a machine state in accordance with the execution of the plurality of instruction blocks, wherein the register view data structure is a reduced size register view data structure by only storing register template snapshots containing branches or by storing deltas between changing register template snapshots.	09-18-2014
20140281428	METHOD FOR POPULATING REGISTER VIEW DATA STRUCTURE BY USING REGISTER TEMPLATE SNAPSHOTS - A method for populating a register view data structure by using register template snapshots. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a register view data structure, wherein the register view data structure stores destinations corresponding to the instruction blocks as recorded by the plurality of register templates; and using the register view data structure to track a machine state in accordance with the execution of the plurality of instruction blocks.	09-18-2014
20140281436	METHOD FOR EMULATING A GUEST CENTRALIZED FLAG ARCHITECTURE BY USING A NATIVE DISTRIBUTED FLAG ARCHITECTURE - A method for emulating a guest centralized flag architecture by using a native distributed flag architecture. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and using a distributed flag architecture to emulate a centralized flag architecture for the emulation of guest instruction execution.	09-18-2014
20140281438	METHOD FOR A DELAYED BRANCH IMPLEMENTATION BY USING A FRONT END TRACK TABLE - A method for a delayed branch implementation by using a front end track table. The method includes receiving an incoming instruction sequence using a global front end, wherein the instruction sequence includes at least one branch, creating a delayed branch in response to receiving the one branch, and using a front end track table to track both the delayed branch the one branch.	09-18-2014
20140282546	METHODS, SYSTEMS AND APPARATUS FOR SUPPORTING WIDE AND EFFICIENT FRONT-END OPERATION WITH GUEST-ARCHITECTURE EMULATION - Methods for supporting wide and efficient front-end operation with guest architecture emulation are disclosed. As a part of a method for supporting wide and efficient front-end operation, upon receiving a request to fetch a first far taken branch instruction, a cache line that includes the first far taken branch instruction, a next cache line and a cache line located at the target of the first far taken branch instruction is read. Based on information that is accessed from a data table, the cache line and either the next cache line or the cache line located at the target is fetched in a single cycle.	09-18-2014
20140282592	METHOD FOR EXECUTING MULTITHREADED INSTRUCTIONS GROUPED INTO BLOCKS - A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline.	09-18-2014
20140282601	METHOD FOR DEPENDENCY BROADCASTING THROUGH A BLOCK ORGANIZED SOURCE VIEW DATA STRUCTURE - A method for dependency broadcasting through a block organized source view data structure. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; populating a block organized source view data structure, wherein the source view data structure stores sources corresponding to the instruction blocks as recorded by the plurality of register templates; upon dispatch of one block of the instruction blocks, broadcasting a number belonging to the one block to a column of the source view data structure that relates that block and marking the column accordingly; and updating the dependency information of remaining instruction blocks in accordance with the broadcast.	09-18-2014
20140317387	METHOD FOR PERFORMING DUAL DISPATCH OF BLOCKS AND HALF BLOCKS - A method for executing dual dispatch of blocks and half blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein each of the instruction blocks comprise two half blocks; scheduling the instructions of the instruction block to execute in accordance with a scheduler; and performing a dual dispatch of the two half blocks for execution on an execution unit.	10-23-2014
20140324937	METHOD FOR A STAGE OPTIMIZED HIGH SPEED ADDER - A method for fast parallel adder processing. The method includes receiving parallel inputs from a communications path, wherein each input comprises one bit, adding the inputs using a parallel structure, wherein the parallel structure is optimized to accelerate the addition by utilizing a characteristic that the inputs are one bit each, and transmitting the resulting outputs to a subsequent stage.	10-30-2014
20140344554	MICROPROCESSOR ACCELERATED CODE OPTIMIZER AND DEPENDENCY REORDERING METHOD - A dependency reordering method. The method includes accessing an input sequence of instructions, initializing three registers, and loading instruction numbers into a first register. The method further includes loading destination register numbers into a second register, broadcasting values from the first register to a position in a third register in accordance with a position number in the second register, overwriting positions in the third register in accordance with position numbers in the second register, and using information in the third register to populate a dependency matrix for grouping dependent instructions from the sequence of instructions.	11-20-2014
20150039859	MICROPROCESSOR ACCELERATED CODE OPTIMIZER - A method for accelerating code optimization a microprocessor. The method includes fetching an incoming microinstruction sequence using an instruction fetch component and transferring the fetched macroinstructions to a decoding component for decoding into microinstructions. Optimization processing is performed by reordering the microinstruction sequence into an optimized microinstruction sequence comprising a plurality of dependent code groups. The optimized microinstruction sequence is output to a microprocessor pipeline for execution. A copy of the optimized microinstruction sequence is stored into a sequence cache for subsequent use upon a subsequent hit optimized microinstruction sequence.	02-05-2015

Patent applications by Mohammad Abdallah, San Jose, CA US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Mohammad Abdallah, San Jose US

Mohammad Abdallah, San Jose, CA US