| Patent application number | Description | Published |
| 20080235496 | Methods and Apparatus for Dynamic Instruction Controlled Reconfigurable Register File - A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64x64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports. | 09-25-2008 |
| 20090019269 | Methods and Apparatus for a Bit Rake Instruction - Techniques for performing a bit rake instruction in a programmable processor. The bit rake instruction extracts an arbitrary pattern of bits from a source register, based on a mask provided in another register, and packs and right justifies the bits into a target register. The bit rake instruction allows any set of bits from the source register to be packed together. | 01-15-2009 |
| 20090063606 | Methods and Apparatus for Single Stage Galois Field Operations - Techniques for single function stage Galois field (GF) computations are described. The new single function stage GF multiplication requires only m-bits per internal logic stage, a savings of m−1 bits per logic stage that do not have to be accounted for as compared with a previous two function stage approach. Also, a common design CF multiplication cell is described that may be suitably used to construct an m-by-m GF multiplication array for the calculation of GF[2 | 03-05-2009 |
| 20090119489 | Methods and Apparatus for Transforming, Loading, and Executing Super-Set Instructions - Techniques are described for loading decoded instructions and super-set instructions in a memory for later access. For loading a decoded instruction, the decoded instruction is a transformed form of an original instruction that was stored in the program memory. The transformation is from an encoded assembly level format to a binary machine level format. In one technique, the transformation mechanism is invoked by a transform and load instruction that causes an instruction retrieved from program memory to be transformed into a new language format and then loaded into a transformed instruction memory. The format of the transformed instruction may be optimized to the implementation requirements, such as improving critical path timing. The transformation of instructions may extend to other needs beyond timing path improvement, for example, requiring super-set instructions for increased functionality and improvements to instruction level parallelism. Techniques for transforming, loading, and executing super-set instructions are described. | 05-07-2009 |
| 20090144502 | Meta-Architecture Defined Programmable Instruction Fetch Functions Supporting Assembled Variable Length Instruction Processors - In an implementation, a processing system includes an instruction fetch (IF) memory storing IF instructions; an arithmetic/logic (AL) instruction memory (IMemory) storing AL instructions; and a programmable instruction fetch mechanism to generate IMemory instruction addresses, from IF instructions fetched from the IF memory, to select AL instructions to be fetched from the IMemory for execution, wherein at least one IF instruction includes a loop count field indicating a number of iterations of a loop to be performed, a loop start address of the loop, and a loop end address of the loop. | 06-04-2009 |
| 20090265512 | Methods and Apparatus for Efficiently Sharing Memory and Processing in a Multi-Processor - A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations. | 10-22-2009 |
| 20090276576 | Methods and Apparatus storing expanded width instructions in a VLIW memory for deferred execution - Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units. | 11-05-2009 |
| 20100318775 | Methods and Apparatus for Adapting Pipeline Stage Latency Based on Instruction Type - Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes. Application code can be given new degrees of optimization freedom where instruction class and the mix of instructions can be chosen based on performance and power requirements. | 12-16-2010 |
| 20110072237 | Methods and apparatus for efficiently sharing memory and processing in a multi-processor - A shared memory network for communicating between processors using store and load instructions is described. A new processor architecture which may be used with the shared memory network is also described that uses arithmetic/logic instructions that do not specify any source operand addresses or target operand addresses. The source operands and target operands for arithmetic/logic execution units are provided by independent load instruction operations and independent store instruction operations. | 03-24-2011 |
| 20110083001 | METHODS AND APPARATUS FOR AUTOMATED GENERATION OF ABBREVIATED INSTRUCTION SET AND CONFIGURABLE PROCESSOR ARCHITECTURE - A systematic approach to architecture and design of the instruction fetch mechanisms and instruction set architectures in embedded processors is described. This systematic approach allows a relaxing of certain restrictions normally imposed by a fixed-size instruction set architecture (ISA) on design and development of an embedded system. The approach also guarantees highly efficient usage of the available instruction storage which is only bounded by the actual information contents of an application or its entropy. The result of this efficiency increase is a general reduction of the storage requirements, or a compression, of the instruction segment of the original application. An additional feature of this system is the full decoupling of the ISA from the core architecture. This decoupling allows usage of a variable length encoding for any size of the ISA without impacting the physical instruction memory organization or layout and branching mechanism as well as tuning of the execution core to the application. A hardware embodiment described herein allows application of the above mentioned high-entropy encoding technique in actual embedded processor using today's technology without posing significant strain on timing requirements. | 04-07-2011 |
| 20110153998 | Methods and Apparatus for Attaching Application Specific Functions Within an Array Processor - A multi-node video signal processor (VSP | 06-23-2011 |
| 20110161625 | Interconnection network connecting operation-configurable nodes according to one or more levels of adjacency in multiple dimensions of communication in a multi-processor and a neural processor - A Wings array system for communicating between nodes using store and load instructions is described. Couplings between nodes are made according to a 1 to N adjacency of connections in each dimension of a G×H matrix of nodes, where G≧N and H≧N and N is a positive odd integer. Also, a 3D Wings neural network processor is described as a 3D G×H×K network of neurons, each neuron with an N×N×N array of synaptic weight values stored in coupled memory nodes, where G≧N, H≧N, K≧N, and N is determined from a 1 to N adjacency of connections used in the G×H×K network. Further, a hexagonal processor array is organized according to an INFORM coordinate system having axes at 60 degree spacing. Nodes communicate on row paths parallel to an FM dimension of communication, column paths parallel to an IO dimension of communication, and diagonal paths parallel to an NR dimension of communication. | 06-30-2011 |