Array processor operation

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

712001000 - PROCESSING ARCHITECTURE

712010000 - Array processor

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
712022000	Single instruction, multiple data (SIMD)	98
712020000	Multimode (e.g., MIMD to SIMD, etc.)	26
712017000	Application specific	12
712018000	Data flow array processor	8
712019000	Systolic array processor	4
20100100704	Integrated circuit incorporating an array of interconnected processors executing a cycle-based program - An integrated circuit	04-22-2010
20100131738	ARRAY PROCESSOR TYPE DATA PROCESSING APPARATUS - In an array processing section, using data strings entered from input ports, a plurality of data processor elements execute predetermined operations while transferring data to each other, and output data strings of results of the operations from a plurality of output ports. A first data string converter converts data strings stored in a plurality of data storages of a data storage group into a placement suitable for the operations in the array processing section, and enters the converted data strings into the input ports of the array processing section. A second data string converter converts the data strings output from output ports of the array processing section into a placement to be stored in the plurality of data storages of the data storage group.	05-27-2010
20100223445	METHOD AND APPARATUS FOR MATRIX DECOMPOSITIONS IN PROGRAMMABLE LOGIC DEVICES - A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.	09-02-2010
20120011344	METHODS AND APPARATUS FOR MATRIX DECOMPOSITIONS IN PROGRAMMABLE LOGIC DEVICES - A processor is adapted for performing a QR-decomposition. The processor has a program memory, a program controller, connected to the program memory to receive program instructions, and at least one processing unit. The processing unit includes a CORDIC calculation block, and has a distributed memory structure, with separate memory blocks for storing respective parameter values.	01-12-2012
712021000	Multiple instruction, Multiple data (MIMD)	2
20120089812	SHARED RESOURCE MULTI-THREAD PROCESSOR ARRAY - A shared resource multi-thread processor array wherein an array of heterogeneous function blocks are interconnected via a self-routing switch fabric, in which the individual function blocks have an associated switch port address. Each switch output port comprises a FIFO style memory that implements a plurality of separate queues. Thread queue empty flags are grouped using programmable circuit means to form self-synchronised threads. Data from different threads are passed to the various addressable function blocks in a predefined sequence in order to implement the desired function. The separate port queues allows data from different threads to share the same hardware resources and the reconfiguration of switch fabric addresses further enables the formation of different data-paths allowing the array to be configured for use in various applications.	04-12-2012
20140195777	VARIABLE DEPTH INSTRUCTION FIFOS TO IMPLEMENT SIMD ARCHITECTURE - In a particular embodiment, a method may include creating a plurality of variable depth instruction FIFOs and a plurality of data caches from a plurality of caches corresponding to a plurality of processors, where the plurality of caches and the plurality of processors correspond to MIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs to implement SIMD architecture. The method may also include configuring the plurality of variable depth instruction FIFOs for at least one of SIMD operation, SIMD operation with staging, or RC-SIMD operation.	07-10-2014

Document	Title	Date
Entries
20080229059	Message routing scheme - Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.	09-18-2008
20080282061	Array Type Operation Device - An array calculation device that includes a processor array composed of a plurality of processor elements having been assigned with orders, acquires an instruction in each cycle, generates, in each cycle, operation control information for controlling an operation of a processor element of a first order, and then generates an instruction to the processor element of the first order in accordance with the operation control information and the acquired instruction, and also generates, in each cycle, operation control information for controlling an operation of each processor element of a next order and onwards, in accordance with operation control information generated for controlling an operation of a processor element of an immediately preceding order, and then generates an instruction to each processor element of the next order and onwards, in accordance with the operation control information generated and the acquired instruction.	11-13-2008
20080307196	Integrated Processor Array, Instruction Sequencer And I/O Controller - A computer processor having an integrated instruction sequencer, array of processing engines, and I/O controller. The instruction sequencer sequences instructions from a host, and transfers these instructions to the processing engines, thus directing their operation. The I/O controller controls the transfer of I/O data to and from the processing engines in parallel with the processing controlled by the instruction sequencer. The processing engines themselves are constructed with an integer arithmetic and logic unit (ALU), a 1-bit ALU, a decision unit, and registers. Instructions from the instruction sequencer direct the integer ALU to perform integer operations according to logic states stored in the 1-bit ALU and data stored in the decision unit. The 1-bit ALU and the decision unit can modify their stored information in the same clock cycle as the integer ALU carries out its operation. The processing engines also contain a local memory for storing instructions and data.	12-11-2008
20090031103	MECHANISM FOR IMPLEMENTING A MICROCODE PATCH DURING FABRICATION - A patch apparatus in a microprocessor is provided. The patch apparatus includes a plurality of fuse banks and an array controller. The plurality of fuse banks is configured to store associated patch records that are employed to patch microcode or circuits in the microprocessor. The array controller is coupled to the plurality of fuse banks, and is configured to read the associated patch records, and is configured to provide the associated patch records to a patch loader, where the patch loader provides patches corresponding to the associated patch records, as prescribed, to designated target patch mechanisms in the microprocessor. The patch loader provides the patches to the designated target patch mechanisms following transition of a microprocessor reset signal and prior to execution of instructions stored in a BIOS ROM.	01-29-2009
20090164752	PROCESSOR MEMORY SYSTEM - A data processor comprises a plurality of processing elements (PEs), with memory local to at least one of the processing elements, and a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid, e.g., in a SIMD array, so as to connect the PEs and their local memories to a common controller. Transaction-enabled PEs and nodes set flags, which are maintained until the transaction is completed and signal status to the controller e.g., over a series of OR-gates. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. External memory may also be connected to the “end” nodes interfacing with the network, eg to provide cache. One or more further processors may similarly be connected to the network so that all the PE memories from all the processors share the same memory map or space. The packet-switched network supports multiple concurrent transfers between PEs and memory. Memory accesses include block and/or broadcast read and write operations, in which data can be replicated within the nodes and, according to the operation, written into the shared memory or into the local PE memory.	06-25-2009
20090204788	Programmable pipeline array - Disclosed is an array of programmable data-processing cells configured as a plurality of cross-connected pipelines. An apparatus includes cells capable of performing data-processing functions selectable by a presented instruction. A first set of cells includes an input cell, an output cell, and a series of at least one interior cell providing an acyclic data processing path from the input cell to the output cell. Additional cells are similarly configured. Memory presents configuration instructions to cells in response to a configuration code. Data advances through ranks of the cells. The configuration code advances to memory associated with a rank in tandem with the data.	08-13-2009
20090210652	SIGNAL ROUTING IN PROCESSOR ARRAYS - There is provided a method for routing a plurality of signals in a processor array, the processor array comprising a plurality of processor elements interconnected by a network of switches, each signal having a respective source processor element and at least one destination processor element in the processor array, the method comprising (i) identifying a signal from the plurality of unrouted signals to route; (ii) identifying a candidate route from the source processor element to the destination processor element, the candidate route using a first plurality of switches; (iii) evaluating the candidate route by determining whether there are offset values that allow the signal to be routed through the first plurality of switches; and (iv) attempting to route the signal using one of the offset values identified in step (iii).	08-20-2009
20100070738	FLEXIBLE RESULTS PIPELINE FOR PROCESSING ELEMENT - A flexible results pipeline for a processing element of a parallel processor is described. A plurality of result registers are selectively connected to each other, to processing logic of the processing element and to a neighbourhood connection register configured to receive data from and send data to other processing elements. The connections between the result registers and between the result registers and the neighbourhood connection register are selectively configurable by applied control signals.	03-18-2010
20100131737	Method for Manipulating Data in a Group of Processing Elements To Perform a Reflection of the Data - A method for generating a reflection of data in a plurality of processing elements comprises shifting the data along, for example, each row in the array until each processing element in the row has received all the data held by every other processing element in that row. Each processing element stores and outputs final data as a function of its position in the row. A similar reflection along a horizontal line can be achieved by shifting data along columns instead of rows. Also disclosed is a method for reflecting data in a matrix of processing elements about a vertical line comprising shifting data between processing elements arranged in rows. An initial count is set in each processing element according to the expression (2×Col_Index)MOD(array size). In one embodiment, a counter counts down from the initial count in each processing element as a function of the number of shifts that have peen performed. Output is selected as a function of the current count. A similar reflection about a horizontal line can be achieved by shifting data between processing elements arranged in columns and setting the initial count according to the expression (2×Row_Index)MOD(array size). The present invention represents an efficient method for obtaining the reflection of data.	05-27-2010
20100211757	Systolic Data Processing Apparatus and Method - A systolic data processing apparatus includes a processing element (PE) array and control unit. The PE array comprises a plurality of PEs, each PE executing a thread with respect to different data according to an input instruction and pipelining the instruction at each cycle for executing a program. The control unit inputs a new instruction to a first PE of the PE array at each cycle.	08-19-2010
20100332793	METHOD FOR SCHEDULING START-UP AND SHUT-DOWN OF MAINFRAME APPLICATIONS USING TOPOGRAPHICAL RELATIONSHIPS - The illustrative embodiments provide for a computer-implemented method for representing actions in a data processing system. A table is generated. The table comprises a plurality of rows and columns. Ones of the columns represent corresponding ones of computer applications that can start or stop in parallel with each other in a data processing system. Ones of the rows represent corresponding ones of sequences of actions within a corresponding column. Additionally, the table represents a definition of relationships among memory address spaces, wherein the table represents when each particular address space is started or stopped during one of a start-up process, a recovery process, and a shut-down process. The resulting table is stored.	12-30-2010
20110066825	MESSAGE ROUTING SCHEME - Each possessor node in an array of nodes has a respective local node address, and each local node address comprises a plurality of components having an order of addressing significance from most to least significant. Each node comprises: mapping means configured to map each component of the local node address onto a respective routing direction, and a switch arranged to receive a message having a destination node address identifying a destination node. The switch comprises: means for comparing the local node address to the destination node address to identify a the most significant non-matching component; and means for routing the message to another node, on the condition that the local node address does not match the destination node address, in the direction mapped to the most significant non-matching component.	03-17-2011
20110107058	PROCESSOR MEMORY SYSTEM - A plurality of processing elements (PEs) include memory local to at least one of the processing elements in a data packet-switched network interconnecting the processing elements and the memory to enable any of the PEs to access the memory. The network consists of nodes arranged linearly or in a grid to connect the PEs and their local memories to a common controller. The processor performs memory accesses on data stored in the memory in response to control signals sent by the controller to the memory. The local memories share the same memory map or space. The packet-switched network supports multiple concurrent transfers between PEs and memory. Memory accesses include block and/or broadcast read and write operations, in which data can be replicated within the nodes and, according to the operation, written into the shared memory or into the local PE memory.	05-05-2011
20110131392	Method and apparatus for scalable and super-scalable information processing using binary gate circuits structured by code-selected pass transistors - A processing space comprises an array of transistors empowered by forming connections through circuit pass transistors to power and data input/output means and connections therebetween through signal pass transistors. By structuring the needed circuits at the site(s) of the data the von Neumann bottleneck is eliminated, which increases the computing power of the apparatus substantially, thus to enable non-stop Information Processing on steady streams of data and code, with no repetitive instruction and data transfers required. That code will identify the physical locations of every transistor in the processing space, and will enable only the pass transistors therein needed to structure the circuits of any arithmetical/logical algorithm in a processing space of any size, speed, and level of computer power. By joining one processing space to another the apparatus also exhibits super-scalability.	06-02-2011
20110153982	SYSTEMS AND METHODS FOR COLLECTING DATA FROM MULTIPLE CORE PROCESSORS - Systems and methods are disclosed for collecting data from cores of a multi-core processor using collection packets. A collection packet can traverse through cores of the multi-core processor while accumulating requested data. Upon completing the accumulation of the requested data from all required cores, the collection packet can be transmitted to a system operator for system maintenance and/or monitoring.	06-23-2011
20110167240	Method of rotating data in a plurality of processing elements - A method of rotating data in a plurality of processing elements comprises a plurality of shifting operations and a plurality of storing operations, with the shifting and storing operations coordinated to enable a three shears operation to be performed on the data. The plurality of storing operations is responsive to the processing element's positions.	07-07-2011
20110179251	POWER SAVING ASYNCHRONOUS COMPUTER - A computer array (	07-21-2011
20110213947	System and Method for Power Optimization - A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.	09-01-2011
20110219209	DYNAMIC ATOMIC BITSETS - Embodiments of the present invention provide techniques, including systems, methods, and computer readable medium, for dynamic atomic bitsets. A dynamic atomic bitset is a data structure that provides a bitset that can grow or shrink in size as required. The dynamic atomic bitset is non-blocking, wait-free, and thread-safe.	09-08-2011
20110258413	APPARATUS AND METHOD FOR EXECUTING MEDIA PROCESSING APPLICATIONS - An apparatus and method for executing media processing applications in a heterogeneous multicore system are provided. The media processing application executing apparatus includes a configuration deciding unit to decide a configuration for a combination of computational kernels and cores in which the computation kernels are to be executed. The computation kernels are media processing components included in a media processing application. The media processing application executing apparatus also includes an execution unit including multiple heterogeneous cores, to execute the media processing application based on the determined configuration.	10-20-2011
20110296137	Performing A Deterministic Reduction Operation In A Parallel Computer - A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.	12-01-2011
20110307685	Processor for Large Graph Algorithm Computations and Matrix Operations - A multiprocessor system and method for performing matrix operations includes multiple processors cooperatively performing a sparse matrix operation. Distributed among the processors are non-zero matrix elements of first and second sparse matrices. Mapped across the processors are the matrix elements of a results matrix. Each processor receives, from the other processors, non-zero matrix elements of the first matrix that had been distributed to those other processors and generates partial results based on the received non-zero matrix elements of the first matrix and on the non-zero matrix elements of the second matrix distributed to that processor. Each processor receives those partial results generated by other processors and associated with the matrix elements of the results matrix mapped to that processor. Each processor generates a final value for each matrix element of the results matrix mapped to that processor based on the partial results generated by that processor and on the partial results received from the other processors associated with that matrix element of the results matrix.	12-15-2011
20110314255	MESSAGE BROADCAST WITH ROUTER BYPASSING - A processor and method for broadcasting data among a plurality of processing cores is disclosed. The processor includes a plurality of processing cores connected by point-to-point connections. A first of the processing cores includes a router that includes at least an allocation unit and an output port. The allocation unit is configured to determine that respective input buffers on at least two others of the processing cores are available to receive given data. The output port is usable by the router to send the given data across one of the point-to-point connections. The router is configured to send the given data contingent on determining that the respective input buffers are available. Furthermore, the processor is configured to deliver the data to the at least two other processing cores in response to the first processing core sending the data once across the point-to-point connection.	12-22-2011
20120096238	Circuit and method for parallel perforation in speed rate matching - The present invention discloses a circuit and a method for parallel perforation in rate matching, which can reduce the perforation processing time delay to satisfy the requirements of a Long Term Evolution (LTE). Both the circuit and the method can adopt three selector arrays and three register groups. Specifically, the first selector array is configured to remove null bits in input data and output the remaining data to the first register group; the second selector array is configured to combine the first register group and the third register group and then output the combined data to the second register group; during the combination, the valid data in the third register group are preferentially selected, and then the data in the first register group are selected; when the second register group is full, the data therein are output to the exterior as the results of the perforation processing. Further, the third selector array is configured to output remaining valid data in the first selector group to the third register group if the valid data in the first selector group are not used out while combining the first register group and is the third register group by the second selector array.	04-19-2012
20120110302	Accelerating Generic Loop Iterators Using Speculative Execution - A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates. The ELI utility performs concurrent iterations on the array by using the queue entries.	05-03-2012
20120311299	NOVEL MASSIVELY PARALLEL SUPERCOMPUTER - A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon System-On-a-Chip technology, i.e., each processing node comprises a single Application Specific Integrated Circuit (ASIC). Within each ASIC node is a plurality of processing elements each of which consists of a central processing unit (CPU) and plurality of floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a single node individually or simultaneously work on any combination of computation or communication as required by the particular algorithm being solved. The system-on-a-chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications throughput and minimizes latency. The multiple networks include three high-speed networks for parallel algorithm message passing including a Torus, Global Tree, and a Global Asynchronous network that provides global barrier and notification functions.	12-06-2012
20130019082	Manifold Array Processor - An array processor includes processing elements arranged in to form a rectangular array. Inter-cluster communication paths are mutually exclusive. Due to the mutual exclusivity of the data paths, communications between the processing elements of each cluster may be combined in a single inter-cluster path, thus eliminating half the wiring required for the path. The length of the longest communication path is not directly determined by the overall dimension of the array, as in conventional torus arrays. Rather, the longest communications path is limited by the inter-cluster spacing. Transpose elements of an N×N torus may be combined in clusters and communicate with one another through intra-cluster communications paths. Transpose operation latency is eliminated in this approach. Each PE may have a single transmit port and a single receive port. Thus, the individual PEs are decoupled from the array topology.	01-17-2013
20130086354	CACHE AND/OR SOCKET SENSITIVE MULTI-PROCESSOR CORES BREADTH-FIRST TRAVERSAL - Methods, apparatuses and storage device associated with cache and/or socket sensitive breadth-first iterative traversal of a graph by parallel threads, are disclosed. In embodiments, a vertices visited array (VIS) may be employed to track graph vertices visited. VIS may be partitioned into VIS sub-arrays, taking into consideration cache sizes of LLC, to reduce likelihood of evictions. In embodiments, potential boundary vertices arrays (PBV) may be employed to store potential boundary vertices for a next iteration, for vertices being visited in a current iteration. The number of PBV generated for each thread may take into consideration a number of sockets, over which the processor cores employed are distributed. In various embodiments, the threads may be load balanced; further data locality awareness to reduce inter-socket communication may be considered, and/or lock-and-atomic free update operations may be employed. Other embodiments may be disclosed or claimed.	04-04-2013
20130166876	METHOD AND APPARATUS FOR USING A PREVIOUS COLUMN POINTER TO READ ENTRIES IN AN ARRAY OF A PROCESSOR - A method and apparatus are described for using a previous column pointer to read a subset of entries of an array in a processor. The array may have a plurality of rows and columns of entries, and each entry in the subset may reside on a different row of the array. A previous column pointer may be generated for each of the rows of the array based on a plurality of bits indicating the number of valid entries in the subset to be read, the previous column pointer indicating whether each entry is in a current column or a previous column. The entries in the subset may be read and re-ordered, and invalid entries in the subset may be replaced with nulls. The valid entries and nulls may then be outputted.	06-27-2013
20130283007	Methods and Apparatus For Attaching Application Specific Functions Within An Array Processor - A multi-node video signal processor (VSP	10-24-2013
20140075154	Task Switching and Inter-task Communications for Multi-core Processors - The invention provides hardware based techniques for switching processing tasks of software programs for execution on a multi-core processor. Invented techniques involve a hardware logic based controller for assigning, adaptive to program processing loads, tasks for processing by cores of a multi-core fabric as well as configuring a set of multiplexers to appropriately interconnect cores of the fabric and program task specific segments at fabric memories, to arrange efficient inter-task communication as well as transferring of activating and de-activating task memory images among the multi-core fabric. The invention thereby provides an efficient, hardware-automated runtime operating system for multi-core processors, minimizing any need to use processing capacity of the cores for traditional operating system software functions. Additionally, such low overhead hardware based operating system for multi-core processors provides significant cost-efficiency and performance advantages, including data processing throughput maximization across all programs dynamically sharing a given multi-core processor, and hardware based security.	03-13-2014
20140164734	CONCURRENT MULTIPLE INSTRUCTION ISSUE OF NON-PIPELINED INSTRUCTIONS USING NON-PIPELINED OPERATION RESOURCES IN ANOTHER PROCESSING CORE - A method and circuit arrangement utilize inactive non-pipelined operation resources in one processing core of a multi-core processing unit to execute non-pipelined instructions on behalf of another processing core in the same processing unit. Adjacent processing cores in a processing unit may be coupled together such that, for example, when one processing core's non-pipelined execution sequencer is busy, that processing core may issue into another processing core's non-pipelined execution sequencer if that other processing core's non-pipelined execution sequencer is idle, thereby providing intermittent concurrent execution of multiple non-pipelined instructions within each individual processing core.	06-12-2014
20140281374	Identifying Logical Planes Formed Of Compute Nodes Of A Subcommunicator In A Parallel Computer - In a parallel computer, a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: for each compute node of the subcommunicator and for a number of dimensions beginning with a first dimension: establishing, by a plane building node, in a positive direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in a positive direction of a second dimension, where the second dimension is orthogonal to the first dimension; and establishing, by the plane building node, in a negative direction of the first dimension, all logical planes that include the plane building node and compute nodes of the subcommunicator in the positive direction of the second dimension.	09-18-2014
20140337601	CONFIGURABLE LOGIC INTEGRATED CIRCUIT HAVING A MULTIDIMENSIONAL STRUCTURE OF CONFIGURABLE ELEMENTS - An array processor composed of processor cells that are programmed by a controlling unit, and that are reprogrammed when a cell has finished a current data processing operation, even while other cell continue to process data with their current programming.	11-13-2014
20150100756	CONFIGURABLE LOGIC INTEGRATED CIRCUIT HAVING A MULTIDIMENSIONAL STRUCTURE OF CONFIGURABLE ELEMENTS - An array processor composed of processor cells that are programmed by a controlling unit, and that are reprogrammed when a cell has finished a current data processing operation, even while other cell continue to process data with their current programming.	04-09-2015
20160378479	DECOUPLED PROCESSOR INSTRUCTION WINDOW AND OPERAND BUFFER - A processor core in an instruction block-based microarchitecture is configured so that an instruction window and operand buffers are decoupled for independent operation in which instructions in the block are not tied to resources such as control bits and operands that are maintained in the operand buffers. Instead, pointers are established among instructions in the block and the resources so that control state can be established for a refreshed instruction block (i.e., an instruction block that is reused without re-fetching it from an instruction cache) by following the pointers. Such decoupling of the instruction window from the operand space can provide greater processor efficiency, particularly in multiple core arrays where refreshing is utilized (for example when executing program code that uses tight loops), because the operands and control bits are pre-validated.	12-29-2016
20160378488	ACCESS TO TARGET ADDRESS - Systems, methods, and computer-readable storage are disclosed for providing early access to target addresses in block-based processor architectures. In one example of the disclosed technology, a method of performing a branch in a block-based architecture can include executing one or more instructions of a first instruction block using a first core of the block-based architecture. The method can include, before the first instruction block is committed, initiating non-speculative execution of instructions of a second instruction block.	12-29-2016

Patent applications in class Array processor operation

Patent applications in all subclasses Array processor operation

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Array processor operation

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

712001000 - PROCESSING ARCHITECTURE

712010000 - Array processor

Patent class list (only not empty are listed)

Deeper subclasses: