Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


ARCHITECTURE BASED INSTRUCTION PROCESSING

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712200000ARCHITECTURE BASED INSTRUCTION PROCESSING89
20090125704DESIGN STRUCTURE FOR DYNAMICALLY SELECTING COMPILED INSTRUCTIONS - A design structure embodied in a machine readable medium used in a design process includes an apparatus for dynamically selecting compiled instructions for execution, the apparatus including an input for receiving static instructions for execution on a first execution unit and receiving dynamic instructions for execution on a second execution unit; and an instruction selection element adapted to evaluate throughput performance of the static instructions and dynamic instructions based on current states of the execution units and select the static instructions or the dynamic instructions for execution at runtime on the first execution unit or the second execution unit, respectively, based on the throughput performance of the instructions.05-14-2009
20120166765PREDICTING BRANCHES FOR VECTOR PARTITIONING LOOPS WHEN PROCESSING VECTOR INSTRUCTIONS - While fetching the instructions from a loop in program code, a processor calculates a number of times that a backward-branching instruction at the end of the loop will actually be taken when the fetched instructions are executed. Upon determining that the backward-branching instruction has been predicted taken more than the number of times that the branch instruction will actually be taken, the processor immediately commences a mispredict operation for the branch instruction, which comprises: (1) flushing fetched instructions from the loop that will not be executed from the processor, and (2) commencing fetching instructions from an instruction following the branch instruction.06-28-2012
20130067199CONTROL REGISTER MAPPING IN HETEROGENEOUS INSTRUCTION SET ARCHITECTURE PROCESSOR - A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs. The microprocessor includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program. The microprocessor also includes a plurality of model-specific registers (MSRs) that control aspects of the operation of the microprocessor. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, each of the plurality of MSRs is accessible via an x86 ISA RDMSR/WRMSR instruction that specifies an address of the MSR. When the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, each of the plurality of MSRs is accessible via an ARM ISA MRRC/MCRR instruction that specifies the address of the MSR.03-14-2013
20120117358Software Selectable Adjustment of SIMD Parallelism - Selective power control of one or more processing elements matches a degree of parallelism to requirements of a task performed in a highly parallel programmable data processor. For example, when program operations require less than the full width of the data path, a software instruction of the program sets a mode of operation requiring a subset of the parallel processing capacity. At least one parallel processing element, that is not needed, can be shut down to conserve power. At a later time, when the added capacity is needed, execution of another software instruction sets the mode of operation to that of the wider data path, typically the full width, and the mode change reactivates the previously shut-down processing element.05-10-2012
20110302391DIGITAL SIGNAL PROCESSOR - A digital signal processor comprises an instruction analysis unit, a digital signal processor (DSP) core and a memory unit. The instruction analysis unit receives an instruction and determines the required bit width M for the data process corresponding to the instruction. The DSP core performs the M-bit data process based on the bit width M determined by the instruction analysis unit, and the memory unit stores multiple data and performs the M-bit access based on the bit width M determined by the instruction analysis unit thereby allowing the DSP core to access, and at lest one available space in the memory unit will be adjusted such that only the access space having the bit width M for the operation corresponding to the instruction will be open in each access, thereby effectively achieving the effect of power-saving.12-08-2011
20100169610PROCESSOR - The processor according to the present invention is a processor having a forwarding function and includes an attribute information holding unit that holds attribute information regarding inhibition of writing to a register and a register write inhibition circuit that holds, when forwarding is performed, the writing of the data forwarded according to attribute information. The attribute information holding unit holds the attribute information by relating the attribute information to at least one register. Alternatively, the attribute information holding unit is a part of plural pipeline buffers and passes the attribute information along with the data to be forwarded, to a pipeline buffer in a subsequent stage.07-01-2010
20120110305Register Renamer that Handles Multiple Register Sizes Aliased to the Same Storage Locations - A processor may include a physical register file and a register renamer. The register renamer may be organized into even and odd banks of entries, where each entry stores an identifier of a physical register. The register renamer may be indexed by a register number of an architected register, such that the renamer maps a particular architected register to a corresponding physical register. Individual entries of the renamer may correspond to architected register aliases of a given size. Renaming aliases that are larger than the given size may involve accessing multiple entries of the renamer, while renaming aliases that are smaller than the given size may involve accessing a single renamer entry.05-03-2012
20130191614PERFORMING A CYCLIC REDUNDANCY CHECKSUM OPERATION RESPONSIVE TO A USER-LEVEL INSTRUCTION - In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.07-25-2013
20090094439Data processing apparatus and method employing multiple register sets - A data processing apparatus and method employing multiple register sets is disclosed. The data processing apparatus has processing logic for performing data processing operations and a register bank for storing data associated with the processing logic. The register bank has at least one register group, each register group having a plurality of register sets. The processing logic has an operating state associated with each register group defining how that register group is used, a first operating state being a state in which each register set in the register group is used to support an independent execution thread of the processing logic, and a second operating state being a state in which the register sets of the register group are collectively used to support a single execution thread of the processing logic. Control logic is provided to control how the register sets of each register group are used dependent on the operating state associated with that register group. This has been found to provide a particularly efficient use of the registers within the data processing apparatus.04-09-2009
20100082945Multi-thread processor and its hardware thread scheduling method - A multi-thread processor in accordance with an exemplary aspect of the present invention includes a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal TSEL designating a hardware thread to be executed in a next execution cycle, a first selector that outputs an instruction generated by a hardware thread selected according to the thread selection signal, and an execution pipeline that executes an instruction output from the first selector, wherein the thread scheduler specifies execution of at least one hardware thread selected in a fixed manner in a predetermined first execution period, and specifies execution of an arbitrary hardware thread in a second execution period.04-01-2010
20120290816Optimized Scalar Promotion with Load and Splat SIMD Instructions - Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.11-15-2012
20100299497APPARATUS FOR EFFICIENTLY DETERMINING INSTRUCTION LENGTH WITHIN A STREAM OF X86 INSTRUCTION BYTES - An apparatus efficiently determines the length of an instruction within a stream of instruction bytes processed by a microprocessor having a variable instruction length instruction set architecture. The apparatus includes combinatorial logic associated with each instruction byte of the stream, each configured to receive the associated instruction byte and the next instruction byte of the stream and to generate in response thereto a first length, a second length, and a select control. A multiplexor associated with each of the combinatorial logic selects and outputs one of the following inputs based on the select control received from the combinatorial logic: a zero input and the second length received from the combinatorial logic associated with each of the next three instruction bytes of the stream. An adder associated with each of the combinatorial logic and multiplexor adds the first length and the output of the multiplexor to generate the length of the instruction.11-25-2010
20090138677System for Native Code Execution - A process, apparatus, and system to execute a program in an array of processor nodes that include an agent node and an executor node. A virtual program of tokens of different types represents the program and is provided in a memory. The types include a run type that includes native code instructions of the executer node. A token is loaded from the memory and executed in the agent node based on its type. In particular, if the token is an optional stop type execution ends and if the token is a run type the native code instructions in the token are sent to the executor node. The native code instructions are executed in the executor node as received from the agent node. And such loading and execution continues in this manner indefinitely or until a stop type token is executed.05-28-2009
20090144525APPARATUS AND METHOD FOR SCHEDULING THREADS IN MULTI-THREADING PROCESSORS - An multi-threading processor is provided. The multi-threading processor includes a first instruction fetch unit to receive a first thread and a second instruction fetch unit to receive a second thread. A multi-thread scheduler coupled to the instruction fetch units and a execution unit. The multi-thread scheduler determines the width of the execution unit and the execution unit executes the threads accordingly.06-04-2009
20090204790BUFFER MANAGEMENT FOR REAL-TIME STREAMING - Technologies are described herein for buffer management during real-time streaming. A video frame buffer stores video frames generated by a real-time streaming video capture device. New video frames received from the video capture device are stored in the video frame buffer prior to processing by a video processing pipeline that processes frames stored in the video frame buffer. A buffer manager determines whether a new video frame has been received from the video capture device and stored in the video frame buffer. When the buffer manager determines that a new video frame has arrived at the video frame buffer, it then determines whether the video processing pipeline has an unprocessed video frame. If the video processing pipeline has an unprocessed video frame, the buffer manager discards the new video frame stored in the video frame buffer or performs other processing on the new video frame.08-13-2009
20120198208SHARED FUNCTION MULTI-PORTED ROM APPARATUS AND METHOD - Various embodiments may be disclosed that may share a ROM pull down logic circuit among multiple ports of a processing core. The processing core may include an execution unit (EU) having an array of read only memory (ROM) pull down logic storing math functions. The ROM pull down logic circuit may implement single instruction, multiple data (SIMD) operations. The ROM pull down logic circuit may be operatively coupled with each of the multiple ports in a multi-port function sharing arrangement. Sharing the ROM pull down logic circuit reduces the need to duplicate logic and may result in a savings of chip area as well as a savings of power.08-02-2012
20100217957Structured Virtual Registers for Embedded Controller Devices - Techniques for using structured virtual registers in embedded systems are described. A virtual register structure definition provides a map of virtual registers within an embedded controller. The virtual registers are externally accessible and correspond to memory locations within the embedded controller. In various embodiments, an embedded controller and/or an external entity may store data in or read data from the virtual registers using the virtual register structure definition. The problems of manual tracking of virtual register addresses and manual transcription of virtual register addresses to program code are ameliorated. When the virtual register map changes, logical references in program code to particular virtual registers need not necessarily be changed.08-26-2010
20130219150Parsing Data Representative of a Hardware Design into Commands of a Hardware Design Environment - A method for implementing a hardware design that includes using a computer for receiving structured data that includes a representation of a basic hardware structure and a complex hardware structure that includes the basic hardware structure, parsing the structured data and generating, based on a result of the parsing, commands of a hardware design environment.08-22-2013
20090113176Method of reducing data path width restrictions on instruction sets - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.04-30-2009
20100299498INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD - An information processing apparatus includes: a first pipeline having first nodes, and moving data held in each first node to a first node located in a first direction; a second pipeline having second nodes corresponding to the first nodes, and moving data held in each second node to a second node located in a second direction that is opposite to the first direction; a first comparison unit arranged to compare data held in a node of interest with data held in a second node corresponding to the node of interest, where the node of interest is one of the first nodes; and a second comparison unit arranged to compare the data held in the node of interest with data held in a second node located one node on an upstream or downstream side of the second node corresponding to the node of interest.11-25-2010
20130145121DYNAMICALLY CONFIGURABLE PLACEMENT ENGINE - A stream application may allocate processing elements to one or more compute nodes (or hosts) to achieve a desired optimization goal. Each optimization mode may define processing element selection criteria and/or host selection criteria. When allocating a processing element to a host, a scheduler may place each processing element individually. Accordingly, the scheduler may use the processing element selection criteria for selecting which processing element in the stream application to allocate next. The scheduler may then determine, based on one or more constraints, which host the processing element can be placed on. If the scheduler determines that multiple hosts are suitable candidates for the processing element, it may use the host selection criteria to pick one of the candidate hosts that further optimize the stream application to meet the desired goal. Examples of different optimization goals that may be achieved using processing element and host selection criteria include optimizing performance, decreasing maintenance and operating costs, increasing solvability, sharing limited computer resources with other applications, and the like.06-06-2013
20110055521MICROPROCESSOR HAVING AT LEAST ONE APPLICATION SPECIFIC FUNCTIONAL UNIT AND METHOD TO DESIGN SAME - Customisable embedded processors that are available on the market make it possible for designers to speed up execution of applications by using Application-specific Functional Units (AFUs), implementing Instruction-Set Extensions (ISEs). Furthermore, techniques for automatic ISE identification have been improving; many algorithms have been proposed for choosing, given the application's source code, the best ISEs under various constraints. Read and write ports between the AFUs and the processor register file are an expensive asset, fixed in the micro-architecture—some processors indeed only allow two read ports and one write port—and yet, on the other hand, a large availability of inputs and outputs to and from the AFUs exposes high speedup. Here we present a solution to the limitation of actual register file ports by serialising register file access and therefore addressing multi-cycle read and write. It does so in an innovative way for two reasons: (1) it exploits and brings forward the progress in ISE identification under constraint, and (2) it combines register file access serialisation with pipelining in order to obtain the best global solution. Our method consists of scheduling graphs—corresponding to ISEs—under input/output constraint03-03-2011
20110264891MICROPROCESSOR THAT FUSES MOV/ALU/JCC INSTRUCTIONS - A microprocessor receives first, second, and third program-adjacent macroinstructions. The first macroinstruction moves a first operand to a first register from a second register. The second macroinstruction performs an arithmetic/logic operation using the first operand in the second register and a second operand in a third register to generate a result, loads the result back into the first register, and updates condition codes based on the result. The third macroinstruction conditionally jumps to a target address. An instruction translator simultaneously translates the first, second, and third program-adjacent macroinstructions into a single micro-operation for execution by an execution unit. The micro-operation performs the arithmetic/logic operation using the first operand in the second register and the second operand in third register to generate the result, loads the result back into the first register, updates the condition codes based on the result, and conditionally jumps to the target address.10-27-2011
20120311303Processor for Executing Wide Operand Operations Using a Control Register and a Results Register - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.12-06-2012
20120311302APPARATUS AND METHOD FOR PROCESSING OPERATIONS IN PARALLEL USING A SINGLE INSTRUCTION MULTIPLE DATA PROCESSOR - A parallel operation processing apparatus and method using a Single Instruction Multiple Data (SIMD) processor are provided. The parallel operation processing apparatus may combine input data of source nodes in a current column with input data of source nodes in a previous column, and may store the combined input data.12-06-2012
20110185157MULTIFUNCTION HEXADECIMAL INSTRUCTION FORM SYSTEM AND PROGRAM PRODUCT - A new zSeries floating-point unit has a fused multiply-add dataflow capable of supporting two architectures and fused MULTIPLY and ADD and Multiply and SUBTRACT in both RRF and RXF formats for the fused functions. Both binary and hexadecimal floating-point instructions are supported for a total of 6 formats. The floating-point unit is capable of performing a multiply-add instruction for hexadecimal or binary every cycle with a latency of 5 cycles. This supports two architectures with two internal formats with their own biases. This has eliminated format conversion cycles and has optimized the width of the dataflow. The unit is optimized for both hexadecimal and binary floating-point architecture supporting a multiply-add/subtract per cycle.07-28-2011
20120210099RUNNING UNARY OPERATION INSTRUCTIONS FOR PROCESSING VECTORS - During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.08-16-2012
20120124334SIMD PROCESSOR FOR PERFORMING DATA FILTERING AND/OR INTERPOLATION - Data processing circuit containing an instruction execution circuit having an instruction set comprising a SIMD instruction. The instruction execution circuit comprises arithmetic circuits, arranged to perform N respective identical operations in parallel in response to the SIMD instruction. The SIMD instruction selects a first one and a second one of the registers. The SIMD instruction defines a first and second series of N respective SIMD instruction operands of the SIMD instruction from the addressed registers. Each arithmetic circuit receives a respective first operand and a respective second operand from the first and second series respectively. The instruction execution circuit selects the first and second series so they partially overlap. Positioning the operands is under program control.05-17-2012
20130227250SIMD ACCELERATOR FOR DATA COMPARISON - Some example embodiments include an apparatus for comparing a first operand to a second operand. The apparatus includes a SIMD accelerator configured to compare first multiple parts (e.g., bytes) of first operand to second multiple parts (e.g., bytes) of the second operand. The SIMD accelerator includes a ones' complement subtraction logic and a twos' complement logic configured to perform logic operations on the multiple parts of the first operand and the multiple parts of the second operand to generate a group of carry out and propagate data across bits of the multiple parts. At least a portion of the group of carry out and propagate data is reused in the group of logic operations.08-29-2013
20110066826IMAGE DATA PROCESSING APPARATUS - An image data processing apparatus includes: a plurality of operational processing circuits each of which is configured to have a variable circuit configuration and to execute operational processing on image data; and a control section that controls each of the operational processing circuits such that each of the operational processing circuits executes one of a plurality of types of operational processing performed on image data in a predetermined order. The control section controls each of the operational processing circuits so that when image data to be newly given to one of the operational processing circuits is interrupted, said one of the operational processing circuits and another one of the operational processing circuits execute operational processing by taking partial charge of the operational processing.03-17-2011
20120137108SYSTEMS AND METHODS INTEGRATING BOOLEAN PROCESSING AND MEMORY - The present disclosure relates to placing a Boolean Processor on a chip with memory to eliminate memory latency issues in computing systems. An asynchronous implementation of a Boolean Processor Switched Memory can theoretically operate at terahertz speed and vastly improve the rate at which computationally relevant data is fed to a microprocessor or microcontroller. Boolean Processor Enhanced Memories hold the promise of increasing memory throughput by several orders of magnitude and shifting the burden of “catching up” to microprocessors and microcontrollers.05-31-2012
20100082944Multi-thread processor - In an exemplary aspect, the present invention provides a multi-thread processor including a plurality of hardware threads each of which generates an independent instruction flow, a thread scheduler that outputs a thread selection signal in accordance with a first or second schedule, the thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the thread selection signal and outputs an instruction generated by the selected hardware thread, and an execution pipeline that executes an instruction output from the first selector, wherein when the multi-thread processor is in a first state, the thread scheduler selects the first schedule, and when the multi-thread processor is in a second state, the thread scheduler selects the second schedule.04-01-2010
20110185156EXECUTING WATCHPOINT EVENTS FOR DEBUGGING IN A "BREAK BEFORE MAKE" MANNER - A processor (e.g., a Digital Signal Processor (DSP) core) rewinds a pipeline of instructions upon a watchpoint event in an instruction being processed. The program execution ceases at the instruction in which the watchpoint event occurred, while the instruction and subsequent instructions are cancelled, keeping the hardware components associated with executing the program in their previous states, prior to the watchpoint. The rewind is such that the program is refetched to enable execution to continue from the instruction in which the watchpoint event occurred. The watchpoint event is executed in a “break before make” manner.07-28-2011
20120179895METHOD AND APPARATUS FOR FAST DECODING AND ENHANCING EXECUTION SPEED OF AN INSTRUCTION - Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components.07-12-2012
20120084533Efficient Parallel Floating Point Exception Handling In A Processor - Methods and apparatus are disclosed for handling floating point exceptions in a processor that executes single-instruction multiple-data (SIMD) instructions. In one embodiment a numerical exception is identified for a SIMD floating point operation and SIMD micro-operations are initiated to generate two packed partial results of a packed result for the SIMD floating point operation. A SIMD denormalization micro-operation is initiated to combine the two packed partial results and to denormalize one or more elements of the combined packed partial results to generate a packed result for the SIMD floating point operation having one or more denormal elements. Flags are set and stored with packed partial results to identify denormal elements. In one embodiment a SIMD normalization micro-operation is initiated to generate a normalized pseudo internal floating point representation prior to the SIMD floating point operation when it uses multiplication.04-05-2012
20110276783THREAD FAIRNESS ON A MULTI-THREADED PROCESSOR WITH MULTI-CYCLE CRYPTOGRAPHIC OPERATIONS - Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.11-10-2011
20120102301PREDICATE COUNT AND SEGMENT COUNT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments comprise a PredCount instruction and a SegCount instruction. When executed by a processor, the PredCount instruction causes the processor to analyze a predicate vector to determine a number of active elements in the predicate vector that exhibit a predetermined condition (e.g., that are set to a predetermined value) and to return a result indicating that number. When executed by a processor, the segCount instruction causes the processor to determine a number of times that a GeneratePredicates instruction would be executed to generate a full set of predicates using active elements of an input vector.04-26-2012
20090177866SYSTEM AND METHOD FOR FUNCTIONALLY REDUNDANT COMPUTING SYSTEM HAVING A CONFIGURABLE DELAY BETWEEN LOGICALLY SYNCHRONIZED PROCESSORS - A method of operating a computer system. A first processor sends a first unit of binary information to an input/output (I/O) unit. The I/O unit then conveys the first unit of binary information to a functional unit in the computer system. A system response from the functional unit is then received by the I/O unit, which forwards the system response to the first processor. The system response is also stored in a first buffer. After a predetermined delay time has elapsed, the system response is then forwarded to the second processor.07-09-2009
20130013894DATA PROCESSOR - A RISC data processor in which the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. An instruction for generating flags according to operands' data sizes is defined, and an instruction set handled by the RISC data processor includes an instruction capable of executing an operation on operands in more than one data size. An identical operation process is conducted on the small-size operand and on low-order bits of the large-size operand, and flags are generated capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation. Thus, a reduction in instruction code space of the RISC data processor can be achieved.01-10-2013
20110161629Arithmetic processor, information processor, and pipeline control method of arithmetic processor - An arithmetic processor includes a first pipeline unit configured to execute a first instruction that is input; a second pipeline unit configured to execute a second instruction that is input; a registration unit into which an aborted instruction is registered, the aborted instruction being the first instruction when the first pipeline unit is unable to complete the first instruction or the second instruction when the second pipeline unit is unable to complete the second instruction; a determination unit configured to make a determination as to which one of the first pipeline unit and the second pipeline unit is operating under a lower load; and an input unit configured to input, in the first pipeline unit or the second pipeline unit that is determined as operating under the lower load by the determination unit, the aborted instruction that is registered in the registration unit.06-30-2011
20080222390Low Noise Coding for Digital Data Interface - A digital data interface system comprises a data transmitter configured to transmit a data word across a plurality of data lines. The data word can comprise a plurality of digital data bits having a bit number order from a lowest bit number to a highest bit number with the lowest ordered bit numbers having higher noise content and the highest ordered bit numbers having higher harmonic content. The system also comprises an encoder configured to arrange the plurality of digital data bits as serialized data sets to be transmitted over each of the plurality of data lines by the data transmitter with consecutive data bits of at least one serialized data set being matched such that bits with the higher harmonic content are matched with bits of the higher noise content to substantially mitigate of at least one of the noise content and the harmonic content of the data word.09-11-2008
20130138922REGISTER MANAGEMENT IN AN EXTENDED PROCESSOR ARCHITECTURE - Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.05-30-2013
20120260068APPARATUS AND METHOD FOR HANDLING OF MODIFIED IMMEDIATE CONSTANT DURING INSTRUCTION TRANSLATION - An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline. Alternatively, if the immediate field value is not within a predetermined subset of values known by the instruction translator, the instruction translator generates, rather than the constant, a second microinstruction for execution by the execution pipeline to generate the constant.10-11-2012
20120260067MICROPROCESSOR THAT PERFORMS X86 ISA AND ARM ISA MACHINE LANGUAGE PROGRAM INSTRUCTIONS BY HARDWARE TRANSLATION INTO MICROINSTRUCTIONS EXECUTED BY COMMON EXECUTION PIPELINE - A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator.10-11-2012
20130198489PROCESSING ELEMENT MANAGEMENT IN A STREAMING DATA SYSTEM - Stream applications may inefficiently use the hardware resources that execute the processing elements of the data stream. For example, a compute node may host four processing elements and execute each using a CPU. However, other CPUs on the compute node may sit idle. To take advantage of these available hardware resources, a stream programmer may identify one or more processing elements that may be cloned. The cloned processing elements may be used to generate a different execution path that is parallel to the execution path that includes the original processing elements. Because the cloned processing elements contain the same operators as the original processing elements, the data stream that was previously flowing through only the original processing element may be split and sent through both the original and cloned processing elements. In this manner, the parallel execution path may use underutilized hardware resources to increase the throughput of the data stream.08-01-2013
20120144162SYSTEMS AND METHODS FOR DETERMINING COMPUTE KERNELS FOR AN APPLICATION IN A PARALLEL-PROCESSING COMPUTER SYSTEM - A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.06-07-2012
712201000 Data flow based system 7
20090119484Method and Apparatus for Implementing Digital Logic Circuitry - A method of generating digital control parameters for implementing digital logic circuitry comprising functional nodes with at least one input or at least one output and connections indicating interconnections between said functional nodes, wherein said digital logic circuitry comprises a first path streamed by successive tokens, and a second path streamed by said tokens is disclosed. The method comprises determining a necessary relative throughput for data flow to said paths; assigning buffers to one of said paths to balance throughput of said paths; removing assigned buffers until said necessary relative throughput is obtained with minimized number of buffers; and generating digital control parameters for implementing said digital logic circuitry comprising said minimized number of buffers. An apparatus, a computer implemented digital logic circuitry, a Data Flow Machine, methods and computer program products are also disclosed.05-07-2009
20100199071DATA PROCESSING APPARATUS AND IMAGE PROCESSING APPARATUS - A data processing apparatus in which pipeline processing is performed comprises a control unit that controls a data processing sequence, a first processing unit that begins first data processing by inputting data on the basis of a start signal, outputs data subjected to the first data processing, and outputs a completion signal to the control unit after completing the first data processing, and a second processing unit that begins second data processing by inputting the data subjected to the first data processing on the basis of a start signal, outputs data subjected to the second data processing, and outputs a completion signal to the control unit after completing the second data processing. The control unit outputs a following start signal to the first processing unit and the second processing unit upon reception of the completion signal of the first data processing and the second data processing respectively.08-05-2010
20120216019METHOD OF, AND APPARATUS FOR, STREAM SCHEDULING IN PARALLEL PIPELINED HARDWARE - There is provided embodiment of methods of generating a hardware design for a pipelined parallel stream processor. An embodiment of the method comprises defining, on a computing device, a processing operation designating processes to be implemented in hardware as part of said pipelined parallel stream processor; defining, on a computing device, a graph representing said processing operation as a parallel structure in the time domain as a function of clock cycles, said graph comprising at least one data path to be implemented as a hardware design for said pipelined parallel stream processor and comprising a plurality of branches configured to enable data values to be streamed therethrough, the branches of the or each data path being represented as comprising at least one input, at least one output, at least one discrete object corresponding directly to a hardware element to be implemented in hardware as part of said pipelined parallel stream processor, the or each discrete object being operable to execute a function for one or more clock cycles and having a predefined latency associated therewith, said predefined latency representing the time required for said hardware element to execute said function; said data values propagating through said data path from the at least one input to the at least one output as a function of increasing clock cycle; defining, on a computing device, the at least one data path and associated latencies of said graph as a set of algebraic linear inequalities; solving, on a computing device, said set of linear inequalities; optimising, on a computing device, the at least one data path in said graph using said solved linear inequalities to produce an optimised graph; and utilising, on a computing device, said optimised graph to define an optimised hardware design for implementation in hardware as said pipelined parallel stream processor.08-23-2012
20110320771INSTRUCTION UNIT WITH INSTRUCTION BUFFER PIPELINE BYPASS - A circuit arrangement and method selectively bypass an instruction buffer for selected instructions so that bypassed instructions can be dispatched without having to first pass through the instruction buffer. Thus, for example, in the case that an instruction buffer is partially or completely flushed as a result of an instruction redirect (e.g., due to a branch mispredict), instructions can be forwarded to subsequent stages in an instruction unit and/or to one or more execution units without the latency associated with passing through the instruction buffer.12-29-2011
20100199070PROGRAMMABLE FILTER PROCESSOR - A programmable filter processor which is adaptable to different filtering algorithms, a plurality of different software algorithms being executable, the programmable filter processor including a logic unit which includes a plurality of pipeline stages; a first memory in which the software algorithms are stored; a second memory in which raw data and parameters for the different filter algorithms are stored; and an address generating unit which is controllable via a program counter, the address generating unit being developed to generate control commands for the second memory and the logic unit.08-05-2010
20110060890STREAM DATA GENERATING METHOD, STREAM DATA GENERATING DEVICE AND A RECORDING MEDIUM STORING STREAM DATA GENERATING PROGRAM - A stream data generating method for a computer system for generating stream data having time information applied thereto in a time series order and processing the generated stream data on the basis of a registered query. The computer system includes a storage for storing therein query information indicative of a plurality of sorts of constituent elements forming stream data corresponding to the query on the basis of the query and a stream definition indicative of the plurality of constituent elements, a data generator for generating and transmitting stream data; and a stream data processor for processing the stream data transmitted from the data generator. The data generator a less quantity of stream data to be transmitted to the stream data processor on the basis of the query information.03-10-2011
20130246737SIMD Compare Instruction Using Permute Logic for Distributed Register Files - Mechanisms, in a data processing system comprising a single instruction multiple data (SIMD) processor, for performing a data dependency check operation on vector element values of at least two input vector registers are provided. Two calls to a simd-check instruction are performed, one with input vector registers having a first order and one with the input vector registers having a different order. The simd-check instruction performs comparisons to determine if any data dependencies are present. Results of the two calls to the simd-check instruction are obtained and used to determine if any data dependencies are present in the at least two input vector registers. Based on the results, the SIMD processor may perform various operations.09-19-2013
712202000 Stack based computer 15
20110289298SEMICONDUCTOR CIRCUIT AND DESIGNING APPARATUS - A semiconductor circuit includes a memory which stores data; a processing device which executes a program, writes argument data of a function of the program into the memory referring to an address stored in a stack pointer, when a value of a program counter, which indicates an address of the program under execution, reaches a hardware accelerator starting address, and outputs the address stored in the stack pointer; and a hardware accelerator which receives the address of the stack pointer from the processing device, when a value of the program counter of the processing device reaches the hardware accelerator starting address, reads the argument data of the function from the memory referring to the address stored in the stack pointer, and executes the function implemented in hardware using the argument data.11-24-2011
20080209171System and Method For Managing a Register-Based Stack of Operand Tags - A virtual machine in a processing system manages type information for operands. In one embodiment, the virtual machine accomplishes the following results through execution of a single instruction: adding an operand tag to a tag stack, and updating a stack pointer for the tag stack to recognize the addition of the operand tag to the tag stack. The single instruction may be a shift instruction, for example. The tag stack may reside in a tag stack register, and each operand tag may indicate whether a corresponding operand on an operand stack is to be treated as a reference operand or a non-reference operand. Other embodiments are described and claimed.08-28-2008
20110271081MULTIMEDIA PLATFORM - A multimedia platform is discussed, which includes a first stacking unit including a first substrate and a multimedia processor, wherein the first substrate and the multimedia processor are stacked on the first stacking unit, a pattern and a via hole are formed on the first substrate, and the multimedia processor is mounted on top of the first substrate; a second stacking unit including a second substrate and a plurality of storage devices, wherein the second substrate and the plurality of storage devices are stacked on the second stacking unit, a pattern and a via hole being formed on the second substrate, and the plurality of storage devices are mounted on top of the second substrate; and at least one solder ball arranged on the first stacking unit, the at least one solder ball allowing the first substrate to be coupled to the second substrate.11-03-2011
20120297167EFFICIENT CALL RETURN STACK TECHNIQUE - A processor, method, and medium for implementing a call return stack within a pipelined processor. A stack head register is used to store a copy of the top entry of the call return stack, and the stack head register is accessed by the instruction fetch unit on each fetch cycle. If a fetched instruction is decoded as a return instruction, the speculatively read address from the static register is utilized as a target address to fetch subsequent instructions and the address at the second entry from the top of the call return stack is written to the stack head register.11-22-2012
20080288750Small barrier with local spinning - A barrier with local spinning. The barrier is described as a barrier object having a bit vector embedded as a pointer. If the vector bit is zero, the object functions as a counter; if the vector bit is one, the object operates as a pointer to a stack. The object includes the total number of threads required to rendezvous at the barrier to trigger release of the threads. The object points to a stack block list that describes each thread that has arrived at the barrier. Arriving at the barrier involves reading the top stack block, pushing onto the list a stack block for the thread that just arrived, decrementing the thread count, and spinning on corresponding local memory locations or timing out and blocking. When the last thread arrives at the barrier, the barrier is reset and all threads at the barrier are awakened for the start of the next process.11-20-2008
20090198959SCALABLE LINK STACK CONTROL METHOD WITH FULL SUPPORT FOR SPECULATIVE OPERATIONS - A computer implemented method, a processor chip, a computer program product, and a data processing system managing a link stack. The data processing system utilizes speculative pushes onto and pops from the link stack. The link stack comprises a set of entries, and each entry comprises a set of state bits. A speculative push of a first instruction is received onto the data stack, and the first instruction is stored into a first entry of the set of entries. A first bit is set to indicate that the first instruction is a valid instruction. A second bit is set to indicate that the first instruction has been speculatively pushed onto the link stack. The link stack pointer control is updated to indicate that the first entry is a top-of-data stack entry.08-06-2009
20100228950MICROPROCESSOR WITH FAST EXECUTION OF CALL AND RETURN INSTRUCTIONS - A microprocessor includes an instruction set architecture, comprising a call instruction type, a return instruction type, and other instruction types. Execution units correctly execute program instructions of the other instruction types. A call/return stack has a plurality of entries arranged in a last-in-first-out manner. The call/return stack is architectural state of the microprocessor not modifiable by program instructions of the other instruction types. The call/return stack is architectural state of the microprocessor indirectly modifiable by program instructions of the call and return instruction types. The microprocessor also includes a fetch unit that fetches program instructions and sends the program instructions of the other instruction types to the execution units to be correctly executed. The fetch unit correctly executes program instructions of the call and return instruction types without sending the program instructions of the call and return instruction types to the execution units to be correctly executed.09-09-2010
20130145122INSTRUCTION PROCESSING METHOD OF NETWORK PROCESSOR AND NETWORK PROCESSOR - The present invention provides an instruction processing method of a network processor and a network processor. The method includes: when executes a pre-added combined function call instruction, adding an address of its next instruction to a stack top of a first stack; judging, according to the combined function call instruction, whether an enable flag of each additional feature is enabled, and if enabled, adding a function entry address corresponding to an additional feature to the stack top of the first stack; and after finishing judging all enable flags, popping a function entry address in the first stack, and executing a function corresponding to a popped function entry address until the address of the next instruction is popped. In the present invention, only one judgment jump instruction needs to be added to a main line procedure to implement function call of enabled additional features, which saves an instruction execution cycle.06-06-2013
20110072241FAST CONCURRENT ARRAY-BASED STACKS, QUEUES AND DEQUES USING FETCH-AND-INCREMENT-BOUNDED AND A TICKET LOCK PER ELEMENT - Implementation primitives for concurrent array-based stacks, queues, double-ended queues (deques) and wrapped deques are provided. In one aspect, each element of the stack, queue, deque or wrapped deque data structure has its own ticket lock, allowing multiple threads to concurrently use multiple elements of the data structure and thus achieving high performance. In another aspect, new synchronization primitives FetchAndIncrementBounded (Counter, Bound) and FetchAndDecrementBounded (Counter, Bound) are implemented. These primitives can be implemented in hardware and thus promise a very fast throughput for queues, stacks and double-ended queues.03-24-2011
20110087861System for High-Efficiency Post-Silicon Verification of a Processor - A post-silicon validation technique is able to craft randomized executable code, with known final outcomes, as a verification test that is executable on a hardware, such as a prototype microprocessor. A verification device is able to generate the test, in the form of programs, in such a way that at the end of the execution, the initial state of the test hardware is restored. Therefore, the final state of such a reversible program is known a priori. The technique may use a program generation algorithm, agnostic to any particular instruction set on the test hardware. In some examples, that algorithm is executed on the test hardware to generate the verification test, which is then executed on that test hardware. In other examples, the verification test is generated on another processor coupled to the test hardware. In either case, the verification test may contain initial and inverse operations determined from the test hardware.04-14-2011
20100023730Circular Register Arrays of a Computer - The invention provides a method and apparatus for eliminating the stack overflow and underflow in a dual stack computer 01-28-2010
20120166766ENHANCED MICROCODE ADDRESS STACK POINTER MANIPULATION - Methods and apparatus for enhanced microcode address stack pointer manipulation are described. In one embodiment, the stacks are invisible to software. In an embodiment, a microcode instruction pointer (UIP) and a next address to be accessed in a microcode storage unit are generated based on an opcode of a microoperation, a marker, and a UIP stack address. The UIP stack address may be generated based on a signal and an immediate field of the microoperation. Other embodiments are also claimed and disclosed.06-28-2012
20120324206COMPUTER OPERATION CONTROL METHOD, PROGRAM, AND SYSTEM - A computer implemented control method, article of manufacture, and computer implemented system for determining whether stack allocation is possible. The method includes: allocating an object created by a method frame to a stack. The allocation is performed in response to: calling a first and second instruction in the method frame; the first instruction causes an escape of the object, and the second instruction cancels the escape of the object; the object does not escape to a thread other than a thread to which the object has escaped, at the point in time when the escape is cancelled; the first instruction has been called before the second instruction is called; and the object does not escape in accordance with an instruction other than the first instruction in the method frame, regardless of whether the object escapes in accordance with the first instruction.12-20-2012
20110271080COMPUTER SYSTEM AND METHOD OF ADAPTING A COMPUTER SYSTEM TO SUPPORT A REGISTER WINDOW ARCHITECTURE - A target computing system 11-03-2011
20110314259OPERATING A STACK OF INFORMATION IN AN INFORMATION HANDLING SYSTEM - A pointer is for pointing to a next-to-read location within a stack of information. For pushing information onto the stack: a value is saved of the pointer, which points to a first location within the stack as being the next-to-read location; the pointer is updated so that it points to a second location within the stack as being the next-to-read location; and the information is written for storage at the second location. For popping the information from the stack: in response to the pointer, the information is read from the second location as the next-to-read location; and the pointer is restored to equal the saved value so that it points to the first location as being the next-to-read location.12-22-2011
712203000 Multiprocessor instruction 21
20100005275MULTIPROCESSING SYSTEM - A multiprocessing system includes a storage part that stores to a memory, a first operating system (OS) task set that is constituted by a combination of a first task and a first OS corresponding to the first task, the first task being designated by an execution instruction; and a task executing part that refers to the first OS task set stored to the memory, loads the OS constituting the first OS task set, and executes the first task designated by the execution instruction.01-07-2010
20090235050DYNAMIC SERVICE MANAGEMENT FOR MULTICORE PROCESSORS - A system, apparatus, method and article to perform dynamic service management for multicore processors are described. The apparatus may include, for example, a processing device having multiple types of processors to process packets. A service manager may dynamically assign executable files for multiple services to the multiple types of processors during execution of the executable files based on packets processed for each service. Other embodiments are described and claimed.09-17-2009
20080209172Selective hardware lock disabling - Controlling a reorder buffer (ROB) to selectively perform functional hardware lock disabling (HLD) is described. One apparatus embodiment includes a unit to enable an ROB to selectively disable a lock upon Identifying a lock acquire operation (LAO) associated with a critical section (CS) entry point, a unit to selectively retire the LAO, a unit to cause the ROB to selectively disable the lock, and a unit to snoop a buffer. The apparatus may, based on the snooping, selectively abort a transaction associated with the CS.08-28-2008
20100005274VIRTUAL FUNCTIONAL UNITS FOR VLIW PROCESSORS - A virtual functional unit design is presented that is employed in a statically scheduled VLIW processor “Virtual” views of the function unit appear to the processor scheduler that exceed the number of physical instantiations of the functional unit. As a result, significant processor performance improvements can be achieved for those types of functional units that are too difficult or too costly to physically duplicate. By providing different virtual views to the different clusters of a VLIW processor, the compiler/scheduler can generate more efficient code for the processor, than a processor without virtual views and the physical unit restricted to a subset of the processor's clusters. The compiler/scheduler guarantees that the restrictions with respect to scheduling of operations for functional units with multiple virtual views is met. NON-clustered processors also benefit from virtual views. By providing multiple virtual views in multiple issue slots of a physical function unit, the compiler/scheduler has more freedom to schedule operations for the functional unit.01-07-2010
20110125986Reducing inter-task latency in a multiprocessor system - A method of reducing inter-task latency for software comprising a sequence of instructions including a synchronous remote procedure call to be executed on a multiprocessor system comprising a calling processor and at least one remote engine. The method comprises the steps of: inputting the software; inputting a runtime resource description describing a runtime environment of the multiprocessor system; identifying the synchronous remote procedure call in the sequence of instructions; replacing the synchronous remote procedure call in the sequence of instructions with an initiation instruction and a wait instruction to generate a substitute sequence of instructions; reordering the substitute sequence of instructions with reference to the runtime resource description and the dependencies to generate a reordered sequence of instructions; and outputting the reordered sequence of instructions.05-26-2011
20090089546Multiple multi-threaded processors having an L1 instruction cache and a shared L2 instruction cache - In general, in one aspect, the disclosure describes a processor that includes an instruction store to store instructions of at least a portion of at least one program and multiple engines coupled to the shared instruction store. The engines provide multiple execution threads and include an instruction cache to cache a subset of the at least the portion of the at least one program from the instruction store, with different respective portions of the engine's instruction cache being allocated to different respective ones of the engine threads.04-02-2009
20080263325SYSTEM AND STRUCTURE FOR SYNCHRONIZED THREAD PRIORITY SELECTION IN A DEEPLY PIPELINED MULTITHREADED MICROPROCESSOR - A microprocessor and system with improved performance and power in simultaneous multithreading (SMT) microprocessor architecture. The microprocessor and system includes a process wherein the processor has the ability to select instructions from one thread or another in any given processor clock cycle. Instructions from each, thread may be assigned selection priorities at multiple decision points in a processor in a given cycle dynamically. The thread priority is based on monitoring performance behavior and activities in the processor. In the exemplary embodiment, the present invention discloses a microprocessor and system for synchronizing thread priorities among multiple decision points throughout the micro-architecture of the microprocessor. This system and method for synchronizing thread priorities allows each thread priority to he in sync and aware of the status of other thread priorities at various decision points within the microprocessor.10-23-2008
20090198960DATA PROCESSING SYSTEM, PROCESSOR AND METHOD THAT SUPPORT PARTIAL CACHE LINE READS - According to a method of data processing in a multiprocessor data processing system, in response to a processor request to read a target granule of a target cache line of data containing multiple granules, a processing unit originates on an interconnect of the multiprocessor data processing system a partial read request that requests permission to read only the target granule of the target cache line. In response to a combined response to the partial read request indicating success, the combined response representing a system-wide response to the partial read request, the processing unit receives the target granule of the target cache line, supplies the target granule to a requesting processor core, and updates a coherency state of the target granule while retaining a coherency state of at least one other granule of the target cache line.08-06-2009
20090198961Cross-Thread Interrupt Controller for a Multi-thread Processor - An interrupt controller for a dual thread processor has for a first thread, an interrupt request register accessible to the second thread, an interrupt count accessible to the second thread, and an interrupt acknowledge accessible to the first thread. Additionally, the interrupt controller has, for a second thread, an interrupt request register accessible to the first thread, an interrupt count accessible to the first thread, and an interrupt acknowledge accessible to the second thread. Each interrupt controller separately has a counter for each request which increments upon assertion of a request and decrements upon assertion of an acknowledgement.08-06-2009
20100228951PARALLEL PROCESSING MANAGEMENT FRAMEWORK - The present disclosure includes a management framework system for processing a parallel task. The framework includes a job package, a job submitter, task trackers, communicators, a plurality of processors, and a node service. The job package has a bundle of implementations defined by a user and an input data domain. The job submitter module has a splitter interface and a reducer interface. The job submitter is configured to split the input data domain into a plurality of sub-data domains. In addition, the job submitter module is configured to send and receive the plurality of sub-data domains to a plurality of processors. The one or more processors are configured to execute parallel tasks on sub-data domains. The management framework separates user-defined applications from parallel execution such that user-implementations are separated from management framework implementations.09-09-2010
20090319757ARCHITECTURE FOR LOCAL PROGRAMMING OF QUANTUM PROCESSOR ELEMENTS USING LATCHING QUBITS - An architecture for a quantum processor may include a set of superconducting flux qubits operated as computation qubits and a set of superconducting flux qubits operated as latching qubits. Latching qubits may include a first closed superconducting loop with serially coupled superconducting inductors, interrupted by a split junction loop with at least two Josephson junctions; and a clock signal input structure configured to couple clock signals to the split junction loop. Flux-based superconducting shift registers may be formed from latching qubits and sets of dummy latching qubits. The devices may include clock lines to clock signals to latch the latching qubits. Thus, latching qubits may be used to program and configure computation qubits in a quantum processor.12-24-2009
20100318768CHANNEL-BASED RUNTIME ENGINE FOR STREAM PROCESSING - An apparatus, including a memory device for storing a program, and a processor in communication with the memory device, the processor operative with the program to facilitate design of a stream processing flow that satisfies an objective, wherein the stream processing flow includes at least three processing groups, wherein a first processing group includes a data source and an operator, a second processing group includes a data source and an operator and a third processing group includes a join operator at its input and another operator, wherein data inside each group is organized by channels and each channel is a sequence of data, wherein an operator producing a data channel does not generate new data for the channel until old data of the channel is received by all other operators in the same group, and wherein data that flows from the first and second groups to the third group is done asynchronously and is stored in a queue if not ready for processing by an operator of the third group, and deploy the stream processing flow in a concurrent computing system to produce an output.12-16-2010
20130138923MULTITHREADED DATA MERGING FOR MULTI-CORE PROCESSING UNIT - Described herein are methods, systems, apparatuses and products for multithreaded data merging for multi-core central and graphical processing units. An aspect provides for executing a plurality of threads on at least one central processing unit comprising a plurality of cores, each thread comprising an input data set (IDS) and being executed on one of the plurality of cores; initializing at least one local data set (LDS) comprising a size and a threshold; inserting IDS data elements into the at least one LDS such that each inserted IDS data element increases the size of the at least one LDS; and merging the at least one LDS into a global data set (GDS) responsive to the size of the at least one LDS being greater than the threshold. Other aspects are disclosed herein.05-30-2013
20100185833MULTIPROCESSOR CONTROL APPARATUS, MULTIPROCESSOR CONTROL METHOD, AND MULTIPROCESSOR CONTROL CIRCUIT - An object of the invention is to reduce the electric power consumption resulting from temporarily activating a processor requiring a large electric power consumption, out of a plurality of processors. A multiprocessor system (07-22-2010
20100122065System and Method for Large-Scale Data Processing Using an Application-Independent Framework - A large-scale data processing system and method for processing data in a distributed and parallel processing environment. The system includes an application-independent framework for processing data having a plurality of application-independent map modules and reduce modules. These application-independent modules use application-independent operators to automatically handle parallelization of computations across the distributed and parallel processing environment when performing user-specified data processing operations. The system also includes a plurality of user-specified, application-specific operators, for use with the application-independent framework to perform a user-specified data processing operation on a user-specified set of input files. The application-specific operators include: a map operator and a reduce operator. The map operator is applied by the application-independent map modules to input data in the user-specified set of input files to produce intermediate data values. The reduce operator is applied by the application-independent reduce modules to process the intermediate data values to produce final output data.05-13-2010
20080229065Configurable Microprocessor - A configurable microprocessor which combines a plurality of corelets into a single microprocessor core to handle high computing-intensive workloads. The process first selects two or more corelets in the plurality of corelets. The process combines resources of the two or more corelets to form combined resources, wherein each combined resource comprises a larger amount of a resource available to each individual corelet. The process then forms a single microprocessor core from the two or more corelets by assigning the combined resources to the single microprocessor core, wherein the combined resources are dedicated to the single microprocessor core, and wherein the single microprocessor core processes instructions with the dedicated combined resources.09-18-2008
20110055522REQUEST CONTROL DEVICE, REQUEST CONTROL METHOD AND ASSOCIATED PROCESSORS - A request control device, request control method, and a multiprocessor cooperation architecture. The request control device is connected to a request storage module and includes a comparing means and an identifier means. The comparing means is configured to determine if an incoming first queue unit corresponds to the same message with a queue unit that has existed in the request storage module. The identifier setting means is configured to set a save identifier of the queue unit that has existed in the request storage module to indicate not to save a state associated with the message if the first queue unit corresponds to the same message with the queue unit that has existed in the request storage module. According to the technical solution of the invention, the access to the memory caused by saving/loading the states is reduced and thereby increases the processing speed of the processor.03-03-2011
20120131310Methods And Apparatus For Independent Processor Node Operations In A SIMD Array Processor - A control processor is used for fetching and distributing single instruction multiple data (SIMD) instructions to a plurality of processing elements (PEs). One of the SIMD instructions is a thread start (Tstart) instruction, which causes the control processor to pause its instruction fetching. A local PE instruction memory (PE Imem) is associated with each PE and contains local PE instructions for execution on the local PE. Local PE Imem fetch, decode, and execute logic are associated with each PE. Instruction path selection logic in each PE is used to select between control processor distributed instructions and local PE instructions fetched from the local PE Imem. Each PE is also initialized to receive control processor distributed instructions. In addition, local hold generation logic is associated with each PE. A PE receiving a Tstart instruction causes the instruction path selection logic to switch to fetch local PE Imem instructions.05-24-2012
20120179896METHOD AND APPARATUS FOR A HIERARCHICAL SYNCHRONIZATION BARRIER IN A MULTI-NODE SYSTEM - A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.07-12-2012
20130019083Redundant Transactional MemoryAANM Cain, III; Harold W.AACI HartsdaleAAST NYAACO USAAGP Cain, III; Harold W. Hartsdale NY USAANM Daly; David M.AACI Croton on HudsonAAST NYAACO USAAGP Daly; David M. Croton on Hudson NY USAANM Ekanadham; KattamuriAACI Mohegan LakeAAST NYAACO USAAGP Ekanadham; Kattamuri Mohegan Lake NY USAANM Huang; Michael C.AACI RochesterAAST NYAACO USAAGP Huang; Michael C. Rochester NY USAANM Moreira; Jose E.AACI IrvingtonAAST NYAACO USAAGP Moreira; Jose E. Irvington NY USAANM Serrano; Mauricio J.AACI BronxAAST NYAACO USAAGP Serrano; Mauricio J. Bronx NY US - A mechanism is provided for redundant execution of a set of instructions. A redundant execution begin (rbegin) instruction to be executed by a first hardware thread on the first processor is identified in the set of instructions. The set of instructions immediately after the rbegin instruction are executed on the first hardware thread and on a second hardware thread. Responsive to both the first processor and the second processor ending execution of the set of instructions, responsive to a first set of cache lines in a first speculative store matching a second set of cache lines in a second speculative store, and responsive to a first set of register states in a first status register matching a second set of register states in a second status register, dirty lines in the first speculative store are committed thereby committing a redundant transaction state to an architectural state.01-17-2013
20120066477ADVANCED PROCESSOR WITH MECHANISM FOR PACKET DISTRIBUTION AT HIGH LINE RATE - An advanced processor comprises a plurality of multithreaded processor cores each having a data cache and instruction cache. A data switch interconnect is coupled to each of the processor cores and configured to pass information among the processor cores. A messaging network is coupled to each of the processor cores and a plurality of communication ports. In one aspect of an embodiment of the invention, the data switch interconnect is coupled to each of the processor cores by its respective data cache, and the messaging network is coupled to each of the processor cores by its respective message station. Advantages of the invention include the ability to provide high bandwidth communications between computer systems and memory in an efficient and cost-effective manner.03-15-2012

Patent applications in class ARCHITECTURE BASED INSTRUCTION PROCESSING

Patent applications in all subclasses ARCHITECTURE BASED INSTRUCTION PROCESSING