Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Operation

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

712001000 - PROCESSING ARCHITECTURE

712028000 - Distributed processing system

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712031000 Master/slave 21
Entries
DocumentTitleDate
20130031336EXTERNAL INTRINSIC INTERFACE - An external intrinsic interface. A processor may include a core including a plurality of functional units, an intrinsic module located outside the core, and an interface module to perform relaying between the intrinsic module and a functional unit, among the plurality of functional units.01-31-2013
20130031335USING PREDICTIVE DETERMINISM WITHIN A STREAMING ENVIRONMENT - Techniques are described for transmitting predicted output data on a processing element in a stream computing application instead of processing currently received input data. The stream computing application monitors the output of a processing element and determines whether its output is predictable, for example, if the previously transmitted output values are within a predefined range or if one or more input values correlate with the same one or more output values. The application may then generate a predicted output value to transmit from the processing element instead of transmitting a processed output value based on current input values. The predicted output value may be, for example, an average of the previously transmitted output values or a previously transmitted output value that was transmitted in response to a previously received input value that is similar to a currently received input value. Moreover, the processing element or elements that transmit the predicted output data may be upstream from the processing element with the predictable output.01-31-2013
20130031334Automatically Routing Super-Compute Interconnects - A mechanism is provided for automatically routing network interconnects in a data processing system. A processor in a node of a plurality of nodes receives network topology from neighboring nodes in the plurality of nodes within the data processing system. The processor constructs a system node map that identifies a physical connectivity between the node and the neighboring nodes. The processor programs a switch in the node with a connectivity map that indicates a set of point-to-point connections with the neighboring nodes. The set of point-to-point connections comprise locally-connected connections and pass-through connections.01-31-2013
20100049944Processor integrated circuit and product development method using the processing integrated circuit - A processor integrated circuit according to the present invention comprises low-speed and high-speed computing units (02-25-2010
20110202745METHOD AND APPARATUS FOR COMPUTING MASSIVE SPATIO-TEMPORAL CORRELATIONS USING A HYBRID CPU-GPU APPROACH - A CPU may select a variable from a variable set as a dependent variable. The variable set may be part of the data structure that includes a plurality of vector values, a vector value associated with a variable set of n number of variables, and each variable of the variable set having a variable value. The number of dependent variable steps for the dependent variable may be determined. The number of the vector values in a dependent variable step is determined as being number of independent variables. A function is mapped to a plurality of thread processors, and each thread processor is assigned for the function to be performed on each one of the independent variables for each of the dependent variable steps.08-18-2011
20100077179METHOD AND APPARATUS FOR COHERENT DEVICE INITIALIZATION AND ACCESS - A method and apparatus for enabling usage of an accelerator device in a processor socket is herein described. A set of inter-processor messages is utilized to initialize a configuration/memory space of the accelerator device. As an example, a first set of inter-processor interrupts (IPIs) is sent to indicate a base address of a memory space and a second set of IPIs is sent to indicate a size of the memory space. Furthermore, similar methods and apparatus' are herein described for dynamic reconfiguration of an accelerator device in a processor socket.03-25-2010
20130086356Distributed Data Scalable Adaptive Map-Reduce Framework - A method for generating a distributed data scalable adaptive map-reduce framework for at least one multi-core cluster. The method includes partitioning a cluster into at least one computational group, determining at least one key-group leader within each computational group, performing a local combine operation at each computational group, performing a global combine operation at each of the at least one key-group leader within each computational group based on a result from the local combine operation, and performing a global map-reduce operation across the at least one key-group leader within each computational group.04-04-2013
20130086355Distributed Data Scalable Adaptive Map-Reduce Framework - A method, an apparatus and an article of manufacture for generating a distributed data scalable adaptive map-reduce framework for at least one multi-core cluster. The method includes partitioning a cluster into at least one computational group, determining at least one key-group leader within each computational group, performing a local combine operation at each computational group, performing a global combine operation at each of the at least one key-group leader within each computational group based on a result from the local combine operation, and performing a global map-reduce operation across the at least one key-group leader within each computational group.04-04-2013
20130042088Collective Operation Protocol Selection In A Parallel Computer - Collective operation protocol selection in a parallel computer that includes compute nodes may be carried out by calling a collective operation with operating parameters; selecting a protocol for executing the operation and executing the operation with the selected protocol. Selecting a protocol includes: iteratively, until a prospective protocol meets predetermined performance criteria: providing, to a protocol performance function for the prospective protocol, the operating parameters; determining whether the prospective protocol meets predefined performance criteria by evaluating a predefined performance fit equation, calculating a measure of performance of the protocol for the operating parameters; determining that the prospective protocol meets predetermined performance criteria and selecting the protocol for executing the operation only if the calculated measure of performance is greater than a predefined minimum performance threshold.02-14-2013
20100100705Distributed Processing System, Distributed Processing Method and Computer Program - A distributed processing system includes at least two processing elements (04-22-2010
20090043987Operation distribution method and system using buffer - Provided is an operation distribution method and system using a buffer. The operation distribution system includes a buffer, a first operation device performing a first operation and storing a result of the first operation performed by the first operation device in the buffer, and a second operation device performing a second operation using the result of the first operation stored in the buffer, thereby reducing the time required to perform operations.02-12-2009
20090307463INTER-PROCESSOR, COMMUNICATION SYSTEM, PROCESSOR, INTER-PROCESSOR COMMUNICATION METHOD, AND COMMUNICATION METHOD - An inter-processor communication system includes processors and a transfer device that, upon receiving a multicast packet from any of the processors, transfers the packet to processors designated in the packet as destinations among the processors. Each processor includes: a memory unit; a holding unit which holds position information indicating a reference position in the memory unit; a transmitting unit which transmits to the transfer device a multicast packet representing data and an adjustment value indicating an area for writing data that was set for use by its own processor by using the reference position; and a receiving unit which, upon receiving a multicast packet that has been transmitted by way of the transfer device, determines a write position in the memory unit based on the adjustment value in the packet and the position information and stores the data in the packet in that write position.12-10-2009
20090106530SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR GENERATING A RAY TRACING DATA STRUCTURE UTILIZING A PARALLEL PROCESSOR ARCHITECTURE - A system, method, and computer program product are provided for generating a ray tracing data structure utilizing a parallel processor architecture. In operation, a global set of data is received. Additionally, a data structure is generated utilizing a parallel processor architecture including a plurality of processors. Such data structure is adapted for use in performing ray tracing utilizing the parallel processor architecture, and is generated by allocating the global set of data among the processors such that results of processing of at least one of the processors is processed by another one of the processors.04-23-2009
20130073832PERFORMING A DETERMINISTIC REDUCTION OPERATION IN A PARALLEL COMPUTER - A parallel computer that includes compute nodes having computer processors and a CAU (Collectives Acceleration Unit) that couples processors to one another for data communications. In embodiments of the present invention, deterministic reduction operation include: organizing processors of the parallel computer and a CAU into a branched tree topology, where the CAU is a root of the branched tree topology and the processors are children of the root CAU; establishing a receive buffer that includes receive elements associated with processors and configured to store the associated processor's contribution data; receiving, in any order from the processors, each processor's contribution data; tracking receipt of each processor's contribution data; and reducing, the contribution data in a predefined order, only after receipt of contribution data from all processors in the branched tree topology.03-21-2013
20130067198COMPRESSING RESULT DATA FOR A COMPUTE NODE IN A PARALLEL COMPUTER - A parallel computer is provided that includes a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID.03-14-2013
20090249031INFORMATION PROCESSING APPARATUS AND ERROR PROCESSING - An information processing apparatus includes a first processing unit, a second processing unit, and a common storage unit that is commonly accessed by the first processing unit and the second processing unit. The first processing unit writes a request in the common storage unit for requesting the second processing unit to perform a certain process, and notifies the second processing unit of the request. The second processing unit writes a notification in the common storage unit indicating the process is completed in response to the request.10-01-2009
20090235049METHOD AND APPARATUS FOR QR-FACTORIZING MATRIX ON A MULTIPROCESSOR SYSTEM - The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result. The present invention enables a multiprocessor system having a high computing capability to be applied to the matrix QR factorization having a large amount of computation tasks.09-17-2009
20090006809NON-DISRUPTIVE CODE UPDATE OF A SINGLE PROCESSOR IN A MULTI-PROCESSOR COMPUTING SYSTEM - Updating code of a single processor in a multi-processor system includes halting transactions processed by a first processor in the system and processing of transactions by a second processor in the system are maintained. The first processor then receives new code and an operating system running on the first processor is terminated whereby all processes and threads being executed by the first processor are terminated. Execution of a self-reset of the first processor is commenced and interrupts associated with the first processor are disabled. Only those system resources exclusively associated with the first processor are reset, and memory transactions associated with the first processor are disabled. An image of the new code is copied into memory associated with the first processor, registers associated with the first processor are reset and the new code is booted by the first processor.01-01-2009
20130166879MULTIPROCESSOR SYSTEM AND SYNCHRONOUS ENGINE DEVICE THEREOF - The invention discloses a multiprocessor System and synchronous engine device thereof. the synchronous engine includes: a plurality of storage queues, wherein one of the queues stores all synchronization primitives from one of the processors; a plurality of scheduling modules, selecting the synchronization primitives for execution from the plurality of storage queues and then according to the type of the synchronization primitive transmitting the selected synchronization primitives to corresponding processing modules for processing, scheduling modules corresponding in a one-to-one relationship with the storage queues; a plurality of processing modules, receiving the transmitted synchronization primitives to execute different functions; a virtual synchronous memory structure module, using small memory space and mapping main memory spaces of all processors into a synchronization memory structure to realize the function of all synchronization primitives through a control logic; a main memory port, communicating with virtual synchronous memory structure module to read and write the main memory of all processors, and initiate an interrupt request to processors; a configuration register, storing various configuration information required by processing modules.06-27-2013
20080294875PARALLEL PROCESSOR FOR EFFICIENT PROCESSING OF MOBILE MULTIMEDIA - Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.11-27-2008
20120023309ACHIEVING ULTRA-HIGH AVAILABILITY USING A SINGLE CPU - Techniques for achieving high-availability using a single processor (CPU). In a system comprising a multi-core processor, at least two partitions may be configured with each partition being allocated one or more cores of the multiple cores. The partitions may be configured such that one partition operates in active mode while another partition operates in standby mode. In this manner, a single processor is able to provide active-standby functionality, thereby enhancing the availability of the system comprising the processor.01-26-2012
20100095089MULTIPROCESSOR SYSTEM WITH MULTIPORT MEMORY - A multiprocessor system includes first and second processors independently executing application functions associated with one or more applications using an open operating system (OS), a multiport memory, a first nonvolatile memory coupled to the first processor via a first bus, and a second nonvolatile memory coupled to the second processor via a second bus. The multiport memory includes a memory cell array divided into a plurality of memory banks including a shared memory bank commonly accessed by the first and second processors via respective first and second ports, and an internal register disposed outside the memory cell array and configured to control access authority to the shared memory bank by the first and second processors, wherein different application functions are independently executed in parallel by the first and second processors using the multiport memory as a data transfer mechanism.04-15-2010
20110283089 MODULARIZED MICRO PROCESSOR DESIGN - A method and system of modularized design for a microprocessor are disclosed. Embodiments disclose modularization techniques, whereby the overall design of the execution unit of the processor is split into different functional modules. The modules are configured to function independent of each other. The microprocessor comprises different components such as a cache logic (11-17-2011
20110283088DATA PROCESSING APPARATUS AND DATA PROCESSING METHOD - A data processing apparatus includes a connecting unit that distributes the plurality of processing modules over the stages, and connects the plurality of processing modules such that a plurality of partial data are processed in parallel. The data processing apparatus detects, with respect to at least a part of the stages, a ratio of an amount of data for which processing in the subsequent stage has been executed, as a passage rate, acquires a processing time for a data amount to be processed in each stage, for which the passage rate was detected, based on the passage rate, and determines the number of processing modules distributed to each stage based on the data amount.11-17-2011
20110283087IMAGE FORMING APPARATUS, IMAGE FORMING METHOD, AND COMPUTER READABLE MEDIUM STORING CONTROL PROGRAM THEREFOR - A first processing unit is implemented by executing a first application program by using an internal computer in an environment where a first operating system is operating. The first processing unit performs a first process or an external service call in accordance with instruction information describing a process to be executed. A second processing unit is implemented by executing a second application program by using the internal computer or an additional computer connected to the internal computer in an environment where a second operating system is operating. The second processing unit performs a second process when instructed by an external service call to execute the second process. When the instruction information includes information specifying the second process as the process to be executed, a transfer unit updates the information included in the instruction information, and transfers the updated instruction information to the first processing unit.11-17-2011
20110283086STREAMING PHYSICS COLLISION DETECTION IN MULTITHREADED RENDERING SOFTWARE PIPELINE - A circuit arrangement, program product and method stream level of detail components between hardware threads in a multithreaded circuit arrangement to perform physics collision detection. Typically, a master hardware thread, e.g., a component loader hardware thread, is used to retrieve level of detail data for an object from a memory and stream the data to one or more slave hardware threads, e.g., collision detection hardware threads, to perform the actual collision detection. Because the slave hardware threads receive the level of detail data from the master thread, typically the slave hardware threads are not required to load the data from the memory, thereby reducing memory bandwidth requirements and accelerating performance.11-17-2011
20130091341PARALLEL COMPUTER ARCHITECTURE FOR COMPUTATION OF PARTICLE INTERACTIONS - A computation system for computing interactions in a multiple-body simulation includes an array of processing modules arranged into one or more serially interconnected processing groups of the processing modules. Each of the processing modules includes storage for data elements and includes circuitry for performing pairwise computations between data elements each associated with a spatial location. Each of the pairwise computations makes use of a data element from the storage of the processing module and a data element passing through the serially interconnected processing modules. Each of the processing modules includes circuitry for selecting the pairs of data elements according to separations between spatial locations associated with the data elements.04-11-2013
20100095090BARRIER SYNCHRONIZATION METHOD, DEVICE, AND MULTI-CORE PROCESSOR - A barrier synchronization device for realizing barrier synchronization of at least 2 processor cores belonging to a same synchronization group among a plurality of processor cores is included in a multi-core processor having a plurality of processor cores, and when two or more processor cores in that multi-core processor belong to the same synchronization group, the included barrier synchronization device is used for realizing barrier synchronization.04-15-2010
20090089542SYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR PERFORMING A SCAN OPERATION - A system, method, and computer program product are provided for efficiently performing a scan operation. In use, an array of elements is traversed by utilizing a parallel processor architecture. Such parallel processor architecture includes a plurality of processors each capable of physically executing a predetermined number of threads in parallel. For efficiency purposes, the predetermined number of threads of at least one of the processors may be executed to perform a scan operation involving a number of the elements that is a function (e.g. multiple, etc.) of the predetermined number of threads.04-02-2009
20110296139Performing A Deterministic Reduction Operation In A Parallel Computer - Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.12-01-2011
20090193229HIGH-INTEGRITY COMPUTATION ARCHITECTURE WITH MULTIPLE SUPERVISED RESOURCES - The present invention relates to computers, the undetected errors of which have a very low rate of occurrence (approximately 1007-30-2009
20110191568INFORMATION PROCESSING APPARATUS AND METHOD OF CONTROLLING THE SAME - An information processing apparatus is provided. The apparatus includes a communication unit configured to communicate with another apparatus, a main processing unit capable of controlling communication processing by the communication unit and other processing, a communication processing unit capable of controlling the communication processing by the communication unit and a deciding unit configured to decide, during communication by the communication unit and based on one of a transfer condition of communication by the communication unit and priority of data to be communicated, which one of the main processing unit and the communication processing unit should control the communication processing by the communication unit.08-04-2011
20100169607RECONFIGURABLE CIRCUIT, ITS DESIGN METHOD, AND DESIGN APPARATUS - A reconfigurable circuit design method includes an input step of inputting design data of a default configuration of a reconfigurable circuit including a plurality of processor elements which perform processing and a first generation step of generating design data obtained by modifying at least one of the processor elements in the reconfigurable circuit with the default configuration.07-01-2010
20090319756Duplexed operation processor control system, and duplexed operation processor control method - The present invention provides a duplexed operation processor control system that includes operation processors, an I/O device, and at least one communication path that couples the operation processors to the I/O device, and at least one communication path that couples the operation processors with each other. The duplexed operation processor control system switches over either of the operation processors to be a primary operation processor that executes a control operation for a control target, and the other to be a secondary operation processor that is in a stand-by state, and the secondary operation processor snoops control data synchronously when the primary operation processor acquires the control data from the control target.12-24-2009
20090125703Context Switching on a Network On Chip - Data processing on a network on chip (‘NOC’) that includes integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controllers, with each IP block adapted to a router through a memory communications controller and a network interface controller, with each IP block also adapted to the network by a low latency, high bandwidth application messaging interconnect comprising an inbox and an outbox, each IP block also including a stack normally used for context switching, the stack access slower than the outbox access, and each IP block further including a processor supporting a plurality of threads of execution, the processor configured to save, upon a context switch, a context of a current thread of execution in memory locations in a memory array in the outbox instead of the stack and lock the memory locations in which the context was saved.05-14-2009
20090177864MULTIPROCESSOR COMPUTING SYSTEMS WITH HETEROGENEOUS PROCESSORS - Heterogeneous processors can cooperate for distributed processing tasks in a multiprocessor computing system. Each processor is operable in a “compatible” mode, in which all processors within a family accept the same baseline command set and produce identical results upon executing any command in the baseline command set. The processors also have a “native” mode of operation in which the command set and/or results may differ in at least some respects from the baseline command set and results. Heterogeneous processors with a compatible mode defined by reference to the same baseline can be used cooperatively for distributed processing by configuring each processor to operate in the compatible mode.07-09-2009
20100125718PARALLEL ANALYSIS OF TIME SERIES DATA - In one aspect, a method of processing time-ordered multi-element data uses a set of computational nodes. In some examples, hundreds or thousands of nodes are used. A set of portions of the data are accepted, for example, from a MD simulation system. Each portion of the data is associated with a corresponding computational node in the plurality of computational nodes, and each portion representing a distinct range of time. Instructions for processing the data are accepted. These instructions include one or more instruction specifying a set of times, a set of elements, an analysis function, and an aggregation function. The accepted data is redistributed from within the portions at each computational node to multiple computational nodes in the plurality of computational nodes, such that data for any element of the specified set of elements is localized to a particular computational node. At each computational node, the trajectory analysis function is applied to the data for each node localized at that computational node. The results of the applying the analysis function are aggregated at each of the computational nodes, including applying the aggregation function to the results.05-20-2010
20100122063INFORMATION PROCESSING APPARATUS AND METHOD - A read-only memory (ROM) includes storage areas used as a processing setting data storage unit, a successful detection rate storage unit, and a processing time storage unit. A central processing unit (CPU) can function as a calculation unit by executing a calculation program stored on the ROM. The successful detection rate storage unit stores a predetermined successful detection rate (the probability of executing subsequent processing based on a result of a current processing). The processing time storage unit stores a predetermined processing time of each processing. The calculation unit calculates a module configuration for executing each processing according to the successful detection rate stored on the successful detection rate storage unit and the processing time stored on the processing time storage unit. The processing setting data storage unit stores setting data of a characteristic amount and a setting data of positional information about image data (the address of the image data).05-13-2010
20110197048DYNAMIC RECONFIGURABLE HETEROGENEOUS PROCESSOR ARCHITECTURE WITH LOAD BALANCING AND DYNAMIC ALLOCATION METHOD THEREOF - A dynamic reconfigurable heterogeneous processor architecture with load balancing and dynamic allocation method thereof is disclosed. The present invention uses a work control logic unit to detect load imbalance between different types of processors, and employs a number of dynamically reconfigurable heterogeneous processors to offload the heavier loaded processors. Hardware utilization of such design can be enhanced, and variation in computation needs among different computation phases can be better handled. To design the dynamic reconfigurable heterogeneous processors, a method of how to choose the basic building blocks and place the routing components is included. With the present invention, performance can be maximized at a minimal hardware cost. Hence the dynamic reconfigurable heterogeneous processor(s) so constructed and the load balancing and dynamic allocation method together will have the best performance at least cost.08-11-2011
20090013154MULTILAYER DISTRIBUTED PROCESSING SYSTEM - The independencies of a plurality of layers executing dividingly a transaction can be easily enhanced. Anode (01-08-2009
20100082942VIRTUALIZATION ACROSS PHYSICAL PARTITIONS OF A MULTI-CORE PROCESSOR (MCP) - Among other things, the disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling/main processing elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements. In accordance with these features, virtualized control threads can traverse the physical boundaries of the MCP to control SPE(s) (e.g., logical partitions having one or more SPEs) in a different physical partition (e.g., different from the physical partition from which the virtualized control threads originated.04-01-2010
20100082941DELEGATED VIRTUALIZATION IN A MULTI-CORE PROCESSOR (MCP) - The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs using program code embodied as a set of virtualized control threads. The arrangement also enables MPEs delegate functionality to one or more groups of SPEs such that those group(s) of SPEs will act as pseudo MPEs. The pseudo MPEs will utilize pseudo virtualized control threads to control the behavior of other groups of SPEs. In a typical embodiment, the apparatus includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.04-01-2010
20090089543INTEGRATED CIRCUIT PERFORMANCE IMPROVEMENT ACROSS A RANGE OF OPERATING CONDITIONS AND PHYSICAL CONSTRAINTS - Methods and apparatus to improve integrated circuit (IC) performance across a range of operating conditions and/or physical constraints are described. In one embodiment, an operating parameter of one or more of processor cores may be adjusted in response to a change in the activity level of processor cores (e.g., the number of active processor cores) and/or a comparison of one or more operating conditions and one or more corresponding threshold values. Other embodiments are also described.04-02-2009
20090187735MICROCONTROLLER HAVING DUAL-CORE ARCHITECTURE - A microcontroller having dual-core architecture is provided. Using a unique hardware configuration of memories, control registers and reset machines, the invention not only reduces hardware cost, but also improves management efficiency and system stability.07-23-2009
20090089544Infrastructure for parallel programming of clusters of machines - GridBatch provides an infrastructure framework that hides the complexities and burdens of developing logic and programming application that implement detail parallelized computations from programmers. A programmer may use GridBatch to implement parallelized computational operations that minimize network bandwidth requirements, and efficiently partition and coordinate computational processing in a multiprocessor configuration. GridBatch provides an effective and lightweight approach to rapidly build parallelized applications using economically viable multiprocessor configurations that achieve the highest performance results.04-02-2009
20080209168Information Processing Apparatus, Process Control Method, and Computer Program - A method and apparatus for improving data processing efficiency with an improved context storage mechanism are provided. In an arrangement where data processing is performed with a plurality of logical processors are allocated to a physical process in a time sharing manner, a context table of a logical processor with the physical processor unapplied thereto is mapped to a logical partition address space of a logical partition to which the logical processor is applied to. The context table is then stored. When the logical processor is not allocated to the physical process, the content of the logical processor can be acquired. Processes such as accessing to the logical processor and program loading are executed without the need for waiting for timing of allocating the logical processor to the physical processor. Data processing efficiency is thus improved.08-28-2008
20090282216HARDWARE ENGINE CONTROL APPARATUS - A hardware engine control apparatus includes: a plurality of hardware engines (HWEs) connected by a control bus, each of the hardware engines executing a series of different kinds of processing; a host control device that outputs control commands for controlling operation of the HWEs to a subordinate control device; and the subordinate control device that has a register, in which the control commands from the host control device is sequentially set, and outputs the control commands set in the register to the control bus at timing based on a clock signal. The HWEs operate according to the control commands output from the subordinate control device.11-12-2009
20100146242Data processing apparatus and method of controlling the data processing apparatus - Provided are a data processing apparatus and a method of controlling the data processing apparatus. The data processing apparatus may select a single stream processor from a plurality of stream processors based on stream processor status information, and input data into the selected stream processor. The stream processor status information may include first status information of a processor core and second status information of at least one internal memory.06-10-2010
20100138634DEVICES, SYSTEMS, AND METHODS TO SYNCHRONIZE PARALLEL PROCESSING OF A SINGLE DATA STREAM - Disclosed are methods and devices, among which is a system that includes one or more pattern-recognition processors, such as in a pattern-recognition cluster. The pattern-recognition processors may be activated to perform a search of a data stream individually using a chip select or in parallel using a universal select signal. In this manner, the plurality of pattern-recognition processors may be enabled concurrently for synchronized processing of the data stream.06-03-2010
20080244227DESIGN STRUCTURE FOR ASYMMETRICAL PERFORMANCE MULTI-PROCESSORS - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design, for allocating processing functions between a primary processor and a secondary processor is disclosed. A primary processor is provided that performs routine processing duties, including execution of application program code, while the secondary processor is in a sleep state. When the load on the primary processor is deemed to be excessive, the secondary processor is awakened from a sleep state and assigned to perform processing functions that would otherwise need to be performed by the primary processor. If temperatures in the system rise above a threshold, the secondary processor is returned to the sleep state.10-02-2008
20080244226Thread migration control based on prediction of migration overhead - A processing system features a first processing core to operate in a first node, a second processing core to operate in a second node, and random access memory (RAM) responsive to the first and second processing cores. The processing system also features control logic to perform operations such as (a) automatically updating a resident set size (RSS) counter to correspond to the RSS for the thread on the first node in response to allocation of a page frame for a thread in the first node, and (b) using the RSS counter to predict migration overhead when determining whether the thread should be migrated from the first processing core to the second processing core. Other embodiments are described and claimed.10-02-2008
20090063817System and Method for Packet Coalescing in Virtual Channels of a Data Processing System in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for packet coalescing in virtual channels of a data processing system. A first processor bundles original data to be transmitted to a destination processor, the original data provided by a first source processor. The first processor transmits the bundle of data to a second processor along a path to the destination processor. The second processor determines if the second processor has additional data destined for the same destination processor, the additional data being provided by a second source processor that is different from the first source processor. Responsive to the second processor having additional data, the second processor unbundles the original data, adds the additional data to the original data, and rebundles the data along with the additional data. Then the second processor transmits the rebundled data to at least one other processor along the path to the destination processor.03-05-2009
20080270752PROCESS ASSIGNMENT TO PHYSICAL PROCESSORS USING MINIMUM AND MAXIMUM PROCESSOR SHARES - A system and method is provided for assigning a plurality of executable processes to a plurality of physical processors in a multi-processor computer system using a minimum processor share and a maximum processor share defined for each executable process. In an embodiment, the method can include allocating shares of total processor time to each executable process in proportion to the minimum processor shares up to the maximum processor shares to form target share allocations. The target share allocations can be used to map processes to the physical processors.10-30-2008
20080270753IMAGE PROCESSING APPARATUS AND METHOD THEREOF - In a case that a precedent queue to supply data to be processed does not include data to be processed, a processor D switches its operation mode to an auxiliary mode to perform a part of processing assigned to a processor A, and issues a request for execution of the part of the processing assigned to the processor A, to the processor A. In response to the request, the processor A notifies the processor D of information to cause the processor D to perform the part of the processing assigned to the processor A, and the processor D performs the part of the processing assigned to the processor A in accordance with the notified information.10-30-2008
20090138676DESIGN STRUCTURES INCLUDING CIRCUITS FOR NOISE REDUCTION IN DIGITAL SYSTEMS - A design structure including a digital system. The digital system includes (a) a first logic circuit and a second logic circuit, (b) a first register, (c) a second register, (d) a third register, (e) a clock generator circuit, and (f) a controller circuit. The first logic circuit is capable of obtaining first data and sending second data. The second logic circuit is capable of obtaining the second data and sending third data. The clock generator circuit is capable of asserting (i) a first register clock signal at a first time point, (ii) a second register clock signal at a second time point, and (iii) a third register clock signal at a third time point. The controller circuit is capable of (i) determining a fourth time point, (ii) determining a fifth time point, (iii) controlling the clock generator circuit to assert the second register clock signal.05-28-2009
20090177863HIERARCHICAL MANAGEMENT OF REALTIME EDGE PROCESSOR - A hierarchical network infrastructure includes an interface that allows a user to define a management hierarchy between a plurality of edge processors. Input is received via the interface designating a management node and a first set of relationships between the management mode and at least one edge processor. A management hierarchy between the management node and the at least one edge processor is generated based on the first set of relationships. Using the management hierarchy, telemetry information can be relayed, hosts can be managed, and the software running on then, and information can be configured to trickled up the chain. Each sub tree of the management hierarchy may have different Access Control for local administrators.07-09-2009
20110145546DEFERRED PAGE CLEARING IN A MULTIPROCESSOR COMPUTER SYSTEM - Processing within a multiprocessor computer system is facilitated by: logically clearing a data page by setting, in association with invalidate page table entry or set storage key processing, a page initialize bit for the data page to a clear data value without physically clearing data from the data page; and subsequent to the setting of the page initialize bit, physically clearing data from the page in central storage responsive to a first access to the page with the page initialize bit set to the clear data value, thereby minimizing overall time required to both clear and subsequently access cleared page data. Setting of the page initialize bit may include setting a line clear bit for each page line to the clear data value, and allocating a state machine to clear each line responsive to the line being first accessed with the its line clear bit set.06-16-2011
20120144157Allocation of Mainframe Computing Resources Using Distributed Computing - There is disclosed a system and method for allocation of mainframe computing resources using distributed computing. In particular, the present application is directed to a system whereby a mainframe process intended for execution on a metered processor may be identified as executable on a non-metered processor. Thereafter, the mainframe computer may initiate execution of the remote process on the remote non-metered processor. If necessary, high-speed access to data available to the metered processor is provided to the non-metered processor. The process operates directly on data available to the metered processor. Once completed, the process signals the mainframe computer that the process is complete. Both metered and non-metered processor configuration and management may be accomplished using the administrative interface.06-07-2012
20120079235APPLICATION SCHEDULING IN HETEROGENEOUS MULTIPROCESSOR COMPUTING PLATFORMS - Methods and apparatus to schedule applications in heterogeneous multiprocessor computing platforms are described. In one embodiment, information regarding performance (e.g., execution performance and/or power consumption performance) of a plurality of processor cores of a processor is stored (and tracked) in counters and/or tables. Logic in the processor determines which processor core should execute an application based on the stored information. Other embodiments are also claimed and disclosed.03-29-2012
20090083518Attaching and virtualizing reconfigurable logic units to a processor - In one embodiment, the present invention includes a pipeline to execute instructions out-of-order, where the pipeline has front-end stages, execution units, and back-end stages, and the execution units are coupled between dispatch ports of the front-end stages and writeback ports of the back-end stages. Further, a reconfigurable logic is coupled between one of the dispatch ports and one of the writeback ports. Other embodiments are described and claimed.03-26-2009
20090083517Lockless Processing of Command Operations in Multiprocessor Systems - A beltway mechanism that takes advantage of atomic locking mechanisms supported by certain classes of hardware processors to handle the tasks that require atomic access to data structures while also reducing the overhead associated with these atomic locking mechanisms. The beltway mechanisms described herein can be used to control access to software and hardware facilities in an efficient manner.03-26-2009
20080263320Executing a Scatter Operation on a Parallel Computer - Executing a scatter operation on a parallel computer includes: configuring a send buffer on a logical root, the send buffer having positions, each position corresponding to a ranked node in an operational group of compute nodes and for storing contents scattered to that ranked node; and repeatedly for each position in the send buffer: broadcasting, by the logical root to each of the other compute nodes on a global combining network, the contents of the current position of the send buffer using a bitwise OR operation, determining, by each compute node, whether the current position in the send buffer corresponds with the rank of that compute node, if the current position corresponds with the rank, receiving the contents and storing the contents in a reception buffer of that compute node, and if the current position does not correspond with the rank, discarding the contents.10-23-2008
20090063816System and Method for Performing Collective Operations Using Software Setup and Partial Software Execution at Leaf Nodes in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.03-05-2009
20090100248Hierarchical System, and its Management Method and Program - A lower system structure reports performance information to an upper system structure. When detecting performance deterioration of the system structure on the basis of the reported performance information, the upper system structure optimizes resource redistribution of the system structure that the upper system structure manages. If the performance is improved by the optimization in the managed system structure, the optimization results is applied to the resource control of the lower system structure, and the lower system structure redistributes the resources according to the resource control. If the performance is not improved by the optimization, the lower system structure reports the performance information to the upper system structure, which optimizes the resource redistribution.04-16-2009
20110145545COMPUTER-IMPLEMENTED METHOD OF PROCESSING RESOURCE MANAGEMENT - A computer-implemented method for managing processing resources of a computerized system having at least a first processor and a second processor, each of the processors operatively interconnected to a memory storing a set of data to be processed by a processor, the method comprising: monitoring data accessed by the first processor while executing; and if the second processor is at a shorter distance than the first processor from the monitored data, instructing to interrupt execution at the first processor and resume the execution at the second processor.06-16-2011
20090144524Method and System for Handling Transaction Buffer Overflow In A Multiprocessor System - There is disclosed a method and apparatus for handling transaction buffer overflow in a multi-processor system as well as a transaction memory system in a multi-processor system. The method comprises the steps of: when overflow occurs in a transaction buffer of one processor, disabling peer processors from entering transactions, and waiting for any processor having a current transaction to complete its current transaction; re-executing the transaction resulting in the transaction buffer overflow without using the transaction buffer; and when the transaction execution is completed, enabling the peer processors for entering transactions.06-04-2009
20100180101Method for Executing One or More Programs on a Multi-Core Processor and Many-Core Processor - The invention relates to a method for executing computer usable program code or a program made up of program parts on a multi-core processor (07-15-2010
20130219148NETWORK ON CHIP PROCESSOR WITH MULTIPLE CORES AND ROUTING METHOD THEREOF - An exemplary embodiment of the present disclosure illustrates a network on chip processor including multiple cores and a Kautz NoC. Each of the cores is assigned with an addressing string with L based-D words, and the addressing string does not have two neighboring identical words, wherein L present of an addressing string length is an integer larger than 1, D present of a word selection is an integer larger than 2. Each of the cores is unidirectionally link to other (D−1) cores through the Kautz NoC, and in the two connected cores, the last (L−1) words associated with the addressing string of one core are same as the first (L−1) words associated with the addressing string of the other core.08-22-2013
20100191933APPARATUS FOR PROCESSING DATA AND METHOD FOR GENERATING MANIPULATED AND RE-MANIPULATED CONFIGURATION DATA FOR PROCESSOR - Some embodiments comprise an apparatus for processing data, the apparatus having a second configurable processor configured to process data using second configuration data, and a configuration data re-manipulator configured to retrieve manipulated second configuration data and first data of a first processor, to re-manipulate the manipulated second configuration data depending on the first data, and to feed the re-manipulated second configuration data to the second configurable processor as the second configuration data.07-29-2010
20100250899DISTRIBUTED PROCESSING SYSTEM - A distributed processing system includes a plurality of processing elements each having one or more inputs and one or more outputs, and a control unit to which the plurality of processing elements are connected, wherein based on a service execution request from a client, the control unit creates execution transition information in which the processing elements that are necessary to execute a specific service and an order of execution are specified.09-30-2010
20100241829HARDWARE SWITCH AND DISTRIBUTED PROCESSING SYSTEM - A hardware switch to which a plurality of processing elements are connected, wherein for sending side processing elements and receiving side processing elements different from the sending side processing elements selected from among the plurality of processing elements, the hardware switch interconnects one output selected from outputs that the sending side processing elements have and one input selected from inputs that the receiving side processing elements have, thereby selectively switching paths between the plurality of processing elements, and at least one of the number of outputs of the sending side processing element connected to the hardware switch and the number of inputs of the receiving side processing elements connected to the hardware switch is more than one.09-23-2010
20100235610PROCESSING SYSTEM, PROCESSING APPARATUS AND COMPUTER READABLE MEDIUM - A processing apparatus includes: an operation detection unit that detects an operation; a request unit that requests other processing apparatuses to transmit functions when the operation is detected by the operation detection unit; a receiving unit that receives replies in response to the requests of the request unit from the at least one of the other processing apparatuses; a selection unit that selects at least one of the other processing apparatuses from which the receiving unit has received the replies; and a communication unit that performs communication with the at least one of the other processing apparatuses selected by the selection unit.09-16-2010
20100241827High Level Programming Extensions For Distributed Data Parallel Processing - General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.09-23-2010
20100241828General Distributed Reduction For Data Parallel Computing - General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.09-23-2010
20100199069SCHEDULER OF RECONFIGURABLE ARRAY, METHOD OF SCHEDULING COMMANDS, AND COMPUTING APPARATUS - A scheduler of a reconfigurable array, a method of scheduling commands, and a computing apparatus are provided. To perform a loop operation in a reconfigurable array, a recurrence node, a producer node, and a predecessor node are detected from a data flow graph of the loop operation such that resources are assigned to such nodes so as to increase the loop operating speed. Also, a dedicated path having a fixed delay may be added to the assigned resources.08-05-2010
20110113219Computer Architecture for a Mobile Communication Platform - A system includes first and second processors, first and second graphics processing units (GPUs), one or more peripheral devices, a switch matrix, and processor-readable memory. The switch matrix comprises programmable data paths between the processors, the GPUs, and the peripheral devices. Software encoded in the process-readable memory includes a first operating system (OS) executed by the first processor, a second OS executed by the second processor, a matrix scheduling engine, and a media interface switch (MIS) engine. The first OS boots faster than the second OS. The matrix scheduling engine runs on both OSs and configures the data paths in the switch matrix to couple the processors and the GPUs, and to couple the processors and the peripheral devices. The MIS engine runs on the operating systems, detects presence of the peripheral devices, and configures the data paths in the switch matrix to couple the processors and the peripheral devices.05-12-2011
20100268915Remote Update Programming Idiom Accelerator with Allocated Processor Resources - A data processing system comprises at least one processing unit, a virtualization layer, and a remote update programming idiom accelerator. The remote update programming idiom accelerator is configured to receive a complex remote update programming idiom from a remote node. Responsive to a determination that the sequence of instructions in the complex remote update programming idiom is longer than a dedicated processor threshold, the remote update programming idiom accelerator is configured to request a processing unit from the virtualization layer in the data processing system, and receive an allocation of a processing unit from the virtualization layer. The allocated processing unit is configured to read the data from the storage location local to the data processing system, execute the sequence of instructions to perform the update operation on the data to form result data, and write the result data to the storage location local to the data processing system.10-21-2010
20090210656METHOD AND SYSTEM FOR OVERLAPPING EXECUTION OF INSTRUCTIONS THROUGH NON-UNIFORM EXECUTION PIPELINES IN AN IN-ORDER PROCESSOR - A system and method for overlapping execution (OE) of instructions through non-uniform execution pipelines in an in-order processor are provided. The system includes a first execution unit to perform instruction execution in a first execution pipeline. The system also includes a second execution unit to perform instruction execution in a second execution pipeline, where the second execution pipeline includes a greater number of stages than the first execution pipeline. The system further includes an instruction dispatch unit (IDU), the IDU including OE registers and logic for dispatching an OE-capable instruction to the first execution unit such that the instruction completes execution prior to completing execution of a previously dispatched instruction to the second execution unit. The system additionally includes a latch to hold a result of the execution of the OE-capable instruction until after the second execution unit completes the execution of the previously dispatched instruction.08-20-2009
20080270754USING FIELD PROGRAMMABLE GATE ARRAY (FPGA) TECHNOLOGY WITH A MICROPROCESSOR FOR RECONFIGURABLE, INSTRUCTION LEVEL HARDWARE ACCELERATION - A method for dynamically programming Field Programmable Gate Arrays (FPGAs) in a coprocessor, the coprocessor coupled to a processor, includes: beginning an execution of an application by the processor; receiving an instruction from the processor to the coprocessor to perform a function for the application; determining that the FPGA in the coprocessor is not programmed with logic for the function; fetching a configuration bit stream for the function; and programming the FPGA with the configuration bit stream. In this manner, the FPGA are programmable “on the fly”, i.e., dynamically during the execution of an application. The hardware acceleration and resource sharing advantages provided by the FPGA can be utilized more often by the application. Logic flexibility and space savings on the chip comprising the coprocessor and processor are provided as well.10-30-2008
20110238951IMAGE FORMING APPARATUS, IMAGE FORMING SYSTEM, AND INFORMATION GENERATING METHOD - An image forming apparatus includes: 09-29-2011
20090063815System and Method for Providing Full Hardware Support of Collective Operations in a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for performing collective operations. In hardware of a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In hardware of the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.03-05-2009
20110113221Data Sharing in Chip Multi-Processor Systems - System, computer readable medium and method for providing transparent access to shared data (05-12-2011
20100325390IMAGE PROCESSING APPARATUS, PROCESSING UNIT, AND IP ADDRESS MANAGING METHOD - An image processing apparatus includes connectors to each of which position information is allocated, processing units configured to be connected to the connectors, each of the processing units is configured to read position information, and to output an IP address of the processing unit determined based on the position information and identification information which denotes a function of the processing unit via the connector, and a control unit configured to be connected with the connectors in compliance with a standard for a transmission line in an IP (internet protocol) network, and to manage the IP address and the identification information of the processing unit.12-23-2010
20130138920METHOD AND APPARATUS FOR PACKET PROCESSING AND A PREPROCESSOR - An apparatus for packet processing is provided. The apparatus is to be implemented in a server and includes: a preprocessor and at least two processors which are respectively connected with the preprocessor. The preprocessor is to classify packets received externally from the server, and to distribute the classified packets to the respective processors, wherein packets in a same flow are distributed to a same processor. Each of the processors is to receive and process a packet distributed by the preprocessor.05-30-2013
20110113220MULTIPROCESSOR - Provided is a multiprocessor capable of executing a plurality of threads without decreasing execution efficiency.05-12-2011
20100332796Method and System for a CPU-Local Storage Mechanism - Described herein are systems and methods for implementing a processor-local (e.g., a CPU-local) storage mechanism. An exemplary system includes a plurality of processors executing an operating system, the operating system including a processor local storage mechanism, wherein each processor accesses data unique to the processor based on the processor local storage mechanism. Each of the plurality of processors of the system may have controlled access to the resource and each of the processors is dedicated to one of a plurality of tasks of an application. The application including the plurality of tasks may be replicated using the processor local storage mechanism, wherein each of the tasks of the replicated application includes an affinity to one of the plurality of processors.12-30-2010
20090327654Method of Handling Duplicate or Invalid Node Controller IDs in a Distributed Service Processor Environment - A method for enabling a Node Controller (NC), which claims a duplicate or invalid service processor Node Controller Identification (NCID) in a distributed service processor system, to be integrated into the system includes reading an NCID by the NC after the NC is booted, saving the NCID into a non-volatile storage and broadcasting an NC Present Message (NPM) to a Service Processor (SC) repeatedly until the SC initiates communication, updating the NCID for the NC in the non-volatile storage when the NC receives an NCID change message from the SC and rating any future NPM as a new NCID, and checking a record of an new NC in the non-volatile storage when the SC receives the NPM from the NC. If the SC has a record of a recorded NC with the same NCID as the new NC, then the SC checks its role as a primary SC. If the SC does not have the record of the recorded NC with the same NCID as the new NC, then the SC checks validity of the NCID.12-31-2009
20100100706MULTIPLE PROCESSOR SYSTEM, SYSTEM STRUCTURING METHOD IN MULTIPLE PROCESSOR SYSTEM AND PROGRAM THEREOF - For flexibly setting up an execution environment according to contents of processing to be executed while taking stability or a security level into consideration, the multiple processor system includes the execution environment main control unit 04-22-2010
20100131740DATA PROCESSING SYSTEM AND DATA PROCESSING METHOD - The workload is heavy in the development of an application program that controls the task distribution in consideration of the variety of the execution environment. In a system where the processing is distributed to SPUs serving as plural processing entities so as to execute the computer program, the data processing is broken into plural units of processing by referring to the script code in which the content of the data processing is written, and the units of processing are assigned to the plural SPUs. Then, the whole computer program is executed when the SPUs execute the assigned process.05-27-2010
20100070740System and Method for Dynamic Dependence-Based Parallel Execution of Computer Software - A method of dynamic parallelization in a multi-processor identifies potentially independent computational operations, such as functions and methods, with a serializer that assigns a computational operation to a serialization set and a processor based on assessment of the data that the computational operation will be accessing upon execution.03-18-2010
20090216997DYNAMICALLY MANAGING THE COMMUNICATION-PARALLELISM TRADE-OFF IN CLUSTERED PROCESSORS - In a processor having multiple clusters which operate in parallel, the number of clusters in use can be varied dynamically. At the start of each program phase, the configuration option for an interval is run to determine the optimal configuration, which is used until the next phase change is detected. The optimum instruction interval is determined by starting with a minimum interval and doubling it until a low stability factor is reached.08-27-2009
20100058029Invoking Multi-Library Applications on a Multiple Processor System - A mechanism is provided for invoking a multi-library application on a multiple processor system, wherein the multiple processor system comprises a Power Processing Element (PPE) and a plurality of Synergistic Processing Element (SPE). Applications including multi-libraries run in the memory of the PPE. The mechanism comprises maintaining the status of each SPE in the application running on the PPE, where there are SPE agents for capturing the instructions from the PPE in the SPEs that have been started. In response to a request for invoking a library, the PPE determines whether the number of available SPEs for invoking the library is adequate based on the current status of SPEs. If the number of available SPEs is adequate, the PPE sends a run instruction to selected SPEs. After finishing the invocation of all libraries, the PPE sends termination instructions to all started SPEs. IBM confidential03-04-2010
20100250898PROCESSING ELEMENT AND DISTRIBUTED PROCESSING UNIT - A general-purpose processing element has a program holding portion that can hold a program by which a specific function is implemented in the general-purpose processing element. A distributed processing system according to the invention includes a control unit, a plurality of processing elements connected to the control unit, and a client, wherein the plurality of processing elements include the above-described processing element.09-30-2010
20120303933 TILE-BASED PROCESSOR ARCHITECTURE MODEL FOR HIGH-EFFICIENCY EMBEDDED HOMOGENEOUS MULTICORE PLATFORMS - The present invention relates to a processor which comprises processing elements that execute instructions in parallel and are connected together with point-to-point communication links called data communication links (DCL). The instructions use DCLs to communicate data between them. In order to realize those communications, they specify the DCLs from which they take their operands, and the DCLs to which they write their results. The DCLs allow the instructions to synchronize their executions and to explicitly manage the data they manipulate. Communications are explicit and are used to realize the storage of temporary variables, which is decoupled from the storage of long-living variables.11-29-2012
20120303932RUNTIME RECONFIGURABLE DATAFLOW PROCESSOR - A processor includes a plurality of processing tiles, wherein each tile is configured at runtime to perform a configurable operation. A first subset of tiles are configured to perform in a pipeline a first plurality of configurable operations in parallel. A second subset of tiles are configured to perform a second plurality of configurable operations in parallel with the first plurality of configurable operations. The process also includes a multi-port memory access module operably connected to the plurality of tiles via a data bus configured to control access to a memory and to provide data to two or more processing tiles simultaneously. The processor also includes a controller operably connected to the plurality of tiles and the multi-port memory access module via a runtime bus. The processor configures the tiles and the multi-port memory access module to execute a computation.11-29-2012
20110078412Processor Core Stacking for Efficient Collaboration - A mechanism is provided for improving the performance and efficiency of multi-core processors. A system controller in a data processing system determines an operational function for each primary processor core in a set of primary processor cores in a primary processor core logic layer and for each secondary processor core in a set of secondary processor cores in a secondary processor core logic layer, thereby forming a set of determined operational functions. The system controller then generates an initial configuration, based on the set of determined operational functions, for initializing the set of primary processor cores and the set of secondary processor cores in the three-dimensional processor core architecture. The initial configuration indicates how at least one primary processor core of the set of primary processor cores collaborate with at least one secondary processor core of the set of secondary processor cores.03-31-2011
20110078411DYNAMICALLY MODIFYING PROGRAM EXECUTION CAPACITY - Techniques are described for managing program execution capacity, such as for a group of computing nodes that are provided for executing one or more programs for a user. In some situations, dynamic program execution capacity modifications for a computing node group that is in use may be performed periodically or otherwise in a recurrent manner, such as to aggregate multiple modifications that are requested or otherwise determined to be made during a period of time, and with the aggregation of multiple determined modifications being able to be performed in various manners. Modifications may be requested or otherwise determined in various manners, including based on dynamic instructions specified by the user, and on satisfaction of triggers that are previously defined by the user. In some situations, the techniques are used in conjunction with a fee-based program execution service that executes multiple programs on behalf of multiple users of the service.03-31-2011
20110078410EFFICIENT PIPELINING OF RDMA FOR COMMUNICATIONS - Disclosed are a method of and system for multiple party communications in a processing system including multiple processing subsystems. Each of the processing subsystems includes a central processing unit and one or more network adapters for connecting said each processing subsystem to the other processing subsystems. A multitude of nodes are established or created, and each of these nodes is associated with one of the processing subsystems. A first aspect of the invention involves pipelined communication using RDMA among three nodes, where the first node breaks up a large communication into multiple parts and sends these parts one after the other to the second node using RDMA, and the second node in turn absorbs and forwards each of these parts to a third node before all parts of the communication arrive from the first node.03-31-2011
20110072240Self-Similar Processing Network - Self-similar processing by unit processing cells may together solve a problem. A unit processing cell may include a processor, a memory and a plurality of Input/Output (IO) channels coupled to the processor. The memory may include a dictionary having one or more instructions that configure the processor to perform at least one function. The plurality of IO channels may be used to communicably couple the unit processing cell with a plurality of other unit processing cells each including their own respective dictionary. The unit processing cell and the plurality of other unit processing cells may be independent of one another and may perform together without a centralized control. The processor may update the dictionary so that the unit processing cell builds a different dictionary from the plurality of other unit processing cells, thereby being self-similar to the plurality of other unit processing cells.03-24-2011
20110252219INFORMATION PROCESSING APPARATUS - According to an aspect of the present invention, there is provided an information processing apparatus including: a first processor; a second processor that has an information processing capability and a power consumption higher than those of the first processor; a temperature monitoring module configured to acquire an operating temperature of the second processor; a throttle number determination module configured to determine whether the throttling control is performed a given number of times or more within a given time interval; and a processor switching control module configured to perform, when the operating temperature of the second processor is equal to or higher than a given temperature: stopping an operation of the second processor; causing the first processor to perform an information process; and prohibiting the operation of the second processor.10-13-2011
20110161627MECHANISMS TO AVOID INEFFICIENT CORE HOPPING AND PROVIDE HARDWARE ASSISTED LOW-POWER STATE SELECTION - An apparatus and method is described herein for avoiding inefficient core hopping and providing hardware assisted power state selection. Future idle-activity of cores is predicted. If the residency of activity patterns for efficient core hop scenarios is predicted to be large enough, a core is determined to be efficient and allowed. However, if efficient activity patterns are not predicted to be resident for long enough—inefficient patterns are instead predicted to be resident for longer—then a core hop request is denied. As a result, designers may implement a policy for avoiding core hops that weighs the potential gain of the core hop, such as alleviation of a core hop condition, against a penalty for performing the core hop, such as a temporal penalty for the core hop. Separately, idle durations associated with hardware power states for cores may be predicted in hardware. Furthermore, accuracy of the idle duration prediction is determined. Upon receipt of a request for a core to enter a power state, a power management unit may select either the hardware predicted power state, if the accuracy is high enough, or utilize the requested power state, if the accuracy of the hardware prediction is not high enough.06-30-2011
20120204002Providing to a Parser and Processors in a Network Processor Access to an External Coprocessor - A mechanism is provided for sharing a communication used by a parser (parser path) in a network adapter of a network processor for sending requests for a process to be executed by an external coprocessor. The parser path is shared by processors of the network processor (software path) to send requests to the external processor. The mechanism uses for the software path a request mailbox comprising a control address and a data field accessed by MMIO for sending two types of messages, one message type to read or write resources and one message type to trigger an external process in the coprocessor and a response mailbox for receiving response from the external coprocessor comprising a data field and a flag field. The other processors of the network poll the flag until set and get the coprocessor result in the data field.08-09-2012
20090240916Fault Resilient/Fault Tolerant Computing - A fault tolerant/fault resilient computer system includes a first coserver and a second coserver. The first coserver includes a first application environment (AE) processor and a first I/O subsystem processor on a first common motherboard. The second coserver includes a second AE processor and a second I/O subsystem processor on a second common motherboard.09-24-2009
20090063814System and Method for Routing Information Through a Data Processing System Implementing a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for routing information through the data processing system. Data is received at a source processor within a set of processors that is to be transmitted to a destination processor, where the data includes address information. A first determination is performed as to whether the destination processor is within a same processor book as the source processor based on the address information. A second determination is performed as to whether the destination processor is within a same supernode as the source processor based on the address information if the destination processor is not within the same processor book. A routing path is identified for the data based on results of the first determination, the second determination, and one or more routing table data structures. The data is then transmitted from the source processor along the identified routing path toward the destination processor.03-05-2009
20080229063Processor Array with Separate Serial Module - A processor array has processor elements (09-18-2008
20080229062Method of sharing registers in a processor and processor - A method of sharing registers in a processor includes executing a data processing instruction so as to obtain a result of the data processing instruction, which is to be written into a register of the processor. Register sharing information is obtained so as to control writing of the result into the register and/or at least one further register of the processor.09-18-2008
20100325389Microprocessor communications system - A microprocessor communications system utilizes a combination of an activity status monitor register and one or more address select registers to read from a communications port of one processor and write to a communications port of an adjacent processor in a single instruction word loop. This circumvents the requirement to save and retrieve data and/or instructions from memory. A stack register selector contains a plurality of stack registers and a plurality of shift registers, which are interconnected. The stack registers are selected by the shift registers in such a way that the stack registers operate in a circular repeating pattern, which prevents overflow and underflow of stacks.12-23-2010
20100281237INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM - In an information processing apparatus in which data processing is performed in a predetermined sequence by processing modules connected to a ring bus, if an amount of data generated by input data in the ring bus is not considered, the data amount exceeds an amount of data that can be held by the processing modules on the ring bus, and a data collision often occurs, so that processing efficiency of the ring bus deteriorates. An amount of data input into the ring bus is controlled so that the total sum of data amounts output to the ring bus from processing units used for processing does not exceed a maximum amount of data that can be held by the processing modules on the ring bus.11-04-2010
20110179252 METHOD AND APPARATUS FOR A GENERAL-PURPOSE, MULTIPLE-CORE SYSTEM FOR IMPLEMENTING STREAM-BASED COMPUTATIONS - A method and system of efficient use and programming of a multi-processing core device. The system includes a programming construct that is based on stream-domain code. A programmable core based computing device is disclosed. The computing device includes a plurality of processing cores coupled to each other. A memory stores stream-domain code including a stream defining a stream destination module and a stream source module. The stream source module places data values in the stream and the stream conveys data values from the stream source module to the stream destination module. A runtime system detects when the data values are available to the stream destination module and schedules the stream destination module for execution on one of the plurality of processing cores.07-21-2011
20110060889METHOD, SYSTEM AND COMPUTER-ACCESSIBLE MEDIUM FOR PROVIDING A DISTRIBUTED PREDICATE PREDICTION - Examples of a system, method and computer accessible medium are provided to generate a predicate prediction for a distributed multi-core architecture. Using such system, method and computer accessible medium, it is possible to intelligently encode approximate predicate path information on branch instructions. Using this statically generated information, distributed predicate predictors can generate dynamic predicate histories that can facilitate an accurate prediction of high-confidence predicates, while minimizing the communication between the cores.03-10-2011
20120311301PIPELINE CONFIGURATION PROTOCOL AND CONFIGURATION UNIT COMMUNICATION - In a method of synchronizing data processing of processor arrangement, responsive to reaching, during execution of a program, a barrier included in a program sequence, the processor arrangement halts the program execution until it is determined that all instructions preceding the barrier in the program sequence have been successfully scheduled for execution.12-06-2012
20120311300MULTIPROCESSOR SYNCHRONIZATION USING REGION LOCKS - Disclosed is a method of synchronizing a plurality of processors accesses to at least one shared resource. One of a plurality of processors requests an exclusive region lock for a shared resource using a logical block address (LBA) of a dummy target. The LBA is defined in a region map that associates LBAs to shared resources. The exclusive region lock request is inserted as a node in a region lock tree of the dummy target. Access to the shared resource is granted based on a determination whether there is an existing region lock in the region lock tree that is overlapps with the new exclusive region lock request.12-06-2012
20100125717Synchronization Controller For Multiple Multi-Threaded Processors - A gated-storage system including multiple control interfaces, each control interface operatively connected externally to respective multithreaded processors. The multithreaded processors each have a thread context running an active thread so that multiple thread contexts are running on the multithreaded processors. A memory is connected to a system-level inter-thread communications unit and shared between the multithreaded processors. The thread contexts request access to the memory by communicating multiple access requests over the control interfaces. The access requests are from any of the thread contexts within any of the multithreaded processors. A single request storage is shared by the multithreaded processors. A controller stores the access requests in the single request storage within a single clock cycle.05-20-2010
20100031003METHOD AND APPARATUS FOR PARTITIONING AND SORTING A DATA SET ON A MULTI-PROCESSOR SYSTEM - The present invention provides a method and apparatus for partitioning, sorting a data set on a multi-processor system. Herein, the multi-processor system has at least one core processor and a plurality of accelerators. The method for partitioning a data set comprises: partitioning iteratively said data set into a plurality of buckets corresponding to different data ranges by using said plurality of accelerators in parallel, wherein each of the plurality of buckets could be stored in local storage of said plurality of accelerators; wherein in each iteration, the method comprises: roughly partitioning said data set into a plurality of large buckets; obtaining parameters of said data set that can indicate the distribution of data values in that data set; determining a plurality of data ranges for said data set based on said parameters; and partitioning said plurality of large buckets into a plurality of small buckets corresponding to the plurality of data ranges respectively by using said plurality of accelerators in parallel, wherein each of said plurality of accelerators, for each element in the large bucket it is partitioning, determines a data range to which that element belongs among the plurality of data ranges by computation.02-04-2010
20100017579Program-Controlled Unit and Method for Operating Same - A method for operating a program-controlled unit has two redundantly operable microprocessor cores and a comparator unit provided downstream from the two microprocessor cores. One working register having a different content is provided in each of the two microprocessor cores for the redundant operation, and the content of these working registers is fed to the downstream comparator unit in order to verify whether the comparator unit signals a difference.01-21-2010
20120042150MULTIPROCESSOR SYSTEM-ON-A-CHIP FOR MACHINE VISION ALGORITHMS - A multiprocessor system includes a main memory and multiple processing cores that are configured to execute software that uses data stored in the main memory. In some embodiments, the multiprocessor system includes a data streaming unit, which is connected between the processing cores and the main memory and is configured to pre-fetch the data from the main memory for use by the multiple processing cores. In some embodiments, the multiprocessor system includes a scratch-pad processing unit, which is connected to the processing cores and is configured to execute, on behalf of the multiple processing cores, a selected part of the software that causes two or more of the processing cores to access concurrently a given item of data.02-16-2012
20090094438OVER-PROVISIONED MULTICORE PROCESSOR - An over-provisioned multicore processor employs more cores than can simultaneously run within the power envelope of the processor, enabling advanced processor control techniques for more efficient workload execution, despite significantly decreasing the duty cycle of the active cores so that on average a full core or more may not be operating.04-09-2009
20090172353SYSTEM AND METHOD FOR ARCHITECTURE-ADAPTABLE AUTOMATIC PARALLELIZATION OF COMPUTING CODE - Systems and methods for architecture-adaptable automatic parallelization of computing code are described herein. In one aspect, embodiments of the present disclosure include a method of generating a plurality of instruction sets from a sequential program for parallel execution in a multi-processor environment, which may be implemented on a system, of, identifying an architecture of the multi-processor environment in which the plurality of instruction sets are to be executed, determining running time of each of a set of functional blocks of the sequential program based on the identified architecture, determining communication delay between a first computing unit and a second computing unit in the multi-processor environment, and/or assigning each of the set of functional blocks to the first computing unit or the second computing unit based on the running times and the communication time.07-02-2009
20120047350CONTROLLING SIMD PARALLEL PROCESSORS - A processing apparatus for processing source code comprising a plurality of single line instructions to implement a desired processing function is described. The processing apparatus comprises:02-23-2012
20090164755Optimizing Execution of Single-Threaded Programs on a Multiprocessor Managed by Compilation - A method for optimizing execution of a single threaded program on a multi-core processor. The method includes dividing the single threaded program into a plurality of discretely executable components while compiling the single threaded program; identifying at least some of the plurality of discretely executable components for execution by an idle core within the multi-core processor; and enabling execution of the at least one of the plurality of discretely executable components on the idle core.06-25-2009
20120159121PARALLEL COMPUTER SYSTEM, SYNCHRONIZATION APPARATUS, AND CONTROL METHOD FOR THE PARALLEL COMPUTER SYSTEM - A synchronization apparatus includes a receiver that receives data from a synchronization apparatus of another node that performs synchronization with its own node from among the plurality of synchronization apparatuses and extracts synchronization information from the received data, a transmitter that transmits the data to the synchronization apparatus of the other node, a receiving state register that stores the extracted synchronization information, a delay unit that delays the received data by a specified period of time, and a controller that stores the extracted synchronization information and synchronization information from its own controller in the reception state register and causes the transmitter to transmit the data to the other node and returns the data to its own node back to its own controller via the delay unit when the extracted synchronization information and the synchronization information from its own controller are stored in the reception state register.06-21-2012
20120124333ADAPTIVE INTEGRATED CIRCUITRY WITH HETEROGENEOUS AND RECONFIGURABLE MATRICES OF DIVERSE AND ADAPTIVE COMPUTATIONAL UNITS HAVING FIXED, APPLICATION SPECIFIC COMPUTATIONAL ELEMENTS - The present invention concerns a new category of integrated circuitry and a new methodology for adaptive or reconfigurable computing. The preferred IC embodiment includes a plurality of heterogeneous computational elements coupled to an interconnection network. The plurality of heterogeneous computational elements include corresponding computational elements having fixed and differing architectures, such as fixed architectures for different functions such as memory, addition, multiplication, complex multiplication, subtraction, configuration, reconfiguration, control, input, output, and field programmability. In response to configuration information, the interconnection network is operative in real-time to configure and reconfigure the plurality of heterogeneous computational elements for a plurality of different functional modes, including linear algorithmic operations, non-linear algorithmic operations, finite state machine operations, memory operations, and bit-level manipulations. The various fixed architectures are selected to comparatively minimize power consumption and increase performance of the adaptive computing integrated circuit, particularly suitable for mobile, hand-held or other battery-powered computing applications.05-17-2012
20090132787Runtime Instruction Decoding Modification in a Multi-Processing Array - A method and system for decoding and modifying processor instructions in runtime according to certain rules in order to separately control processing elements embedded within a multi-processor array using a single instruction. The present invention allows multiple processing elements and/or execution units in a multi-processor array to perform different operations, based upon a variable or variables such as their location in the multi-processor array, while accepting a single instruction as an input.05-21-2009
20120317399Performing A Local Reduction Operation On A Parallel Computer - A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.12-13-2012
20120317398METHOD FOR REDUCING BUFFER CAPACITY IN A PIPELINE PROCESSOR - A method to reduce buffer capacity in a processor includes giving the data packets admittance to the processor through at least one interface, storing the data packets in at least one input buffer, and using a packet rate shaper outside of a processing pipeline to control flow of the data packets to the pipeline before the data packets enter the pipeline. First and second data packets are given admittance to the pipeline in dependence on cost information per packet that is dependent upon an expected time period of residence of the first data packet in the pipeline. Cost information dependent upon an expected time period of residence of the second data packet in the pipeline differs from said cost information dependent upon the expected time period of residence of the first data packet in the pipeline.12-13-2012
20120216017PARALLEL COMPUTING APPARATUS AND PARALLEL COMPUTING METHOD - Computational unit area selecting units, each of which is provided in individual multiple cores, sequentially select uncomputed computational unit areas in a computational area. Computing units, each of which is provided in the individual multiple cores, perform computation for the selected computational unit areas. In addition, the computing units write computational results in a memory device which is accessible from each of the multiple cores. Computational result transmitting unit of the core performs computational result acquisition and transmission processing in a different time period with respect to each of multiple computational result transmission areas. The computational result acquisition processing is for acquiring, from the memory device, computational results related to the computational result transmission areas.08-23-2012
20120216016INSTRUCTION SCHEDULING APPROACH TO IMPROVE PROCESSOR PERFORMANCE - A processor instruction scheduler comprising an optimization engine which uses an optimization model for a processor architecture with: means to generate an optimization model for the optimization engine from a design of a processor and data representing optimization goals and constraints and a code stream, wherein the processor has at least two execution pipes and at least two registers, and wherein the design comprises data for processor instruction latency and execution pipes, and wherein the code stream comprises processor instructions with corresponding register selections; and reordering means to generate an optimized code stream from the code stream with the optimal solution provided by the optimization engine for the optimization model by reordering the code stream, such that optimum values for the optimization goals under the given constraints are achieved without affecting the operation results of the code stream.08-23-2012
20110185154SYNCHRONIZATION OF MULTIPLE PROCESSOR CORES - The invention relates to a spinlock-based multi-core synchronization technique in a real-time environment, wherein multiple processor cores perform spinning attempts to request a lock and the lock is allocated to at most one of the multiple cores for a mutually exclusive operation thereof. A method embodiment of the technique comprises the steps of allocating the lock to the first core requesting it; establishing for each core an indication of a waiting time for receiving the lock; selecting at least one of the spinning cores based on the waiting time indications; and, upon return of the lock, conditionally allocating the lock to the selected core, if the selected core performs a spinning attempt within a predefined time window starting with the return of the lock.07-28-2011
20110185153SIMULTANEOUS EXECUTION RESUMPTION OF MULTIPLE PROCESSOR CORES AFTER CORE STATE INFORMATION DUMP TO FACILITATE DEBUGGING VIA MULTI-CORE PROCESSOR SIMULATOR USING THE STATE INFORMATION - A multi-core microprocessor includes first and second processing cores and a bus coupling the first and second processing cores. The bus conveys messages between the first and second processing cores. The cores are configured such that: the first core stops executing user instructions and interrupts the second core via the bus, in response to detecting a predetermined event; the second core stops executing user instructions, in response to being interrupted by the first core; each core outputs its state after it stops executing user instructions; and each core waits to begin fetching and executing user instructions until it receives a notification from the other core via the bus that the other core is ready to begin fetching and executing user instructions. In one embodiment, the predetermined event comprises detecting that the first core has retired a predetermined number of instructions. In one embodiment, microcode waits for the notification.07-28-2011
20120185672LOCAL-ONLY SYNCHRONIZING OPERATIONS - Performing a series of successive synchronizing operations by a core on data shared by a plurality of cores may include a first core indicating an upcoming synchronizing operation on shared data. A second memory layer stores the shared data and tracks the first core's ownership of the shared data. The second memory layer is shared via coherency operations among the first core and one or more second cores. The first core may perform one or more synchronization operations on the shared data without requiring interaction from the second memory layer.07-19-2012
20120185673RECONFIGURABLE PROCESSOR USING POWER GATING, COMPILER AND COMPILING METHOD THEREOF - Provided is a reconfigurable processor that may process a first type of operation in first mode using a first group of functional units, and process a second type of operation in second mode using a second group of functional units. The reconfigurable processor may selectively supply power to either the first group or the second group, in response to a mode-switch signal or a mode-switch instruction.07-19-2012
20110004740DATA TRANSFER APPARATUS, INFORMATION PROCESSING APPARATUS AND METHOD OF SETTING DATA TRANSFER RATE - A method of setting transfer rate for information processing apparatus having a plurality of processing apparatus including a processor outputting data and connected by one or a plurality of data transfer apparatuses for transferring the data outputted from the processor, the method includes obtaining a dividing information indicating a manner of dividing the information processing apparatus into a plurality of partitions including at least one of the plurality of processing apparatuses, and setting a transfer rate of each partition for broadcasting data to all of the processors included in the plurality of processing apparatuses in each partition based on the obtained dividing information.01-06-2011
20110047354Processor Cluster Architecture and Associated Parallel Processing Methods - A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.02-24-2011
20110238950Performing A Scatterv Operation On A Hierarchical Tree Network Optimized For Collective Operations - Performing a scattery operation on a hierarchical tree network optimized for collective operations including receiving, by the scattery module installed on the node, from a nearest neighbor parent above the node a chunk of data having at least a portion of data for the node; maintaining, by the scattery module installed on the node, the portion of the data for the node; determining, by the scattery module installed on the node, whether any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child; and sending, by the scattery module installed on the node, those portions of data to the nearest neighbor child if any portions of the data are for a particular nearest neighbor child below the node or one or more other nodes below the particular nearest neighbor child.09-29-2011
20110238949Distributed Administration Of A Lock For An Operational Group Of Compute Nodes In A Hierarchical Tree Structured Network - Distributed administration of a lock for an operational group of compute nodes in a hierarchical tree structured network including assigning the root node of the operational group to send acknowledgments for lock requests, the root lock administration module comprising a module of automated computing machinery; receiving a lock request assigned to a particular node from a child node; determining whether another request from another child is directly ahead in an acknowledgement queue; if a request from another child is directly ahead in the acknowledgement queue, putting the lock request for the particular node in the acknowledgement queue until the lock request directly ahead in the acknowledgement queue is satisfied and when the lock request ahead in the queue is satisfied, sending the particular node for whom the lock request is assigned a message acknowledging the particular node has the lock; and if a request from another child is not directly ahead in a queue, sending to the particular node for whom the lock request is assigned a message acknowledging that the particular node has the lock.09-29-2011
20120278590RECONFIGURABLE PROCESSING SYSTEM AND METHOD - A reconfigurable processor is provided. The reconfigurable processor includes a plurality of functional blocks configured to perform corresponding operations. The reconfigurable processor also includes one or more data inputs coupled to the plurality of functional blocks to provide one or more operands to the plurality of functional blocks, and one or more data outputs to provide at least one result outputted from the plurality of functional blocks. Further, the reconfigurable processor includes a plurality of devices configured to inter-connect the plurality of functional blocks such that the plurality of functional blocks are independently provided with corresponding operands from the data inputs and individual results from the plurality of functional blocks are independently feedback as operands to the plurality of functional blocks to carry out one or more operation sequences11-01-2012
20120278589STORAGE SYSTEM COMPRISING MULTIPLE MICROPROCESSORS AND METHOD FOR SHARING PROCESSING IN THIS STORAGE SYSTEM - The present invention provides a storage system in which each microprocessor is able to execute synchronous processing and asynchronous processing in accordance with the operating status of the storage system. Any one attribute, from among multiple attributes (operating modes) prepared beforehand, is set in each microprocessor in accordance with the operating status of the storage system. The attribute that is set in each microprocessor is regularly reviewed and changed.11-01-2012
20110264889SYSTEMS AND METHODS FOR PROCESSING DATA - Systems, methods, and an article of manufacture for the reduction in process load experienced by a primary processor when executing an application by dynamically reassigning portions of the application to one or more secondary processors are shown and described. A second processing unit is queried for one or more characteristics. One or more performance characteristics of the second processor are measured. A portion of the application can be reassigned to the second processing unit based on the queried characteristics and performance measurements.10-27-2011
20120089814Inter-Processor Protocol in a Multi-Processor System - In a multiprocessor system, a primary processor may store an executable image for a secondary processor. A communication protocol assists the transfer of an image header and data segment(s) of the executable image from the primary processor to the secondary processor. Messages between the primary processor and secondary processor indicate successful receipt of transferred data, termination of a transfer process, and acknowledgement of same.04-12-2012
20120089813COMPUTING APPARATUS BASED ON RECONFIGURABLE ARCHITECTURE AND MEMORY DEPENDENCE CORRECTION METHOD THEREOF - Provided are a computing apparatus based on a reconfigurable architecture and a memory dependence correction method thereof. In one general aspect, a computing apparatus has a reconfigurable architecture. The computing apparatus may include: a reconfiguration unit having processing elements configured to reconfigure data paths between one or more of the processing elements; a compiler configured to analyze instructions to generate reconfiguration information for reconfiguring one or more of the reconfigurable data paths; a configuration memory configured to store the reconfiguration information; and a processor configured to execute the instructions through the reconfiguration unit, and to correct at least one memory dependency among the processing elements.04-12-2012
20120331270Compressing Result Data For A Compute Node In A Parallel Computer - Compressing result data for a compute node in a parallel computer, the parallel computer including a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID.12-27-2012
20110320769PARALLEL COMPUTING DEVICE, INFORMATION PROCESSING SYSTEM, PARALLEL COMPUTING METHOD, AND INFORMATION PROCESSING DEVICE - A computing section is provided with a plurality of computing units and correlatively stores entries of configuration information that describes configurations of the plurality of computing units with physical configuration numbers that represent the entries of configuration information and executes a computation in a configuration corresponding to a designated physical configuration number. A status management section designates a physical configuration number corresponding to a status to which the computing section needs to advance the next time for the computing section and outputs the status to which the computing section needs to advance the next time as a logical status number that uniquely identifies the status to which the computing section needs to advance the next time in an object code. A determination section determines whether or not the computing section has stored an entry of configuration information corresponding to the status to which the computing section needs to advance the next time based on the logical status number that is output from the status management section. A rewriting section correlatively stores the entry of the configuration information and a physical configuration number corresponding to the entry of the configuration information in the computing section when the determination section determines that the computing section has not stored the entry of configuration information corresponding to the status to which the computing section needs to advance the next time.12-29-2011
20110320768METHOD OF, AND APPARATUS FOR, MITIGATING MEMORY BANDWIDTH LIMITATIONS WHEN PERFORMING NUMERICAL CALCULATIONS - There is provided a method of, and apparatus for, processing a computation on a computing device comprising at least one processor and a memory, the method comprising: storing, in said memory, plural copies of a set of data, each copy of said set of data having a different compression ratio and/or compression scheme; selecting a copy of said set of data; and performing, on a processor, a computation using said selected copy of said set of data. By providing such a method, different compression ratios and/or compression schemes can be selected as appropriate. For example, if high precision is required in a computation, a copy of the set of data can be chosen which has a low compression ratio at the expense of processing time and memory transfer time. In the alternative, if low precision is acceptable, then the speed benefits of a high compression ratio and/or lossy compression scheme may be utilised.12-29-2011
20110320767Parallelization of Online Learning Algorithms - Methods, systems, and media are provided for a dynamic batch strategy utilized in parallelization of online learning algorithms. The dynamic batch strategy provides a merge function on the basis of a threshold level difference between the original model state and an updated model state, rather than according to a constant or pre-determined batch size. The merging includes reading a batch of incoming streaming data, retrieving any missing model beliefs from partner processors, and training on the batch of incoming streaming data. The steps of reading, retrieving, and training are repeated until the measured difference in states exceeds a set threshold level. The measured differences which exceed the threshold level are merged for each of the plurality of processors according to attributes. The merged differences which exceed the threshold level are combined with the original partial model states to obtain an updated global model state.12-29-2011
20130013891METHOD AND APPARATUS FOR A HIERARCHICAL SYNCHRONIZATION BARRIER IN A MULTI-NODE SYSTEM - A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.01-10-2013
20130024659Executing An Instruction for Performing a Configuration Virtual Topology Change - In a logically partitioned host computer system comprising host processors (host CPUs) partitioned into a plurality of guest processors (guest CPUs) of a guest configuration, a perform topology function instruction is executed by a guest processor specifying a topology change of the guest configuration. The topology change preferably changes the polarization of guest CPUs, the polarization related to the amount of a host CPU resource is provided to a guest CPU.01-24-2013
20130024658MEMORY CONTROLLER AND SIMD PROCESSOR - Technology to suppress the drop in SIMD processor efficiency that occurs when exchanging two-dimensional data in a plurality of rectangular regions, between an external section and a plurality of processor elements in an SIMD processor, so that one rectangular region corresponds to one processor element. In the SIMD processor, an address storage unit in a memory controller is capable of setting N number of addresses Ai (i=1 through N) in an external memory by utilizing a control processor. A parameter storage unit is capable of setting a first parameter OSV, a second parameter W, and a third parameter L by utilizing a control processor. A data transfer unit executes the transfer of data between an external memory, and the buffers in N number of processor elements contained in the applicable SIMD processor, based on the contents of the address storage unit and the parameter storage unit.01-24-2013
20110246748Managing Sensor and Actuator Data for a Processor and Service Processor Located on a Common Socket - Illustrated is a system and method that includes a processor and service processor co-located on a common socket, the service processor to aggregate data from a distributed network of additional service processors and processors both of which are co-located on an additional common socket. The system and method also includes a first sensor to record the data from the processor. The system and method also includes a second sensor to record the data from a software stack. The system and method further includes a registry to store the data.10-06-2011
20130097407UNIFIED, WORKLOAD-OPTIMIZED, ADAPTIVE RAS FOR HYBRID SYSTEMS - A method, system, and computer program product for maintaining reliability in a computer system. In an example embodiment, the method includes managing workloads on a first processor with a first processor architecture by an agent process executing on a second processor with a second processor architecture. The method proceeds by activating redundant computation on the second processor by the agent process. The method continues by performing a same computation from a workload of the workloads at least twice. Finally, the method includes comparing results of the same computation. In this embodiment the first processor is coupled the second processor by a network, and the first processor architecture and second processor architecture are different architectures.04-18-2013
20130097406CLUSTER COMPUTING USING SPECIAL PURPOSE MICROPROCESSORS - In some embodiments, a computer cluster system comprises a plurality of nodes and a software package comprising a user interface and a kernel for interpreting program code instructions. In certain embodiments, a cluster node module is configured to communicate with the kernel and other cluster node modules. The cluster node module can accept instructions from the user interface and can interpret at least some of the instructions such that several cluster node modules in communication with one another and with a kernel can act as a computer cluster.04-18-2013
20130103928Method, Apparatus, And System For Optimizing Frequency And Performance In A Multidie Microprocessor - With the progress toward multi-core processors, each core is can not readily ascertain the status of the other dies with respect to an idle or active status. A proposal for utilizing an interface to transmit core status among multiple cores in a multi-die microprocessor is discussed. Consequently, this facilitates thermal management by allowing an optimal setting for setting performance and frequency based on utilizing each core status.04-25-2013
20130103927CHARACTERIZATION AND VALIDATION OF PROCESSOR LINKS - A processor link that couples a first processor and a second processor is selected for validation and a plurality of communication parameter settings associated with the first and the second processors is identified. The first and the second processors are successively configured with each of the communication parameter settings. One or more test data pattern(s) are provided from the first processor to the second processor in accordance with the communication parameter setting. Performance measurements associated with the selected processor link and with the communication parameter setting are determined based, at least in part, on the test data pattern as received at the second processor. One of the communication parameter settings that is associated with the highest performance measurements is selected. The selected communication parameter setting is applied to the first and the second processors for subsequent communication between the first and the second processors via the processor link.04-25-2013
20110219211CPU CORE UNLOCKING DEVICE APPLIED TO COMPUTER SYSTEM - A CPU core unlocking device applied to a computer system is provided. The core unlocking device includes a CPU having a plurality of signal terminals and a core unlocking executing unit having a plurality of GPIO ports connected with the corresponding signal terminals of the CPU. The GPIO ports of the core unlocking executing unit generate and transmit and transmit a combination of core unlocking signal to the signal terminals of the CPU to unlock the CPU core.09-08-2011
20110213950System and Method for Power Optimization - A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.09-01-2011
20130151814MULTI-CORE PROCESSOR - A multi-core processor includes a monitored processor core whose process result is to be monitored; a monitoring processor core group including two or more monitoring processors which can perform a process for monitoring the monitored processor core; an evaluating part configured to evaluate a processing load of the monitoring processor core group; and a controlling part configured to make the monitoring processor core group perform the process for monitoring the monitored processor core in a distributed manner if the processing load of the monitoring processor core group evaluated by the evaluating part is low, and make the monitoring processor of the monitoring processor core group perform the process for monitoring the monitored processor core if the processing load of the monitoring processor core group evaluated by the evaluating part is high, the monitoring processor performing a process whose priority is relatively low.06-13-2013
20130191613PROCESSOR CONTROL APPARATUS AND METHOD THEREFOR - Whether each of a plurality of processor cores is in a suspend state or operation state is detected. The processor utilization of a processor core of interest in the operation state is acquired. The number of processes assigned to the processor core of interest is obtained. The stop control or startup control of a processor core is performed based on the suspend state or operation state, the processor utilization, and the number of processes.07-25-2013
20130191612INTERFERENCE-DRIVEN RESOURCE MANAGEMENT FOR GPU-BASED HETEROGENEOUS CLUSTERS - Systems and methods are disclosed that share coprocessor resources between two or more applications in a computing cluster using a job selector to receive jobs from a job queue; a node selector coupled to the job selector; an off line profiler with an interference prediction model; a coprocessor dynamic interference detection module; and a coprocessor interference response module.07-25-2013
20120290815DATA PROCESSING APPARATUS AND DATA PROCESSING METHOD - A data processing apparatus causes multiple processors to carry out a first data process in parallel, and when storing the data processed in parallel in a storage unit, converts the addresses of the data into addresses in the storage unit based on the data cache size of the multiple processors and stores the data. The data stored in the storage unit is then read out, and a second data process is carried out on the read-out data.11-15-2012
20120297164VIRTUALIZATION IN A MULTI-CORE PROCESSOR (MCP) - This invention describes an apparatus, computer architecture, method, operating system, compiler, and application program products for MPEs as well as virtualization in a symmetric MCP. The disclosure is applied to a generic microprocessor architecture with a set (e.g., one or more) of controlling elements (e.g., MPEs) and a set of groups of sub-processing elements (e.g., SPEs). Under this arrangement, MPEs and SPEs are organized in a way that a smaller number MPEs control the behavior of a group of SPEs. The apparatus enables virtualized control threads within MPEs to be assigned to different groups of SPEs for controlling the same. The apparatus further includes a MCP coupled to a power supply coupled with cores to provide a supply voltage to each core (or core group) and controlling-digital elements and multiple instances of sub-processing elements.11-22-2012
20120079236SCALABLE AND PROGRAMMABLE PROCESSOR COMPRISING MULTIPLE COOPERATING PROCESSOR UNITS - A processor comprises a plurality of processor units arranged to operate concurrently and in cooperation with one another, and control logic configured to direct the operation of the processor units. At least a given one of the processor units comprises a memory, an arithmetic engine and a switch fabric. The switch fabric provides controllable connectivity between the memory, the arithmetic engine and input and output ports of the given processor unit, and has control inputs driven by corresponding outputs of the control logic. In an illustrative embodiment, the processor units may be configured to perform computations associated with a key equation solver in a Reed-Solomon (RS) decoder or other type of forward error correction (FEC) decoder.03-29-2012

Patent applications in class Operation

Patent applications in all subclasses Operation