COGNITIVE ELECTRONICS, INC. Patent applications |
Patent application number | Title | Published |
20160132541 | EFFICIENT IMPLEMENTATIONS FOR MAPREDUCE SYSTEMS - Techniques for use with a processor configured to function as at least a Mapper in a MapReduce system include generating a set of [key, value] pairs by executing a Map function on input data. The set of [key, value] pairs may be stored in a storage system implemented on at least one data storage medium, the storage system being organized into a plurality of divisions with different divisions of the storage system storing [key, value] pairs corresponding to different keys. A first [key, value] pair corresponding to a first key handled by a first Reducer in the MapReduce system and a second [key, value] pair corresponding to a second key handled by a second Reducer in the MapReduce system may both be stored in a first division of the plurality of divisions. | 05-12-2016 |
20160110173 | PROFILING AND OPTIMIZATION OF PROGRAM CODE/APPLICATION - A system and associated methods are disclosed for profiling the execution of program code by a processor. The processor provides an instruction set with special profiling instructions for efficiently determining the bounds and latency of memory operations for blocks of program code. Information gathered regarding the bounds and latency of memory operations are used to determine code optimizations, such as allocation of memory for data structures in memory more local to the processor. | 04-21-2016 |
20160092182 | METHODS AND SYSTEMS FOR OPTIMIZING EXECUTION OF A PROGRAM IN A PARALLEL PROCESSING ENVIRONMENT - An automated method of optimizing execution of a program in a parallel processing environment is described. The program is adapted to execute in data memory and instruction memory. An optimizer receives the program to be optimized. The optimizer instructs the program to be compiled and executed. The optimizer observes execution of the program and identifies a subset of instructions that execute most often. The optimizer also identifies groups of instructions associated with the subset of instructions that execute most often. The identified groups of instructions include the identified subset of instructions that execute most often. The optimizer recompiles the program and stores the identified groups of instructions in instruction memory. The remaining instructions portions of the program are stored in the data memory. The instruction memory has a higher access rate and smaller capacity than the data memory. Once recompiled, subsequent execution of the program occurs using the recompiled program. | 03-31-2016 |
20150127880 | EFFICIENT IMPLEMENTATIONS FOR MAPREDUCE SYSTEMS - Techniques for use with a processor configured to function as at least a Mapper in a MapReduce system include generating a set of [key, value] pairs by executing a Map function on input data. The set of [key, value] pairs may be stored in a storage system implemented on at least one data storage medium, the storage system being organized into a plurality of divisions with different divisions of the storage system storing [key, value] pairs corresponding to different keys. A first [key, value] pair corresponding to a first key handled by a first Reducer in the MapReduce system and a second [key, value] pair corresponding to a second key handled by a second Reducer in the MapReduce system may both be stored in a first division of the plurality of divisions. | 05-07-2015 |
20150127691 | EFFICIENT IMPLEMENTATIONS FOR MAPREDUCE SYSTEMS - Techniques for use with at least one processor configured to execute one or more MapReduce applications that cause the at least one processor to function as at least a Mapper in a MapReduce system include accessing data stored in a file system implemented on at least one nonvolatile storage medium. In response to input data being written to the file system by an application other than the one or more MapReduce applications, a set of one or more Map functions applicable to the input data may be accessed. At least one Map function of the one or more Map functions may be executed on the input data via the at least one processor functioning as at least the Mapper in the MapReduce system, and at least one set of [key, value] pairs resulting from execution of the at least one Map function on the received input data may be output. | 05-07-2015 |
20150127649 | EFFICIENT IMPLEMENTATIONS FOR MAPREDUCE SYSTEMS - In some embodiments, a processor configured to function as at least a first Reducer in a MapReduce system may receive a set of mapped [key, value] pairs output from a Mapper in the MapReduce system, identify within the set of mapped [key, value] pairs one or more [key, value] pairs for whose keys the first Reducer is not responsible, and transfer those [key, value] pairs to one or more other Reducers in the MapReduce system. In some embodiments, a system including at least one processor may receive a data packet including a set of mapped [key, value] pairs corresponding to a plurality of keys handled by a plurality of Reducers in a MapReduce system. For each mapped [key, value] pair, the system may identify the corresponding key and one of the Reducers responsible for that key, and provide the mapped [key, value] pair to the Reducer for processing. | 05-07-2015 |
20140282455 | APPLICATION PROFILING - A system and associated methods are disclosed for profiling the execution of program code by a processor. The processor provides an instruction set with special profiling instructions for efficiently determining the bounds and latency of memory operations for blocks of program code. Information gathered regarding the bounds and latency of memory operations are used to determine code optimizations, such as allocation of memory for data structures in memory more local to the processor. | 09-18-2014 |
20140281366 | ADDRESS TRANSLATION IN A SYSTEM USING MEMORY STRIPING - A system and associated methods are disclosed for translating virtual memory addresses to physical memory addresses in a parallel computing system using memory striping. One method comprises: receiving a virtual memory address, comparing a portion of the received virtual memory address to each of a plurality of entries of a virtual memory address matching table, determining a matching row of the virtual memory address matching table for the portion of the received virtual memory address, shifting a contiguous set of bits of the received virtual memory address, wherein the shifting is performed in accordance with information from the matching row, and combining the shifted contiguous set of bits of the received virtual memory address with high-order physical memory address bits associated with the determined matching row of the virtual memory address matching table, and with low-order bits of the received virtual memory address, to produce a physical memory address. | 09-18-2014 |
20140281362 | MEMORY ALLOCATION IN A SYSTEM USING MEMORY STRIPING - A system and associated methods are disclosed for allocating memory in a system providing translation of virtual memory addresses to physical memory addresses in a parallel computing system using memory striping. One method comprises: receiving a request for memory allocation, identifying an available virtually-contiguous physically-non-contiguous memory region (VCPNCMR) of at least the requested size, where the VCPNCMR is arranged such that physical memory addresses for the VCPNCMR may be derived from a corresponding virtual memory addresses by shifting a contiguous set of bits of the virtual memory address in accordance with information in a matching row of a virtual memory address matching table, and combining the shifted bits with high-order physical memory address bits also associated with the determined matching row and with low-order bits of the virtual memory address, and providing to the requesting process a starting address of the identified VCPNCMR. | 09-18-2014 |
20140269765 | Broadcast Network - A system and associated methods are disclosed for routing communications amongst computing units in a distributed computing system. In a preferred embodiment, processors engaged in a distributed computing task transmit results of portions of the computing task via a tree of network switches. Data transmissions comprising computational results from the processors are aggregated and sent to other processors via a broadcast medium. Processors receive information regarding when they should receive data from the broadcast medium and activate receivers accordingly. Results from other processors are then used in computation of further results. | 09-18-2014 |
20140237175 | PARALLEL PROCESSING COMPUTER SYSTEMS WITH REDUCED POWER CONSUMPTION AND METHODS FOR PROVIDING THE SAME - A parallel processing computing system includes an ordered set of m memory banks and a processor core. The ordered set of m memory banks includes a first and a last memory bank, wherein m is an integer greater than 1. The processor core implements n virtual processors, a pipeline having p ordered stages, including a memory operation stage, and a virtual processor selector function. | 08-21-2014 |
20130311806 | PARALLEL PROCESSING COMPUTER SYSTEMS WITH REDUCED POWER CONSUMPTION AND METHODS FOR PROVIDING THE SAME - A parallel processing computing system includes an ordered set of m memory banks and a processor core. The ordered set of m memory banks includes a first and a last memory bank, wherein m is an integer greater than 1. The processor core implements n virtual processors, a pipeline having p ordered stages, including a memory operation stage, and a virtual processor selector function. | 11-21-2013 |
20130226724 | METHODS AND SYSTEMS FOR PRICING COST OF EXECUTION OF A PROGRAM IN A PARALLEL PROCESSING ENVIRONMENT AND FOR AUCTIONING COMPUTING RESOURCES FOR EXECUTION OF PROGRAMS - An automated auction-based method of determining price to execute one or more candidate programs on a parallel computing system is disclosed. The parallel computing system includes a plurality of computing resources, each having a price per unit of time. For each candidate program, a plurality of executions are performed using different amounts of computing resources. The number of program outputs completed during each execution is measured. A plurality of bids defining a price for completing a desired number of program outputs in a desired amount of time are received. The amount of computing resources required to fulfill each bid is determined. A price per unit of time for the computing resources for each bid is calculated based on the price associated with the bid and the determined amount of computing resources required to fulfill the bids. The bids are fulfilled based on the calculated price per unit of time. | 08-29-2013 |
20130086564 | METHODS AND SYSTEMS FOR OPTIMIZING EXECUTION OF A PROGRAM IN AN ENVIRONMENT HAVING SIMULTANEOUSLY PARALLEL AND SERIAL PROCESSING CAPABILITY - An automated method of optimizing execution of a program in a parallel processing environment is disclosed. The program has a plurality of threads and is executable in parallel and serial hardware. The method includes receiving the program at an optimizer and compiling the program to execute in parallel hardware. The execution of the program is observed by the optimizer to identify a subset of memory operations that execute more efficiently on serial hardware than parallel hardware. A subset of memory operations that execute more efficiently on parallel hardware than serial hardware are identified. The program is recompiled so that threads that include memory operations that execute more efficiently on serial hardware than parallel hardware are compiled for serial hardware, and threads that include memory operations that execute more efficiently on parallel hardware than serial hardware are compiled for parallel hardware. Subsequent execution of the program occurs using the recompiled program. | 04-04-2013 |
20130061292 | METHODS AND SYSTEMS FOR PROVIDING NETWORK SECURITY IN A PARALLEL PROCESSING ENVIRONMENT - A method of providing network security for executing applications is disclosed. One or more servers including a plurality of microprocessors and a plurality of network processors are provided. A first grouping of microprocessors executes a first application. The first application is executed using the microprocessors in the first grouping. The microprocessors in the first grouping of microprocessors are permitted to communicate with each other via one or more of the network processors. A second grouping of microprocessors executes a second application. At least one server has one or more microprocessors for executing the first application and one or more different microprocessors for executing the second application. The second application is executed using the microprocessors in the second grouping of microprocessors. One or more of the network processors prevent the microprocessors in the first grouping from communicating with the microprocessors in the second grouping during periods of simultaneous execution. | 03-07-2013 |
20130061213 | METHODS AND SYSTEMS FOR OPTIMIZING EXECUTION OF A PROGRAM IN A PARALLEL PROCESSING ENVIRONMENT - An automated method of optimizing execution of a program in a parallel processing environment is described. The program is adapted to execute in data memory and instruction memory. An optimizer receives the program to be optimized. The optimizer instructs the program to be compiled and executed. The optimizer observes execution of the program and identifies a subset of instructions that execute most often. The optimizer also identifies groups of instructions associated with the subset of instructions that execute most often. The identified groups of instructions include the identified subset of instructions that execute most often. The optimizer recompiles the program and stores the identified groups of instructions in instruction memory. The remaining instructions portions of the program are stored in the data memory. The instruction memory has a higher access rate and smaller capacity than the data memory. Once recompiled, subsequent execution of the program occurs using the recompiled program. | 03-07-2013 |
20130054939 | INTEGRATED CIRCUIT HAVING A HARD CORE AND A SOFT CORE - An integrated circuit (IC) is disclosed. The integrated circuit includes a non-reconfigurable multi-threaded processor core that implements a pipeline having n ordered stages, wherein n is an integer greater than 1. The multi-threaded processor core implements a default instruction set. The integrated circuit also includes reconfigurable hardware that implements n discrete pipeline stages of a reconfigurable execution unit. The n discrete pipeline stages of the reconfigurable execution unit are pipeline stages of the pipeline that is implemented by the multi-threaded processor core. | 02-28-2013 |
20130054665 | METHODS AND SYSTEMS FOR PERFORMING EXPONENTIATION IN A PARALLEL PROCESSING ENVIRONMENT - An automated method of performing exponentiation is disclosed. A plurality of tables holding factors for obtaining results of Exponentiations are provided. The plurality of tables are loaded into computer memory. Each factor is the result of a second exponentiation of a constant and an exponent. The exponent is related to a memory address corresponding to the factor. A plurality of memory addresses are identified for performing the first exponentiation by breaking up the first exponentiation into equations, the results of which are factors of the first Exponentiation. The exponents of the equations are related to the memory addresses corresponding to the factors held in the tables. A plurality of lookups into the computer memory are performed to retrieve the factors held in the tables corresponding to the respective memory addresses. The retrieved factors are multiplied together to obtain the result of the first exponentiation. | 02-28-2013 |
20120311353 | PARALLEL PROCESSING COMPUTER SYSTEMS WITH REDUCED POWER CONSUMPTION AND METHODS FOR PROVIDING THE SAME - A computing system is provided that includes a web page search node including a web page collection, a web server, and a search page returner. | 12-06-2012 |
20100241938 | SYSTEM AND METHOD FOR ACHIEVING IMPROVED ACCURACY FROM EFFICIENT COMPUTER ARCHITECTURES - This invention provides a system and method that can employ a low-instruction-per-second (lower-power), highly parallel processor architecture to perform the low-precision computations. These are aggregated at high-precision by an aggregator. Either a high-precision processor arrangement, or a low-precision processor arrangement, employing soft-ware-based high-precision program instructions performs the less-frequent, generally slower high-precision computations of the aggregated, more-frequent low-precision computations. One final aggregator totals all low-precision computations and another high-precision aggregator totals all high-precision computations. An equal number of low precision computations are used to generate the error value that is subtracted from the low-precision average. A plurality of lower-power processors can be arrayed to provide the low-precision computation function. Alternatively a plurality of SIMD can be used to alternately conduct low-precision computations for a predetermined number of operations and high-precision operations on a fewer number of operations. In an embodiment, aggregation can include summing values within predetermined ranges of orders of magnitude, via an adding tree arrangement, so that significant digits therebetween are preserved. | 09-23-2010 |
20090083263 | PARALLEL PROCESSING COMPUTER SYSTEMS WITH REDUCED POWER CONSUMPTION AND METHODS FOR PROVIDING THE SAME - This invention provides a computer system architecture and method for providing the same which can include a web page search node including a web page collection. The system and method can also include a web server configured to receive, from a given user via a web browser, a search query including keywords. The node is caused to search pages in its own collection that best match the search query. A search page returner may be provided which is configured to return, to the user, high ranked pages. The node may include a power-efficiency-enhanced processing subsystem, which includes M processors. The M processors are configured to emulate N virtual processors, and they are configured to limit a virtual processor memory access rate at which each of the N virtual processors accesses memory. The memory accessed by each of the N virtual processors may be RAM. In select embodiments, the memory accessed by each of the N virtual processors includes DRAM having a high capacity yet lower power consumption then SRAM. | 03-26-2009 |