Patent application number | Description | Published |
20090106717 | MULTITHREADED STATIC TIMING ANALYSIS - A method and apparatus for executing multithreaded algorithm to provide static timing analysis of a chip design includes analyzing a chip design to identify various components and nodes associated with the components. A node tree is built with a plurality of nodes. The node tree identifies groups of nodes that are available in different levels. A size of node grouping for a current level is determined by looking up the node tree. Testing data for parallel processing of different size of node groupings using varied thread counts is compiled. An optimum thread count for the current level based on the size of node grouping in the node tree is identified from compiled testing data. Dynamic parallel processing of nodes in the current level is performed using the number of threads identified by the optimum thread count. An acceptable design of the chip is determined by the dynamic parallel processing. | 04-23-2009 |
20090300643 | USING HARDWARE SUPPORT TO REDUCE SYNCHRONIZATION COSTS IN MULTITHREADED APPLICATIONS - A processor configured to synchronize threads in multithreaded applications. The processor includes first and second registers. The processor stores a first bitmask in the first register and a second bitmask in the second register. For each bitmask, each bit corresponds with one of multiple threads. A given bit in the first bitmask indicates the corresponding thread has been assigned to execute a portion of a unit of work. A corresponding bit in the second bitmask indicates the corresponding thread has completed execution of its assigned portion of the unit of work. The processor receives updates to the second bitmask in the second register and provides an indication that the unit of work has been completed in response to detecting that for each bit in the first bitmask that corresponds to a thread that is assigned work, a corresponding bit in the second bitmask indicates its corresponding thread has completed its assigned work. | 12-03-2009 |
20130024647 | CACHE BACKED VECTOR REGISTERS - A processor, method, and medium for utilizing a shared cache to store vector registers. Each thread of a multithreaded processor utilizes a plurality of virtual vector registers to perform vector operations. Virtual vector registers are allocated for each thread, and each virtual vector register is mapped into the shared cache on the processor. The cache is shared between multiple threads such that if one thread is not using vector registers, there is more space in the cache for other threads to use vector registers. | 01-24-2013 |
20130024653 | ACCELERATION OF STRING COMPARISONS USING VECTOR INSTRUCTIONS - A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found. | 01-24-2013 |
20130024654 | VECTOR OPERATIONS FOR COMPRESSING SELECTED VECTOR ELEMENTS - A processor, method, and medium for using vector operations to compress selected elements of a vector. An input vector is compared to a criteria vector, and then a subset of the plurality of elements of the input vector are selected based on the comparison. A permutation vector is generated based on the locations of the selected elements and then the permutation vector is used to permute the selected elements of the input vector to an output vector. The selected elements of the input vector are stored in contiguous locations in the leftmost elements of the output vector. Then, the output vector is stored to memory and a pointer to the memory location is incremented by the number of selected elements. | 01-24-2013 |
20130036276 | INSTRUCTIONS TO SET AND READ MEMORY VERSION INFORMATION - Systems and methods for providing additional instructions for supporting efficient memory corruption detection in a processor. A physical memory may be a DRAM with a spare bank of memory reserved for a hardware failover mechanism. Version numbers associated with data structures allocated in the memory may be generated so that version numbers of adjacent data structures are different. A processor determines that a fetched instruction is a memory access instruction corresponding to a first data structure within the memory. For instructions that are not a version update instruction, the processor compares the first version number and second version number stored in a location in the memory indicated by the generated address and flags an error if there is a mismatch. For version update instructions, the processor performs a memory access operation on the second version number with no comparison check. | 02-07-2013 |
20130036332 | MAXIMIZING ENCODINGS OF VERSION CONTROL BITS FOR MEMORY CORRUPTION DETECTION - Systems and methods for maximizing a number of available states for a version number used for memory corruption detection. A physical memory may be a DRAM comprising a plurality of regions. Version numbers associated with data structures allocated in the physical memory may be generated so that version numbers of adjacent data structures in a virtual address space are different. A reserved set and an available set of version numbers are associated with each one of the plurality of regions. A version number in a reserved set of a given region may be in an available set of another region. The processor detects no memory corruption error in response to at least determining a version number stored in a memory location in a first region identified by a memory access operation is also in a reserved set associated with the first region. | 02-07-2013 |
20140115283 | BLOCK MEMORY ENGINE WITH MEMORY CORRUPTION DETECTION - Techniques for handling version information using a copy engine. In one embodiment, an apparatus comprises a copy engine configured to perform one or more operations associated with a block memory operation in response to a command. Examples of block memory operations may include copy, clear, move, and/or compress operations. In one embodiment, the copy engine is configured to handle version information associated with the block memory operation based on the command. The one or more operations may include operating on data in a cache and/or modifying entries in a memory. In one embodiment, the copy engine is configured to compare version information in the command with stored version information. The copy engine may overwrite or preserve version information based on the command. The copy engine may be a coprocessing element. The copy engine may be configured to maintain coherency with other copy engines and/or processing elements. | 04-24-2014 |
20150046650 | Flexible Configuration Hardware Streaming Unit - A processor having a streaming unit is disclosed. In one embodiment, a processor includes a streaming unit configured to load one or more input data streams from a memory coupled to the processor. The streaming unit includes an internal network having a plurality of queues configured to store streams of data. The streaming unit further includes a plurality of operations circuits configured to perform operations on the streams of data. The streaming unit is software programmable to operatively couple two or more of the plurality of operations circuits together via one or more of the plurality of queues. The operations circuits may perform operations on multiple streams of data, resulting in corresponding output streams of data. | 02-12-2015 |
20150046687 | Hardware Streaming Unit - A processor having a streaming unit is disclosed. In one embodiment, a processor includes one or more execution units configured to execute instructions of a processor instruction set. The processor further includes a streaming unit configured to execute a first instruction of the processor instruction set, wherein executing the first instruction comprises the streaming unit loading a first data stream from a memory of a computer system responsive to execution of a first instruction. The first data stream comprises a plurality of data elements. The first instruction includes a first argument indicating a starting address of the first stream, a second argument indicating a stride between the data elements, and a third argument indicative of an ending address of the stream. The streaming unit is configured to output a second data stream corresponding to the first data stream. | 02-12-2015 |