Patent application number | Description | Published |
20080266151 | Method of CABAC Coefficient Magnitude and Sign Decoding Suitable for Use on VLIW Data Processors - This invention decodes coefficient magnitudes in compressed video data using a selected context and speculatively decodes a coefficient sign. The next context selection depends upon a number of iterations. This invention confirms the speculatively decoded coefficient sign upon completion of the magnitude decode. This invention operates in a loop until reaching the number of significant coefficients within the block. The method exits the loop and decodes an escape code if an iteration count is greater than a predetermined number. An embodiment of this invention collects both a count up and a count down in an escape code decode in one loop. An embodiment of this invention estimates the number of significant coefficients in a block and selects the inventive or a prior art decode. | 10-30-2008 |
20080267513 | Method of CABAC Significance MAP Decoding Suitable for Use on VLIW Data Processors - This invention decodes a next significance symbol using a selected context. The invention operates in a loop for each symbol decode for a whole block until the number of decoded map elements reaches a maximum number of coefficients for the block type or a last significant coefficient marker is decoded updating loop variables accordingly. This invention counts the number of decoded significance symbols indicating a significant coefficient and stores the locations of such significant coefficients in an array. An embodiment of this invention estimates the number of significant coefficients in a block and selects the inventive method or a prior art decode method. | 10-30-2008 |
20090006037 | Accurate Benchmarking of CODECS With Multiple CPUs - An accurate and simple benchmarking method for multiple processor systems. Instead of a central timer as used in the prior art, a counter is implemented in each processor that counts the processor's clock cycles. The counter may be read after the processor's completes a benchmark task. This eliminates the timing skew common in the prior art. | 01-01-2009 |
20090006664 | Linked DMA Transfers in Video CODECS - A new mechanism submits multiple DMA requests that are becoming more common in the newer video codec standards. This feature improves system performance and allows bus accesses to be more efficient. An artificial burst is created by aggregating multiple requests which normally would be distributed to be more localized in time, thus creating a burst of traffic. | 01-01-2009 |
20090006665 | Modified Memory Architecture for CODECS With Multiple CPUs - The solution proposed in this invention is a nearest neighborhood access protocol, where not every processor is given access to every other memory block. It is shown by analyzing the pipeline that it is adequate to have no more than two masters (CPU's) in particular and 3 CPU's in general. In the case of the 2 CPU approach one of these CPU's is a producer, and the other CPU is a consumer. In the 3 CPU case the third owner may be a DMA channel. | 01-01-2009 |
20110280314 | SLICE ENCODING AND DECODING PROCESSORS, CIRCUITS, DEVICES, SYSTEMS AND PROCESSES - A video decoder includes a memory ( | 11-17-2011 |
20110310966 | SYNTAX ELEMENT DECODING - Techniques for efficient syntax element decoding in a system employing context-based adaptive binary arithmetic decoding are disclosed herein. In some embodiments, a video decoding system includes a context-based adaptive binary arithmetic code (“CABAC”) decoder. The decoder includes a processor and decode logic executed by the processor. The decode logic is configured to decompress a CABAC encoded syntax element. The decode logic includes a table embodying a set of rules that determine whether syntax element decoding is complete based on table addressing derived from a decoded syntax element binary value. | 12-22-2011 |
20110317762 | VIDEO ENCODER AND PACKETIZER WITH IMPROVED BANDWIDTH UTILIZATION - Techniques for managing a video encoding pipeline are disclosed herein. In one embodiment, a video encoder includes a multi-stage encoding pipeline. The pipeline includes an entropy coding engine and a transform engine. The entropy encoding engine is configured to, in a first pipeline cycle, entropy encode a transformed first macroblock and determine that a predetermined slice size will be exceeded by adding the entropy encoded macroblock to a slice. The transform engine is configured to provide a transformed macroblock to the entropy coding engine. The transform engine is also configured to determine, in a third pipeline cycle, coding and prediction mode to apply to the first macroblock, based on the entropy coding engine determining, in the first pipeline cycle, that the predetermined slice size will be exceeded by adding the encoded macroblock to a slice. | 12-29-2011 |
20120017067 | ON-DEMAND PREDICATE REGISTERS - In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. The register set includes a plurality of legacy predicate registers. Separate from the legacy predicate registers, a plurality of on-demand predicate registers are selectively signaled without changing the opcode space for the DSP. | 01-19-2012 |
20120066415 | METHODS AND SYSTEMS FOR DIRECT MEMORY ACCESS (DMA) IN-FLIGHT STATUS - In accordance with at least some embodiments, a system includes a processing entity configured to run multiple threads. The system also includes a direct memory access (DMA) engine coupled to the processing entity, the DMA engine being configured to track DMA in-flight status information for each of a plurality of DMA channels. The processing entity is configured to manage overlapping DMA requests to a DMA channel of the DMA engine based on said DMA in-flight status information. | 03-15-2012 |
20120117360 | DEDICATED INSTRUCTIONS FOR VARIABLE LENGTH CODE INSERTION BY A DIGITAL SIGNAL PROCESSOR (DSP) - In accordance with at least some embodiments, a digital signal processor (DSP) includes an instruction fetch unit and an instruction decode unit in communication with the instruction fetch unit. The DSP also includes a register set and a plurality of work units in communication with the instruction decode unit. The DSP selectively uses a dedicated insert instruction to insert a variable number of bits into a register. | 05-10-2012 |
20130185538 | PROCESSOR WITH INTER-PROCESSING PATH COMMUNICATION - A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage. The instruction stream includes scalar instructions executable by the scalar processor core and vector instructions executable by the vector coprocessor core. The scalar processor core is configured to pass the vector instructions to the vector coprocessor core. The vector coprocessor core configured to process a plurality of data values in parallel while executing each vector instruction passed by the scalar processor core. The vector coprocessor core includes a plurality of processing paths arranged in parallel to process the data values. Each of the processing paths includes an execution unit. Each of the execution units is configured to communicate a result of processing to each other of the execution units. | 07-18-2013 |
20130185539 | PROCESSOR WITH TABLE LOOKUP AND HISTOGRAM PROCESSING UNITS - A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop. | 07-18-2013 |
20130185540 | PROCESSOR WITH MULTI-LEVEL LOOPING VECTOR COPROCESSOR - A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core includes a program memory interface through which the scalar processor retrieves instructions from a program memory. The instructions include scalar instructions executable by the scalar processor and vector instructions executable by the vector coprocessor core. The vector coprocessor core includes a plurality of execution units and a vector command buffer. The vector command buffer is configured to decode vector instructions passed by the scalar processor core, to determine whether vector instructions defining an instruction loop have been decoded, and to initiate execution of the instruction loop by one or more of the execution units based on a determination that all of the vector instructions of the instruction loop have been decoded. | 07-18-2013 |
20130185544 | PROCESSOR WITH INSTRUCTION VARIABLE DATA DISTRIBUTION - A vector processor includes a plurality of execution units arranged in parallel, a register file, and a plurality of load units. The register file includes a plurality of registers coupled to the execution units. Each of the load units is configured to load, in a single transaction, a plurality of the registers with data retrieved from memory. The loaded registers corresponding to different execution units. Each of the load units is configured to distribute the data to the registers in accordance with an instruction selectable distribution. The instruction selectable distribution specifies one of plurality of distributions. Each of the distributions specifies a data sequence that differs from the sequence in which the data is stored in memory. | 07-18-2013 |
20140355893 | VECTOR PROCESSOR CALCULATION OF LOCAL BINARY PATTERNS - A method (and system) of determining a local binary pattern in an image includes selecting an orientation. For each pixel in the image, the method further includes determining a binary decision for each such pixel relative to one neighboring pixel of the orientation, selecting a new orientation, and repeating the determination of the binary decision for each pixel in the image relative to one neighboring pixel of the newly selected orientation. | 12-04-2014 |