Patent application number | Description | Published |
20090010221 | METHOD OF MULTIPLE-INPUT-MULTIPLE-OUTPUT WIRELESS COMMUNICATION - Embodiments of the present invention provide a method for selectively modulating a data frame of a signal using either a frequency-multiplexing modulation method or a spatial-multiplexing modulation method based on a predetermined criterion. | 01-08-2009 |
20090116401 | METHOD AND DEVICE OF ADAPTIVE CONTROL OF DATA RATE, FRAGMENTATION AND REQUEST TO SEND PROTECTION IN WIRELESS NETWORKS - A method and device for adaptive control of transmission parameters such as data rate, fragmentation and request to send protection. Packet error rates for frames transmitted with and without request to send protection are computed and compared to determine whether error rates are attributable to noise or to collision. If error rates are attributable to noise, data rates may be adjusted and fragmentation may be activated. If error rates are attributable to collisions, request to send protection may be activated or adjusted. | 05-07-2009 |
20090323570 | Techniques for management of shared resources in wireless multi-communication devices - An embodiment of the present invention provides an apparatus, comprising a network adapter configured for wireless communication using more than one technology, and wherein the network adapter is configured to share a plurality of shared hardware components by limiting access to the air to one comm only at given time by designating one comm that owns the shared hardware components as a primary comm and all other comms are secondary comms, wherein the primary comm allows the secondary comms to use the shared hardware components when it is in an idle-state but when the primary comm returns from the idle state, it claims ownership of the shared resources and the secondary comms release the shared resources. | 12-31-2009 |
20090327767 | Techniques for distributed management of wireless devices with shared resources between wireless components - An embodiment of the present invention provides an apparatus, comprising a network adapter configured for wireless communication using more than one technology using distributed management and wherein the network adapter is configured to share a plurality of shared hardware components by automatically turning all other comms to OFF when one comm is turned to ON. | 12-31-2009 |
20100321397 | Shared Virtual Memory Between A Host And Discrete Graphics Device In A Computing System - In one embodiment, the present invention includes a device that has a device processor and a device memory. The device can couple to a host with a host processor and host memory. Both of the memories can have page tables to map virtual addresses to physical addresses of the corresponding memory, and the two memories may appear to a user-level application as a single virtual memory space. Other embodiments are described and claimed. | 12-23-2010 |
20110153707 | MULTIPLYING AND ADDING MATRICES - An apparatus and method are described for multiplying and adding matrices. For example, one embodiment of a method comprises decoding by a decoder in a processor device, a single instruction specifying an m-by-m matrix operation for a set of vectors, wherein each vector represents an m-by-m matrix of data elements and m is greater than one; issuing the single instruction for execution by an execution unit in the processor device; and responsive to the execution of the single instruction, generating a resultant vector, wherein the resultant vector represents an m-by-m matrix of data elements. | 06-23-2011 |
20120057552 | TECHNIQUES FOR MANAGEMENT OF SHARED RESOURCES IN WIRELESS MULTI-COMMUNICATION DEVICES - An embodiment of the present invention provides an apparatus, comprising a network adapter configured for wireless communication using more than one technology, and wherein the network adapter is configured to share a plurality of shared hardware components by limiting access to the air to one comm only at given time by designating one comm that owns the shared hardware components as a primary comm and all other comms are secondary comms, wherein the primary comm allows the secondary comms to use the shared hardware components when it is in an idle-state but when the primary comm returns from the idle state, it claims ownership of the shared resources and the secondary comms release the shared resources. | 03-08-2012 |
20120233439 | Implementing TLB Synchronization for Systems with Shared Virtual Memory Between Processing Devices - Page faults arising in a graphics processing unit may be handled by an operating system running on the central processing unit. In some embodiments, this means that unpinned memory can be used for the graphics processing unit. Using unpinned memory in the graphics processing unit may expand the capabilities of the graphics processing unit in some cases. | 09-13-2012 |
20120236010 | Page Fault Handling Mechanism - Page faults arising in a graphics processing unit may be handled by an operating system running on the central processing unit. In some embodiments, this means that unpinned memory can be used for the graphics processing unit. Using unpinned memory in the graphics processing unit may expand the capabilities of the graphics processing unit in some cases. | 09-20-2012 |
20130007406 | DYNAMIC PINNING OF VIRTUAL PAGES SHARED BETWEEN DIFFERENT TYPE PROCESSORS OF A HETEROGENEOUS COMPUTING PLATFORM - A computer system may support one or more techniques to allow dynamic pinning of the memory pages accessed by a non-CPU device (e.g., a graphics processing unit, GPU). The non-CPU may support virtual to physical address mapping and may thus be aware of the memory pages, which may not be pinned but may be accessed by the non-CPU. The non-CPU may notify or send such information to a run-time component such as a device driver associated with the CPU. In one embodiment, the device driver may, dynamically, perform pinning of such memory pages, which may be accessed by the non-CPU. The device driver may even unpin the memory pages, which may be no longer accessed by the non-CPU. Such an approach may allow the memory pages, which may be no longer accessed by the non-CPU to be available for allocation to the other CPUs and/or non-CPUs. | 01-03-2013 |
20130027410 | CPU/GPU Synchronization Mechanism - A thread on one processor may be used to enable another processor to lock or release a mutex. For example, a central processing unit thread may be used by a graphics processing unit to secure a mutex for a shared memory. | 01-31-2013 |
20130054899 | A 2-D GATHER INSTRUCTION AND A 2-D CACHE - A processor may support a two-dimensional (2-D) gather instruction and a 2-D cache. The processor may perform the 2-D gather instruction to access one or more sub-blocks of data from a two-dimensional (2-D) image stored in a memory coupled to the processor. The two-dimensional (2-D) cache may store the sub-blocks of data in a multiple cache lines. Further, the 2-D cache may support access of more than one cache lines while preserving a two-dimensional structure of the 2-D image. | 02-28-2013 |
20130249925 | Shared Virtual Memory Between A Host And Discrete Graphics Device In A Computing System - In one embodiment, the present invention includes a device that has a device processor and a device memory. The device can couple to a host with a host processor and host memory. Both of the memories can have page tables to map virtual addresses to physical addresses of the corresponding memory, and the two memories may appear to a user-level application as a single virtual memory space. Other embodiments are described and claimed. | 09-26-2013 |
20130262816 | TRANSLATION LOOKASIDE BUFFER FOR MULTIPLE CONTEXT COMPUTE ENGINE - Some implementations disclosed herein provide techniques and arrangements for an specialized logic engine that includes translation lookaside buffer to support multiple threads executing on multiple cores. The translation lookaside buffer enables the specialized logic engine to directly access a virtual address of a thread executing on one of the plurality of processing cores. For example, an acceleration compute engine may receive one or more instructions from a thread executed by a processing core. The acceleration compute engine may retrieve, based on an address space identifier associated with the one or more instructions, a physical address associated with the one or more instructions from the translation lookaside buffer to execute the one or more instructions using the physical address. | 10-03-2013 |
20130268742 | CORE SWITCHING ACCELERATION IN ASYMMETRIC MULTIPROCESSOR SYSTEM - An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code executing on the ASMP is analyzed by a binary analysis unit to determine what functions are called by the program code and select which of the cores are to execute the program code, or a code segment thereof. Selection may be made to provide for native execution of the program code, to minimize power consumption, and so forth. Control operations based on this selection may then be inserted into the program code, forming instrumented program code. The instrumented program code is then executed by the ASMP. | 10-10-2013 |
20130268804 | SYNCHRONOUS SOFTWARE INTERFACE FOR AN ACCELERATED COMPUTE ENGINE - Some implementations disclosed herein provide techniques and arrangements for a synchronous software interface for a specialized logic engine. The synchronous software interface may receive, from a first core of a plurality of cores, a control block including a transaction for execution by the specialized logic engine. The synchronous software interface may send the control block to the specialized logic engine and wait to receive a confirmation from the specialized logic engine that the transaction was successfully executed. | 10-10-2013 |
20130318323 | APPARATUS AND METHOD FOR ACCELERATING OPERATIONS IN A PROCESSOR WHICH USES SHARED VIRTUAL MEMORY - An apparatus and method are described for coupling a front end core to an accelerator component (e.g., such as a graphics accelerator). For example, an apparatus is described comprising: an accelerator comprising one or more execution units (EUs) to execute a specified set of instructions; and a front end core comprising a translation lookaside buffer (TLB) communicatively coupled to the accelerator and providing memory access services to the accelerator, the memory access services including performing TLB lookup operations to map virtual to physical addresses on behalf of the accelerator and in response to the accelerator requiring access to a system memory. | 11-28-2013 |
20140006758 | Extension of CPU Context-State Management for Micro-Architecture State | 01-02-2014 |
20140013333 | CONTEXT-STATE MANAGEMENT - Extended features such as registers and functions within processors are made available to operating systems (OS) using an extended-state driver and by modifying instruction set extensions, such as XSAVE. A map-table designates a correspondence between memory locations for storing data relating to extended features not supported by the OS and called by an application. As a result, applications may utilize processor resources which are unsupported by the OS. | 01-09-2014 |
20140019723 | BINARY TRANSLATION IN ASYMMETRIC MULTIPROCESSOR SYSTEM - An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution. | 01-16-2014 |
20140082630 | PROVIDING AN ASYMMETRIC MULTICORE PROCESSOR SYSTEM TRANSPARENTLY TO AN OPERATING SYSTEM - In one embodiment, the present invention includes a multicore processor with first and second groups of cores. The second group can be of a different instruction set architecture (ISA) than the first group or of the same ISA set but having different power and performance support level, and is transparent to an operating system (OS). The processor further includes a migration unit that handles migration requests for a number of different scenarios and causes a context switch to dynamically migrate a process from the second core to a first core of the first group. This dynamic hardware-based context switch can be transparent to the OS. Other embodiments are described and claimed. | 03-20-2014 |
20140095832 | METHOD AND APPARATUS FOR PERFORMANCE EFFICIENT ISA VIRTUALIZATION USING DYNAMIC PARTIAL BINARY TRANSLATION - Methods, apparatus and systems for virtualization of a native instruction set are disclosed. Embodiments include a processor core executing the native instructions and a second core, or alternatively only the second processor core consuming less power while executing a second instruction set that excludes portions of the native instruction set. The second core's decoder detects invalid opcodes of the second instruction set. A microcode layer disassembler determines if opcodes should be translated. A translation runtime environment identifies an executable region containing an invalid opcode, other invalid opcodes and interjacent valid opcodes of the second instruction set. An analysis unit determines an initial machine state prior to execution of the invalid opcode. A partial translation of the executable region that includes encapsulations of the translations of invalid opcodes and state recoveries of the machine states is generated and saved to a translation cache memory. | 04-03-2014 |
20140129808 | MIGRATING TASKS BETWEEN ASYMMETRIC COMPUTING ELEMENTS OF A MULTI-CORE PROCESSOR - In one embodiment, the present invention includes a multicore processor having first and second cores to independently execute instructions, the first core visible to an operating system (OS) and the second core transparent to the OS and heterogeneous from the first core. A task controller, which may be included in or coupled to the multicore processor, can cause dynamic migration of a first process scheduled by the OS to the first core to the second core transparently to the OS. Other embodiments are described and claimed. | 05-08-2014 |
20140156942 | Shared Virtual Memory Between A Host And Discrete Graphics Device In A Computing System - In one embodiment, the present invention includes a device that has a device processor and a device memory. The device can couple to a host with a host processor and host memory. Both of the memories can have page tables to map virtual addresses to physical addresses of the corresponding memory, and the two memories may appear to a user-level application as a single virtual memory space. Other embodiments are described and claimed. | 06-05-2014 |
20140304559 | AGGREGATED PAGE FAULT SIGNALING AND HANDLINE - A processor of an aspect includes an instruction pipeline to process a multiple memory address instruction that indicates multiple memory addresses. The processor also includes multiple page fault aggregation logic coupled with the instruction pipeline. The multiple page fault aggregation logic is to aggregate page fault information for multiple page faults that are each associated with one of the multiple memory addresses of the instruction. The multiple page fault aggregation logic is to provide the aggregated page fault information to a page fault communication interface. Other processors, apparatus, methods, and systems are also disclosed. | 10-09-2014 |
20140325184 | MECHANISM FOR SAVING AND RETRIEVING MICRO-ARCHITECTURE CONTEXT - A processor saves micro-architectural contexts to increase the efficiency of code execution and power management. Power management hardware during runtime monitors execution of a code block. The code block has been compiled to have a reserved space appended to one end of the code block. The reserved space includes a metadata block associated with the code block or an identifier of the metadata block. The hardware stores a micro-architectural context of the processor in the metadata block. The micro-architectural context includes performance data resulting from a first execution of the code block. The hardware reads the metadata block upon a second execution of the code block and tunes the second execution based on the performance data. | 10-30-2014 |
20140344815 | CONTEXT SWITCHING MECHANISM FOR A PROCESSING CORE HAVING A GENERAL PURPOSE CPU CORE AND A TIGHTLY COUPLED ACCELERATOR - An apparatus is described having multiple cores, each core having: a) an accelerator; and, b) a general purpose CPU coupled to the accelerator. The general purpose CPU has functional unit logic circuitry to execute an instruction that returns an amount of storage space to store context information of the accelerator. | 11-20-2014 |
20140359629 | MECHANISM FOR ISSUING REQUESTS TO AN ACCELERATOR FROM MULTIPLE THREADS - An apparatus is described having multiple cores, each core having: a) a CPU; b) an accelerator; and, c) a controller and a plurality of order buffers coupled between the CPU and the accelerator. Each of the order buffers is dedicated to a different one of the CPU's threads. Each one of the order buffers is to hold one or more requests issued to the accelerator from its corresponding thread. The controller is to control issuance of the order buffers' respective requests to the accelerator. | 12-04-2014 |
20140365747 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A HORIZONTAL PARTIAL SUM IN RESPONSE TO A SINGLE INSTRUCTION - Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described. | 12-11-2014 |
20150178217 | 2-D Gather Instruction and a 2-D Cache - A processor may support a two-dimensional (2-D) gather instruction and a 2-D cache. The processor may perform the 2-D gather instruction to access one or more sub-blocks of data from a 2-D image stored in a memory coupled to the processor. The 2-D cache may store the sub-blocks of data in a multiple cache lines. Further, the 2-D cache may support access of more than one cache lines while preserving a 2-D structure of the 2-D image. | 06-25-2015 |
Patent application number | Description | Published |
20080270763 | Device and Method for Processing Instructions - A method and a device for processing instructions. The device includes a pipelined processor, an instruction memory unit and a register file, whereas the pipelined processor includes a write-back unit and an execution unit. The device is characterized by including a controller that is adapted to receive a first register group size information and a first register identification information that define a first group of source registers associated with a first instruction; and to determine an execution related operation of the first instruction in response to the first register group size information, the first register identification information, a second register group size information and a second register identification information. Whereas the second register group size information and the second register identification information define a second group of target registers associated with a second instruction. Whereas the second instruction is provided to the pipelined processor before the first instruction. | 10-30-2008 |
20080281778 | Hardware Accelerator Based Method and Device for String Searching - A method for searching within a data block for a data chunk having a predefined value, the method includes: fetching, by a processor, a data block search instruction; fetching, a data unit that includes multiple data chunks; wherein at least one data chunk within the data unit belongs to the data block; deciding whether to use a mask for data chunk level masking; searching, by a hardware accelerator, for a valid data chunk within the fetched data unit that has the predefined value; wherein the searching comprising applying a mask; wherein a valid data chunk in an non-masked data chunk that belongs to the data block; and determining whether to update the value of the mask and whether to fetch a new data unit that belongs to the data block. | 11-13-2008 |
20090006822 | Device and Method for Adding and Subtracting Two Variables and a Constant - A method device and a method. The method includes fetching an instruction, decoding an instruction that includes an instruction type field, a first variable field, a second variable field, a result field and a constant field; selecting an operation out of addition operation, a subtraction operation and another type of operation, in response to the content of the instruction type field; determining, in response to the value of the constant field, whether the result of the selected operation is responsive to the first and second variables or is responsive to the first variable, the second variable and the constant; and executing the selected operation, during a single instruction execution cycle, to provide the result. | 01-01-2009 |
20090070555 | DEVICE AND METHOD FOR FINDING EXTREME VALUES IN A DATA BLOCK - A method for locating an extreme value data chunk within a data block, the method includes: fetching, by a processor, an instruction; fetching, in response to a content of the instruction, a data unit that comprises multiple data chunks; selectively masking the fetched data chunks in response to a value of a mask; comparing, by a hardware accelerator, between values of valid data chunks to provide a extreme value data chunk; wherein valid data chunks include un-masked data chunks that belong to the data block; updating the value of the mask and jumping to the stage of fetching a new data unit, until the whole data block is fetched. | 03-12-2009 |
20100223444 | METHOD FOR PERFORMING PLURALITY OF BIT OPERATIONS AND A DEVICE HAVING PLURALITY OF BIT OPERATIONS CAPABILITIES - A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle. | 09-02-2010 |
20120155570 | DEVICE AND METHOD FOR PERFORMING BITWISE MANIPULATIONS - A device and a method for performing bitwise manipulation is provided. Multiple bitwise logic circuits are coupled to an instruction decoder, a register array and a rotator. Each bitwise logic circuit includes input multiplexers connected to an output multiplexer. The instruction decoder is receives a bit manipulation instruction and sends to each corresponding input multiplexer a control signal based on a type of the instruction. Each input multiplexer of each bitwise logic circuit receives a control signal, a constant signal that has a value that is indifferent to the value of the mask, and a mask affected signal that has a value that is responsive to a value of an associated mask bit. Each input multiplexer selects between the constant signal and the mask affected signal based on the control signal, and outputs a selected signal. Each output multiplexer receives selected signals from each of the corresponding input multiplexers, and elects between the selected signal based on a value of an associated manipulated register bit and based on a value an associated control register bit. | 06-21-2012 |
20120185730 | DISTRIBUTED DEBUG SYSTEM - A distributed debug system including processing elements connected to perform a plurality of processing functions on a received data unit, a debug trap unit, a debug trace dump logic unit, and a debug initiator unit is provided. At least two of the processing elements include a debug trap unit that has a first debug enable input and output, and a first debug thread. The first debug thread holds at least a first debug trap circuit having a match signal output connected to the first debug enable output. The first debug trap circuit filters a part of the data unit, compares a filtering result with a debug value, and provides a match signal to the match signal output. The debug trace dump logic unit dumps debug trace data to a buffer associated with the data unit on reception of a match event. The debug initiator unit includes a debug initiator output connected to the first debug enable input of the debug trap unit of one processing element, and a debug initiator input connected to the first debug enable output of the debug trap unit of another processing element. | 07-19-2012 |
20150234419 | METHODS AND APPARATUS FOR ADAPTIVE TIME KEEPING FOR MULTIPLE TIMERS - A timer distribution module supports multiple timers and comprises: a command decoder arranged to determine expiration times of a plurality of timers; and a timer link list distribution adapter, LLDA, operably coupled to the command decoder. The LLDA is arranged to: receive a time reference from a master clock; receive timer data from the command decoder wherein the timer data comprises at least one timer expiration link list; construct a plurality of timer link lists based on at least one of: the timer expiration link list, at least one configurable timing barrier; dynamically split the link list timer data into a plurality of granularities based on the timer expiration link list; and output the dynamically split link list timer data. | 08-20-2015 |