Patent application number | Description | Published |
20130275734 | PACKED DATA OPERATION MASK CONCATENATION PROCESSORS, METHODS, SYSTEMS AND INSTRUCTIONS - A method of an aspect includes receiving a packed data operation mask concatenation instruction. The packed data operation mask concatenation instruction indicates a first source having a first packed data operation mask, indicates a second source having a second packed data operation mask, and indicates a destination. A result is stored in the destination in response to the packed data operation mask concatenation instruction. The result includes the first packed data operation mask concatenated with the second packed data operation mask. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130305020 | VECTOR FRIENDLY INSTRUCTION FORMAT AND EXECUTION THEREOF - A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams. | 11-14-2013 |
20140149724 | VECTOR FRIENDLY INSTRUCTION FORMAT AND EXECUTION THEREOF - A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams. | 05-29-2014 |
20140289503 | PACKED DATA OPERATION MASK COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - Receive packed data operation mask comparison instruction indicating first packed data operation mask having first packed data operation mask bits and second packed data operation mask having second packed data operation mask bits. Each packed data operation mask bit of first mask corresponds to a packed data operation mask bit of second mask in corresponding position. Modify first flag to first value if bitwise AND of each packed data operation mask bit of first mask with each corresponding packed data operation mask bit of second mask is zero. Otherwise modify first flag to second value. Modify second flag to third value if bitwise AND of each packed data operation mask bit of first mask with bitwise NOT of each corresponding packed data operation mask bit of second mask is zero. Otherwise modify second flag to fourth value. | 09-25-2014 |
Patent application number | Description | Published |
20110153983 | Gathering and Scattering Multiple Data Elements - According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception. | 06-23-2011 |
20120166761 | VECTOR CONFLICT INSTRUCTIONS - A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector. | 06-28-2012 |
20130275719 | PACKED DATA OPERATION MASK SHIFT PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a packed data operation mask shift instruction. The packed data operation mask shift instruction indicates a source having a packed data operation mask, indicates a shift count number of bits, and indicates a destination. The method further includes storing a result in the destination in response to the packed data operation mask shift instruction. The result includes a sequence of bits of the packed data operation mask that have been shifted by the shift count number of bits. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130275728 | PACKED DATA OPERATION MASK REGISTER ARITHMETIC COMBINATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130275730 | APPARATUS AND METHOD OF IMPROVED EXTRACT INSTRUCTIONS - An apparatus is described that includes instruction execution logic circuitry to execute first, second, third and fourth instructions. Both the first instruction and the second instruction select a first group of input vector elements from one of multiple first non overlapping sections of respective first and second input vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups of the first and third instructions at a first granularity, where, respective resultants produced therewith are respective resultants of the first and third instructions. The masking circuitry is also to mask the first and second groups of the second and fourth instructions at a second granularity, where, respective resultants produced therewith are respective resultants of the second and fourth instructions. | 10-17-2013 |
20130283021 | APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS - An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity. | 10-24-2013 |
20130290254 | INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY - An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity. | 10-31-2013 |
20130290687 | APPARATUS AND METHOD OF IMPROVED PERMUTE INSTRUCTIONS - An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths. | 10-31-2013 |
20130339664 | INSTRUCTION EXECUTION UNIT THAT BROADCASTS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY - An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The first data structure is four times as large as the second data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second instruction to create a second replication data structure. | 12-19-2013 |
20140019732 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING MASK BIT COMPRESSION - Embodiments of systems, apparatuses, and methods for performing in a computer processor mask bit compression in response to a single mask bit compression instruction that includes a source writemask register operand, a destination writemask register operand, and an opcode are described. | 01-16-2014 |
20140059322 | APPARATUS AND METHOD FOR BROADCASTING FROM A GENERAL PURPOSE REGISTER TO A VECTOR REGISTER - An apparatus and method are described for broadcasting from a general purpose source register to a destination vector register. For example, a method according to one embodiment includes the following operations: selecting data element position N within the destination vector register to be updated; broadcasting a set of data from the general purpose source register to data element position N within the destination vector register if a mask indicator is set to a first indication; and either copying zeroes to data element position N within the destination vector register or maintaining existing values stored within data element position N within the destination vector register if the mask indicator is set to a second indication. | 02-27-2014 |
20140068227 | SYSTEMS, APPARATUSES, AND METHODS FOR EXTRACTING A WRITEMASK FROM A REGISTER - Embodiments of systems, apparatuses, and methods for performing in a computer processor mask extraction from a general purpose register in response to a single mask extraction from a general purpose register instruction that includes a source general purpose register operand, a destination writemask register operand, an immediate value, and an opcode are described. | 03-06-2014 |
20140095843 | Systems, Apparatuses, and Methods for Performing Conflict Detection and Broadcasting Contents of a Register to Data Element Positions of Another Register - Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting. | 04-03-2014 |
20140096119 | LOOP VECTORIZATION METHODS AND APPARATUS - Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency. | 04-03-2014 |
20140149802 | Apparatus And Method To Obtain Information Regarding Suppressed Faults - A processor includes an execution unit, a fault mask coupled to the execution unit, and a suppress mask coupled to the execution unit. The fault mask is to store a first plurality of bit values to indicate which elements of a multi-element vector have an associated fault generated in response to execution of an instruction on the element in the execution unit. The suppress mask is to store a second plurality of bit values to indicate which of the elements are to have an associated fault suppressed. The processor also includes counter logic to increment a counter in response to an indication of a first fault associated with the first element and received from the fault mask, and an indication of a first suppression associated with the first element and received from the suppress mask. Other embodiments are described as claimed. | 05-29-2014 |
20140189296 | SYSTEM, APPARATUS AND METHOD FOR LOOP REMAINDER MASK INSTRUCTION - A loop remainder mask instruction indicates a current iteration count of a loop as a first operand, an iteration limit of a loop as a second operand, and a destination. The loop contains iterations and each iteration includes a data element of the array. A processor receives the loop remainder mask instruction, decodes the instruction for execution, and stores a result of the execution in the destination. The result indicates a number of data elements of the array past an end of a preceding portion of the array that are to be handled separately from the preceding portion, the end of the preceding portion being where the current iteration count is recorded. | 07-03-2014 |
20140189307 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT RESOLUTION WITH VECTOR POPULATION COUNT FUNCTIONALITY - Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140189308 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT DETECTION FUNCTIONALITY - Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140223138 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING CONVERSION OF A MASK REGISTER INTO A VECTOR REGISTER. - Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a vector register in response to a single vector packed convert a mask register to a vector register instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described. | 08-07-2014 |
20140281389 | METHODS AND APPARATUS FOR FUSING INSTRUCTIONS TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY ON MULTIPLE TEST SOURCES - Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler. Some embodiments also fuse the novel test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set. | 09-18-2014 |
20140281397 | FUSIBLE INSTRUCTIONS AND LOGIC TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY USING MULTIPLE TEST SOURCES - Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set. | 09-18-2014 |
20140281401 | Systems, Apparatuses, and Methods for Determining a Trailing Least Significant Masking Bit of a Writemask Register - The execution of a KZBTZ finds a trailing least significant zero bit position in an first input mask and sets an output mask to have the values of the first input mask, but with all bit positions closer to the most significant bit position than the trailing least significant zero bit position in an first input mask set to zero. In some embodiments, a second input mask is used as a writemask such that bit positions of the first input mask are not considered in the trailing least significant zero bit position calculation depending upon a corresponding bit position in the second input mask. | 09-18-2014 |
20140344553 | Gathering and Scattering Multiple Data Elements - According to a first aspect, efficient data transfer operations can be achieved by: decoding by a processor device, a single instruction specifying a transfer operation for a plurality of data elements between a first storage location and a second storage location; issuing the single instruction for execution by an execution unit in the processor; detecting an occurrence of an exception during execution of the single instruction; and in response to the exception, delivering pending traps or interrupts to an exception handler prior to delivering the exception. | 11-20-2014 |
20150026439 | APPARATUS AND METHOD FOR PERFORMING PERMUTE OPERATIONS - An apparatus and method are described for permuting data elements with masking. For example, a method according to one embodiment includes the following operations: reading values from a mask data structure to determine whether masking is implemented for each data element of a destination operand; if masking not implemented for a particular data element, then selecting data elements from a first source operand and a second source operand based on index values stored in destination operand to be copied to data element positions within the destination operand, wherein any one of the data elements from either the first source operand and the second source operand may be copied to any one of the data element positions within the destination operand; and if masking is implemented for a particular data element of the destination operand, then performing a designated masking operation with respect to that particular data element. | 01-22-2015 |
20150113246 | INSTRUCTION AND LOGIC TO PROVIDE CONVERSIONS BETWEEN A MASK REGISTER AND A GENERAL PURPOSE REGISTER OR MEMORY - Instructions and logic provide conversions between a mask register and a general purpose register or memory. Some embodiments, responsive to an instruction specifying: a destination operand, a mask length corresponding to a number of mask data fields, and a source operand; values are read from data fields in the source operand, corresponding to the specified mask length, and stored to corresponding data fields in the destination operand specified by the instruction, wherein one of the source or the destination operands is a mask register. Values indicative of masked vector elements may be stored to any data fields in the destination operand other than the number of data fields corresponding to the specified mask length. For some embodiments, the other one of the source or the destination operands may be a general purpose register or a memory location. | 04-23-2015 |
20150261534 | PACKED TWO SOURCE INTER-ELEMENT SHIFT MERGE PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A processor includes a decoder to receive an instruction that indicates first and second source packed data operands and at least one shift count. An execution unit is operable, in response to the instruction, to store a result packed data operand. Each result data element includes a first least significant bit (LSB) portion of a first data element of a corresponding pair of data elements in a most significant bit (MSB) portion, and a second MSB portion of a second data element of the corresponding pair in a LSB portion. One of the first LSB portion of the first data element and the second MSB portion of the second data element has a corresponding shift count number of bits. The other has a number of bits equal to a size of a data element of the first source packed data minus the corresponding shift count. | 09-17-2015 |