Home

Overview

Why MMX

MMX in-depth

Future Development

MMX In-depth

Finally, we get to talk about the MMX in_depth!

Before we go any further some points are worth mentioning. First of all we have to have a good understanding about the characteristics of today's computer applications, all the relatively computation-intensive applications. Things like video, audio, graphics and miltimedia applications. There are serveral things all of them have in common,

they all have small data types
they all have regular recurring memory-access patterns
they all try to localize the data they are computing
and most of all they all have computation-intensive algorithm!!!!!!

As we talk more about the MMX set instructions and syntax, the following topics are unescapable.

DATA TYPES
REGISTERS
INSTRUCTIONS

DATA TYPES

Unlike the most of the data types, bytes, words, double words, and quadwords, the major data type of the MMX instruction set is the packed, fixed-point integer. The fixed-point integer is fromed of multiple integer words (bytes, words, double, etc) that are grouped into one single 64-bit quantity. To fully take advantages of the MMX's date types, the size of each fixed-point integer has to be a fixed 64-bit quantuty. This 64-bit quantity is then moved into one of the eight 64-bit MMX registers (we'll talk about those registers later). The decimal point of the fixed-point values is implicit and is left for all the programmers to control for the maximum amount of the programming flexibility.

The supported data types are signed and unsigned fixed-point intergers, bytes, words, doublewords and quadwords.

The four data types are:

  1. Packed byte -- Eight bytes packed into one 64-bit quantity
  2. Packed word -- Four 16-bit words packed into one 64-bit quantity
  3. Packed doubleword -- Two 32-bit double words packed into one 64-bit quantity
  4. Quadword -- One 64-bit quantity!!

Why is the fixed-point integer run faster than the general 8-bit code? Let's take a example of scaler code and MMX code processing one same image file.  Hopofully you will see the difference.  The odinary scaler operation can access one data element at a time, that is 8-bit (one pixel) at a time.  But the MMX code process pixels at a much faster rate. MMX operation can access 64-bit data at a time that's 8 data element at a time (8 times faster)!!!

 

REGISTERS

The MMX's data types can be stored in any of the eight new MMX registers. And the MMX instructions can access them directly, by using the names from MM0 to MM7.

 

INSTRUCTIONS

MMX has this syntax:

    instruction [dest, source]

The most MMX instructions have two operands: source and destination.  MMX instructions typically use use both operand as input and write the result in the destination.

The MMX instructions cover several functional units including:

Basic arithmetic operations such as add, subtract, multiply, arithmetic shift and multiply-add
Comparison operations
Conversion instructions to convert between the new data types - pack data together, and unpack from small to larger data types
Logical operations such as AND, AND NOT,OR, and XOR
Shift operations
Data Transfer (MOV) instructions for MMX register-to-register transfers, or 64-bit and 32-bit load/store to memory

Arithmetic and logical instructions are designed to support the different packed integer data types. These instructions have a different opcode for each data type supported. As a result, the new MMX technology instructions are implemented with 57 opcodes.

(more RISC stuff!) MMX use general-purpose, basic instructions that are fast and easily assigned to the parallel pipelines in Intel processors. By using this general-purpose approach, MMX technology provides performance that will scale well across current and future generations of Intel processors.

Most instructions have a suffix that indicates operation and data types.

US indicates unsigned saturations.  Using saturation arithmetic, when a number exceeds the data size limit, for its data type , it saturates the upper data-range limit.   a signed word greater than 7FFH will be saturates to 7FFH. On the other hand, when a number is less than 7FFH will also be saturated to 7FFH.
S or SS indicates signed saturation.  If there is no S or SS, then the wrap around arithmetic is used.  The wrapp around arithmetic works, when a number exceeds the data-range of its data type.  Using wraparound arithmetic, when a number exceeds the data-range limt, it wraps around.  The carry or borrow is discarded. 

For example a single word 2 greater than 7FFH will be wrapped around to 8001H.

B, W, D, or Q indicates the certain data type.  If 2 letters are appended, the source operands are treated as the first data type and the destination operand as the second data type.

For example, paddusw MM4, mem1 is an MMX instrction.

Most of MMX instructions operate on data using wraparound arithmetic. Some MMX instructions oerate on data using saturation arithmetic.

Category

Mnemonic

Number of
Different Opcodes

Description

Arithmetic PADD[B,W,D]

3

Add with wrap-around on [byte, word, doubleword]
PADDS[B,W]

2

Add signed with saturation on [byte, word]
PADDUS[B,W]

2

Add unsigned with saturation on [byte, word]
PSUB[B,W,D]

3

Subtract with wrap-around on [byte, word, doubleword]
PSUBS[B,W]

2

Subtract signed with saturation on [byte, word]
PSUBUS[B,W]

2

Subtract unsigned with saturation on [byte, word]
PMULHW

1

Packed multiply high on words
PMULLW

1

Packed multiply low on words
PMADDWD

1

Packed multiply on words and add resulting pairs
Comparison PCMPEQ[B,W,D]

3

Packed compare for equality [byte, word,doubleword]
PCMPGT[B,W,D]

3

Packed compare greater than [byte, word, doubleword]
Conversion PACKUSWB

1

Pack words into bytes (unsigned with saturation)
PACKSS[WB,DW]

2

Pack [words into bytes, doublewords into words] (signed with saturation)
PUNPCKH [BW,WD,DQ]

3

Unpack (interleave) high-order [bytes, words, doublewords] from MMXTM register
PUNPCKL [BW,WD,DQ]

3

Unpack (interleave) low-order [bytes, words, doublewords] from MMX register
Logical PAND

1

Bitwise AND
PANDN

1

Bitwise AND NOT
POR

1

Bitwise OR
PXOR

1

Bitwise XOR
Shift PSLL[W,D,Q]

6

Packed shift left logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value
PSRL[W,D,Q]

6

Packed shift right logical [word, doubleword, quadword] by amount specified in MMX register or by immediate value
PSRA[W,D]

4

Packed shift right arithmetic [word, doubleword] by amount specified in MMX register or by immediate value
Data Transfer MOV[D,Q]

4

Move [doubleword, quadword] to MMX register or from MMX register
FP & MMX State Mgmt EMMS

1

Empty MMX state

(A table copied from Intel.com)

Instruction Recape

EMMS : The EMMS instructions empties the MMX states by setting the float point tag to empty (all 1's).  Always use EMMS at the end of the runtines!!!

Add and Substract instructions:Add and substructact instuctions operate both on signed and unsigned packed data types. (paddb/w/d, psubb/w/d, paddsp/w, psubsb/w, paddusb/w, psubusb/w)

Shift instructions: Shift instructions shifts the data elements in the destination operand (psllw/d/q, psraw/d, psrlw/d/q )

Logic instuctions:The logic instructions operate bitwiseon 64 bits. (pand, por, pandn, pxor, )

Multiply instructions:Multiply instructions multiply signed 16 bit word, and product 32 bit doubleword products (pmaddwd, pmulhw, pmullw)

Compare instructions:Compare instrctions compares the data in the soure operand and destination operand, and generating masks in the destination operand. (pcmeqb/w/d, pcmpgtb/w/d)

Packed and Unpacked instrcutoins: (VERY USEFUL!!) Packed instructions are used to pack bigger data types into smaller ones.Unpacked instrcutions are used the other way around. (packsswb/wb, puncpkhbw/wb/dq, packuswb, puncpklbw/wd/dq)

Data transfer instructions:.(VERY USEFUL!!) The data transfer instructions are used to move data to and from MMX registers. Integer registers, and Mem.   (movd, movp)

 

 

Copyright University of Maryland.
For problems or questions regarding this web contact [ProjectEmail].
Last updated: December 20, 1998.