We are going to look at multiplication in 3 ways:
We will consider this block of code:
= ia * ib;
ic = uia * uib;
uic = ca * cb;
cc = uca * ucb;
ucc = la * lb;
lc = ula * ulb; ulc
As with addition, we will treat each line as c = a × b.
The equivalent assembly is shown in the following table:
C | .s file | gdb |
---|---|---|
ic = ia * ib; |
||
movl -8(%rbp), %eax |
mov -0x8(%rbp),%eax |
|
imull -12(%rbp), %eax |
imul -0xc(%rbp),%eax |
|
movl %eax, -16(%rbp) |
mov %eax,-0x10(%rbp) |
|
uic = uia * uib; |
||
movl -20(%rbp), %eax |
mov -0x14(%rbp),%eax |
|
imull -24(%rbp), %eax |
imul -0x18(%rbp),%eax |
|
movl %eax, -28(%rbp) |
mov %eax,-0x1c(%rbp) |
|
cc = ca * cb; |
||
movsbl -29(%rbp), %eax |
movsbl -0x1d(%rbp),%eax |
|
movsbl -30(%rbp), %ecx |
movsbl -0x1e(%rbp),%ecx |
|
imull %ecx, %eax |
imul %ecx,%eax |
|
movb %al, -31(%rbp) |
mov %al,-0x1f(%rbp) |
|
ucc = uca * ucb; |
||
movzbl -32(%rbp), %eax |
movzbl -0x20(%rbp),%eax |
|
movzbl -33(%rbp), %ecx |
movzbl -0x21(%rbp),%ecx |
|
imull %ecx, %eax |
imul %ecx,%eax |
|
movb %al, -34(%rbp) |
mov %al,-0x22(%rbp) |
|
lc = la * lb; |
||
movq -48(%rbp), %rax |
mov -0x30(%rbp),%rax |
|
imulq -56(%rbp), %rax |
imul -0x38(%rbp),%rax |
|
movq %rax, -64(%rbp) |
mov %rax,-0x40(%rbp) |
|
ulc = ula * ulb; |
||
movq -72(%rbp), %rax |
mov -0x48(%rbp),%rax |
|
imulq -80(%rbp), %rax |
imul -0x50(%rbp),%rax |
|
movq %rax, -88(%rbp) |
mov %rax,-0x58(%rbp) |
This follows the same general pattern as addition of variables:
eax
/rax
(movl
,
movsbl
, movzbl
, movq
)ecx
/rcx
imull
or
imulq
, storing in eax
/rax
The instructions imull
and imulq
(or just
imul
in gdb) are the only differences.
Multiplicative assignment ia *= ib
is identical, except
for the final mov
. This is the same as we saw with additive
assignment.
The equivalent assembly is shown in the following table:
C | .s file | gdb |
---|---|---|
ic = ia * ib; |
||
ldr r3, [fp, #-8] |
ldr r3, [r11, #-8] |
|
ldr r2, [fp, #-28] |
ldr r2, [r11, #-28] |
|
mul r3, r2, r3 |
mul r3, r2, r3 |
|
str r3, [fp, #-48] |
str r3, [r11, #-48] |
|
uic = uia * uib; |
||
ldr r3, [fp, #-12] |
ldr r3, [r11, #-12] |
|
ldr r2, [fp, #-32] |
ldr r2, [r11, #-32] |
|
mul r3, r2, r3 |
mul r3, r2, r3 |
|
str r3, [fp, #-52] |
str r3, [r11, #-52] |
|
cc = ca * cb; |
||
ldrb r2, [fp, #-13] |
ldrb r2, [r11, #-13] |
|
ldrb r3, [fp, #-33] |
ldrb r3, [r11, #-33] |
|
smulbb r3, r2, r3 |
smulbb r3, r2, r3 |
|
strb r3, [fp, #-53] |
strb r3, [r11, #-53] |
|
ucc = uca * ucb; |
||
ldrb r2, [fp, #-14] |
ldrb r2, [r11, #-14] |
|
ldrb r3, [fp, #-34] |
ldrb r3, [r11, #-34] |
|
smulbb r3, r2, r3 |
smulbb r3, r2, r3 |
|
strb r3, [fp, #-54] |
strb r3, [r11, #-54] |
|
lc = la * lb; |
||
ldr r3, [fp, #-20] |
ldr r3, [r11, #-20] |
|
ldr r2, [fp, #-40] |
ldr r2, [r11, #-40] |
|
mul r3, r2, r3 |
mul r3, r2, r3 |
|
str r3, [fp, #-60] |
str r3, [r11, #-60] |
|
ulc = ula * ulb; |
||
ldr r3, [fp, #-24] |
ldr r3, [r11, #-24] |
|
ldr r2, [fp, #-44] |
ldr r2, [r11, #-44] |
|
mul r3, r2, r3 |
mul r3, r2, r3 |
|
str r3, [fp, #-64] |
str r3, [r11, #-64] |
The process for AArch64 is similarly almost identical to addition. We
replace the add
instruction with mul
,
except for single-byte variables, for which we use
smulbb
. Multiplicative assignment ia *= ib
follows the same pattern we have seen elsewhere.
This is almost identical for all of our variable types, so we will only consider the following statement:
= ia * 2; ic
The equivalent assembly is shown in the following table:
C | .s file | gdb |
---|---|---|
ic = ia * 2; |
||
movl -8(%rbp), %eax |
mov -0x8(%rbp),%eax |
|
shll %eax |
shl $1,%eax |
|
movl %eax, -16(%rbp) |
mov %eax,-0x10(%rbp) |
Here we see that, rather than calling imull
, we call
shll
, which is a left-shift. In the assembly file, we don’t
provide the amount by which we want to shift — it uses 1 as the default.
In gdb, we see the value of 1 explicitly. For long integers,
shll
is replaced by shlq
.
The equivalent assembly is shown in the following table:
C | .s file | gdb |
---|---|---|
ic = ia * 2; |
||
ldr r3, [fp, #-8] |
ldr r3, [r11, #-8] |
|
lsl r3, r3, #1 |
lsl r3, r3, #1 |
|
str r3, [fp, #-48] |
str r3, [r11, #-48] |
As with AMD64, on AArch64 we replace the multiplication by 2 with a
left shift. In this case, the instruction is lsl
, and is
the same for all of our integer types.
Since we multiplication by a power of 2 is a simple shift, we will look at multiplying by a value that is not a power of 2. We will also do this as a multiplicative assignment, just to have that as a concrete example. Consider the following statement:
*= 5; ia
The equivalent assembly is shown in the following table:
C | .s file | gdb |
---|---|---|
ia *= 5; |
||
imull $5, -8(%rbp), %eax |
imul $0x5,-0x8(%rbp),%eax |
|
movl %eax, -8(%rbp) |
mov %eax,-0x8(%rbp) |
Here we see a very simple pair of instructions. The version of
imull
here has three arguments, instead of the two we saw
previously. We only use one register, eax
, and directly
reference the stack location of a. The first argument is the literal
$5
. We then move eax
back to a’s location on the stack.
The equivalent assembly is shown in the following table:
C | .s file | gdb |
---|---|---|
ia *= 5; |
||
ldr r2, [fp, #-8] |
ldr r2, [r11, #-8] |
|
mov r3, r2 |
mov r3, r2 |
|
lsl r3, r3, #2 |
lsl r3, r3, #2 |
|
add r3, r3, r2 |
add r3, r3, r2 |
|
str r3, [fp, #-8] |
str r3, [r11, #-8] |
This is considerably more complex than the AMD64 case. Looking at the
instructions, we note that the literal #5
never appears.
Instead, the compiler decomposed this to a left shift by 2
(multiplication by 4) followed by an addition. Because multiplication is
considerably slower than addition, and shifting is extremely fast, this
ends up being faster than using the mul
instruction, even
though it is more instructions.
Going through instruction-by-instruction:
r2
with the value of
a.r2
to another register,
r3
.r3
left by 2 bits, storing the result
back in r3
.r3
(4a) and r2
(a), storing the result in
r3
.r3
back into a on the stack.