Multiplication

We are going to look at multiplication in 3 ways:

Multiplying two variables, assigning them to a third;
Multiplying a variable by 2, assigning the to a second; and
Multiplicative assignment by 5.

Multiplying Two Variables

We will consider this block of code:

  ic = ia * ib;
  uic = uia * uib;
  cc = ca * cb;
  ucc = uca * ucb;
  lc = la * lb;
  ulc = ula * ulb;

As with addition, we will treat each line as c = a × b.

AMD64

The equivalent assembly is shown in the following table:

C	.s file	gdb
`ic = ia * ib;`
	`movl -8(%rbp), %eax`	`mov -0x8(%rbp),%eax`
	`imull -12(%rbp), %eax`	`imul -0xc(%rbp),%eax`
	`movl %eax, -16(%rbp)`	`mov %eax,-0x10(%rbp)`
`uic = uia * uib;`
	`movl -20(%rbp), %eax`	`mov -0x14(%rbp),%eax`
	`imull -24(%rbp), %eax`	`imul -0x18(%rbp),%eax`
	`movl %eax, -28(%rbp)`	`mov %eax,-0x1c(%rbp)`

`cc = ca * cb;`
	`movsbl -29(%rbp), %eax`	`movsbl -0x1d(%rbp),%eax`
	`movsbl -30(%rbp), %ecx`	`movsbl -0x1e(%rbp),%ecx`
	`imull %ecx, %eax`	`imul %ecx,%eax`
	`movb %al, -31(%rbp)`	`mov %al,-0x1f(%rbp)`
`ucc = uca * ucb;`
	`movzbl -32(%rbp), %eax`	`movzbl -0x20(%rbp),%eax`
	`movzbl -33(%rbp), %ecx`	`movzbl -0x21(%rbp),%ecx`
	`imull %ecx, %eax`	`imul %ecx,%eax`
	`movb %al, -34(%rbp)`	`mov %al,-0x22(%rbp)`
`lc = la * lb;`
	`movq -48(%rbp), %rax`	`mov -0x30(%rbp),%rax`
	`imulq -56(%rbp), %rax`	`imul -0x38(%rbp),%rax`
	`movq %rax, -64(%rbp)`	`mov %rax,-0x40(%rbp)`
`ulc = ula * ulb;`
	`movq -72(%rbp), %rax`	`mov -0x48(%rbp),%rax`
	`imulq -80(%rbp), %rax`	`imul -0x50(%rbp),%rax`
	`movq %rax, -88(%rbp)`	`mov %rax,-0x58(%rbp)`

This follows the same general pattern as addition of variables:

Copy the local variable a to eax/rax (movl, movsbl, movzbl, movq)
For single-byte variables, do the same for b and ecx/rcx
Multiply the two using either imull or imulq, storing in eax/rax
Copy the register value back to c on the stack

The instructions imull and imulq (or just imul in gdb) are the only differences.

Multiplicative assignment ia *= ib is identical, except for the final mov. This is the same as we saw with additive assignment.

AArch64

The equivalent assembly is shown in the following table:

C	.s file	gdb
`ic = ia * ib;`
	`ldr r3, [fp, #-8]`	`ldr r3, [r11, #-8]`
	`ldr r2, [fp, #-28]`	`ldr r2, [r11, #-28]`
	`mul r3, r2, r3`	`mul r3, r2, r3`
	`str r3, [fp, #-48]`	`str r3, [r11, #-48]`
`uic = uia * uib;`
	`ldr r3, [fp, #-12]`	`ldr r3, [r11, #-12]`
	`ldr r2, [fp, #-32]`	`ldr r2, [r11, #-32]`
	`mul r3, r2, r3`	`mul r3, r2, r3`
	`str r3, [fp, #-52]`	`str r3, [r11, #-52]`
`cc = ca * cb;`
	`ldrb r2, [fp, #-13]`	`ldrb r2, [r11, #-13]`
	`ldrb r3, [fp, #-33]`	`ldrb r3, [r11, #-33]`
	`smulbb r3, r2, r3`	`smulbb r3, r2, r3`
	`strb r3, [fp, #-53]`	`strb r3, [r11, #-53]`
`ucc = uca * ucb;`
	`ldrb r2, [fp, #-14]`	`ldrb r2, [r11, #-14]`
	`ldrb r3, [fp, #-34]`	`ldrb r3, [r11, #-34]`
	`smulbb r3, r2, r3`	`smulbb r3, r2, r3`
	`strb r3, [fp, #-54]`	`strb r3, [r11, #-54]`
`lc = la * lb;`
	`ldr r3, [fp, #-20]`	`ldr r3, [r11, #-20]`
	`ldr r2, [fp, #-40]`	`ldr r2, [r11, #-40]`
	`mul r3, r2, r3`	`mul r3, r2, r3`
	`str r3, [fp, #-60]`	`str r3, [r11, #-60]`
`ulc = ula * ulb;`
	`ldr r3, [fp, #-24]`	`ldr r3, [r11, #-24]`
	`ldr r2, [fp, #-44]`	`ldr r2, [r11, #-44]`
	`mul r3, r2, r3`	`mul r3, r2, r3`
	`str r3, [fp, #-64]`	`str r3, [r11, #-64]`

The process for AArch64 is similarly almost identical to addition. We replace the add instruction with mul, except for single-byte variables, for which we use smulbb. Multiplicative assignment ia *= ib follows the same pattern we have seen elsewhere.

Multiplying by 2

This is almost identical for all of our variable types, so we will only consider the following statement:

ic = ia * 2;

AMD64

The equivalent assembly is shown in the following table:

C	.s file	gdb
`ic = ia * 2;`
	`movl -8(%rbp), %eax`	`mov -0x8(%rbp),%eax`
	`shll %eax`	`shl $1,%eax`
	`movl %eax, -16(%rbp)`	`mov %eax,-0x10(%rbp)`

Here we see that, rather than calling imull, we call shll, which is a left-shift. In the assembly file, we don’t provide the amount by which we want to shift — it uses 1 as the default. In gdb, we see the value of 1 explicitly. For long integers, shll is replaced by shlq.

AArch64

The equivalent assembly is shown in the following table:

C	.s file	gdb
`ic = ia * 2;`
	`ldr r3, [fp, #-8]`	`ldr r3, [r11, #-8]`
	`lsl r3, r3, #1`	`lsl r3, r3, #1`
	`str r3, [fp, #-48]`	`str r3, [r11, #-48]`

As with AMD64, on AArch64 we replace the multiplication by 2 with a left shift. In this case, the instruction is lsl, and is the same for all of our integer types.

Multiplicative Assignment by 5

Since we multiplication by a power of 2 is a simple shift, we will look at multiplying by a value that is not a power of 2. We will also do this as a multiplicative assignment, just to have that as a concrete example. Consider the following statement:

ia *= 5;

AMD64

The equivalent assembly is shown in the following table:

C	.s file	gdb
`ia *= 5;`
	`imull $5, -8(%rbp), %eax`	`imul $0x5,-0x8(%rbp),%eax`
	`movl %eax, -8(%rbp)`	`mov %eax,-0x8(%rbp)`

Here we see a very simple pair of instructions. The version of imull here has three arguments, instead of the two we saw previously. We only use one register, eax, and directly reference the stack location of a. The first argument is the literal $5. We then move eax back to a’s location on the stack.

AArch64

The equivalent assembly is shown in the following table:

C	.s file	gdb
`ia *= 5;`
	`ldr r2, [fp, #-8]`	`ldr r2, [r11, #-8]`
	`mov r3, r2`	`mov r3, r2`
	`lsl r3, r3, #2`	`lsl r3, r3, #2`
	`add r3, r3, r2`	`add r3, r3, r2`
	`str r3, [fp, #-8]`	`str r3, [r11, #-8]`

This is considerably more complex than the AMD64 case. Looking at the instructions, we note that the literal #5 never appears. Instead, the compiler decomposed this to a left shift by 2 (multiplication by 4) followed by an addition. Because multiplication is considerably slower than addition, and shifting is extremely fast, this ends up being faster than using the mul instruction, even though it is more instructions.

Going through instruction-by-instruction:

We begin by loading the register r2 with the value of a.
Next we copy r2 to another register, r3.
We then shift r3 left by 2 bits, storing the result back in r3.
Now we add r3 (4a) and r2 (a), storing the result in r3.
Finally, we store the register r3 back into a on the stack.