Word Alignment

Word alignment is not a particularly difficult concept, but it is fairly important, because it does show up in unusual places.

What you know

We've defined a word to mean 4 bytes. To store a word in byte-addressable memory (i.e. where each element of memory is one byte), you have to break up the 32 bit quantity into 4 bytes. Thus, if the word was 0x01ab23cd, it's broken up into 0x01, 0xab, 0x23, 0xcd.

You can store this in two ways. If it's big endian, than the most significant byte (i.e., 0x01) is stored in the smallest of four consective addresses. The data 0xab, 0x23, 0xcx are stored in the following three memory addresses. Thus, if you stored the first byte in address 1000, the remaining bytes are stored in addresses 1001, 1002, and 1003.

For little endian, you store the least significant byte (0xcd) in the smallest address (in our example, this is address 1000), then 0x23, 0xab, and 0x01. Thus, it's stored in reverse order.

Even though it's somewhat inaccurate to say this, we say a word is stored at, say, address 1000. That is, we pick the smallest address, and say that's where the data is located in memory. Thus, if the data has N bytes, then it is stored in address A to A + N - 1, and we say that the data is at address A.

Word alignment

However, there's a second issue. For reasons of making hardware simpler (and sometimes because the ISA defines it this way), words are often stored at word aligned addresses.

Word-aligned means the address is stored at an address that's divisible by 4. If you look at an address that's divisible by 4 and written in binary, you see that the last two bits are 0.

Why is this interesting? Whenever you're dealing with word quantities they must appear at word aligned addresses. Consider the following structure (written in C++):

struct Foo {
   char x ;  // 1 byte
   int y ;   // 4 byte, must be word-aligned
   char z ;  // 1 byte
   int w ;   // 4 byte, must be word-aligned
} ;
In C/C++, data is stored in the order declared. Thus, x, y, z, and w appear in that order in memory.

In principle, the amount of memory needed by Foo should be 10 bytes (1 byte for each char, 4 bytes for each int variable).

However, due to word aligment, it will probably take more than 10 bytes. In particular, if y and w are both word aligned, and z is in between, there will be 3 unusued bytes. Thus, the structure may be 13 bytes large, with 3 filler bytes, used for padding.

To see this in action, try declaring a structure or class as above, then use the sizeof operator, and see how many bytes it has.

Byte quantities can be stored at any address in memory. Halfword quantities (16 bits) are often stored at half-word aligned addresses (addresses divisible by 2). Doubleword quantities (64 bits) are often stored at double-word aligned addresses (addresses divisible by 8). You see these restrictions most often on a RISC ISA.

CISC ISAs may not necessarily require alignment of words, etc.

Chart

This chart summarizes the characteristics of word-alignment.

Quantity Address divisible by (Binary) address ends in
Byte 1 anything
Halfword (16 bits) 2 0
Word (32 bits) 4 00
Doubleword (64 bits) 8 000