Data and Text Segment

Introduction

Memory contains bytes. Some of those bytes are instructions, and some of those bytes are data.

In fact, one of the great ideas in computer science is the idea that programs could be stored just as data was stored. Before that, people envisioned the hardware running a fixed program, and data being stored in memory.

Most assembly languages give some minimal support to placing data in memory. A program is usually divided into a data segment and a text segment. The text segment contains the program. The data segment essentially contains global data.

This is how a typical MIPS program looks:

   .data  # Tells assembler we're in the data segment
val:  .word  10, -14, 30   # Three words placed in memory

   .text  # Tells assembler we're in the text segment
   .global main # Tells assembler main is accessible outside file
main:  addi $sp, $sp, -8

In assmebly language code, there are instructions, data, and assembler directives. Instructions and data should be self-explanatory. Assembler directives provide information to the assembler. Unfortunately, the directives vary from one ISA to the next, and sometimes from one assembler to the next.

.data and .text are both directives to the assembler. .data tells the assembler that the upcoming section is considered data. .text tells the assembler that the upcoming section is considered assembly language instructions. In general, you place the data segment first and the text segment second, though it's strictly not necessary to do so.

Notice that shortly after the .text, we have another assembler directive. .global tells the assembler that the label following it (in this case, "main") is accessible outside the file. This is useful when you want to link several files together. You want to indicate to the assembler which labels can be accessed outside the file, and which ones are private to the file.

We've spent a great deal of time talking about the text segment, but not that much about the data segment. So, we can do that now.

The data segment consists of declarations. Declarations isn't really an official MIPS term, but I use it because it resembles declarations in a language like C.

A single declaration consists of:

a label A label is an identifier, followed by a colon, and if it appears it appears at the beginning of a line (possibly preceded by white space). Each label corresponds to a unique address in memory, which the assembler determines.
a type MIPS has a weak sense of type, but it's the closest term to describe what's going on
the data values The data values, which must be of the correct type.

Even though labels look like variable declarations, it really isn't. It's merely an address in memory. In particular, the assembler doesn't check if you use the label correctly based on the type.

Data Types

What kind of types are permitted?

.ascii str
This stores str in memory, but without a null terminator.
.asciiz str
This stores str in memory, but with a null terminator. The "z" refers to zero, which is the ASCII code for the null character. This is how C-style strings are stored.
.byte b1, ..., bn
Store n bytes contiguously in memory (you get to pick n). I'll assume the values b1,...,bn can be written in either in base 10 or in hex. I'll also assume commas are needed to separate the values. Finally, I assume that the values can be written on more than one line.
.halfword h1, ..., hn
Store n 16-bit halfwords contiguously in memory (you get to pick n). I'll assume the values h1,...,hn can be written in either in base 10 or in hex. I'll also assume commas are needed to separate the values. I assume that the values can be written on more than one line. Finally, I assume the halfwords are half word aligned in memory, i.e., initial byte stored at addresses divisible by 2.
.word w1, ..., wn
Store n 32-bit words contiguously in memory (you get to pick n). I'll assume the values w1,...,wn can be written in either in base 10 or in hex. I'll also assume commas are needed to separate the values. I assume that the values can be written on more than one line. Finally, I assume the words are word-aligned in memory, i.e., initial byte stored at addresses divisible by 4.
.space numBytes
Reserves numBytes of space in memory.

As you can see, the choice of types is quite limited: non-null terminated strings, null-terminated strings, bytes, halfwords, words, and bytes without values.

You can have one more declaration in the .data segment.

   .data  # Tells assembler we're in the data segment
val:  .word  10, -14, 30   
str:  .ascii  "Hello, world"
num:  .byte  0x01, 0x03
arr:  .space 100

We have four declarations above. Each starts with a label, which consists of the identifier and a colon, then the "type", then possibly the data.

The assembler tries to store the data in consecutive memory locations, and tries to observe word alignment, if applicable.

Using la Suppose you want to access an address in memory corresponding to some label in the data segment. For example, you may have "declared" an array in memory called arr.

You can use the pseudoinstruction la to load the address. la stands for "load address". This pseudoinstruction takes a label as its only operand.

la is really basically ori or some similar real instruction (possibly lui combined with ori). The real instructions for la should be identical (or nearly so) with li (load immediate).

Real load instructions (e.g. lw, lh, lb) copy data from memory to registers. These load pseudoinstructions (and the real lui instruction) copy immediate values into registers.

Here's how we'd use it:

   .data
arr: .space 100

   .text
   .global main
main:   la $t0, arr  # Place address of label, arr, in $t0

Using the Stack

Another way to declare an array is to use the stack. For example, if you want to declare a 100 element int array, subtract 400 from the stack pointer, and now you can an array. The main problem with doing this is that the array has to be initialized using instructions.

Using the data segment allows you to initialize arrays using data values. Unfortunately, the data is global, and may not be how C compilers allocates spaces for arrays, which can be done locally on the stack (at least, for statically allocated arrays).

Summary

For writing simple assmebly languages programs, it's convenient to use the data segment to declare data. You can use la to access the address from the data segment. The assembler does the work of computing the actual address for you, so you don't have to keep track of it yourself.

Web Accessibility