Data and Text Segment

Introduction

Memory contains bytes. Some of those bytes are instructions, and some of those bytes are data.

In fact, one of the great ideas in computer science is the idea that programs could be stored just as data was stored. Before that, people envisioned the hardware running a fixed program, and data being stored in memory.

Most assembly languages give some minimal support to placing data in memory. A program is usually divided into a data segment and a text segment. The text segment contains the program. The data segment essentially contains global data.

This is how a typical MIPS program looks:

   .data  # Tells assembler we're in the data segment
val:  .word  10, -14, 30   # Three words placed in memory

   .text  # Tells assembler we're in the text segment
   .global main # Tells assembler main is accessible outside file
main:  addi $sp, $sp, -8
In assmebly language code, there are instructions, data, and assembler directives. Instructions and data should be self-explanatory. Assembler directives provide information to the assembler. Unfortunately, the directives vary from one ISA to the next, and sometimes from one assembler to the next.

.data and .text are both directives to the assembler. .data tells the assembler that the upcoming section is considered data. .text tells the assembler that the upcoming section is considered assembly language instructions. In general, you place the data segment first and the text segment second, though it's strictly not necessary to do so.

Notice that shortly after the .text, we have another assembler directive. .global tells the assembler that the label following it (in this case, "main") is accessible outside the file. This is useful when you want to link several files together. You want to indicate to the assembler which labels can be accessed outside the file, and which ones are private to the file.

We've spent a great deal of time talking about the text segment, but not that much about the data segment. So, we can do that now.

The data segment consists of declarations. Declarations isn't really an official MIPS term, but I use it because it resembles declarations in a language like C.

A single declaration consists of:

Even though labels look like variable declarations, it really isn't. It's merely an address in memory. In particular, the assembler doesn't check if you use the label correctly based on the type.

Data Types

What kind of types are permitted?

As you can see, the choice of types is quite limited: non-null terminated strings, null-terminated strings, bytes, halfwords, words, and bytes without values.

You can have one more declaration in the .data segment.

   .data  # Tells assembler we're in the data segment
val:  .word  10, -14, 30   
str:  .ascii  "Hello, world"
num:  .byte  0x01, 0x03
arr:  .space 100
We have four declarations above. Each starts with a label, which consists of the identifier and a colon, then the "type", then possibly the data.

The assembler tries to store the data in consecutive memory locations, and tries to observe word alignment, if applicable.

Using la Suppose you want to access an address in memory corresponding to some label in the data segment. For example, you may have "declared" an array in memory called arr.

You can use the pseudoinstruction la to load the address. la stands for "load address". This pseudoinstruction takes a label as its only operand.

la is really basically ori or some similar real instruction (possibly lui combined with ori). The real instructions for la should be identical (or nearly so) with li (load immediate).

Real load instructions (e.g. lw, lh, lb) copy data from memory to registers. These load pseudoinstructions (and the real lui instruction) copy immediate values into registers.

Here's how we'd use it:

   .data
arr: .space 100

   .text
   .global main
main:   la $t0, arr  # Place address of label, arr, in $t0

Using the Stack

Another way to declare an array is to use the stack. For example, if you want to declare a 100 element int array, subtract 400 from the stack pointer, and now you can an array. The main problem with doing this is that the array has to be initialized using instructions.

Using the data segment allows you to initialize arrays using data values. Unfortunately, the data is global, and may not be how C compilers allocates spaces for arrays, which can be done locally on the stack (at least, for statically allocated arrays).

Summary

For writing simple assmebly languages programs, it's convenient to use the data segment to declare data. You can use la to access the address from the data segment. The assembler does the work of computing the actual address for you, so you don't have to keep track of it yourself.