CMSC 412 - Project 1

CMSC 412 Project #1

Loading Executable Files

Due Thursday February 12th, 2003 (09:00 AM)

Grading Criteria

Submission Instructions

New project files: proj1.tar.gz

Slides used in recitation proj1.ppt

Introduction

In this project, you will give GeekOS the ability to load executable files from disk into memory. In a future project, you will add the ability to run user programs in a safe way, but for this project we have supplied code that will execute the loaded program as part of a kernel process. Your job will be to write a function to load a program and lay it out correctly in memory. You will know that you have successfully loaded the program when running it produces the specified output.

ELF Files

ELF is a format for storing programs on disk. There are other formats, but ELF is the one you will be working with. An ELF file contains text (or code) and data sections for an executable program, as well as extra information about where these sections are stored in the file and how they should be arranged in memory when the program is loaded. The extra information is stored in sections of the ELF file called Headers. An ELF file is the result of the linking process.

In this project, you will be loading ELF files and using the information in the headers to determine how to lay out the sections of the program in memory.

Executable Image

When a program is linked, the linker assumes the text and data sections of a program will be laid out in a certain pattern in memory. This allows the linker to set specific memory addresses for code and data references. We will call this pattern the Executable Image. Most of the work for this project will be to determine what the executable image should be for the loaded program (by using the ELF headers) and to copy the sections of the program to the correct places in memory so that the program will be laid out as the linker assumed. If the program is not laid out right, it will not run correctly.

In addition to laying out the sections of the program correctly, you will need to allocate some extra memory at the end of the executable image that will be used as the stack space for the program as it runs. For this project, you should make the stack 4096 bytes. The stack should begin immediately after the end of the data section.

After your code has determined how much space is needed for the program and the stack, you should round the total size of the memory allocated for the executable image up to an even multiple of 4096 bytes (this will become important in later projects). For rounding, you might want to use the function Round_Up_To_Page() defined in mem.h.

ELF File Format

The ELF file format is describe in the ELF Specification. The most relevant sections for this project are 1.1 to 1.4 and 2.1 to 2.7.

The steps involved in identifying the sections of the ELF file are:

1) read the ELF Header. The ELF header will always be at the very beginning of an ELF file. The ELF header contains information on how to...

2) find the Program Headers, which specify where in the file to find the text and data sections and where they should end up in the executable image.

There are a few simplifying assumptions you can make about the types and location of program headers. In the files you will be working with, there will always be one text header and one data header. The text header will be the first program header and the data header will be the second program header. This is not generally true of ELF files, but it will be true of the programs you will be responsible for.

The file ELF.h provides data types for structures which match the format of the ELF and program headers. See A trick in C: casting a pointer to a structure below for tips on how to parse the headers.

This diagram shows the relationship between the ELF File Image and the Executable Image:

Project Requirements:

You should start this project with the new GeekOS distribution in proj1.tar.gz. This distribution contains the same code as the proj0 distribution, with the addition of several files that add new functionality and a file that provides much of the structure for the code you will write for this project.

The biggest addition is a simple filesystem for GeekOS called PFAT. PFAT provides the function ReadFileAndAllocate(), which will read a named file off of the disk and into a buffer in memory. The "disks" that bochs reads from are just files in the LINUX filesystem. The disks are configured in the .bochsrc file.

There is also a new subdirectory called elfProgs. This directory contains a file called a.c which contains the source code for the ELF program you will need to load. There is also a Makefile in this directory that compiles a.c and builds an ELF file called a.exe. You will need to gmake in the elfProgs directory before you can load the program. The Makefile will also copy a.exe into hd.img, which is the file for the C: drive on bochs. The path name for a.exe will be "/c/a.exe".

There is also a new subdirectory called buildFat. This contains a program called buildFat which can add files to PFAT drives. It is used by the Makefile in elfProgs.

The order of compilation should be:

gmake in the buildFat directory (this only needs to be done the first time you compile)
gmake in the main project directory (proj1)
gmake in the elfProgs directory (this only need to be done the first time you compile, unless you change a.c or you remove the hd.img file)

Code has been added to main.c to start a new thread that will run a function called Spawner that calls your code to load the ELF file and then executes the program as you have set it up.

Your code to load the ELF file will go into ELF.c, where you must complete the Read_ELF_Executable() function. You will need to parse the ELF headers, malloc a piece of memory big enough for the executable image (including the stack), copy the text and data sections from the ELF file image to the correct places within this memory, and fill in the Loadable_Program structure that will provide information to our functions (and later to your functions) that will run the code. You should also free the buffer that was read in by ReadFileAndAllocate().

You will know you have loaded the program correctly if you see the following output when you run bochs:

Hi ! This is the first string

Hi ! This is the second string

Hi ! This is the last string

If you see this you're happy

If your program prints these lines, you'll know that you've done it correctly. There are some other debug statements that will be printed too.

A trick in C: casting a pointer to a structure

Part of this project involves parsing the ELF header structures that were read from the file. There is a specification of exactly how the elements of the header will be laid out on disk. There's a simple way in C to access the different fields of the header as the fields of a C structure.

In the file ELF.h, there are structures defined that correspond to the ELF header and the ELF program header:

typedef struct {

    unsigned  char    ident[16];

    unsigned  short   type;

    unsigned  short   machine;

    unsigned  int     version;

    unsigned  int     entry;

    unsigned  int     phoff;

    unsigned  int     sphoff;

    unsigned  int     flags;

    unsigned  short   ehsize;

    unsigned  short   phentsize;

    unsigned  short   phnum;

    unsigned  short   shentsize;

    unsigned  short   shnum;

    unsigned  short   shstrndx;

} ELFHeader;

typedef struct {

    unsigned  int   type;

    unsigned  int   offset;

    unsigned  int   vaddr;

    unsigned  int   paddr;

    unsigned  int   fileSize;

    unsigned  int   memSize;

    unsigned  int   flags;

    unsigned  int   alignment;

} programHeader;

The data at the beginning of the ELF file is laid out in exactly the same pattern as the ELFHeader structure: there are 16 characters, followed by 2 short ints, followed by 5 ints, and so on. When you read in the ELF file, there will be a big chunk of memory containing the file contents and you will have a pointer-to-char that points to it.

When you define a structure in C, the compiler will arrange things so that the memory for an instance of that structure will look exactly as you defined the structure. All the fields will be in the order you specified them, with no extra space in between. So the memory image that your char* points to is exactly the same as the memory image would be created if you created an ELFHeader structure.

So, here's the crazy part. If you create a pointer-to-ELFHeader, and you point it at the memory you read in, the code that knows how to pull fields out of an ELFHeader structure will be able to pull fields out of your memory. You will tell the pointer that the memory it's pointing at is an ELFHeader structure, it will access the memory as if it were an ELFHeader structure, and everything will work because the memory really is exactly the same as an ELFHeader structure.

Here's an example. Say we have a blah structure defined as:

typedef struct {

    int   number;

    char  name[10];

    int   age;

} blah;

And we have a pointer-to-char that points to data we read in somewhere:

char *data = ReadFileAndAllocate(...

We can create a pointer-to-blah and point it at our data:

blah *myBlah = (blah *) data;

We cast the pointer to make myBlah (well, the compiler, really...) think that data is a pointer-to-blah, rather than a pointer-to-char.

Now we can access the fields of myBlah in the usual fashion:

printf("My blah's name is: %s", myBlah->name);

Ain't C great?