CMSC 412 Project #1
Loading Executable Files
Due Thursday, February 10th at 6pm
Introduction
In this project, you will give GeekOS the ability to load
executable files from disk into memory. In a future project, you will
add the
ability to run user programs in a safe way, but for this project we
have
supplied code that will execute programs as part of a kernel process.
Your job will be to parse an executable file and fill in appropriate
structures so that our code can execute the program. You will know that
you have successfully loaded the program when it produces the specified
output.
ELF Files
ELF is a format for storing programs or fragments of programs
on
disk, created as a result of compiling and linking. There are
other formats, but ELF is the one you will be working with. An ELF file
is divided into sections.
For an executable program, these are the text section for the code, and the data section for global
variables. The ELF file contains headers that describe how these
sections should be stored in memory. In this project, you will
parse ELF executable files to tell the loader about the program's
sections so it properly can lay out segments in memory.
Executable Image
When a program is linked, the linker specifies that the text and
data sections of a program should be laid out in a certain pattern in
memory.
This allows the linker to set specific memory addresses for code and
data
references. We will call this pattern the Executable Image. The
work for this project will be to determine what the executable image
should be
for the loaded program (by using the ELF headers) and to fill in some
structures expected by the loader with this information. If you pass
the loader wrong information, it can not
load and run the executable correctly.
ELF File Format
The ELF file format is described in the ELF
Specification. The most relevant sections for this project are 1.1
to 1.4
and 2.1 to 2.7.
The steps involved in identifying the sections of the ELF
file are:
1) Read the ELF Header. The ELF header will always be
at the very beginning of an ELF file. The ELF header contains
information about how the rest of the file is laid out. You are
interested only in the program headers.
2) Find the Program Headers, which specify where in
the file to find the text and data sections and where they should end
up in the
executable image.
There are a few simplifying assumptions you can make about
the types and location of program headers. In the files you will be
working
with, there will always be one text header and one data header. The
text header
will be the first program header and the data header will be the second
program
header. This is not generally true of ELF files, but it will be true of
the
programs you will be responsible for.
The file geekos/include/geekos/elf.h provides data types
for structures
which match the format of the ELF and program headers. See A
trick in C: casting a pointer to a structure below for tips on
how to
parse the headers.
Project Requirements:
You should start this project with the new GeekOS
distribution. This distribution
contains the same code as the proj0 distribution, with the addition of
several
files that add new functionality and a file that provides much of the
structure
for the code you will write for this project.
We added a simple filesystem for GeekOS
called PFAT. PFAT provides basic routines for reading files from and
writing files to disk. The "disks" that bochs reads from are just
files in the LINUX filesystem. The disks are configured in the .bochsrc
file. The .bochsrc file provided in the distribution includes an
extra line that specifies how the disk should be interpreted, so do not
simply overwrite it with the .bochsrc you used in project 0.
If you look in the geekos/src/user directory, you'll
see
a file called a.c which contains the source code for the ELF
program you will need to load. When you gmake the project, a.c
will also be compiled and the resulting ELF file, called a.exe
will be written to the disk image hd.img
which is the file for the C: drive on bochs. The path name
for a.exe will be /c/a.exe.
Code has been added to geekos/src/geekos/main.c to
start a new thread that will run a function called Spawner that loads /c/a.exe
into memory,
calls your Parse_ELF_Executable()
then executes the program as you have set it up. If you have not
properly built the disk or used the correct .bochsrc file, the Spawner
will not be able to load /c/a.exe.
Your code to load the ELF file will go into geekos/src/geekos/elf.c,
where you must complete the Parse_ELF_Executable(char
*exeFileData, ulong_t exeFileLength, struct Exe_Format *exeFormat)
function. The executable file is read into memory and passed to you as
the exeFileData argument, which is of course exeFileLength
long.
You will need to parse the
ELF headers and fill out the Exe_Format structure.
The body of this function is the only piece of code that needs be
written for this project !
This is a rough guideline for what Parse_ELF_Executable()
has to do:
- Check that exeFileData is non-null and exeFileLength
is large enough to accomodate the ELF headers and phnum program headers.
- Check
that the file starts with the ELF magic number (4 bytes)
as described in figure 1-4 (and subsequent table) on page 11 in the ELF
specification.
- Check that the ELF file has no more than EXE_MAX_SEGMENTS
program headers (phnum field of the elfHeader).
- Fill in numSegments and entryAddr fields
of the exeFormat output variable.
- For each program header k in turn, fill in the
corresponding segmentList[k] array element of exeFormat
with offsetInFile, lengthInFile, startAddress,
sizeInMemory, protFlags with information from that program header k.
See figure 2-1 on page 33 in the ELF specification.
This diagram shows the relationship between the ELF File
Image and the Executable Image in memory.
This image is provided only to have a clearer picture about the
differences between the
ELF File Image and the Executable Image.
The loader will automatically copy segments from the ELF File
Image to the Executable Image, provided the information it
gets from the Exe_Format structure is correct. So you don't
need to worry about copying segments now, but you'll do it in project
2.
You will know you have loaded the program correctly if you
see the following output when you run bochs:
Hi ! This is the first string
Hi ! This is the second string
Hi ! This is the third (and last) string
If you see this you're happy
If your program prints these lines, you'll know that you've
done it correctly.
If things go wrong, try setting the lprogdebug flag in
geekos/src/geekos/lprog.c to 1, to print some debug statements
on the glorious way towards loading and running the executable.
Part of this project involves parsing the ELF header
structures that were read from the file. There is a specification of
exactly
how the elements of the header will be laid out on disk. There's a
simple way
in C to access the different fields of the header as the fields of a C
structure.
In the file geekos/include/geekos/elf.h,
there are structures defined that
correspond to the ELF header (called elfHeader)
and the ELF program header(called programHeader).
typedef struct {
unsigned char ident[16];
unsigned short type;
unsigned short machine;
unsigned int version;
unsigned int entry;
unsigned int phoff;
unsigned int sphoff;
unsigned int flags;
unsigned short ehsize;
unsigned short phentsize;
unsigned short phnum;
unsigned short shentsize;
unsigned short shnum;
unsigned short shstrndx;
} elfHeader;
typedef struct {
unsigned int type;
unsigned int offset;
unsigned int vaddr;
unsigned int paddr;
unsigned int fileSize;
unsigned int memSize;
unsigned int flags;
unsigned int alignment;
} programHeader;
The data at the beginning of the ELF file is laid out
in
exactly the same pattern as the elfHeader structure: there
are 16
characters, followed by 2 short ints,
followed by 5 ints, and so on. When you read in the ELF file, there
will be a
big chunk of memory containing the file contents and you will have a
pointer-to-char that points to it.
When you define a structure in C, the compiler will
arrange
things so that the memory for an instance of that structure will look
exactly
as you defined the structure. All the fields will be in the order you
specified
them, with no extra space in between. So the memory image that your
char*
points to is exactly the same as the memory image would be
created if
you created an elfHeader structure.
So, here's the important part. If you create a
pointer-to-elfHeader,
and you point it at the memory you read in, the code that knows how to
pull
fields out of an elfHeader structure will be able to pull
fields out
of your memory. You will tell the pointer that the memory it's pointing
at is
an elfHeader
structure, it will access the memory as if it were an elfHeader
structure, and everything will work because the memory really is exactly
the
same as an elfHeader structure.
Here's an example. Say we have a blah
structure
defined as:
typedef struct {
int number;
char name[10];
int age;
} blah;
and a big chunk of memory pointed to by
char * exeFileData
We can create a pointer-to-blah and point it at our
data:
blah *myBlah = (blah *) exeFileData;
We cast the pointer to make myBlah
(well, the compiler, really...) think that exeFileData
is a pointer-to-blah, rather than a pointer-to-char.
Now we can access the fields of myBlah in the usual
fashion:
printf("My blah's name is:
%s", myBlah->name);