CMSC 412 Project #1

Loading Executable Files

Due Friday, September 17 at 6pm

  • Grading Criteria
  • Submission Instructions
  • New project files: project1-cyclone.tar.gz
  • Slides used in recitation proj1.ppt
  • Introduction

    In this project, you will give GeekOS the ability to load executable files from disk into memory. In a future project, you will add the ability to run user programs in a safe way, but for this project we have supplied code that will execute programs as part of a kernel process. Your job will be to parse an executable file and fill in appropriate structures so that our code can execute the program. You will know that you have successfully loaded the program when it produces the specified output.

    ELF Files

    ELF is a format for storing programs or fragments of programs on disk, created as a result of compiling and linking. There are other formats, but ELF is the one you will be working with. An ELF file is divided into sections.  For an executable program, these are the text section for the code, and the data section for global variables.  The ELF file contains headers that describe how these sections should be stored in memory.  In this project, you will parse ELF executable files to tell the loader about the program's sections so it properly can lay out segments in memory.

    Executable Image

    When a program is linked, the linker specifies that the text and data sections of a program should be laid out in a certain pattern in memory. This allows the linker to set specific memory addresses for code and data references. We will call this pattern the Executable Image. The work for this project will be to determine what the executable image should be for the loaded program (by using the ELF headers) and to fill in some structures expected by the loader with this information. If you pass the loader wrong information, it can not load and run the executable correctly.

    ELF File Format

    The ELF file format is described in the ELF Specification. The most relevant sections for this project are 1.1 to 1.4 and 2.1 to 2.7.

    The steps involved in identifying the sections of the ELF file are:

    1) Read the ELF Header. The ELF header will always be at the very beginning of an ELF file. The ELF header contains information about how the rest of the file is laid out.  You are interested only in the program headers.

    2) Find the Program Headers, which specify where in the file to find the text and data sections and where they should end up in the executable image.

    There are a few simplifying assumptions you can make about the types and location of program headers. In the files you will be working with, there will always be one text header and one data header. The text header will be the first program header and the data header will be the second program header. This is not generally true of ELF files, but it will be true of the programs you will be responsible for.

    The file geekos/include/geekos/elf.h provides data types for structures which match the format of the ELF and program headers. See A trick in C: casting a pointer to a structure below for tips on how to parse the headers.

    Project Requirements:

    You should start this project with the new GeekOS distribution. This distribution contains the same code as the proj0 distribution, with the addition of several files that add new functionality and a file that provides much of the structure for the code you will write for this project.  In addition, it contains newer versions of the Cyclone runtime system, so be sure you are using the most recent version of Cyclone (version 0.8.2a) if you are going to use Cyclone (more on using Cyclone below).  Note that until September 11, this version of Cyclone is installed in /afs/csic/projects/cmsc412/cyclone-0.8.2a/bin, so as not to influence those working on project 0.

    We added a simple filesystem for GeekOS called PFAT. PFAT provides basic routines for reading files from and writing files to disk.  The "disks" that bochs reads from are just files in the LINUX filesystem. The disks are configured in the .bochsrc file.  The .bochsrc file provided in the distribution includes an extra line that specifies how the disk should be interpreted, so do not simply overwrite it with the .bochsrc you used in project 0.

    If you look in the geekos/src/user directory, you'll see a file called a.c which contains the source code for the ELF program you will need to load. When you gmake the project, a.c will also be compiled and the resulting ELF file, called a.exe will be written to the disk image hd.img which is the file for the C: drive on bochs. The path name for a.exe will be /c/a.exe.

    Code has been added to geekos/src/geekos/main.c to start a new thread that will run a function called Spawner that loads /c/a.exe into memory, calls your Parse_ELF_Executable() then executes the program as you have set it up.  If you have not properly built the disk or used the correct .bochsrc file, the Spawner will not be able to load /c/a.exe.

    Your code to load the ELF file will go into geekos/src/geekos/elf.c, where you must complete the Parse_ELF_Executable(char *exeFileData, ulong_t exeFileLength, struct Exe_Format *exeFormat) function. The executable file is read into memory and passed to you as the exeFileData argument, which is of course exeFileLength long. You will need to parse the ELF headers and fill out the Exe_Format structure.

    The body of this function is the only piece of code that needs be written for this project !

    This is a rough guideline for what Parse_ELF_Executable() has to do:

    1. Check that exeFileData is non-null and exeFileLength is large enough to accomodate the ELF headers and phnum program headers.
    2. Check that the file starts with the ELF magic number (4 bytes) as described in figure 1-4 (and subsequent table) on page 11 in the ELF specification.
    3. Check that the ELF file has no more than EXE_MAX_SEGMENTS program headers (phnum field of the elfHeader).
    4. Fill in numSegments and entryAddr fields of the exeFormat output variable.
    5. For each program header k in turn, fill in the corresponding segmentList[k] array element of exeFormat with offsetInFile, lengthInFile, startAddress, sizeInMemory, protFlags with information from that program header k. See figure 2-1 on page 33 in the ELF specification.

    ELF File Image vs Executable Image

    This diagram shows the relationship between the ELF File Image and the Executable Image in memory.

    This image is provided only to have a clearer picture about the differences between the ELF File Image and the Executable Image. The loader will automatically copy segments from the ELF File Image to the Executable Image, provided the information it gets from the Exe_Format structure is correct. So you don't need to worry about copying segments now, but you'll do it in project 2.


    You will know you have loaded the program correctly if you see the following output when you run bochs:

    Hi ! This is the first string
    Hi ! This is the second string
    Hi ! This is the third (and last) string
    If you see this you're happy

    If your program prints these lines, you'll know that you've done it correctly.

    If things go wrong, try setting the lprogdebug flag in geekos/src/geekos/lprog.c to 1, to print some debug statements on the glorious way towards loading and running the executable.

    A trick in C: casting a pointer to a structure

    Part of this project involves parsing the ELF header structures that were read from the file. There is a specification of exactly how the elements of the header will be laid out on disk. There's a simple way in C to access the different fields of the header as the fields of a C structure.

    In the file geekos/include/geekos/elf.h, there are structures defined that correspond to the ELF header (called elfHeader) and the ELF program header(called programHeader).

    typedef struct {

    unsigned char ident[16];
    unsigned short type;
    unsigned short machine;
    unsigned int version;
    unsigned int entry;
    unsigned int phoff;
    unsigned int sphoff;
    unsigned int flags;
    unsigned short ehsize;
    unsigned short phentsize;
    unsigned short phnum;
    unsigned short shentsize;
    unsigned short shnum;
    unsigned short shstrndx;

    } elfHeader;

    typedef struct {

    unsigned int type;
    unsigned int offset;
    unsigned int vaddr;
    unsigned int paddr;
    unsigned int fileSize;
    unsigned int memSize;
    unsigned int flags;
    unsigned int alignment;

    } programHeader;

    The data at the beginning of the ELF file is laid out in exactly the same pattern as the elfHeader structure: there are 16 characters, followed by 2 short ints, followed by 5 ints, and so on. When you read in the ELF file, there will be a big chunk of memory containing the file contents and you will have a pointer-to-char that points to it.

    When you define a structure in C, the compiler will arrange things so that the memory for an instance of that structure will look exactly as you defined the structure. All the fields will be in the order you specified them, with no extra space in between. So the memory image that your char* points to is exactly the same as the memory image would be created if you created an elfHeader structure.

    So, here's the important part. If you create a pointer-to-elfHeader, and you point it at the memory you read in, the code that knows how to pull fields out of an elfHeader structure will be able to pull fields out of your memory. You will tell the pointer that the memory it's pointing at is an elfHeader structure, it will access the memory as if it were an elfHeader structure, and everything will work because the memory really is exactly the same as an elfHeader structure.

    Here's an example. Say we have a blah structure defined as:

    typedef struct {

    int number;
    char name[10];
    int age;

    } blah;
    and a big chunk of memory pointed to by

    char * exeFileData

    We can create a pointer-to-blah and point it at our data:

    blah *myBlah = (blah *) exeFileData;

    We cast the pointer to make myBlah (well, the compiler, really...) think that exeFileData is a pointer-to-blah, rather than a pointer-to-char.

    Now we can access the fields of myBlah in the usual fashion:

    printf("My blah's name is: %s", myBlah->name);

    Using Cyclone

    Unfortunately, this project is one that is not terribly amenable to using Cyclone, because the above trick for casting a pointer is in general unsafe.  For example, you could mistakenly cast a pointer to buffer containing garbage to an elfHeader *, leading to unpredictable results.

    However,  you can write C code to perform the casts you need and perform the safety checks to ensure they are safe.  Then you can Cyclone code to call this C code to "convert" a char * to an elfHeader *, and perform the operations on the elfHeader * in Cyclone.  Here's how you might do this.

    First, you will need to define a Cyclone file for your project.  Call it elf.cyc.  You will need to modify the Makefile to use this Cyclone file; look at the project 0 Makefile for an example of how to do this.  You can either call your Cyclone function from the C version in elf.c, or else remove elf.c entirely and use elf.cyc instead.

    Second, inside elf.cyc, you will use Cyclone's extern "C include" facility to define your C functions.  This facility allows you to declare C code within your Cyclone file, and make its functions available to be called from Cyclone code.  Note that C code can always be called directly from Cyclone code (you do not need to use macros like you did in project 0 for calling Cyclone from C).  You can see some examples in the Cyclone manual.  For this project, you'd do something like:
    extern "C include" {
      static elfHeader *getHeader(char *buf, ulong_t buflen) {
        // your code here
    // perhaps other functions here
    } export { getHeader; }
    The code inside the first set of braces is regular C code, and the export statement indicates that it should be callable from Cyclone.  Note that all typedefs, struct definitions, #defines, etc. are exported by default; you do not need to put them in the export list.

    Within the same file, you will need to use extern "C include" again to make the information in the geekos headers available for Cyclone.  For example, you will need to include geekos/screen.h to be able to call Print, for example.  Thus, you would extend the above block to be something like
    extern "C include" {
    #include <geekos/screen.h>
    #include <geekos/elf.h>
    // perhaps other includes here

      static elfHeader *getHeader(char *buf, ulong_t buflen) {
        // your code here
    // perhaps other functions here
    } export { getHeader, ...; }
    For every variable or function appearing in one of these headers that you wish to use, you will need to add it to the export list.  In general, it may turn out that the C type in the geekos file you are including does not correspond to the Cyclone type that you need.  For example, C does not specify zero-termination or other qualifiers that Cyclone does.  There is a facility for defining Cyclone types to override the C codes, called cyclone_override, that is described in the manual.  You should not need that for this project.  However, beware that you should not #include geekos/string.h or geekos/malloc.h, since these could result in problematic types (but your mileage may vary).

    Finally, you can define your Cyclone function for parsing the file and filling out the Exe_Format structure.  Because the compiler automatically prepends any Cyclone definition with the prefix Cyc_, you will also need to define a C stub for Parse_ELF_Executable(char *exeFileData, ulong_t exeFileLength, struct Exe_Format *exeFormat) that calls the Cyclone one.  Put this in your extern "C include" block.

    As usual, refer to the manual, and using the mailing list or newsgroup if you have questions.  Again, make sure you are using the most recent version of Cyclone, version 0.8.2a (upgraded since the last project). 
    Again, note that until September 11, this version of Cyclone is installed in /afs/csic/projects/cmsc412/cyclone-0.8.2a/bin, so as not to influence those working on project 0.