CMSC 412 Project #5

File System

Due Tuesday, April 27th, 2004 (9:00 AM)

  • Grading Criteria
  • Submission Instructions
  • Slides used in recitation
  • New project files

    Kernel

  • In your project Makefile change the line all : fd.img hd.img to all : fd.img hd.img diskd.img
  • In .bochsrc, after the diskc:line, add this line diskd: file=diskd.img, cyl=1024, heads=4, spt=16

  • Updated (04/14) syscall.c.add
  • user.h.add - add to respective files in the project directory
  • vfs.h,fileio.h,gosfs.h - add to the project directory
  • gosfs.c,vfs.c - add to the project directory; add .c files to the C_SRCS variable in the Makefile in the project directory

    User

  • cp.c, format.c, ls.c, mkdir.c, mount.c, p5test.c, - add to userProgs directory; add corresponding .exe files to the PROGS variable in the Makefile in userProgs directory
  • libuser.h.add, libuser.c.add - add to respective files in userProgs directory

    Introduction

    The purpose of this project is to add a new filesystem to GeekOS, as well as the standard operations for file management

    GOSFS - GeekOS FileSystem

    The main part of this project is to develop a new filesystem for the GeekOS. This filesystem will reside on the second IDE disk drive in the Bochs emulator. This will allow you to continue to use your existing PFAT drive to load user programs while you test your filesystem. The second IDE disk's image is called diskd.img. It has 2MB by default, but you must change the size/disk geometry in .bochsrc to make it larger. See the How to... section.

    GOSFS will provide a filesystem that includes multiple directories and long file name support.

    The Mount system call allows you to associate a filesystem with a place in the file name hierarchy.The Mount call is implemented as part of the VFS code we supply. You will need to modify the init code to call Mount to mount the PFAT file system on drive 0 onto /c.

    Then you can mount the GOSFS file system on drive 1 onto /d, for instance.

    VFS and file operations

    Since GEEKOS will have two types of filesystems (PFAT and GOSFS), it will have a virtual filesystem layer (VFS) to handle sending requests to an appropriate filesystem (see figure below). We have provided an implementation of the VFS layer in the file vfs.c. The VFS layer will call the appropriate GOSFS routines when a file operation refers a file in the GOSFS filesystem.

    The System Call layer is already implemented in syscall.c.add and the PFAT in pfat.c. Thus the only component you need to take care of is the GOSFS one.

    Each user space process will have a file descriptor table that keeps track of which files that process can currently read and write. Any user process should be able to have up to 10 files open at once.
    The file descriptors for a user process are kept in the files[MAX_OPEN_FILES] array in struct User_Context (see user.h.add). Note that not all the entries in the files are open files, since usually a process has less than 10 files open at once. If the field openFile.fsType == FS_TYPE_NONE that represents a free slot (file descriptor not used)
    . But the good news is that file descriptor management is already implemented for you (see Open() function in vfs.c).

    Your filesystem should support long filenames (at most 64 bytes, including a null at the end). A full path to a file will be no more than 1024 characters.

    You should keep track of free disk blocks using a bit vector (as described in class). A library called bitset is provided (see bitset.h and bitset.c) that manages a set of bits and provides functions to find bits that are 0 (i.e. correspond to free disk blocks).

    All disk allocations will be in units of 4KB (i.e. 8 physical disk blocks). Thus one bit in a bitset corresponds to a 4KB block. A bitset that is 8192 bits (1024 bytes) large will obviously keep track of 8192 * 4KB = 32 MB of data.

    Directory Structure

    See the recitation slides for details on directory structure. Each directory in GOSFS takes up a single disk block. The structure of the directory is defined in gosfs.h. A directory is an array of GOSFSfileNodes (36 elements, since they have to fit in a single 4KB block). Each filenode can represent either a file in the directory or a subdirectory.

    The filenode for a directory is distinguished by the isDirectory bit. The location of the block that holds the data for the directory will be stored in the first entry in the blocks array of the directory's filenode (hence entries blocks[1]..blocks[7] are unused).

    Files

    Unlike directories, that have a fixed size of one blocks (irrespective of how many files the hold), files can take up an arbitrary number of disk blocks. You will use a version of indexed allocation to represent the data blocks of your filesystem. The blocks field (GOSFSfileNode, gosfs.h) keeps track of data blocks for a file. The first eight 4KB-blocks are direct blocks, the ninth points to a single indirect block, the tenth to a double indirect block. See textbook, pp 429 and the recitation slides for a detailed layout.

    New System Calls

    You have to implement the sematics of the new system calls as described below. As you see, the semantics is very similar to the UNIX one.

  • All user-supplied pointers (e.g. strings, buffers) must be checked for validity. The checking functions are automatically called in vfs.c but you need still to implement Validate_User_Memory().
  • The new syscalls in syscall.c.add assume you use paging (and add 0x80000000 to user pointers to convert them to kernel ones). If your paging doesn't work then use your Project 3 as a base, instead of Project 4. You'll still get full credit, but you need to replace (+ 0x8000 0000) with (+ User_Context->program) or whatever mechanism you used in P2/P3 to convert user pointers to kernel pointers.
  • Although the Mount call is implemented as part of the VFS code we supply,you 'll still have to add code to the Mount() function in vfs.c to check the magic number prior to mounting a GOSFS disk. See details on magic here.

    Call User Function Return on success Return on failure Reasons for failure Comment
    SYS_OPEN Open(char *name, int permissions) new file descriptor number -1
  • name does not exist (if permissions don't include O_CREATE )
  • path to name does not exist (if permissions include O_CREATE )
  • O_WRITE and O_CREATE not allowed for directories, use CreateDirectory instead
  • there's no create syscall, so setting O_CREATE will create the file. If the file exists, the call succeeds (return >= 0) but its data contents is not affected.
  • Should NOT create directories recursively if needed, e.g. Open("/d/d1/d2/d3/xFile", O_CREATE), will NOT create d1 inside of d, d2 inside of d1, etc. if they don't exist already. If the leading path /d/d1/d2/d3 does not exist, the syscall fails, returning -1
  • The permissions values are flags and may be or'ed together in a call. For example:
    • O_CREATE|O_READ
    • O_READ|O_WRITE
    • O_CREATE|O_READ|O_WRITE
  • SYS_CLOSE Close(int fd) 0 -1
  • fd not within 0-9
  • fd is not an open file
  •  
    SYS_DELETE Delete(char *name) 0 -1
  • name does not exist
  • name is a non-empty directory
  • if Delete(file) is called and file is still open in other threads or even in the thread that called Delete(), all the subsequent operations on that file (except Close()) should fail
    SYS_READ Read(int fd, char *buffer, int length) number of bytes read -1
  • fd not within 0-9
  • fd is not an open file
  • fd was not open with O_READ flag
  • it's OK if return value < length, for instance reading close to end of file
  • increase the filePos, if successful

    There is special behavior when SYS_READ is called on a directory:

    • The data put into the buffer should be formatted as an array of dirEntry structs.
    • The length argument specifies the number of dirEntries to return
    • The return value equals the number of dirEntries read

      dirEntry is defined in fileio.h

  • SYS_WRITE Write(int fd, char *buffer, int length) number of bytes written -1
  • fd not within 0-9
  • fd is not an open file
  • fd was not open with O_WRITE flag
  • fd is a directory
  • increases filePos is successful
  • "Grow on write"- allocate blocks "on the fly" if past end of file
  • SYS_STAT Stat(int fd, fileStat *stat) 0 -1
  • fd not within 0-9
  • fd is not an open file
  •  
    SYS_SEEK Seek(int fd, int offset) 0 -1
  • fd not within 0-9
  • fd is not an open file
  • offset > fileSize
  • offset is an absolute position; could be equal to fileSize, then Write appends, see above
    SYS_CREATEDIR CreateDirectory(char *name) 0 -1
  • name already exists, as file or directory
  • regular file encountered on the path to name
  • Should create directories recursively if needed, e.g. CreateDirectory("/d/d1/d2/d3/d4"), will create d1 inside of d, d2 inside of d1, etc. if they don't exist already. This operation should be atomic, in the sense that either the whole directory chain is created or no directory is created.
    SYS_FORMAT Format(int drive) 0 -1
  • illegal value for drive (it must work with 1, higher is optional)
  • drive is in use, i.e. mounted
  • formats a drive with GOSFS; don't need to support formatting with PFAT ; don't need to format in init code; so you can save your data between sessions

    Disk Layout

    disk layout

    A guideline is provided above. First block (0) is called SUPERBLOCK, and contains filesystem housekeeping data. Blocks >= 1 contain files and directories.

  • The Magic number at the very beginning could be 0xDEADBEEF, "GOSF" or the like. This tells you that the disk has a GOSFS filesystem on it. If you try to mount a drive and you don't find the magic signature, return error.
  • Root Dir Pointer holds the block number of the block containing the root directory.
  • Size is the size of the disk, in 4KB blocks. (32M / 4K = 8K for the example above)
  • Free Blocks Bitmap is : Size bits large, that is Size/8 bytes large. ( 8K / 8 = 1K for the example above). Every block has an associated bit.

    When you do a Format() , you make a raw disk usable with GOSFS. That is:

    1. Get drive's size, convert it in # of blocks. IDE_getNumBlocks() in ide.c tells you that.
    2. Figure out Free Blocks Bitmap size, mark them all free.
    3. Create a valid, but empty directory. That will be the root directory. Make Root Dir Pointer point to it.
    4. Mark superblock and block for root directory as used in the Free Blocks Bitmap
    5. If everything went OK, write the Magic. Now the disk is ready to be mounted and used.
    Keep in mind that the superblock and root directory have no associated GOSFSfileNode.

    How to create an arbitrary size big diskd.img

  • Change the size/disk geometry by changing the diskd line in .bochsrc
  • Change the argument to $(ZEROFILE) in Makefile. For $(ZEROFILE) one block is 512, so 4096 blocks = 2 MB, 65536 blocks = 32 MB and so on.

    Notes

    You do not need to consider situations where two processes have the same file open. You do not need to consider situations where one process opens the same file twice without closing it in between.

    All changes should be committed to disk before the system returns from the syscall that made the changes. If you cache any structures while processing a call, you should write any changes to them to disk before returning from the syscall.

    If a read() is called on a directory, the data returned should be in the form of an array of dirEntry structures. The length argument and the return value will indicate the number of entries to read and the number of entries that were read, rather than the number of bytes.

    Requirements

  • Make sure your Mount() works well, so that we can test your project. If we cannot Mount() a GOSFS, we cannot grade your project.
  • You might also want to mount "/d" (dee) automatically in main() to speed up your testing, but the code you submit should not mount "/d" automatically. "/c" (cee) should be mounted automatically in main() though.
  • You should support disk sizes of at least 32 MB. More than 32 MB is optional. Following the procedure described in the "How to create an arbitrary size big diskd.img" section above, in your submitted project, when someone types gmake, a 32 MB file should be created.
  • You should support file sizes of at least 5 MB (double indirect threshold crossed, yes). More than 5 MB is optional.

    Testing

    As you saw at the top, in userProgs there are some programs that can be used to test your file management syscalls:
    cp.c, format.c, ls.c, mkdir.c, mount.c, p5test.c.