CMSC 412 Project #5

File System

Due Wednesday, May 9th, 2007 (6:00 PM)

  • Grading Criteria
  • Submission Instructions
  • Slides used in recitation
  • Dave Hovemeyer's Project 5 guidelines (rehashes the information below; could be handy)

    New project files

  • project5.tgz - new project distribution
  • Replace the current cat.c in src/user with this cat.c
  • Save write.c in src/user, and add write.c after shell.c (variable USER_C_SRCS in build/Makefile)
  • GOSFS - GeekOS FileSystem

    The purpose of this project is to add a new filesystem called to GeekOS called GOSFS (for GeekOS FileSystem). GOSFS will support multiple directories and long file names, as well as the standard operations for file management. The new filesystem will reside on the second IDE disk drive in the Bochs emulator. This will allow you to continue to use your existing PFAT drive to load user programs while you test your filesystem. The second IDE disk's image is called diskd.img. It is 10MB by default.

    VFS and file operations

    Since GeekOS will have two types of filesystems (PFAT and GOSFS), it will have a virtual filesystem layer (VFS) to handle sending requests to an appropriate filesystem (see figure below). We have provided an implementation of the VFS layer in the file vfs.c. The VFS layer will call the appropriate GOSFS routines when a file operation refers a file in the GOSFS filesystem.

    The implementation of PFAT is in pfat.c. You will implement GOSFS in gosfs.c and relevant system calls in syscall.c. VFS picks the functions to call based on supplied structures containing function pointers, one for each operation. For example, see Init_PFAT in pfat.c: this initializes the PFAT filesystem with VFS by passing it a pointer to s_pfatFilesystemOps, a structure that contains function pointers to PFAT routines for mounting (and formatting) a filesystem.  Other PFAT functions are stored in different structures (e.g., look at the PFAT_Open routine, which passes the s_pfatFileOps structure to VFS).  You will analogously use s_gosfsFilesystemOps, s_gosfsMountPointOps, s_gosfsDirOps, and s_gosfsFileOps in gosfs.c. You should also add a call to Init_GOSFS provided gosfs.c in main.c to register the GOSFS filesystem.  In general, use the PFAT implementation as your guide to interfacing GOSFS with VFS.

    Open files are tracked by the kernel using a file descriptor, defined as struct File.  Each user space process will have an associated file descriptor table that records which files the process can currently read and write. A user process can have up to USER_MAX_FILES files open at once. The file descriptor table is implemented as a struct File *fileList[USER_MAX_FILES] array that you must add to struct User_Context. Note that not all the entries in the fileList are necessarily open files, since usually a process has less than USER_MAX_FILES files open at once. If fileList[i] is NULL, it represents a free slot (file descriptor is not used). This descriptor will be filled out by the code in VFS; e.g., see Open in vfs.h, whose pFile argument is a pointer to a free slot in the table.

    Your filesystem should support long filenames (at most 128 bytes, which includes a null at the end). A file's full path (which includes directories and subdirectories) will be no more than 1024 characters in total.

    You should keep track of free disk blocks using a bit vector (as described in the textbook, Section 11.5.1, page 429). A library called bitset is provided (see bitset.h and bitset.c) that manages a set of bits and provides functions to find bits that are 0, which correspond to free disk blocks (note that this is the opposite of the book, which uses a 1 to indicate a free block).

    The block size for GOSFS is 4 KB; IDE sectors (a.k.a. disk blocks) are 512 bytes, so a block is 8 sectors.  Thus one bit in a bitset corresponds to a 4KB block. For example, a bitset that is 8192 bits (1024 bytes) will keep track of 8192 * 4KB = 32 MB of data.

    Files, Directories, and Mounting

    Each directory in GOSFS takes up a single disk block. The structure of the directory is defined in gosfs.h. A directory is an array of struct GOSF_Dir_Entry of size GOSFS_DIR_ENTRIES_PER_BLOCK. Each filenode (directory entry) can represent either a file or a subdirectory.  That is, unlike UNIX where directory entries and inodes are separate, here all file information is tracked in its directory entry.

    The filenode for a directory (i.e., a subdirectory) is represented by the GOSFS_DIRENTRY_ISDIRECTORY flag set in the GOSFS_Dir_Entry->flags field. As we shall see later, a filenode for a file stores information about the blocks that the file is composed of. The filenode for a directory on the other hand just stores the location of the one block and contains all the information about that directory. The location of the block that holds the data for the directory will be stored in the first entry in the blocksList array of the directory's filenode (hence entries blocksList[1]..blocksList[GOSFS_NUM_BLOCK_PTRS-1] are unused).

    Unlike directories, which have a fixed size (one block, irrespective of how many files they hold), files can take up an arbitrary number of disk blocks. You will use a version of indexed allocation to represent the data blocks of your filesystem. The blocksList field in the GOSF_Dir_Entry struct in the gosfs.h file keeps track of data blocks for a file. The first eight entries are direct blocks (i.e., direct pointers to the blocks of the file), the ninth points to a single indirect block, the tenth to a double indirect block (there is no triple indirect block). This is similar to the "Combined scheme" as detailed in Section 11.4.3 in the text, pages 427 - 428.

    The mount system call allows you to associate a filesystem with a place in the file name hierarchy. The Mount call is implemented as part of the VFS code we supply (see Mount function in vfs.c); you will implement your Init_GOSFS function so that VFS's mount code will call your function GOSFS_Mount() in gosfs.c. Among other things, it must ensure that the filesystem being mounted is GOSFS.

    See the recitation slides for additional details on directory structure and how files are organized.

    Buffer Cache

    In your filesystem implementation, you will frequently need to read data from the filesystem into memory, and sometimes you will need to write data from memory back to the filesystem. Most operating system kernels use a buffer cache for this purpose. The idea is that the buffer cache contains memory buffers which correspond to particular filesystem blocks. To read a block, the kernel requests a buffer containing that block from the buffer cache. To write a block, the data in the buffer is modified, and the buffer is marked as dirty. At some point in the future, the data in the dirty buffer will be written back to the disk.

    For this project, we have provided you with a buffer cache implementation, which you can use by including the <geekos/bufcache.h> header file. The Buffer_Cache data structure represents a buffer cache for a particular filesystem instance. You can create a new buffer cache by passing the block device to the Create_FS_Buffer_Cache() function. A buffer containing the data for a single filesystem block is represented by the FS_Buffer data type. You can request a buffer for a particular block by calling the Get_FS_Buffer() function. The actual memory containing the data for the block is pointed to by the data field of FS_Buffer. If you modify the data in a buffer, you must call the Modify_FS_Buffer() function to let the buffer cache know that the buffer is now dirty. When your code has finished accessing or modifying a buffer, it should be released using the Release_FS_Buffer function.

    Buffers are accessed in a transactional manner. Only one thread can use a buffer at a time. If one thread is using a buffer and a second requests the same buffer (by calling Get_FS_Buffer()), the second thread will be suspended until the first thread releases the buffer (by calling Release_FS_Buffer()). Note that you need to be careful never to call Get_FS_Buffer() twice in a row, since that will cause a self-deadlock.

    The GeekOS buffer cache writes dirty buffers to disk as needed. If you want to immediately write the contents of a buffer back to the disk, you can call the Sync_FS_Buffer() function. It is generally a good idea to write back modified filesystem buffers when they contain filesystem metadata such as directories or disk allocation bitmaps; by writing these blocks eagerly, the filesystem is less likely to be seriously corrupted if the operating system crashes or if the computer is turned off unexpectedly. You can flush all of the dirty buffers in a buffer cache by calling the Sync_FS_Buffer_Cache(); this is useful for implementing the Sync() operation for your mounted filesystems.

    The Destroy_FS_Buffer_Cache() function destroys a buffer cache, first flushing all dirty buffers to disk. This may be useful in your implementation of GOSFS_Format().

    Disk Layout

    disk layout

    Here are guidelines for how you should set up your filesystem on the disk.

    New System Calls

    You have to implement the semantics of the new system calls as described below. As you can see, the semantics is very similar to the UNIX file system. Some notes:

    Call User Function Return on success Return on failure Reasons for failure Comment
    SYS_MOUNT Mount(char *dev, char *prefix, char *fstype) 0 -1
  • a filesystem already mounted under name
  • illegal value for one of the parameters
    SYS_OPEN Open(char *path, int mode) new file descriptor number -1
  • name does not exist (if permissions don't include O_CREATE )
  • path to name does not exist (if permissions include O_CREATE )
  • attempt to open a directory, (should use OpenDirectory)
  • there's no create syscall, so setting O_CREATE will create the file. If the file exists, the call succeeds (return >= 0) but its data contents is not affected.
  • Should NOT create directories; e.g. if you do Open("/d/d1/d2/d3/xFile", O_CREATE) and the leading path /d/d1/d2/d3 does not exist, the syscall fails, returning -1
  • You should use the Allocate_File in vfs.c to get a File when opening.
  • The permissions values are flags and may be or'ed together in a call. For example:
    • O_READ | O_WRITE
  • SYS_OPENDIRECTORY Open_Directory(char *path) new file descriptor number -1
  • path does not exist
  • attempt to open a file, (should use Open)
  • Should NOT create directories, use CreateDirectory instead.
  • You should use the Allocate_File in vfs.c to get a File when opening.
  • SYS_CLOSE Close(int fd) 0 -1
  • fd not within range 0-(USER_MAX_FILES-1)
  • fd is not an open file
    SYS_DELETE Delete(char *path) 0 -1
  • name does not exist
  • name is a non-empty directory
  • if Delete(file) is called and file is still open in other threads or even in the thread that called Delete(), all the subsequent operations on that file (except Close()) should fail
    SYS_READ Read(int fd, char *buffer, int length) number of bytes read -1
  • fd not in range
  • fd is not an open file
  • fd was not open with O_READ flag
  • fd is a directory
  • it's OK if return value < length, for instance reading close to end of file
  • increase the filePos (the current position in the file) by the number of bytes read (if successful)
  • SYS_READENTRY Read_Entry(int fd, VFS_Dir_Entry *entry)
  • 0 if entry is valid
  • >0 if end of directory reached
  • -1
  • fd not in range
  • fd is not an open directory
  • copy over directory entry info from a GOSF_Dir_Entry into a VFS_Dir_Entry
  • skip over unused entries
  • increase the filePos, if successful
  • VFS_Dir_Entry is defined in fileio.h
  • SYS_WRITE Write(int fd, char *buffer, int length) number of bytes written -1
  • fd not in range
  • fd is not an open file
  • fd was not open with O_WRITE flag
  • fd is a directory
  • increases filePos by the number of bytes written, if successful
  • "Grow on write"- allocate blocks "on the fly" if past end of file
  • "Grow minimum" - allocate only the needed blocks. A seek to a position past the end of the file should be allowed and a write will allocate only a single block rather than all blocks in between.
  • SYS_STAT Stat(char *path, VFS_File_Stat *stat) 0 -1
  • path is not valid
  • VFS_File_Stat is defined in fileio.h
  • SYS_FSTAT FStat(int fd, VFS_File_Stat *stat) 0 -1
  • fd not in range
  • fd is not an open file
  • VFS_File_Stat is defined in fileio.h
  • SYS_SEEK Seek(int fd, int offset) 0 -1
  • fd not in range
  • fd is not an open file or directory
  • offset not in range (see comment)
  • offset is an absolute position; for files: could be equal to fileSize or greater, then Write appends, see above. For directories, it seeks to the absolute directory entry, and is an error if there's no valid entry at the given index.
    SYS_CREATEDIR Create_Directory(char *path) 0 -1
  • name already exists, as file or directory
  • regular file encountered on the path to name
  • Should NOT create directories recursively; e.g. CreateDirectory("/d/d1/d2/d3/d4"), will fail if /d/d1/d2/d3 does not exist already.
    SYS_SYNC Sync(void) 0 -1 None Forces any outstanding operations to be written to disk.
    SYS_FORMAT Format(char *dev, char *fstype) 0 -1
  • illegal value for dev
  • dev is in use, i.e. mounted
  • You only have to implement the format of the GOSFS filesystem.

    Testing and Requirements

    Building on Past Projects

    A working Project 4 is not required for this project. You can implement Project 5 on top of the provided base kernel. Note however that if you do not build this project on top of project 4, you must make the following changes:

  • Remove Init_VM from main.c
  • Remove Init_Paging from main.c
  • Modify the USER_IMP_C := uservm.c line to USER_IMP_C := userseg.c
  • You may also have to make a few small changes if you do not implement on top of project 2. These were listed in the project 3 description. In short, you will have to make Sys_RegDeliver, Sys_Signal, and Check_Pending_Signal return 0, and you will have to provide a minimal implementation of Get_Next_Runnable in kthread.c; you can replace the TODO with

        best = Find_Best(&s_runQueue[0]);
        KASSERT(best != 0);
        Remove_Thread(&s_runQueue[0], best);
    For details on merging your sources, please refer to the "Final Notes" section in Project 3 right at the end.


    Here are some must do's for the project:

    You do not need to consider situations where two processes have the same file open. You do not need to consider situations where one process opens the same file twice without closing it in between.


    Finally, in src/user there are some programs that can be used to test your file management syscalls: cp.c, format.c, ls.c, mkdir.c, mount.c, p5test.c, touch.c, type.c.

    Web Accessibility