CMSC 412 Project 5

GOSFS Filesystem

Part a: due Sunday, December 2, 6:00pm
Part b: due Tuesday, December 11, 6:00pm

1. Introduction

GeekOS currently has a filesystem of type PFAT residing in IDE disk 0 (disk image diskc.img). The goal of this project is to develop a new filesystem type, called GOSFS, and to create an instance of it in IDE disk 1 (disk image diskd.img). You can continue to use the existing PFAT filesystem to load user programs to test your GOSFS filesystem implementation.

Here, we use the term filesystem type to mean a set of rules for laying out directories and files in a disk and for adding, modifying and deleting them. We use the term filesystem to mean an instance of a filesystem type. For example, PFAT is a filesystem type (whose rules are implemented in pfat.c); diskc.img is a PFAT filesystem. GOSFS is another filesystem type (whose rules you will implement in gosfs.c); your diskd.img will be a GOSFS filesystem.


2. Context: VFS and blockdev

GeekOS has a virtual filesystem (VFS) onto which PFAT and GOSFS (and other types of) filesystems are "mounted". VFS acts as a wrapper to these mounted filesystems, allowing them to be accessed by users in a uniform manner. Initially VFS consists of just an empty root directory ("/"). At the end of OS initializing, the PFAT filesystem in IDE 0 is mounted onto VFS at position "/c", after which user programs can access the PFAT filesystem as the VFS subdirectory "/c". Similarly, your GOSFS filesystem can be mounted at another point (say "/d") and accessed.

GeekOS also has a virtual block device into which any block-structured storage device (e.g., IDE, floppy) can be registered. The block device acts as a wrapper to these registered storage devices, allowing them to be accessed in a uniform manner. The users in this case are other parts of the kernel (filesystems, paging, etc.).

The figure below illustrates the context. User programs invoke system calls, which invoke functions in vfs.c, which invoke functions in pfat.c or gosfs.c, which invoke functions in blockdev.c, which in turn invoke functions in ide.c. You have to implement the functions in in gosfs.c. The others are already implemented. You will need to understand how vfs.c and pfat.c work and how to use blockdev.c.

The pfat functions that are called by vfs have names of the form PFAT_<function> (see pfat.c). Similarly, the gosfs functions that are called by vfs have names of the form GOSFS_<function> (see gosfs.c). Note that vfs does not call these functions by name. Rather vfs gets pointers to these functions at run time (when a filesystem type is registered, when a filesystem is mounted, etc.).

There are also pfat and gosfs functions that call vfs functions, for example, to register a filesystem type (Register_Filesystem), to allocate a File struct (Allocate_File).


GOSFS provides a much richer interface than PFAT. The directories of a GOSFS filesystem form a tree. A directory can hold up to 36 entries (files/directories). A directory entry (file or directory) has a name of up to 64 characters (bytes), including the null at the end. A full path to a file is at most 1024 characters. Directories and files can be created and deleted. Existing files can opened for reading and writing. A raw disk can be formatted to hold a GOSFS filesystem. (In contrast, PFAT is read-only, has only one directory, its file names are at most 11 characters, and there is no format capability.) 

Internally, GOSFS uses a block size of 4KB (just like PFAT). Recall that an IDE disk in GeekOS has a block size of 512 bytes. Thus in an IDE disk with a GOSFS filesystem, each gosfs block is stored in 8 successive disk blocks. Gosfs block 0 (stored in disk blocks 0-7) is the so-called superblock; it holds info about this particular filesystem (disk size, free blocks, root directory's location, etc.). Gosfs blocks 1 and higher contain files and directories or are free.

You should keep track of free gosfs blocks using a bitvector. A library called bitset is provided (see bitset.h and bitset.c) that manages a bitvector and find bits that are 0 (i.e. corresponding to free gosfs blocks). Note that one bit in the bitvector corresponds to a gosfs (4KB) block (i.e., 8 disk blocks). So a bitvector of is 8192 bits (1024 bytes) can keep track of a disk of size 8192 * 4KB = 32 MB.

To read or write to a file on disk, the file has to be first opened (see Open or GOSFS_Open below). This creates a File struct containing a cache of (some blocks of) the file, the current read/write position in the file, etc. A library called bufcache is provided that does the cacheing. You can make use of it if you want.

3.1. GOSFS functions

Here is an informal look at some gosfs functions. See gosfs.c for a more complete list of functions. Except for Init_GOSFS, all the functions below are called by vfs. (As mentioned above, these gosfs functions may call vfs functions.)

3.2. User process file descriptors

Each user process will have a file descriptor table that keeps track of the files that the process currently has open. Every user process should be able to have up to 10 files open at once. The file descriptors for a user process are kept in the files[MAX_OPEN_FILES] array in struct User_Context. If a process has less than 10 files open then some of the entries in the table are free (field openFile.fsType equal to FS_TYPE_NONE). The file descriptor management is already implemented (see function Open() in vfs.c).

3.3. Filenodes and Directory structure

The internal structure of a GOSFS directory is defined in gosfs.h. Each directory in GOSFS takes up a single gosfs block. It is an array of 36 GOSFS filenodes (GOSFSfileNode), one for every possible entry (file or subdirectory). (The limit of 36 ensures that the array fits in a single 4KB block.) A filenode has the following fields: name of the entry; size of the entry's data; whether the entry is active; whether the entry is a directory; and an array of pointers to gosfs blocks containing the entry's data. (There are also some fields (isSetUid, acls) not relevant for project 5.)

The array of pointers is called blocks. It is used as follows.

The superblock and root directory have no associated GOSFSfileNode. Every other directory and every file has an associated GOSFSfileNode.

3.4. Disk Layout

A guideline is provided above. Gosfs block 0 is called SUPERBLOCK, and contains filesystem housekeeping data. Gosfs blocks ≥ 1 contain files and directories or are empty. When you do a Format(), you make a raw disk usable with GOSFS. That is:
  1. Get the drive's Size; convert it to number of gosfs blocks. IDE_getNumBlocks() in ide.c tells you the number of disk blocks.
  2. Figure out Free Blocks Bitmap size; mark them all free.
  3. Create a gosfs block containing a valid but empty directory. That will be the root directory. Make Root Dir Pointer point to it.
  4. Mark the superblock and the root directory block as used in the Free Blocks Bitmap.
  5. If everything went OK, write the Magic. The disk is now ready to be mounted and used.
Keep in mind that the superblock and root directory have no associated GOSFSfileNode.

4. VFS System Calls

Here are the system calls concerning the virtual filesystem (VFS). As you see, the semantics of these calls is very similar to that in UNIX. Keep in mind that each call vectors through the VFS layer before invoking a corresponding GOSFS_ or PFAT_ function. You have to implement the GOSFS_ functions. The PFAT_ functions are already implemented; look in pfat.c for a complete implementation of a filesystem.

int Mount(char *dev, char *prefix, char *fstype) :
  • System call: SYS_MOUNT
  • Return 0 on success.
  • Return a negative value on failure:
    • a filesystem is already mounted under prefix.
    • a parameter has an illegal value.
    • filesystem settings (on device) has invalid magic or version fields or block size is not a multiple of 512 bytes. (No need to check for other things, e.g., the number and start location, the total number of blocks, can be arbitrary.)

int Open(char *path, int permissions) :
  • System call: SYS_OPEN
  • Return new file descriptor number on success.
  • Return a negative value on failure:
    • path excluding final name does not exist.
    • path does not exist, path excluding final name exists, permissions does not include O_CREATE.
    • O_WRITE and O_CREATE not allowed for directories; use CreateDirectory instead.
  • Comments
    • There's no create syscall. Setting O_CREATE creates the file if it does not exist. If it already exists, the call succeeds (return ≥ 0) but its data content is not affected.
    • Do NOT create directories recursively if needed, e.g. Open("/d/d1/d2/d3/xFile", O_CREATE) will NOT create d1 inside of d, d2 inside of d1, etc. If the leading path /d/d1/d2/d3 does not exist, the syscall fails.
    • The permissions values are flags and may be or'ed together in a call. For example:

int Open_Directory(char *path) :
  • System call: SYS_OPEN_DIRECTORY
  • Return new file descriptor number on success.
  • Return a negative value on failure:
    • path does not exist
    • does not end in a directory.

int Close(int fd) :
  • System call: SYS_CLOSE
  • Return 0 on success.
  • Return a negative value on failure:
    • fd is not within 0-9.
    • fd is not an open file.

int Delete(char *path) :
  • System call: SYS_DELETE
  • Return 0 on success.
  • Return a negative value on failure:
    • path does not exist.
    • path ends in non-empty directory.
  • Comments
    • If Delete is called and the file is still open in other threads or even in the thread that called Delete, all subsequent operations on that file except Close() should fail.

int Read(int fd, char *buffer, int length) :
  • System call: SYS_READ
  • If success and fd is an open file:
    • length is the number of bytes to read.
    • return number of bytes read. It can be less than length (e.g., reading close to end of file).
    • increase filePos appropriately.
  • If success and fd is an open directory:
      length is the number of dirEntries to read.
    • return number of dirEntries read. It can be less than length.
    • increase filePos appropriately.
    • data put into buffer should be formatted as an array of dirEntry structs. (dirEntry is defined in fileio.h.)
  • Return a negative value on failure:
    • fd not within 0-9.
    • fd is not an open file or directory.
    • fd was not open with O_READ flag.

int Read_Entry(int fd, struct VFS_Dir_Entry *dirEntry) :
  • System call: SYS_READ_ENTRY
  • Return 1 on success.
  • Return a negative value on failure:
    • fd is not a directory
    • file pointer is at end of directory.

int Write(int fd, char *buffer, int length) :
  • System call: SYS_WRITE
  • length is number of bytes to write.
  • Return number of bytes written and increase filePos on success. "Grow on write": allocate blocks "on the fly" if past end of file.
  • Return a negative value on failure:
    • fd not within 0-9.
    • fd is not an open file.
    • fd was not open with O_WRITE flag.
    • fd is a directory.

int Stat(char *path, VFS_File_Stat *stat) :
  • System call: SYS_STAT
  • Return 0 on success.
  • Return a negative value on failure:
    • file is not found or is not readable.

int FStat(int fd, VFS_File_Stat *stat) :
  • System call: SYS_FSTAT
  • Return 0 on success.
  • Return a negative value on failure:
    • fd not within 0-9 or not an open file.

int Seek(int fd, int offset) :
  • System call: SYS_SEEK
  • offset is an absolute position; can be equal to fileSize.
  • Return 0 on success.
  • Return a negative value on failure:
    • fd not within 0-9 or not an open file or offset > fileSize.

int CreateDirectory(char *path) :
  • System call: SYS_CREATEDIR
  • Return 0 on success.
    • Create directories recursively if needed, e.g. CreateDirectory("/d/d1/d2/d3/d4") creates d1 inside of d, d2 inside of d1, etc., if they don't exist already. This operation should be atomic: either the whole directory chain is created or no directory is created.
  • Return a negative value on failure:
    • path already exists (ending in file or directory).
    • regular file encountered on the path to the final name.

int Format(int device, char *filesystem_type) :
  • System call: SYS_FORMAT
  • Return 0 on success.
    • Need only handle GOSFS (i.e., filesystem_type "gosfs").
  • Return a negative value on failure:
    • illegal value for device (it must work with IDE 1, higher is optional)
    • device is in use, i.e. mounted.
  • Comments: Don't need to format in init code; so you can save your data between sessions.

5. Notes

You do not need to consider situations where two processes have the same file open. You do not need to consider situations where one process opens the same file twice without closing it in between.

Too allow you to cache information, the VFS layer includes a Sync function. When the Sync function is called, all changed state needs to be saved to disk (i.e. the machine can be rebooted after it). You may choose to make all operations synchronous, in that case sync will be a no-op.

If a read() is called on a directory, the data returned should be in the form of an array of dirEntry structures. The length argument and the return value will indicate the number of entries to read and the number of entries that were read, rather than the number of bytes.

Make sure your Mount() works well, so that we can test your project. If we cannot Mount() a GOSFS filesystem, we cannot grade your project.

You might also want to mount "/d" automatically in Main() to speed up your testing, but the code you submit should not mount "/d" automatically. However, "/c" should be mounted automatically in Main(), as usual.

You should support disk sizes of at least 32 MB. More than 32 MB is optional. Following the procedure described in the "How to create an arbitrary size big diskd.img" section above, in your submitted project, when someone types gmake, a 32 MB file should be created.

You should support file sizes of at least 5 MB (double-indirect threshold crossed, yes). More than 5 MB is optional.

6. Testing

As you saw at the top, in src/user there are some programs that can be used to test your file management syscalls: cp,c, ls.c, mkdir.c, mount.c, p5test.c.