CMSC 412 Project #5

File System

Parts Due: November 25 (Wednesday), December 3, and December 10, 2009 (11:59 PM)

GFS2 - GeekOS FileSystem #2

The purpose of this project is to add a writable filesystem, GFS2, to GeekOS. Unlike the existing PFAT, GFS2 includes directories, inodes, direntries, etc. The new filesystem will reside on the second IDE disk drive in the emulator. The PFAT drive will continue to hold user programs while you format and test your implementation of GFS2. The second IDE disk's image is called diskd.img. It is 10MB by default.

VFS and file operations

Since GeekOS will have two types of filesystems (PFAT and GFS2), it will have a virtual filesystem layer (VFS) to direct requests to an appropriate filesystem (see figure below). We have provided an implementation of the VFS layer in the file vfs.c. The VFS layer will call the appropriate GFS2 routines when a file operation refers to a file in GFS2.

The implementation of PFAT is in pfat.c. You will implement GFS2 in gfs2.c and relevant system calls in syscall.c.

VFS picks the functions to call based on supplied structures containing function pointers, one for each operation. For example, see Init_PFAT in pfat.c: this initializes the PFAT filesystem with VFS by passing it a pointer to s_pfatFilesystemOps, a structure that contains function pointers to PFAT routines for mounting (and formatting) a filesystem. Other PFAT functions are stored in different structures (e.g., look at the PFAT_Open routine, which passes the s_pfatFileOps structure to VFS). You will analogously use s_gfs2FilesystemOps, s_gfs2MountPointOps, s_gfs2DirOps, and s_gfs2FileOps in gfs2.c. You should also add a call to Init_GFS2 provided gfs2.c in main.c to register the GFS2 filesystem. In general, use the PFAT implementation as your guide to interfacing GFS2 with VFS.

Open files are tracked by the kernel using a struct File. Each user space process will have an associated file descriptor table that records which files the process can currently read and write. A user process can have up to USER_MAX_FILES files open at once. The file descriptor table is implemented as a struct File *fileList[USER_MAX_FILES] array that you must add to struct User_Context. Note that not all the entries in the fileList are necessarily open files, since usually a process has less than USER_MAX_FILES files open at once. If fileList[i] is NULL, it represents a free slot (file descriptor is not used). This descriptor will be filled out by the code in VFS; e.g., see Open in vfs.h, whose pFile argument is a pointer to a free slot in the table.

GFS2 Data Types

Primitive types

Both primitive types are unsigned integers; the separate data type is meant to ease type checking.
gfs2_blocknum
The number of the block (not sector) on disk.
gfs2_inodenum
The number of the inode. (not the block containing the inode.)

Inode

You know the inode. The size of the inode is constrained so that an even number of inodes fit into each block (inodes won't span blocks).

Type can be either:

GFS2_DIRECTORY
It's a directory. It won't be read or written directly by applications.
GFS2_FILE
It's a file.

Dirent

Figure caveat: the lengths aren't pointers; the alignment padding isn't explicit.

Directories are files consisting of appended records of type gfs2_dirent. Note that you will be overflowing the array to write names of any length; sizeof(struct gfs2_dirent) will very likely confuse. To calculate the size for a new dirent, round 4 + 2 + name_length to the nearest multiple of four. A helper function would be a good idea here.

Some code for seeking through the list in a directory is in lecture notes: to find an entry, first look for entries that match the length.

You do have to support deletion; I believe there is an obvious way to remove from the middle of this list without rewriting the directory, so I leave it to you to determine.

Superblock

The superblock contains the file system parameters. It must be stored at offset 1024 (512-byte block #3, or #2 depending on how you count). You can expect it to be there on mount. If it's not there, the file system is broken and would have to be repaired before being mounted.

A replica of the superblock will, when you format, be added at number-of-blocks / 2. That's it. On format, just one. The block index of the replica superblock is in the filesystem-numbered block size, not the 512-byte block space (like block #3). 512-block #3 is assumed. All the other replicas will be at block "0", which means not replicated.

The Empty Directory

When creating a directory, it is not entirely empty. Each directory has both '.' and '..' entries. The root directory will have contents:

01 00 00 00 04 01 2e 00
01 00 00 00 04 02 2e 2e

If you're wondering why inode 1 is represented using 01 00 00 00, smack yourself. LSB.

I encourage you to not write these 16 bytes, but instead to write a dirent constructor function to use to build the directory.

Free Blocks Bitmap

Track free disk blocks using a bit vector. A library called bitset is provided (see bitset.h and bitset.c) that manages a set of bits and provides functions to find bits that are 0, which correspond to free disk blocks (note that this is the opposite of the textbook, which uses a 1 to indicate a free block).

The free disk blocks map is stored within the file system using inode 2.

On format, create this file, filled with ones for every block allocated to inodes, for the block consumed by the root directory, for the block containing the free blocks bitmap, and for the block containing the replica superblock(s).

Mounting

The mount system call allows you to associate a filesystem with a place in the file name hierarchy. The Mount call is implemented as part of the VFS code we supply (see Mount function in vfs.c); you will implement your Init_GFS2 function so that VFS's mount code will call your function GFS2_Mount() in gfs2.c. Among other things, it must ensure that the filesystem being mounted is GFS2.

Buffer Cache

The buffer cache allows you to keep recently-accessed blocks in memory for a while (for example, the inode for a directory), and hold in-progress modifications to blocks until all the writes to that block are done and can be committed back.

Each buffer in the cache has a block number (the backing store block number), the buffer in memory, and two flags:

in use
Don't touch it, it's in use. Some thread is reading from it or writing to it. Or it's going to disk. The flag is set when you Get_FS_Buffer(), and cleared when you Release_FS_Buffer(). (Releasing doesn't remove the buffer from cache, it just marks it not in use.) If the flag has been set when Get_FS_Buffer() is called, you'll block, so pretend it is a lock acquisition.
dirty
It has been modified. Mark a buffer dirty using Modify_FS_Buffer(). The buffer cache will take care of eventually writing the block back to disk, in whatever order it chooses.

Call Create_FS_Buffer_Cache() to make a new buffer cache for the device; it is given a block size at creation time.

Call Sync_FS_Buffer() to push back specific blocks, and Sync_FS_Buffer_Cache() to push them all back. Ideally, one would use Sync_FS_Buffer for flushing the blocks of a file on close or of filesystem metadata eagerly. One would use Sync_FS_Buffer_Cache to sync the whole filesystem. (i.e., you've run the "sync" program before, right?)

There's a Destroy_FS_Buffer_Cache() function too. I can't see a purpose to it in this assignment.

Part 1: Format

You will implement the code for format outside geekos, using src/tools/gfs2f.c. The gfs2f tool will function much like the buildFat.c tool that packages your user code and pagefile into diskc.img.

Your gfs2f must take the following options:

output file name
Obvious. Create if it doesn't exist. Overwrite if it does.
block size
Your format tool must support block sizes of 512, 1024, and 4096 bytes
number of blocks
Your format tool must be able to construct filesystems of at least 100 MB

Your gfs2f may accept the following additional options:

names of files to include
You may appreciate having some files to read in part 2.

Inode balance: The number of inodes should be computed so that there are no more than 10 blocks per inode. Take the number of blocks, divide by ten, round up to determine the minimum number of inodes. Then determine how many blocks will be used by that minimum number of inodes by dividing the number of inodes by the number of inodes that will fit in a block, and round up. Finally, the number of inodes in the system will be the number that fill that many blocks.) (Sorry, this has to be somewhat clearly specified so that we can predict how much free space should be on a disk, or conversely how many files it should support.)

Key steps:

Hints:

Part 2: Read Functions

From the list of system calls below, implement all functionality for reading. Open (without the create option), Open_Directory, Close, Read, Readentry, Stat, FStat, Seek. "All functionality" includes the implementation of these functions for GFS2 in src/geekos/gfs2.c. (The syscalls are nearly trivial, since most of the work is done for you in vfs.c.) Modify Makefile.linux (if not done for you) so that the second ide disk is tied to an image you formatted in part 1. (Add "-hdb gfs-512x64.img" to the qemu line, for example.)

Part 3: Write Functions

Implement the rest: the create option to open, delete, write, create_directory, sync. Deletion should release the blocks to be reused by another file.

New System Calls

You will implement new system calls as described below. As you can see, the semantics are very similar to the UNIX file system. Some notes:

Call User Function Return on success Return on failure Reasons for failure Comment
SYS_MOUNT Mount(char *dev, char *prefix, char *fstype) 0 -1
  • a filesystem already mounted under name
  • illegal value for one of the parameters
  • Your GFS2 Mount function should not "validate" the filesystem settings except for magic and version fields, and that block size is support-able (a multiple of 512, or 512/1024/4096 at least). Other items, e.g., the number and start location of inodes and the total number of blocks, can be arbitrary.
    SYS_OPEN Open(char *path, int mode) new file descriptor number -1
  • name does not exist (if permissions don't include O_CREATE )
  • path to name does not exist (if permissions include O_CREATE )
  • attempt to open a directory, (should use OpenDirectory)
  • there's no create syscall, so setting O_CREATE will create the file. If the file exists, the call succeeds (return >= 0) but its data contents is not affected.
  • Should NOT create directories; e.g. if you do Open("/d/d1/d2/d3/xFile", O_CREATE) and the leading path /d/d1/d2/d3 does not exist, the syscall fails, returning -1
  • You should use the Allocate_File in vfs.c to get a File when opening.
  • The permissions values are flags and may be or'ed together in a call. For example:
    • O_CREATE | O_READ
    • O_READ | O_WRITE
    • O_CREATE | O_READ | O_WRITE
  • SYS_OPENDIRECTORY Open_Directory(char *path) new file descriptor number -1
  • path does not exist
  • attempt to open a file, (should use Open)
  •  
  • Should NOT create directories, use CreateDirectory instead.
  • You should use the Allocate_File in vfs.c to get a File when opening.
  • SYS_CLOSE Close(int fd) 0 -1
  • fd not within range 0-(USER_MAX_FILES-1)
  • fd is not an open file
  •  
    SYS_DELETE Delete(char *path) 0 -1
  • name does not exist
  • name is a non-empty directory
  • if Delete(file) is called and file is still open in other threads or even in the thread that called Delete(), all the subsequent operations on that file (except Close()) should fail
    SYS_READ Read(int fd, char *buffer, int length) number of bytes read -1
  • fd not in range
  • fd is not an open file
  • fd was not open with O_READ flag
  • fd is a directory
  • it's OK if return value < length, for instance reading close to end of file
  • increase the filePos (the current position in the file) by the number of bytes read (if successful)
  • SYS_READENTRY Read_Entry(int fd, VFS_Dir_Entry *entry)
  • 0 if entry is valid
  • >0 if end of directory reached
  • -1
  • fd not in range
  • fd is not an open directory
  • copy over directory entry info from a GOSF_Dir_Entry into a VFS_Dir_Entry
  • skip over unused entries
  • increase the filePos, if successful
  • VFS_Dir_Entry is defined in fileio.h
  • SYS_WRITE Write(int fd, char *buffer, int length) number of bytes written -1
  • fd not in range
  • fd is not an open file
  • fd was not open with O_WRITE flag
  • fd is a directory
  • increases filePos by the number of bytes written, if successful
  • "Grow on write"- allocate blocks "on the fly" if past end of file
  • "Grow minimum" - allocate only the needed blocks. A seek to a position past the end of the file should be allowed and a write will allocate only a single block rather than all blocks in between.
  • SYS_STAT Stat(char *path, VFS_File_Stat *stat) 0 -1
  • path is not valid
  • VFS_File_Stat is defined in fileio.h
  • SYS_FSTAT FStat(int fd, VFS_File_Stat *stat) 0 -1
  • fd not in range
  • fd is not an open file
  • VFS_File_Stat is defined in fileio.h
  • SYS_SEEK Seek(int fd, int offset) 0 -1
  • fd not in range
  • fd is not an open file
  • offset is an absolute position; for files: could be equal to fileSize or greater, then Write appends, see above. The behavior of seek() on gfs2 directories shall be undefined; the behavior of readdir after seek similarly undefined.
    SYS_CREATEDIR Create_Directory(char *path) 0 -1
  • name already exists, as file or directory
  • regular file encountered on the path to name
  • Should NOT create directories recursively; e.g. CreateDirectory("/d/d1/d2/d3/d4"), will fail if /d/d1/d2/d3 does not exist already.
    SYS_SYNC Sync(void) 0 -1 None Forces any outstanding operations to be written to disk.

    Testing and Requirements

    Requirements

    Here are some must do's for the project:

    You do not need to consider situations where two processes have the same file open. You do not need to consider situations where one process opens the same file twice without closing it in between.

    Testing

    Finally, in src/user there are some programs that can be used to test your file management syscalls: cp.c, ls.c, mkdir.c, mount.c, p5test.c, touch.c, type.c.

    Study questions

    Consider the difference between this mini inode and a more realistic inode; what do others maintain, and why does gfs2 not include it?

    Consider the difference between the original unix file system and FFS; in your implementation, how often do you jump between reading inodes and reading data blocks?

    Consider readent (SYS_READENTRY); does it make sense to have this be a system call? "strace ls" on a linux machine. What system call does it use instead? Check the man page and explain why that's a better design.

    Using your experience from this project, what does it mean for a filesystem to have been cleanly unmounted? How does the kernel know if a filesystem is clean when it is being mounted? What would have to be added to this project to allow that decision? How would one check and repair GFS2?