Parts Due: November 25 (Wednesday), December 3, and December 10, 2009 (11:59 PM)
The purpose of this project is to add a writable filesystem, GFS2, to GeekOS. Unlike the existing PFAT, GFS2 includes directories, inodes, direntries, etc. The new filesystem will reside on the second IDE disk drive in the emulator. The PFAT drive will continue to hold user programs while you format and test your implementation of GFS2. The second IDE disk's image is called diskd.img. It is 10MB by default.
Since GeekOS will have two types of filesystems (PFAT and GFS2), it will have a virtual filesystem layer (VFS) to direct requests to an appropriate filesystem (see figure below). We have provided an implementation of the VFS layer in the file vfs.c. The VFS layer will call the appropriate GFS2 routines when a file operation refers to a file in GFS2.

The implementation of PFAT is in pfat.c. You will implement GFS2 in gfs2.c and relevant system calls in syscall.c.
VFS picks the functions
to call based on supplied structures containing function pointers, one
for each operation. For example, see Init_PFAT in pfat.c: this initializes the
PFAT filesystem with VFS by passing it a pointer to s_pfatFilesystemOps, a
structure that contains function pointers to PFAT routines for mounting
(and formatting) a filesystem. Other PFAT functions are stored in
different structures (e.g., look at the PFAT_Open routine, which passes
the s_pfatFileOps
structure to VFS). You will analogously use s_gfs2FilesystemOps,
s_gfs2MountPointOps, s_gfs2DirOps, and s_gfs2FileOps
in gfs2.c.
You should also add a call to Init_GFS2 provided gfs2.c
in main.c to register the GFS2 filesystem. In general,
use the PFAT implementation as your guide to interfacing GFS2 with VFS.
Open files are tracked by the kernel using a struct File. Each user
space process will have an associated file descriptor table that records which files the
process can currently read and write. A user process can have up to
USER_MAX_FILES files open at once. The file descriptor table
is implemented as a struct File *fileList[USER_MAX_FILES]
array that you must add to struct User_Context. Note that
not all the entries in the fileList are necessarily open
files, since usually a process has less than USER_MAX_FILES
files open at once. If fileList[i] is NULL, it represents a free
slot (file descriptor is not used). This descriptor will be filled
out by the code in VFS; e.g., see Open in vfs.h,
whose pFile argument is
a pointer to a free slot in the table.
You know the inode. The size of the inode is constrained so that an even number of inodes fit into each block (inodes won't span blocks).
Type can be either:
Figure caveat: the lengths aren't pointers; the alignment padding isn't explicit.
Directories are files consisting of appended records of type gfs2_dirent. Note that you will be overflowing the array to write names of any length; sizeof(struct gfs2_dirent) will very likely confuse. To calculate the size for a new dirent, round 4 + 2 + name_length to the nearest multiple of four. A helper function would be a good idea here.
Some code for seeking through the list in a directory is in lecture notes: to find an entry, first look for entries that match the length.
You do have to support deletion; I believe there is an obvious way to remove from the middle of this list without rewriting the directory, so I leave it to you to determine.
The superblock contains the file system parameters. It must be stored at offset 1024 (512-byte block #3, or #2 depending on how you count). You can expect it to be there on mount. If it's not there, the file system is broken and would have to be repaired before being mounted.
A replica of the superblock will, when you format, be added at number-of-blocks / 2. That's it. On format, just one. The block index of the replica superblock is in the filesystem-numbered block size, not the 512-byte block space (like block #3). 512-block #3 is assumed. All the other replicas will be at block "0", which means not replicated.
When creating a directory, it is not entirely empty. Each directory has both '.' and '..' entries. The root directory will have contents:
| 01 00 00 00 04 01 2e 00 |
| 01 00 00 00 04 02 2e 2e |
If you're wondering why inode 1 is represented using 01 00 00 00, smack yourself. LSB.
I encourage you to not write these 16 bytes, but instead to write a dirent constructor function to use to build the directory.
Track free disk blocks using a bit vector. A library called bitset is provided (see bitset.h and bitset.c) that manages a set of bits and provides functions to find bits that are 0, which correspond to free disk blocks (note that this is the opposite of the textbook, which uses a 1 to indicate a free block).
The free disk blocks map is stored within the file system using inode 2.
On format, create this file, filled with ones for every block allocated to inodes, for the block consumed by the root directory, for the block containing the free blocks bitmap, and for the block containing the replica superblock(s).The mount system call allows you to associate a filesystem with a place in the file name hierarchy. The Mount call is implemented as part of the VFS code we supply (see Mount function in vfs.c); you will implement your Init_GFS2 function so that VFS's mount code will call your function GFS2_Mount() in gfs2.c. Among other things, it must ensure that the filesystem being mounted is GFS2.
The buffer cache allows you to keep recently-accessed blocks in memory for a while (for example, the inode for a directory), and hold in-progress modifications to blocks until all the writes to that block are done and can be committed back.
Each buffer in the cache has a block number (the backing store block number), the buffer in memory, and two flags:
Call Create_FS_Buffer_Cache() to make a new buffer cache for the device; it is given a block size at creation time.
Call Sync_FS_Buffer() to push back specific blocks, and Sync_FS_Buffer_Cache() to push them all back. Ideally, one would use Sync_FS_Buffer for flushing the blocks of a file on close or of filesystem metadata eagerly. One would use Sync_FS_Buffer_Cache to sync the whole filesystem. (i.e., you've run the "sync" program before, right?)
There's a Destroy_FS_Buffer_Cache() function too. I can't see a purpose to it in this assignment.
You will implement the code for format outside geekos, using src/tools/gfs2f.c. The gfs2f tool will function much like the buildFat.c tool that packages your user code and pagefile into diskc.img.
Your gfs2f must take the following options:
Your gfs2f may accept the following additional options:
Inode balance: The number of inodes should be computed so that there are no more than 10 blocks per inode. Take the number of blocks, divide by ten, round up to determine the minimum number of inodes. Then determine how many blocks will be used by that minimum number of inodes by dividing the number of inodes by the number of inodes that will fit in a block, and round up. Finally, the number of inodes in the system will be the number that fill that many blocks.) (Sorry, this has to be somewhat clearly specified so that we can predict how much free space should be on a disk, or conversely how many files it should support.)
Key steps:
Hints:
From the list of system calls below, implement all functionality for reading. Open (without the create option), Open_Directory, Close, Read, Readentry, Stat, FStat, Seek. "All functionality" includes the implementation of these functions for GFS2 in src/geekos/gfs2.c. (The syscalls are nearly trivial, since most of the work is done for you in vfs.c.) Modify Makefile.linux (if not done for you) so that the second ide disk is tied to an image you formatted in part 1. (Add "-hdb gfs-512x64.img" to the qemu line, for example.)
Implement the rest: the create option to open, delete, write, create_directory, sync. Deletion should release the blocks to be reused by another file.
You will implement new system calls as described below. As you can see, the semantics are very similar to the UNIX file system. Some notes:
| Call | User Function | Return on success | Return on failure | Reasons for failure | Comment |
| SYS_MOUNT | Mount(char *dev, char *prefix, char *fstype) | 0 | -1 | |
Your GFS2 Mount function should not "validate" the filesystem settings except for magic and version fields, and that block size is support-able (a multiple of 512, or 512/1024/4096 at least). Other items, e.g., the number and start location of inodes and the total number of blocks, can be arbitrary. |
| SYS_OPEN | Open(char *path, int mode) | new file descriptor number | -1 | |
|
| SYS_OPENDIRECTORY | Open_Directory(char *path) | new file descriptor number | -1 | |
|
| SYS_CLOSE | Close(int fd) | 0 | -1 | |
|
| SYS_DELETE | Delete(char *path) | 0 | -1 | |
if Delete(file) is called and file is still open in other threads or even in the thread that called Delete(), all the subsequent operations on that file (except Close()) should fail |
| SYS_READ | Read(int fd, char *buffer, int length) | number of bytes read | -1 | |
|
| SYS_READENTRY | Read_Entry(int fd, VFS_Dir_Entry *entry) | -1 | |
||
| SYS_WRITE | Write(int fd, char *buffer, int length) | number of bytes written | -1 | |
|
| SYS_STAT | Stat(char *path, VFS_File_Stat *stat) | 0 | -1 | |
|
| SYS_FSTAT | FStat(int fd, VFS_File_Stat *stat) | 0 | -1 | |
|
| SYS_SEEK | Seek(int fd, int offset) | 0 | -1 | |
offset is an absolute position; for files: could be equal to fileSize or greater, then Write appends, see above. The behavior of seek() on gfs2 directories shall be undefined; the behavior of readdir after seek similarly undefined. |
| SYS_CREATEDIR | Create_Directory(char *path) | 0 | -1 | |
Should NOT create directories recursively; e.g. CreateDirectory("/d/d1/d2/d3/d4"), will fail if /d/d1/d2/d3 does not exist already. |
| SYS_SYNC | Sync(void) | 0 | -1 | None | Forces any outstanding operations to be written to disk. |
Here are some must do's for the project:
Consider the difference between this mini inode and a more realistic inode; what do others maintain, and why does gfs2 not include it?
Consider the difference between the original unix file system and FFS; in your implementation, how often do you jump between reading inodes and reading data blocks?
Consider readent (SYS_READENTRY); does it make sense to have this be a system call? "strace ls" on a linux machine. What system call does it use instead? Check the man page and explain why that's a better design.
Using your experience from this project, what does it mean for a filesystem to have been cleanly unmounted? How does the kernel know if a filesystem is clean when it is being mounted? What would have to be added to this project to allow that decision? How would one check and repair GFS2?