This class is about writing correct distributed programs. A distributed program is a multi-threaded program (aka concurrent, parallel, interactive program), typically with component programs spread over different computers. To write correct a distributed program, say A, three things are needed. First, a definition of the intended external behavior of A; we refer to this as the service of A. Second, a way to establish A implements the service; i.e., that every possible evolution of A, regardless of variations in thread speeds (i.e., race conditions), satisfies the service. Third, a way to use the service of A (instead of A itself) when writing other programs. We will cover a unified method to do these things, and apply the method to problems in distributed computing and systems (e.g., locks, termination detection, global snapshot computation, TCP sockets, wide-area routing, sequentially-consistent distributed shared memory, serializable transaction processing).
There will be programming and non-programming assignments.
Time permitting, we will also look at analyzing security (authentication) properties of distributed programs.
The programming that we will do in any particular topic will correspond to the service-based framework outlined above. We first describe the desired service informally (and, hopefully, unambiguously). Next comes a service program that formally defines the service, and one or more distributed programs that implement the service. Then we obtain model and tester programs from the service. The model can be used instead of an implementation for testing applications that use the service. The tester can test any implementation of the service; it can also check global assertions of the implementation internals.
Examples in Python are here for details. We will be adding to it over the course.