712-F14 Project 1: Oct 1

Click here for updates

1. Introduction

In this project, you implement a fifo channel service with N addresses, integers 1..N, spread across one or more machines. You also implement a testing harness (i.e., the inverse service).

You can do this project individually or in a group of at most 3. Use any procedural language, e.g., C, C++, Java, Python, Ruby. A high-level language (e.g., Python) should make it easier.

The figure below shows the systems and their interactions. Each box denotes a system, i.e., an executing instance of a program (whose name is in the box). You have to supply three programs: "Chan", "User" and "Gsc" (global snapshot collector). You're welcome to change the names.

2. Channel implementation

The channel implementation consists of a Chan system at each address. Each Chan system provides three functions to its user:

The Chan systems use TCP sockets to talk to each other. Each TCP connection provides a data path between two Chan systems. So treating the addresses 1..N as the nodes of a graph, each TCP connection would be a link in this graph (directed if the connection provides only one-way data transfer). Naturally, there must be a path between every two addresses. The configuration of the graph is up to you: fully connected, spanning tree, loop, directed, undirected, mixed, etc. If the path between two addresses has more than one (TCP) link, the intermediate Chan has to do routing.

Here is a very rough outline of Chan for a fully-connected graph. Below, addr_j is the "tcp address" (ip addr, tcp port) of Chan j.

Framing: Because tcp is byte-oriented, Chan needs to do framing (of the byte stream into msgs). Assume a max msg size (e.g., 4GB). Start each message with a 32-bit length (in octets)

Random delays: Introduce random delays in the transit time of a message, e.g., at message reception, at the endpoint and/or at an intermediate point.

Zmq and such: Chan must use TCP sockets, not a higher-level msg-passing service (e.g., zmq, rpc). It's fine to use the latter for User-Gsc communication.

3. Testing harness

The testing harness is essentially the fifo channel service inverse. It consists of a User system at each address and one Gsc system.

Each User system calls its Chan system's tx and rx functions to send and receive "random" messages. At each each msg send and msg receive, it atomically informs the Gsc system of the event. More precisely, it conveys "[tx,j,k,msg]" to Gsc before calling Chan's tx(k,msg), and it conveys "[rx,j,k,msg]" to Gsc after its Chan's rx() call returns with message msg from k (how does it get k?).

The Gsc system maintains the "current" global state of the inverse service, i.e., the transmit and receive histories for every address pair. Whenever it learns of a message reception, bit checks whether the global state still satisfies the desired invariants.

4. Interaction between systems

The interaction between a User system and its Chan system is via standard function calls (rather than syscalls or IPC). Thus the two systems are linked in the same address space (form one OS process with multiple threads). We refer to this as a "User-Chan" system. (In later projects, the User part will change.)

The interaction between a User system and the Gsc system necessarily involves remote communication, eg, a remote procedure call or TCP. (The Gsc system can be part of one User system, but that seems pointless.)

5. Initialization

Have a script, say Init (you're welcome to change the name), that is executed (by a human user) once at each machine of the channel. It starts up an instance of User-Chan at each address on that machine. It also starts an instance of Gsc on one machine, e.g., the one with address 1.

Init should take as input an "address-tcp" text file that gives the mapping between channel addresses and TCP ports. For example:

or The address-tcp file can also be an input to the User-Chan program.

6. Updates

[10/7b]
Init should just start all User systems on the local machine.

[10/7a]
Ensure that the assertion checking in Gsc is atomic with Chan.tx and Chan.rx. Otherwise, Gsc can incorrectly report an error. For example, User_j should call Chan_j.tx(k,msg) only after Gsc has updated txh_j,k, to preclude Gsc from learning of User_k's rx of msg before learning of User_j's tx of msg. So if your Gsc-User communication is over TCP, it's not sufficient for TCP to have completed its transfer.