** begin lecture 2 man 2 socket ;; the 2 is for system calls; ;; open, close, read, write ;; a 3 would be for library function ;; printf, fgets, fopen Create a socket: socket(AF_INET, SOCK_DGRAM, 0); ^^^^^^^ "internet" : ip addresses, potentially ipv4 addresses ^^^^^^^^^^ datagram -> UDP (messages that fit in a packet) (as opposed to SOCK_STREAM ->TCP, text, ordered bytes, reliable) ^ switches between datagram socket implementations, of which there's only one. Fragmentation ; to be covered a bit more later. Large IP packet, typically larger than 1500 bytes, would be split into fragments to be reassembled at the destination. For long-distance links, we use Path MTU discovery. MTU == Maximum Transmission Unit ; for Ethernet, 1500 bytes. This informs the sender what maximum size packet won't experience fragmentation. Fragmentation does happen; typically for NFS (network file system) traffic. Servers tend to be local because they trust IP addresses, you want to send 4096 bytes at a time. The maximum size of an IP packet is 65,535 bytes. (deduct 20 for the IP header, 8 for the udp header) Jumbo frames (gig ether, -> ~9000 byte MTUs) to be loosely standardized. Small MTU -> multiplexing (other people get a chance) ability to detect bit errors. (2-bit errors, 3-bit errors) Large MTU -> efficiency, perhaps fewer headers wasting bandwidth, fewer times to lookup where a packet goes. Ports 1. Ephemeral - allocated to clients, doesn't matter what they are, allow the kernel to tell which conversation a packet belongs to. 2. Bound - allocated to servers, often well known ( 41710 ), allows clients to contact a specific service. Conversation is identified by the 5-tuple: Source port, Destination port, Source address, Destination IP address, Protocol (TCP or UDP) Implication is that you can send a message to one of our servers (at dest port 41710) from a socket (bound to) a different port. /etc/services lists all the IANA-allocated ports. ssh = 22, http = 80, ntp = 123. assignment: send/broadcast a message, then wait for messages and print them. man 2 bind ;; all the include files you need, are listed here. bind - will give us a port. we can ask for one, or let it come. bind - the addresses and ports are in network byte order. #include struct sockaddr_in { /* internet endpoint address, IP address and port */ sa_family_t sin_family; /* AF_INET ; PF_INET */ in_port_t sin_port; /* ...sin_port = htons(41710); */ struct in_addr sin_addr; /* ...sin_addr.s_addr = "224.0.50.112"; */ /* ...sin_addr.s_addr = inet_addr("224.0.50.112");*/ /* ...sin_addr.s_addr = htonl(0xc000.....);*/ /* ...sin_addr.s_addr = INADDR_ANY; if you were a server */ /* ...sin_addr.s_addr = INADDR_LOCALHOST; */ /* ...sin_addr.s_addr = inet_addr("127.0.0.1"); */ /* ...sin_addr.s_addr = htonl(0x7f000001); */ */ }; ** that was an address, how to fill in the fields setsockopt allow more than one process to bind the same port. --- for most servers, this would be bad. int one = 1; setsockopt(socket, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one)); // the ability to attach more than one process (socket) // to the same address and port // on BSD -- there is REUSEPORT option. before the bind. allow us to configure the multicast membership bits. --- subscription to a multicast address. as a sender, you're broadcasting, only interested receivers will pick the packet off the wire. ** begin lecture 3 struct ip_mreq mreq; memset(&mreq, 0, sizeof(struct ip_mreq)); ^^^ treated as a character pointer ^^^ set bytes to zero ^^^^ that many bytes. // bzero(&mreq, sizeof(struct ip_mreq)); // another "optional" field might get added mreq.imr_multiaddr.s_addr = inet_addr("224.0.50.111"); // use the right address. mreq.imr_interface.s_addr = htonl(INADDR_ANY); // setsockopt will return -1 on error; 0 on success. if(setsockopt(receiving_socket, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(struct ip_mreq)) == -1) { perror("setsockopt!"); fprintf(stderr, "%s: %s\n", "setsockopt!", strerror(errno)); exit(EXIT_FAILURE); } // first parameter is the socket. sendto(sending_socket, buffer_to_send, length_of_buffer, struct sockaddr *destination_adress, socklen_t length_of_dest_address ); struct sockaddr { int family <--- subclass. struct sockaddr_in { // I think this is 12 bytes long int family == AF_INET struct sockaddr_in6 { // I'd bet this is at least 32 bytes long. int family == AF_INET6 HOW to CONSTRUCT a BUFFER struct packet { struct header { // all those fields } hdr; char data[0]; // instead of zero, be 0xffff - 28 - sizeof(header) (or so) char *data; // big mistake. }; char *packet; ((struct header *)packet)->version = 1 [ header ][ data ] in some region of memory you can point to. don't call sizeof(struct packet); the size of the packet is in the length field. * you keep track of it. not the compiler. one call to sendto per packet. (no fragmentation). sizeof(struct packet *) => 4. -- why not to use the pointer scheme. struct packet *the_packet = malloc( 0xffff ); the_packet->data if done the right way == (char *)the_packet + sizeof(header) if done the other (bad; the pointer) way, it'd be zero or some unitialized bytes. struct header *the_header = malloc( 0xffff ); memcpy(the_header + 1, "hello", 5); printf("%p %p\n", the_header, the_header + 1); recvfrom(receiving_socket, buffer, maximum_size_of_buffer, * address, * length); => returns the number of bytes read. address is in/out parameter. in => restrictive, anything but inaddr_any (all zeroes), it will only return a packet from that source. source address to match on. source address is an ip address and UDP port; ip is end-to-end; original source (as IP believes) but most likely, make it all zeroes. out => from which source. for us, we *could* send a unicast response right back. the buffer won't have this information for us. Debugging: printf. inet_ntop I think will convert IP addresses to strings. __FILE__ __LINE__ macros can print where you are. gdb. break at one.c:50 strace - what system calls your code invokes, with what parameters. << anyone can run it. ltrace ... dunno if it's useful here... might not be installed. tcpdump/wireshark - what bytes are in what packets being sent out. << you need to own the machine. man 2 sendto ;; takes the struct sockaddr_in as destination Back to course content, no more programming assignment stuff. ===== Reliability. build a network from cheap components you can get at Fry's. from the overpriced components you can get at ... -- cheap oscillators (telling where a 1 and zero is. -- cheap wires. (original ethernet used catv trunk wire... twisted pair used copper telephone wiring) events outside our control. backhoes dig up wire. "baltimore tunnel fire" -- reasonably large outage on the east coast. reboot routers. power failures laws of physics. noise from external interference, speed of light, attenuation, fading, multipath Phy - encode the bits with enough redundancy that we recover them all. Data Link - retransmissions. if you believe it didn't get received, send again. Transport - retransmissions in TCP use "cumulative acknowledgement" Phy - Problem: put bits on a wire. Subgoal 1: High bandwidth (throughput) (to have short bits) Subgoal 2: Low latency (delay) the first bits should get to other side quickly. NRZ, Baseline Wander, Clock Recovery. ** begin lecture 4 if you're sending "hello" [ header, with all integer fields in network byte order ] [ hello ] not [ lleh\0\0\0o ] only things to convert are the 32-bit integers, and 16-bit integers. htonl htons small trick when calculating the checksum. (but not for this one!) Weird issues: ** some csic machines act lame. ** "nauseated" does not seem to be among them. <- login there. ** send me mail with which ones work and don't. ** if you can explain it correctly.... I will be impressed. CLASSROOM ETHERNET -- we will eventually talk about how to share it. -- for now, use it as an motivating example for why we care about clock recovery. -- clock recovery: have to find the middle of the 1 or the zero and be able to tell when there are several consecutive 1's or 0's. (stadium concert video example) Encoding schemes: -- NRZ encoding : non-return to zero direct: 1 is high, 0 is low. RS-232 -12v => 1, +12v => 0 (pretty resilient) internal PC bus (but architects get to cheat, they have a clock) downside: no help for the clock. clock recovery: repeated 1's or repeated zeroes could get lost. baseline wander: spends too long at some level, average tracks up (or down) into the noise. what we need: transitions. 0's and 1's in equal parts, and frequently changing. (transitions help with both problems (clock recovery and baseline wander)) -- NRZI to encode a 1: change (from low to high or high to low) to encode a 0: no change 11111111111 (original signal) HLHLHLHLHLH (H = high, L is low) solves half the problem (consecutive 1's) doesn't solve the other half (consecutive 0's) -- Manchester Encoding to encode a 1: ( high/low ) to encode a 0: ( low/high ) 1 1 1 1 1 1 1 1 1 1 1 (original signal) HLHLHLHLHLHLHLHLHLHLHL 1 0 1 0 1 0 1 0 1 0 1 (original signal) HLLHHLLHHLLHHLLHHLLHHL Good: each bit gets a transition. Bad: send at half the rate. (many transitions encode nothing) this is in all ethernet before 100 Mbit. -- 4B/5B 4-bit sequence in your message, turn it into 5 bits on the wire. Have a table. This table will ensure that there are never more than three consecutive zeroes. 16 possible 4-bit sequences. 32 possible 5-bit sequences. not gonna use 00000 not gonna use 00001 not gonna use 00010 not gonna use 00011 gonna use 10001 (maybe... not in the list) not gonna use 11000 not gonna use 01000 could totally use 11111 could totally use 11001 catch is you put two of them together. all valid codewords start with at most one zero, and end with at most two. (verified!) rule eliminates less than 16. rest can be used for framing / control there is a table in the book, you could double check my rule generated by no notes. -- 4B/5B + NRZI recall 4B/5B: Have a table. This table will ensure that there are never more than three consecutive zeroes. recall NRZI: solves half the problem (consecutive 1's) 0100 1000 0110 1001 (original signal 0x4869) 01010 10010 01110 10011 (to 4b/5b) spaces not transmitted 01100 11100 01011 00010 (add nrzi) scheme in fast (100mbit) ethernet. wikipedia pages are fairly good... (it's not important to me that you could match the encoding scheme to the ieee standard.) -- scrambling SONET (optical, serious metropolitan or long-distance networks) 0100100001101001 (your bits) 0110010111010010 (*random* signal everyone has agreed on) 0010110110111011 (xor, and pray) advantage: no extra bits. disadvantage: could be unlucky. (ideally, you would not get unlucky more than once) devices that implement this are expensive. squeeze out as much performance as possible. might mean that you don't need as many transitions most of the time? not fighting three consecutive bits, maybe fighting 20 consecutive bits. based on not assuming random data. * if you believed all traffic to be encrypted, you wouldn't need scrambling. * that doesn't happen. FRAMING - mark the beginning and the end of a string of bits (frame) Options: (i) 4b/5b codewords (symbol outside the vocabulary) (ii) sentinel (like double-quote in programming language) (iii) fixed-size frames (using timing) 4b/5b scheme: 10001 (valid codeword, not in the table) stick that at the beginning and end of a frame. HDLC as used in some PPP. append 01111110 I believe there is both a bitwise version (counting 6 1's after a zero) and a byte-wise version (seeking 0x7e as a frame delimiter) For HDLC, just like a \" in a string, you have to escape: adding an extra zero after five 1's (regardless of whether a zero or 1 follows the five 1's.) [ check with the text ] If the end-of-frame marker is in the message, the sender will add that zero to make sure that it's not in the message. The receiver will remove the zero. What if we want to send 011111[0]0 somewhere deep within the frame (so it would have frame markers)? Still have to stuff! [011111]{0}[00] where [original] {stuffed bit} Just like in the quoted string... you have to be able to recognize the escape (as in "c:\\" where you want the escape character) Means you have some probability of lengthening the frame. (not free) (iii) fixed size frames not all that interesting. ATM cell is 48 bytes long (plus 5 of header). if everything is of this fixed size, don't need to waste any bytes delimiting the beginning and end of a frame. may be some periodic clock-synchronizing signal. ERROR DETECTION. Version 1: Parity Bit count the 1's, if odd, parity bit is 1, if even, 0. add the 1's. chance of detecting a single bit error? 100% (the error might be in the parity bit too) chance of detecting a two-bit error? 0%. if you have only 8 bits to send (a character over a serial connection) Version 2: Checksum. instead of adding the 1's, add the 16-bit words.o "internet checksum" is the ones complement of the sum of the 16-bit words. if you store the checksum as the negative sum of 16-bit words... to check, just *add* the 16-bit words and hope you get 0. an aside about handling the carry when adding 16-bit words for a checksum. 0xffff 0x0002 ------ 0x0003 nevermind that. I will bring in some code. chance of detecting a single bit error? 100% (the error might be in the parity bit too) chance of detecting a two-bit error? depends on which two bits. I'd guess it's still 80%-like... considered to be rather weak. (relatively few bytes to protect, have another scheme at your disposal.) ** lecture 5 Oscilloscope: delay, noise, collisions, modulation, preamble. CRC: like checksum, but with division. 2-D parity: simple error correcting code. Submit server woes: "could not run test process" - that's the submit server being lame. don't turn in a compiled version of "zero". it can cause the build to fail so that your code doesn't execute. Assignment hints: PA1 due friday. use two sockets; you'll need them for PA2. *Don't* listen to only your packets. Interoperability is the goal. *Don't* crash on a bad packet. Don't print bad packets. Bad means invalid length, newer version, unknown protocol (maybe more) "Robustness principle" <- be conservative in what you send, liberal in what you accept. (though in this set of assignment, don't print invalid packets, it just means, don't reject packets having a source address or dest address, or for any other reason I think is arbitrary) There's no need to fork. Oscilloscope. fun for me. CRC we end up adding to the packet the *remainder* of a division. Division over a binary field with no carry. Agreed upon divisor. As an example: 10011 -- x^5 + x + 1 * important that the first and last bits be 1. * different links can require different polynomials Message: 1101011011 Divide the message by 10011 to find the remainder. Only other trick is that there is no carry. 1-1=0, * 0-1=1 *, 1-0=1, 0-0=0. ._____1100001010 10011 | 11010110110000 -10011. . ------. . 010011 . -10011 . ----- . 000001011. -10011 ----- 10100 -10011 0111 <- that is the number we want. More fun: 2-D parity. 16 bits to send. : 48 (H) 69 (i) 0100-1 1000-1 0110-0 1001-0 ---- 0011 0<- not the diagonal, just the row or the column (both are the same) 25 bits to send 16. 0100-1 v 10I0-1 x not good. 0110-0 v 1001-0 v ---- 0011 0 v vvxv v if just one bit is broken, can correct it. 0100-1 v 1000-0 x 0110-0 v 1001-0 v ---- 0011 0 v <- the zero can check the parity bits. vvvv x can correct the parity bit 2x2 rectangle of corruption can go undetected. 0100-1 1II0-1 killed four, specifically chosen bits. 0OO0-0 and can have an undetected error. 1001-0 ---- 0011 0 Playing with errors (aside) Burst errors - many errored bits, all at once. - many potential reasons for burst errors. Single bit, random errors - should be easy, cksum, crc. Convert burst errors into fake single bit errors. taking "rows" of bits and turning them into columns. ** lecture 6 Homework 1 return. not in alphabetical order. my ta is slacking! (four left, after class) PA1 questions, I'm sure. ARQ Sequence Numbers and ordering. Debugging PA1: I'm sending but not receiving. * if you copied and pasted the header structure, you may be missing source_address, because it was gobbled by the comment that went off the edge of the page. I'm receiving but not printing. It works fine for me, but not on the submit server. * you didn't put the port back to 41710. bind() fails on nauseated. eventual public shame threatened. * don't have to reuseaddr on the socket you don't bind. remind neil to un-secret-ize the currently secret tests. Reliability: phy layer; ensure that bits are reasonably likely to be recovered. clock recovery, baseline wander, redundant bit encodings - 4b/5b. data link layer; often crc. ARQ. (checksums, even correcting codes like the 2-D parity) transport, network; often checksum. ARQ. ARQ: What do we do if the CRC doesn't match, if we don't see the end of frame, or if there are two-bit errors in 2-D parity? (or any of several other errors we haven't talked about yet) "automatic repeat request" acknowledgements and retransmissions. "please send that again" "I've received 1, 2, and 4" (implicitly saying "3" is missing) "I've received everything up to 4" (which might imply "5" is missing) "I've waited and waited for her to call. I will call her again." Assume for the moment that we have several packets to send. (more than just 1500 bytes). * if we know that the destination can take ten packets at a time, send ten packets at once, and wait for the receiver to tell us which ones they're missing or if they received all of them -- if they don't say anything, wait... 5 seconds and then try again. [[ push this idea on the stack, so we can describe a simple-as-possible scheme ]] We can send only one packet "at a time". -- if we have only one wire, only one radio medium., there's no reason to send more than one packet without getting feedback. Label these packets "0" or "1". Send packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. Send packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. (( in the layer diagram, the network layer helps us cross different links, potentially of different types, transport layer sits atop that, so the constraint about having only one wire would be inappropriate for transport layer (i.e., TCP) stuff. so TCP has a sequence number much larger than one bit.)) Send packet 0. Get ack for packet 0. Send packet 1. *LOST permanently, dropped* not gonna get an ack! Resend packet 1. Get ack for packet 1. Send packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. Another situation Send packet labeled 0. Get ack for packet 0. *LOST permanently* not getting the ack. don't know whether it was the packet that got lost or the ack that got lost. No one suggested retransmitting the ack! Great! Sender responsibility is common. Resend packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. Send packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. If packets can go off into some corner of the network, and so, when retransmitted, there are two copies of the same packet labeled "0", an older 0 can sneak in before the next zero, causing badness. Send packet labeled 0. disappears. Resend packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. Send packet labeled 0. Prior packet 0 is resurrected, arrives before the new packet 0. Get ack for packet 0. Send packet 1. Get ack for packet 1. We can use a one-bit sequence number IF packets can't hang out and appear at inconvenient times. On a single wire. One-bit sequence number will not work well on a large network because of the potential for a delayed packet to sneak in. If the round trip time is large, you can't take advantage of the bandwidth by sending more than one packet at once. How big a sequence number space do you need? if not over a single link. One bit is not enough, how many do you need? * fill the pipe. product of the bottleneck bandwidth and delay. (multiply by two... I think) -- this is enough to get the performance available ** bytes or packets per second times round trip time times 2. * large enough that no previously-transmitted (potentially duplicated) packet can sneak in. TTL; time-to-live; supposed to be decremented by 1 every second that a packet lives in the Internet, in practice also decremented on every router it passes. - *mechanism* for keeping packets from living a long time. - ensuring that packet doesn't consume infinite network resources. (doesn't loop forever) MSL "maximum segment lifetime" we assume that all the packets will be cleared in at most this long. 2 minutes. TTL is at most 120. In all likelihood most implementations set it to 64. ** bytes or packets per second times MSL (I'm leaving it without the times 2, but I'm not certain) TCP numbers bytes. * why? ** if you get an out-of-order TCP segment, you know exactly where in the buffer these bytes will go based on the sequence number of the first byte (you don't have to guess based on the sizes of the missing packets). ** [ could imagine instead of IP fragmentation, allowing TCP-level fragmentation so that the TCP pieces would arrive. Similar thing with NATs ] ** "telnet" people would actually type. telnet over TCP. what happens when you type? each character gets sent in its own packet. (small white lie.) what happens if a character-in-a-packet gets lost? you'll retransmit the character. what if you type fast? you can retransmit all the characters at once -- TCP can re-segment previously transmitted data. 3 n 4 s <-lost 5 p 6 r <-maybe also lost? retransmission: 4 spr ** or for path mtu (mtu==maximum transmission unit) discovery (alternative to fragmentation) send a large packet, get an error, split it into smaller packets, each of which will fit without fragmentation. mtu associated with your interface. 1500, aside from tunnels. path mtu, smallest mtu along the path to the packet's destination. as link bandwidth increases, less reason for a small mtu [ 1500 byte IP | GRE | 1460 byte IP | TCP | something ] You could calculate the maximum performance of TCP knowing that it has a 32-bit sequence space. 4 billion bytes. - "actually might be a problem" ; physicists stumble upon it. ** lecture 7 PA1 Review. PA2 Distribution; discussion embargo lifted! techniques you'll need. HW3 Distribution. 1 #include 2 #include 3 #include 4 #include 5 #include 6 #include 7 #include 8 9 struct msgFrame { 10 uint8_t version; /* must be 1 */ 11 uint8_t ttl; /* must be 1 */ 12 uint16_t payload_length; /* bytes following the header */ 13 uint32_t account_identifier; /* digits of your account name */ 14 uint32_t source_address; /* unused for now */ 15 uint32_t destination_address; /* unused for now */ 16 uint16_t checksum; /* unused for now */ 17 uint16_t protocol; /* must be 1 */ 18 char* msgString; /* stoes the string to be transmitted */ 19 }; struct msgFrame *f = malloc(0xffff); f->msgString = ? memcpy(f->msgString, "hello", 5); /* would barf! */ 20 21 const int DEBUG = 0; 22 const char* HOST_ADDR = "224.0.50.111"; 23 const int HOST_PORT = 12345; int main(int argc, char* argv[]){ char* str_to_send = NULL; // stores message from cmd line int sd = -1, sd2 = -1; // socket descriptors struct sockaddr_in addr_local; // to send to the multicast server struct sockaddr_in addr_multicast; // to recv from the multicast server struct ip_mreq multicast_request; // request to join multicast server const char* multicast_ip = "224.0.50.11"; unsigned int multicast_port = 12345; 18 #define PORT 12345 19 #define IP "224.0.50.111" 20 #define MAX_BUFFER 65535 21 22 struct header { 23 uint8_t version; 24 uint8_t ttl; 25 uint16_t payload_length; 26 uint32_t account_identifier; 27 uint32_t source_address; 28 uint32_t destination_address; 29 uint16_t checksum; 30 uint16_t protocol; 31 }; 32 33 struct packet { 34 char * message; 35 struct header hdr; 36 }; 37 18 struct header{ 19 uint8_t version; /* must be 1 */ 20 uint8_t ttl; /* must be 1 */ 21 uint16_t payload_length; /* bytes following the header */ 22 uint32_t account_identifier; /* digits of your account name */ 23 uint32_t source_address; /* unused for now */ 24 uint32_t destination_address; /* unused for now */ 25 uint16_t checksum; /* unused for now */ 26 uint16_t protocol; /* must be 1 */ 27 }; 28 29 //set up the header 30 struct header h; 31 h.version = 1; 32 h.ttl = 1; 33 h.payload_length = sizeof(argv[1]); h.payload_length = 4; /* same as. */ /* also not htons'd */ 34 h.account_identifier = 99; /*not htonl'd*/ 35 h.protocol = 1; /* not htons'd */ 36 h.account_identifier = htonl(036); /* octal */ h.account_identifier = htonl(30); -- Programming Assignment 2: maintain a neighbor table. ad hoc network with wireless nodes, they like to track who their friends are. just above the physical layer, but not yet routing. want to learn who we can reach directly; can later advertise this information to our neighbors so that everyone can reach everyone else. "hello" message. (sort of as in OSPF). we just send it. receiver will track who he receives it from. build a table of nodes from which he's received a hello message recently. "soft state". information that is essential, but not persistent. other examples include the arp cache. If the neighbor dies, he won't send hello. If the link dies, we won't see hello. both cases, we should drop the neighbor. Table for this assignment includes: IP address and port of the neighbor. out of the recvfrom. I would like for the port to not be 41710. second "sending" socket -- we can send unicast responses only to your process later. (the "sending" socket's port should be not-reused; your own; exclusive.) unicast_socket = socket(AF_INET, SOCK_DGRAM, 0); sendto(unicast_socket, message, length, dest_addr, sizeof(struct sin_addr)) now unicast_socket has a port, arbitralily chosen Two examples: /* will work well; will presumably have only one instance in your neighbor's neighbor table. you're holding on to the global socket */ unicast_socket = socket(AF_INET, SOCK_DGRAM, 0); sendto(unicast_socket, message, length, dest_addr, sizeof(struct sin_addr)) sendto(unicast_socket, message, length, dest_addr, sizeof(struct sin_addr)) both messages have the same source port. /* will lead to suffering; just make the sending socket a global. it's okay. because we want to be able to read from it later. */ int send_hello_message() { /* don't recreate the socket every time... that is, this fragment is bad. */ unicast_socket = socket(PF_INET, SOCK_DGRAM, 0); sendto(unicast_socket, message, length, dest_addr, sizeof(struct sin_addr)) close(unicast_socket); /* don't do this */ } while (1) { send_hello_message(); } both messages will have different port. /* don't do this */ IF you want to know what port you bound to, getsockname. ... after call to bind or sendto. BUT this is not necessary. /* this is fake syntax */ bind(multicast_socket, { AF_INET, 224.0.50.111, 41710 }, number); /* unnecessary: */ bind(sending_socket, { AF_INET, INADDR_ANY (0) , 0 }, number); /* I think this is close to legal syntax for struct stuff. */ struct sockaddr_in m = { AF_INET, inet_addr("224.0.50.111"), htons(41710) }; bind(multicast_socket, &m, number); source address: function of (getpid(), account_id, 12), htonl'd send "hello" every 25 seconds. Table for this assignment includes: IP address and port of the neighbor. out of the recvfrom. network address from the header (source address in our special header) account_id (from the header) is alive? (have we heard from this guy less than 2 minutes ago). you can just omit, discard any entry that is not alive. intent: support debugging. time remaining (how much longer we'll think him alive for). 0 if nothing left. see another message, update the existing entry. (ip,port) <- unique. (net address from header) <- unique MAY update on either/both, whatever, I won't screw with it. when stdin is closed, die. ^D will close stdin. return value from fgets is zero.. . read returns zero. "print neighbor table" on stdin. (without quotes) you must respond. quickly. no sleeping for 25 seconds What is the file descriptor number of stdin? 0. If you print a value, convert it from network byte order. The two-minute expiry. when you receive, note the time. when you print, do the comparison. NO BUSY LOOPS! NO SLEEPING FOR 1 SECOND ALWAYS. How? you may ask? "select" "poll" you can choose. select is traditional, portable, ugly, not very scalable. poll is new and exciting, and uses an array of structures, scales better. Problem: mulitcast socket, unicast socket, stdin, and we have a timer (we have to wake up every 25s). "select" allow us to punt this problem to the kernel. select takes: first param maximum file descriptor number you're using + 1. max( unicast_socket, multicast_socket) + 1. three bitmasks: readable, writable, exception(?) we only care about the "readable" that is, that won't block if we call read or recv. on input, we set the bit if we're interested. on output, kernel sets the bit if there's something interesting. last param: how long (duration) to wait before waking us up. will it always be 25? NO!!! right after you sent a hello, probably 25 (25 minus epsilon) but, if you're interrupted. will it always be 1? NO!!! (because you know when you have to send a new hello.) struct timeval *. when you send the first hello. call gettimeofday(struct timeval *tv, NULL) /* globals */ struct timeval now; /* has only two fields. */ struct timeval next_hello; struct timeval sleep_interval; hello transmission() { sendto(unicast_socket, hello message, len, multicast addr, maddrlen) gettimeofday(&now, NULL) next_hello.tv_sec = now.tv_sec + 25; next_hello.tv_usec = now.tv_usec; } TIMERSUB( &A, &B, &C ) while() { /* why you need to handle the carry/borrow from the usec */ /* what if next_hello.tv_usec < now.tv_usec? */ sleep_interval.tv_sec = next_hello.tv_sec - now.tv_sec sleep_interval.tv_usec = next_hello.tv_usec - now.tv_usec select(..,..,.,., &sleep_interval) } next_hello_time and subtract now. (assuming positive) macro: TIMERSUB() handles carry. ** lecture 8 Moving from basic reliability to performance, heterogeneity. * we mostly have the ideas of reliability. * now we just want to make it work faster. Early ARPANET transport and Sliding Window startup. * send more than one packet at a time. Cumulative and Selective ACK. Flow control. Phy and data-link layer tend to have only one packet outstanding at a time. Most of these (concurrency) issues are TCP related. < three layers up from the phy. PA2 Sockets: "multicast socket" -> "multicast receiving socket" only receives. only multicast. "sending socket" to "unicast socket", renamed to "something" sends to multicast address, future: send and receive unicast. present: the address information (IP, port) are remembered by your neighbor table. Early ARPANET transport. Start from stop-and-wait (one packet at a time) * well defined performance: 1 packet per RTT RTT = round trip time. * nothing to do with the bandwidth of the wire. Run in parallel! if you run four stop-and-waits concurrently, get 4x the performance End-to-end connection between sender and receiver. If you can have one stop and wait session from S to R, very little stops you from having four. In a sense, netscape/mozilla do a similar thing. open many connections to the same server at the same time. Catch (or at least one of the catches) * four concurrent streams sender's perspective (send pkt, recv ack) time ... -> A pkt ack;pkt ... wait for timeout ... rxpkt B pkt ack;pkt ack;pkt ack;pkt C pkt ack;pkt ack;pkt ack;pkt D pkt ack;pkt ack;pkt ack;pkt Aside: simplex, half-duplex, full-duplex links. (cat 5 100 Mbit one pair for up, one pair for down) (only one guy can send at a time, but both ends can send (old school ethernet, 802.11) (satellite like links; one direction only.) (one type of channel to send one direction, a different type for the other.) Speaking about stuff at the transport layer, these issues kind of don't matter; we are up toward the level where there are devices (routers or switches) connecting many wires, these devices have buffers (can queue packets if the medium/wire is busy); the connections are long-haul (many milliseconds), making performance dominated by latency, not bitrate of the wire. concurrent streams: * complicated putting the original stream back together. * loss of packet detected only by timeout. * (don't know) what happens when some streams get way ahead of others? *Sliding Window Transport* Decide on a "window" size. Send "window" packets into the network, unacknowledged, at a time. Acknowledgements are *cumulative*. (not selective, can't ack out-of-order data.) IETF RFC 793 (TCP) - RFC's are the standards documents for the Internet. if the number of the RFC is less than 1500, it's probably good, well written. RFC 791 (IP) Imagine you have an infinite stream of bytes to transmit. * In TCP, bytes have sequence numbers, acknowledgements apply to bytes. could send the same bytes in two different packets. At the sender, we have the following bytes to transmit, and we have a window of 5 bytes. and a packet size (MSS) of 1 bytes. -------------------------------------------------------- bytes already transmitted and acknowledged. (pre-window) bytes transmitted but not acknowledged. (in the window) bytes we're allowed to transmit, but haven't yet. (in the window) bytes we're not yet allowed to transmit. (beyond the window) bytes we (as the kernel) haven't even been given. At the receiver, ---------------------- bytes that have been delivered to the app. bytes that we've received in order that are ready for the app. (been acknowledged) bytes that we've received, but not in order. non-existent space to store bytes beyond the window. ---=----= - -- - | Send Sequence Variables | | SND.UNA - send unacknowledged | SND.NXT - send next | SND.WND - send window | SND.UP - send urgent pointer | SND.WL1 - segment sequence number used for last window update | SND.WL2 - segment acknowledgment number used for last window | update | ISS - initial send sequence number | | Receive Sequence Variables | | RCV.NXT - receive next | RCV.WND - receive window | RCV.UP - receive urgent pointer | IRS - initial receive sequence number | | [Page 19] | | September 1981 |Transmission Control Protocol |Functional Specification | | The following diagrams may help to relate some of these variables to | the sequence space. | | Send Sequence Space | | 1 2 3 4 | ----------|----------|----------|---------- | SND.UNA SND.NXT SND.UNA | +SND.WND | | 1 - old sequence numbers which have been acknowledged | 2 - sequence numbers of unacknowledged data | 3 - sequence numbers allowed for new data transmission | 4 - future sequence numbers which are not yet allowed | | Send Sequence Space | | Figure 4. | | The send window is the portion of the sequence space labeled 3 in | figure 4. | | Receive Sequence Space | | 1 2 3 | ----------|----------|---------- | RCV.NXT RCV.NXT | +RCV.WND | | 1 - old sequence numbers which have been acknowledged | 2 - sequence numbers allowed for new reception | 3 - future sequence numbers which are not yet allowed | | Receive Sequence Space | sliding window sender numbers will represent sequence numbers on the one-byte packets. transmissions : 1 2 3 4 5 6 7 8 9 A acknowledgments:0 1 2 3 4 5 <- each ack includes everything up to and including* that value. * not actually TCP transmissions : 1 2 3 4 5 67 89 acknowledgments:0 (acks 1 & 3 lost) 2 4 <- * "real" TCP uses "delayed acks" - intentionally sending only every other ack for in-order receipt of packets. ((almost) doesn't hurt anything). transmissions : 1 2 3 4 5 (2 lost) 6 acknowledgments:0 1 1 1 1 1 <- scheme for cumulative is to send an ack on every received packet. an ack says what you have. transmissions : 1 2 3 4 5 (2 delayed) acknowledgments:0 1 1 1 1 5 <- 2 arrived ^ Retransmission schemes in a sliding window transport protocol: * timeout. some (long) time went by. retransmit * RTO > RTT, and RTO > 1 second. * "fast" retransmission: three duplicate acks. * "janey hoe-style" retransmission: any duplicate ack if ack is overdue (rtt). (won't talk about it much) overdue > RTT. transmissions : 1 2 3 4 5 (2 lost) 6 2 acknowledgments:0 1 1 1 1 1 6 <- transmissions : 1 2 3 4 5 (2 & 3 lost) 6 2 acknowledgments:0 1 1 1 1 2 <- when packet received 1 4 5 6 2 difficult to recover using only fast retransmission. [ 12345 [ 6789A 12345 ] 6789A ] Two main ways to alter the size of the window: FLOW CONTROL - manage the receiver's buffer. if the receiving app is slow, don't want to send packets through the network that would be dropped at the destination. CONGESTION CONTROL - manage the network's buffers. if you're using all the bandwidth, and someone else wants to share. RTT estimation... ** lecture 9 Sliding window introduced and described in Peterson S2.5.2. Cumulative acks, (As well as the alternates, "selective" and "negative") Sliding window in TCP reintroduced in Peterson S5.2.4. Adds receiver's advertised window. S5.2.5 extends with two means of preventing "tinygrams" Then! connection setup 5.2.3. Aside: Negative ack - receiver somehow figures out that a packet is missing and asks for that one to be retransmitted. PA2 issues: select returns 0 -> no file descriptors were ready. -1 means error. positive means the number of file descriptors are ready. if(FD_ISSET(&readfds, socket_fd)) { } /* style: on't use "else": select might give you more than socket. independent if's to handle many being ready at once. */ if(FD_ISSET(&readfds, 0)) { // read from stdin. } Flow control time. controlling the amount of information going into *receiver's buffers*. (contrast congestion control, which protects the network's buffers.) we would like to not have packets dropped at the receiver when the receiver has no more space to store out of order packets or data for slow applications. devices that can deal with data slowly (printers, audio thingys) will find some way to slow the sender down. (protect their buffers.) In TCP, this is handled by a field in the acknowledgement that expresses the size of the window. (the remaining space in the buffer allocated to this transfer.) offset from the ack sequence number to the right edge of the window. .<- beginning of this connection's seq space --------|-------|-----|------ ******** received, given to the application ******** received in order, but the app hasn't read. ***** not yet received, or out of order ****** sender better not have sent. ^ sequence number we've acked. -----| ack will include both: a) the sequence number of the ack (th_ack) (^) b) the number of bytes after the ack that (-----|) the receiver is prepared to receive (th_wnd) TCP: the ack value is that of the next byte expected. (i.e., not yet received byte) (I don't really know why this decision was made.) If the receiver's application hasn't called read(), what happens as new, in-order packets arrive? a) advance th_ack -- the ack number to the next byte expected. b) keep the right edge from moving: subtract from the advertised window (th_wnd). Before: ^-----| ack 5 wnd 5 After : ^--| ack 8 wnd 2 Key: ^ - represents the ack sequence number. - - represents a byte in the advertised window. | - represents the right edge of the window. Sequence space in TCP applies to bytes. TCP is a bytestream-oriented transport protocol. If the sender sees "ack 8 wnd 2" Sender can send "8" and "9" After After: ^| ack 10 wnd 0 <--- closed window. How does the sender learn when to send. [[ push this on the stack ]] What happens when the receiver finally calls read()? Before : ^--| ack 8 wnd 2 After : ^----------| ack 8 wnd 10 If the application calls read, we get to send an ack to open the window. *(not precisely true) Let's receive out-of-order data "9", "10" "11" (not 8.) Before : ^----------| ack 8 wnd 10 (completely "empty") After 9: ^-=--------| ack 8 wnd 10 (one is used...) After10: ^-==-------| ack 8 wnd 10 After11: ^-===------| ack 8 wnd 10 NOT !!!: ^-===---| ack 8 wnd 7 Key: - sequence number available in the buffer. = sequence number we've received out-of-order The advertised window is a committment to provide the memory resources. Begin description of tinygram pathologies and fixes! Let's say the application is telnet. Before: ^| ack 10 wnd 0 the telnet application might getchar. read(socket, &c, 1); After: ^-| ack 10 wnd 1 Sender gets the ack. Sends "10" (one byte only). At the dest: ^| ack 11 wnd 0 A bit later: ^-| ack 11 wnd 1 and the process repeats. What happens if the receiver's window-opening ack is lost? Before: ^| ack 10 wnd 0 Then : ^----------| ack 10 wnd 10 *lost* The answer is *not* receiver retransmits the ack. Because the sender is the responsible one. The sender knows if there's more to send. And there's no ack of an ack. Sender allowed to send a "window probe" After a timeout-sized period (seconds), sender can send the next byte expected. (#10) Identical to all other data packets *except* that it's sequence number is at (or just over) the right edge of the window. Tries to figure out if the window has opened, since there's some reasonable chance that the window-opening acknowledgement was lost. "Silly Window Syndrome" How might we avoid this? (-) could forbid the app from reading characters at a time. (not expected to work... those pesky application writers. can't trust them.) (a) receiver: send one of these window advertising acks only when there's a substantial growth in the advertised window. (i) substantial: a whole segment* (much larger than one byte) or a half of the buffer (might have made sense for very low-resource devices). * maximum segment size (mss) \approx MTU-header size. (b) sender: don't send into a very small window. (unless all remaining data to send fits.) (c) receiver: delay the acknowledgements: hope is that the application will have (either ack every other packet waiting ~200ms for the second.) The book might make it appear as if Nagle's algorithm is a technique for silly window syndrome avoidance. Don't be fooled. ** lecture 10 Midterm: 3/25. SWS Review Nagle TCP header fields Connection setup and the TCP state machine. Connection trace with some flow control. PA2: bytes_read = recvfrom(s, buf, buflen, &sender_address, sizeof(struct sockaddr_in)); sender_address is in network byte order. sin_addr.s_addr is in network byte order. sin_port in network byte order. (ntohs) ... table[i].ip_addr = ntohl(sender_address.sin_addr.s_addr); table[i].udp_port = ntohs(sender_address.sin_port); ... printf("%s, %d, ...", inet_ntoa(table[i].ip_addr), table[i].udp_port); inet_ntoa expects network byte order, returns the string. (probably deprecated and bad since not thread safe). inet_addr("127.0.0.1") provides network byte order. PA2: continue to check for protocol, version. don't compare the payload to anything. Silly window sydrome review: Send many tinygrams because the receiver's window is only opened by one (or very few) byte(s) at a time. Avoid by: (a) don't send into tiny buffers, (b) don't advertise tiny buffers, (c) delay acks expecting the app to empty out the buffers. Another way to get tinygrams, even with empty receiver buffer. If the sender produces one character at a time. My slow keystrokes! Or very lame applications that call for(i=1; i (let's say this takes a while. long distance network) <----ack----- --- "ello" -> can send this only after ack of "h" (say, a while later... same network) <----ack----- if the bizarre code above (for(...) write(... 1)) --- byte 1 ---> --- 2:501 ---> - 502:1001 ---> corner case / neat trick: on timeout, you could send bytes 1:499. Times when you really don't want Nagle's algorithm delaying your transmissions: - real time! - "sending an emergency self-destruct command". - voip -- you kinda don't want to use TCP to begin with. it's okay to lose stuff, it's not really okay to delay stuff. - games -- might also use something non-TCP-ish. - "canonical example" is mouse movements in VLC, remote X. TCP Header Format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ TCP Header Format refer to rfc793 for detail. Transmission Control Protocol Functional Specification +---------+ ---------\ active OPEN | CLOSED | \ ----------- +---------+<---------\ \ create TCB | ^ \ \ snd SYN passive OPEN | | CLOSE \ \ ------------ | | ---------- \ \ create TCB | | delete TCB \ \ V | \ \ +---------+ CLOSE | \ | LISTEN | ---------- | | +---------+ delete TCB | | rcv SYN | | SEND | | ----------- | | ------- | V +---------+ snd SYN,ACK / \ snd SYN +---------+ | |<----------------- ------------------>| | | SYN | rcv SYN | SYN | | RCVD |<-----------------------------------------------| SENT | | | snd ACK | | | |------------------ -------------------| | +---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+ | -------------- | | ----------- | x | | snd ACK | V V | CLOSE +---------+ | ------- | ESTAB | | snd FIN +---------+ | neil sez: close CLOSE | | rcv FIN V for write ------- | | ------- +---------+ snd FIN / \ snd ACK +---------+ | FIN |<----------------- ------------------>| CLOSE | | WAIT-1 |------------------ | WAIT | +---------+ rcv FIN \ +---------+ | rcv ACK of FIN ------- | CLOSE | | -------------- snd ACK | ------- | V x V snd FIN V +---------+ +---------+ +---------+ |FINWAIT-2| | CLOSING | | LAST-ACK| +---------+ +---------+ +---------+ | rcv ACK of FIN | rcv ACK of FIN | | rcv FIN -------------- | Timeout=2MSL -------------- | | ------- x V ------------ x V \ snd ACK +---------+delete TCB +---------+ ------------------------>|TIME WAIT|------------------>| CLOSED | +---------+ +---------+ TCP Connection State Diagram Figure 6. 7.456042 IP 10.0.1.2.59655 > 10.0.1.1.6000: S 470725384:470725384(0) win 65535 7.457239 IP 10.0.1.1.6000 > 10.0.1.2.59655: S 475384637:475384637(0) ack 470725385 win 8192 7.457292 IP 10.0.1.2.59655 > 10.0.1.1.6000: . ack 1 win 65535 7.710674 IP 10.0.1.2.59655 > 10.0.1.1.6000: P 1:457(456) ack 1 win 65535 7.710763 IP 10.0.1.2.59655 > 10.0.1.1.6000: . 457:1905(1448) ack 1 win 65535 7.710798 IP 10.0.1.2.59655 > 10.0.1.1.6000: . 1905:3353(1448) ack 1 win 65535 7.710826 IP 10.0.1.2.59655 > 10.0.1.1.6000: . 3353:4801(1448) ack 1 win 65535 7.713372 IP 10.0.1.1.6000 > 10.0.1.2.59655: . ack 4801 win 41540 likely to describe some of the rest later. ** Lecture 11 PA3 RTT Estimation, Karn's (or Karn/Partridge) algorithm RTO calculation ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z Hubs, Bridges, and Routers. PA3 to rate-limit unicast transmission. "safety" - after all, we don't want you hosing any campus networks. ensure that no more than one packet (1000 bytes max) to a dest goes out in any 0.1 second. <- service rate. burst queue of 10 packets. if I send 12 all at once, I expect at least one to be dropped. much like "print neighbor table" and "quit" main "new" command: "sendmsg %u %s" <- our network address (getpid <<..) followed by a string message. okay nevermind.... %s will end on spaces, maybe that's not good. sscanf(buffer_from_stdin, "sendmsg %u %n", &dst_addr, &msg_begins_at_index) // fscanf(stdin, "sendmsg %u %n", &dst_addr, &msg_begins_at_index) strcpy(message_data, &buffer_from_stdin[msg_begins_at_index]); //roughly // also figuring out the length.... use dst_addr to find the neighbor entry. -> gives us the ip and port. copy and paste.... sendmsg 11331 hello sendmsg 11331 how are you sendmsg 11331 what's happening PA2: you send on socket A to the multicast address. you receive on socket B the messages sent to the multicast address. PA3 adds: you receive on socket A the messages sent directly to you. for the per-neighbor rate limiting,... how? add a field to the neighbor structure saying when the last packet will have been sent. (tricky) When you're given the first packet for a destination, send it! it's away. When you're given the second packet for a destination, a) if it's been at least 0.1 seconds since packet #1, send. b) queue. when will it get sent? 0.1 seconds after the last sent unicast packet to that address. When you're given the third packet for a destination, a) if it's been 0.1 seconds since packet #2 was already sent, send. b) else queue, until 0.1 seconds after #2 was (or will have been) sent. You *can* maintain one queue per neighbor. You can also maintain just one big queue of events. :) to know when the last packet would have been sent, schedule the next packet for 0.1 after max(that last guy, now) which means, you don't have to grub around in every (even idle) neighbors to find the next packet to send and when it will get sent. Be careful that you do not give select a negative number of seconds to sleep. Error messages: "ERROR:NOBUFF" that is, don't block. "ERROR:NOROUTE" the guy is not my neighbor. print to stdout. select([ ], nil, nil) Back to the non-PA course content... RTT Estimation, Karn's (or Karn/Partridge) algorithm RTO calculation ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z TCP reliability stuff: RTT & RTO Why track how long it takes to get a reply from the receiver? (The RTT) Will let us set the RTO (retransmission timeout.) If RTO >> RTT -> slow. if we lose a packet, it will take unnecessary time to retransmit it. If RTO \approx RTT -> might retransmit packets that don't need to be retransmitted. Initial RTO in TCP is 3 seconds. Measuring RTT. Send a packet. Get the ack. Subtract the times! RTT can vary, especially if you choose a packet that's gonna be queued in the network for a while. what's the catch? could lose that packet. can still retransmit. if you get the ack for the retransmitted packet just after you sent the retransmission, your estimate of RTT will be way too low. VERY bad: when your RTT is large (larger than your estimate) (meaning your estimated RTT makes you retransmit early.), you decide that the RTT is way small. (because the ack of the original transmission ends the timer that the retransmission started.) Karn/Partrdige algorithm - don't take RTT samples while retransmitting. correct answer: "don't do that" RTO estimation: combine "mean" RTT and deviation to find a good RTO (good means longer than most all RTT's but not by much). Calculating the "mean" uses "stochastic gradient" mean = (1-\alpha) mean + \alpha * sample m := measured value a := average RTT estimate v := mean deviation (variance) not divide. integer arithmetic only. shifts are okay. sa := "scaled" average RTT estimate (multiplied by 8) sv := "scaled" variance estimate (multiplied by 4) // gets us the average m -= (sa >> 3) // m is now the "error" sa += m // added 1/8 the error to a // gets us the variance. if (m<0) m=-m; // error = abs(error) m -= (sv >> 2) // what's the difference between the // error and the expected error sv += m // added 1/4 the variance to v. // finally rto = (sa >> 3) + sv For next time: Read S 3.2 we should start bridges next time. ** lecture 12 Ruby fragments for PA3. TCP states review in system calls. Addressing. 3 tricks for Ruby in PA3 1. fool the makefile 2. fool the compiler warning test 3. setsockopt for multicast. # so that "make" just runs. # so that executable bits get set. three: three.rb cp three.rb three chmod +x ./three three.c include: int main(int argc, char *argv[]) { exit(0); } =============== #!/usr/bin/ruby # Other notes: remind neil to check version ruby on submit server # IPAddr not necessarily present. require 'socket' addr = '224.0.50.112' mreq = (addr.split('.').map { |octet| octet.to_i } + [ 0 ] * 4 ).pack('C*') raise "oops" unless mreq.length == 8 multicast_socket = UDPSocket.new multicast_socket.setsockopt(Socket::IPPROTO_IP, Socket::IP_ADD_MEMBERSHIP or 12, mreq) readable, dummy, dummy = select([$stdin, multicast_socket], nil, nil, 0); readable.each { |rdfd| case rdfd when $stdin puts "read from stdin: %s" % $stdin.gets when multicast_socket puts "got a packet" else raise "argh" end } __END__ TCP States and system calls. to create a TCP socket: call socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) could possibly call bind before the connect (though I don't know quite why. "getting past stupid firewalls" suggested.) to create a client (active open): call connect(s, destination_addr/port, addr_len) call send, write, call read, recv, select, to close the connection: close(s) if we are done writing but not reading: shutdown(s, SHUT_WR); // generates the fin. // it would be nice if web clients used this. if we are done reading but not writing: shutdown(s, SHUT_RD); to create a server (passive open): bind(s, (the port we want to claim "80"), addr_len) listen(s, 5); // parameter is the backlog of not-yet-accepted connection. can also call select on s (in rdfds) to tell if there's a connection waiting. connection_socket = accept(s); // will block until there's a new connection to be accepted. after the accept() there are two sockets: one that is the new conversation, and one that is still listening. read(connection_socket, ....) write(connection_socket, ....) eventually close(s); never read(s, ...); write(s, ...); shutdown or close(connection_socket); Design of Apache: create the accepting socket. fork 8 times. each forked child process would call accept. the kernel would give a new socket to one of the children. *********** Addressing: Why do packets, datagrams, etc. need addresses? A) For routing. If I send a packet with an address of a specific destination, I want the network to carry it there (and nowhere else) <-- IP addresses are like this. wikipedia NickServ for a nice little story. B) For the recipient to decide if they're interested in the packet. <-- ISA bus, or "Classroom" Ethernet. All the interfaces receive the packet, but only those addressed wake up and pass it along. To send an IP datagram across an Ethernet, what destination Ethernet (MAC) address does the sender put in the frame? a) if the destination is on the same subnet (i.e., is nearby, on your Ethernet segment), want: the destination ethernet address. b) if the IP destination is on a different subnet, (i.e., is far away), want: the ethernet address of the gateway. That is, IP will tell us what the next hop across the ethernet is. *) For example, destination IP address is 128.8.128.8, .... [ Ethernet header: src mac address <- the "hardware" address used by the data-link layer underneath (in this case, IP) dst mac address protocol field (ip) [ IP header: .... src IP address dst IP address [ data ] ] ] Goal: find the ethernet address corresponding to an IP address. Scheme: ARP (address resolution protocol). Two messages: "who-has", "is-at". How/who do we ask "who has (what is) the ethernet address corresponding to IP address 128.8.128.8?" ... ask everyone. Request: [ Ethernet frame: src mac: my address. dst mac: broadcast. ff:ff:ff:ff:ff:ff protocol: ARP (there's a different number for ARP than for IP) [ ARP message: who-has ip: 128.8.128.8 ether: ] ] Response: [ Ethernet frame: src mac: that guy's address (hooray?) dst mac: my address. protocol: ARP [ ARP message: is-at ip: 128.8.128.8 ether: 00:17:c1:00:84:23 ] ] From these responses, we build an "ARP table" maps IP addresses to ethernet addresses. Run 'arp -na' Alternative to setting up IP routing properly: Proxy ARP. Two ways: 1) have a machine send arp responses for an additional ethernet address that is actually someplace else. (modem attached, in vmware) machines don't complain when more than one IP maps to the same ethernet address. 2) have a router answer arp for every IP address in the universe (or at least not on your segment)