first day green quiz. questions I expect you to know the answers to soon. if you know the answers to these, you will likely have an easy time this semester. if you know none of the answers, you may have a bit of struggle. even though it looks like I want you to turn it in, I don't. it's your own private information how completely unprepared you might be for this class. heh. syllabus review and discussion quick overview of networking research. 7 - layer OSI model physical - encode/modulate bits on a wire. physics. ee. software-defined radios. extract information from the superposition of signals. data link - frame those bits, addressing, typically only on a wire. play with the reliability, FEC, acks, optimize for streaming. not too clear what research focuses here. network - routing packets from one end to another, potentially across many wires. routing research, often about incentives (why do stuff), often about security (breaking the internet's routing protocol would be disaster! ) all of the routing protocols from an undergrad class (RIP, OSPF) are pretty much uninteresting for research, it's all BGP. discovering topologies research in preventing / detecting DoS attacks. transport - provide reliable delivery of larger, say files broken up into packets. congestion control, lots of ways of signaling congestion, giving incentive to those sending too fast to send more slowly. at very high bandwidth or very high latency. session - (bogus) synchronization. presentation - (bogus) application - everything on top. writing protocols and message formats. Overlay networks - "fake" networks that run as applications. Content distribution - for example, find the closest machine to satisfy a request. OSNs - BitTorrent - Themes: layering. soft state. antonym: hard state. - if you don't need to store data on disk, you can keep it in memory and just have it refreshed. - routing tables. - arp caches. - dns caches. - multicast state. end to end arguments assert: if you can implement functionality at the endpoints only, implement it there. (not in the middle.) this is where you get the idea that the internet is "dumb" some things can *also* be implemented in the network, but only as an optimization. - reliability. best-effort delivery: service model to IP that allows the network to do four bad things to your packets. - drop them. - corrupt them. bit errors happen. - delay them. (reorder them) packets won't necessarily take the same path. - duplicate. intermediate stations might try to retransmit. first homework assignment end. -- Circuit switching and Packet switching (and virtual circuits) - 1 page in P&D - 4 pages. in Kurose - ?? in Keshav. Circuit switching is the idea that you can setup an end to end connection with reserved bandwidth, typically so that each router along the path has allocated a "cell" or frame (or timeslot) for the transfer. Router forwarding is deterministic, there's a schedule, if you don't use your slot its wasted. Associated with the phone network. T-1 T-3 OC-1.. the little pieces (starting from your 64kbit/s voice line) get aggregated into larger and larger pieces. synchronous operation (the next hop router knows when one of these samples ends based on *time*. becomes expensive to keep everybody in sync. TDM - time division multiplexing Packet switching is the idea that you send your data, self-describing (where it's going), whenever you want, and pray that there's room. Analogy (in Kurose in Ross) Reservation at a restaurant: when you show up, you get your table. (downside that you have to use the phone and talk to people... or in some other way set this up.) If you have a reservation, you might have to wait. And you could figure that you're not going to wait an hour for a table. Virtual-circuit switching (mix) provide the interface of a circuit (a connection to the other side potentially with guaranteed bandwidth) without the expensive gear. includes the setup. maybe put state in the router (a virtual circuit identifier VCI) so that he doesn't have to do any hard work when forwarding. if you have to store some state in the router anyway (for the guarantee that you're trying to provide) the VCI is no big deal. separate signaling (which is the connection setup) from data transfer (the rest) (because you don't want to require using circuits to set up circuits. what a mess that would be!) Clark - 7 goals what is the primary goal? ("fundamental") multiplexed utilization of existing interconnected networks. make it work at all is most important. some networks glued together by translating between pairs of technologies; instead, you'd prefer a single common language. arpanet and arpa packet radio network what are the seven subgoals? 1. Internet communication must continue despite loss of networks or gateways. if there's somehow a failure or destruction of devices in the middle of the network, endpoints that were communicating should be able to continue communicating. if you had a circuit traversing a dead gateway, you'll have to make a new one (this might be disruptive; might "end" your communication) move the state, the logic to the ends (the only ones that have to survive to keep the communication going) downside of having provided this (relative to circuit switching) now that the network isn't in charge of how the endpoints work (or share) it is difficult to police behavior from within the network. also that endpoints have to slow down, only send as much into the network as is acknowledged, which could take time. also that the hosts have to be really smart and hold on to state. 2. Multiple types of communication service applications have different requirements. don't make decisions in the network that tailor to specific applications or preclude others. send whatever you want over the network... goal is to keep the network design out of the way of applications. 3. Variety of networks how do we design a protocol that can run on top of many underlying networks? minimal requirements / assumptions. dropped, corrupted, delayed, duplicated. maximum transmission unit decide to permit only very small frames + keeps the delay (from waiting for others down) - headers, checksums, etc. can occupy more of the packet than data. - "tinygram" - where your packets have nearly no data, are nearly all header. - per-packet overheads at routers + ( might have a greater chance of fitting into a buffer - however 1) just because the network supports large frames doesn't mean you have to use them 2) might not win if you have to send more packets, net larger amount of data to go in buffers. 3) there's a fair chance that you'd be dropped anyway if queues make decisions based on packets and not on bytes ) + the CRC (error detection code) can guarantee that it will find all 2-bit errors in messages of some limited size. + if there's a bit error, the damage is contained. - fragmentation. 4. Distributed management 1. this remains active research. "traffic engineering" hard. "traffic matrix" measurement / estimation is hard. 2. SNMP protocol (don't look) 5. Cost effective. 1. reliance on statistical multiplexing to increase utilization. 2. keeping all the fields small, avoid state at routers. 6. Host attachment with a low level of effort 1. human effort - address assignment. (though bootp/dhcp eventually helps) 2. interfaces are cheap. 7. Accountability. (hah) 1. abject failure. 2. since cisco will do what ATT wants, there exists netflow. Questions on the Blog 1 recognition that despite goal #1 and the military context, no discussion of security. - almost certainly will come back to this on tuesday. - hard to protect the physical layer. - we're still trying to protect BGP. (currently relies on pairwise trust and people to not make mistakes) (but then again, BGP is a more 1990's thing) 2 how does the IP service model relate to the IM service model. (why are my IM's delayed?) 3 wtf is EOL. you'd like to be able to say, when writing an application, this is the end of a message, process the bytes that I've sent you. you'd also like to be able to put more than one message in a packet, and the EOL flag that applied to packets probably missed this case. instead, we track how long a message is, tell the receiver how many bytes to collect before processing a message. "Content-length" header. 4 regulate the number of bytes sent per unit time. (bandwidth) also per-packet overheads. why not regulate the number of packets as well or instead? flow control - if the sender can generate data faster than the receiver can process it, hold it at the sender, don't send it to the network if it's just going to overrun the receiver. congestion control - don't overrun the buffers in the network: if you observe that the packets you sent didn't make it to the other side, it's a signal that you sent too fast, the routers can't take it, and you should send more slowly. (can be wrong) (some more explicit protocols incl. Stoica's CSFQ, Katabi's XCP can help you set the rate quickly) (ECN "I (a router) am being nice, I'm not going to drop your packet, I'm just going to flag it., in return, please be nice and slow down." nagle's algorithm. - "tinygrams" above. potential that in a remote login session, every keystroke is a packet. new rule: can only send one < MSS (less than full size) packet into the network at a time (unacknowledged) (unless you're completely done) ---- Class meeting #3 Blog comment review. Make one point. Layering phy data link network transport session presentation application 100baseT Ethernet IP TCP HTTP Firefox/apache Two views: By machine in the network Client Linksys DSL modem Server --- ---------------- -------------- --- Firefox/HTTP Apache/HTTP TCP TCP IP IP \cdots IP 802.11 || 802.11 Ethernet || Ethernet Bah! Ethernet 5GHz radio || 5Ghz radio 10 Mbit || 10 Mbit Gigabit. \___________/ \__________/ \______ \ldots ________/ By packet or bits. Imagine that we have a tiny web page. (100 bytes) Ethernet "MAC" addresses are 48 bits, basically flat, e.g., 00:0d:93:30:a1:90 IP addresses are 32 bits, hierarchical e.g., 10.105.66.70 signal to wake up . error detection code. . Eth also has src/dest. destination represent the next IP-speaking gateway/destination. . . IP header includes the addresses of: source destination addresses are of the endpoints . . . TCP header has "ports" these ports identify the connection (the application) . . . Header of the HTTP message . . . . Body of an HTTP message preamble| ethernet |ip |tcp| ctype |. ... | ethernet CRC Third scheme, suggesting the generality of layers: big map of protocols on top of other protocols. don't expect that the interfaces are tied to a single user (IP used by more than TCP, Ethernet used by more than IP) End-to-end arguments Point: If you have to implement functionality at the endpoints\footnote (for that functionality to be complete), don't implement it elsewhere (except as an optimization). The "endpoints" can be difficult to identify. Not necessarily end-hosts (could be the person? as unsatisfying as that is.) Multiple services -> if you tailor the network to the requirements of a specific application, might cause harm to others. (e.g., retransmission vs. voice applications... error checking vs. video) Five cases: 1. File transfer. What are the ends? the disks. lots of places where bits can get broken. make sure that none of the bits were broken after the file makes it onto the new disk. 2. Secure transmission. can't do it completely within the network a) have to trust the network with keys? bad idea. b) data getting to the network is in the clear. hole. c) still have to make sure it came from the right guy (authentication). redundant. can provide some things within the network. in an attempt to keep faulty apps from leaking data. Aside from the blog: it is difficult to write secure protocols. software libraries help, but their interfaces aren't quite to the point of being idiot-proof. VPNs; there are devices that bundle your stuff into VPNs (so that you don't have to run the client on the machine). 3. Duplicate suppression. a) could have the network eliminate duplicates, but a buggy (recovering) application might generate them anyway... design your applications to suppress the duplicates (to not deduct from my bank account each time a duplicate message is received.) 4. FIFO. (ordering in general) why applications might have to be involved if you had fifo virtual circuits (i.e., tcp) imagine having three machines, A, B, C A sends a request to C A sends a request to B B receives A's message B sends a request to C (as a result of its message from A) C receives B's message C receives A's message ordering is not "causal", which means it's unlikely to match expectation. (might be slightly off) If you wanted to make sure to process messages in the order they were sent, the application would have to be involved, knowing that a message from A is missing at C before processing B's message. (lots of protocols for doing this in distributed systems.) 5. Transactions. Don't need a low-level ack if a high-level ack is coming. DNS uses UDP. DNS provides the mapping between www.cs.umd.edu and 128.8.127.3 request goes out to one of two or three "recursive" name servers hairshirt:~% cat /etc/resolv.conf domain umd.edu nameserver 128.8.74.2 nameserver 128.8.76.2 a) using UDP, i.e., no reliability to the transmission. b) what happens if I dont get a response? "retransmission" goes to the other guy! In the writing of this paper. Evidence that the examples came from discussion. No chronological statements. No explicit discussion of disagreement, just a few alternate viewpoints. Two-Phase Commit. Transaction processing systems are supposed to provide the ACID properties. Atomicity - happens or not: "commit" or "abort" Consistency - business logic not broken Isolation - lock everything, don't see uncommitted changes. Durability - once done, it's done. allow the transaction processing system to say "no" Problem: have databases on more than one machine. On just one machine, to commit a transaction, we would write all the new values to the log, then write "this transaction committed" to the log. If we have two machines running different parts of the database, how do we know the transaction committed? (everything got to disk, on every machine, and every machine believes that the transaction committed.) "Transaction processing monitor" machine is kind of in charge of running the protocol (could be the same) Phase 1: "Prepare": write your stuff to disk (not the commit record) and get back to me. after we get all of the responses, move to phase 2. Phase 2: "Commit": everybody write your commit record, and get back to me. the transaction is committed as soon as: anyone writes "committed" to the log. not "all of the replies are received"... be careful. if the machine crashes before writing commit, you'll ask the other databases involved in the transaction for the result. ---- Class meeting #4 Blog feedback. (because I can't provide individual comments easily) 1) thank you for using paragraphs: paragraphs.join("


") 2) active voice always. not active voice: 1) mistakes were made. (government one) 2) it was decided that... -- I understand some of you were taught that scientific writing requires passive voice. -- If I don't notice, it's fine. 3) please adhere to a 30-word sentence limit. 4) otoh, one line is unlikely to be enough. I appreciate the effort for brevity. Just finish the argument. 5) don't summarize the paper. (I let some of you get away with this... this time.) (10% summarization is fine, to provide context.) 6) pay not terribly much attention to neil's "example" comment, I dropped into instructor mode and described lots of background, instead of focusing on the odd policies that I wanted to share as "insightful" Tussle Stuff Who are the players? (acknowledge the diversity) ISPs -- want $$ residential types -- sell you a very large "bandwidth" for a reasonably large cost -- and get mad if you actually use it. backbone types -- have access to lots of cross country wires, offer the connection to the rest of the network. ability to connect to other ISPs. -- under some debate whether they make money doing this. Users -- want choice as a means of getting the other bits: get more stuff without restriction anonymity / privacy / security, protection against e.g., nigerian spammers fast, reliability, cheap. Businesses -- (sort of like users and sort of like ISPs) profit. likely prize reliability above other features. fairness -- their stuff isn't discriminated against by ISPs. Skype (ISPs might try to hurt other voice protocols to promote their own service. Vonage. Google ("singled" out for injecting too many bits). Governments -- (not all governments are alike) want the ability to monitor. -- the communications of evildoers. (and probably everyone else in the process) -- find the illegal content want to ability to restrict information flowing in. (and maybe out) -- great firewall of china (avoid the subversive) -- its affect on google (and I think on skype as well) -- pornographic. -- keep people from getting the RSA "munition". (DES until 1998) document retention (ianal) under pressure from users and lobbyists, regulate. -- anti-trust (monopoly prevention) -- spectrum allocation -- consumer protection (?) Rights holders -- profit through 1) licensing 2) sueing people who violate the license. (e.g., grandma) the companies will hire other folks to poke around and find the IP addresses of hosts in the network that might possibly be making available "protected" content. ... followed by the cease-and-desist ... etc. if you're participating in bittorrent (or the like) it's as hard for them to find you as it is for your client to find peers. (easy) they also don't discriminate between participants in a swarm and people actively uploading. (so be careful; being in the right doesn't save you from having to deal with lawyers.) Content providers -- what do they want? profit... give away the content, sell the ads. tussle example: Pandora (internet radio, becomes way un-profitable if licenses are high, rights holders believe Pandora cannibalizes sales) iTunes DRM-free music, pricing argument. Youtube -- rights holders issue takedown notices for clips and such. movie trailers. apple might want to keep control over the trailers, keep them off "competing" sites. They mute the music? (if a cd is ripped into the video) Reputation DNSBL (blacklists of spam-generating mail servers, open relays) Verisign, Thawte (provide certificates to support trust) Summarize tussle (why it's in this class) The points of reading this in the class: * Think about how any technical solution has implications for these people. * Pick a side. * Figure who's against you: gov't, isps. maybe other users. * Sometimes you punt to the lawyers. Not a technical problem, you give up, and expect that people misbehaving in some way will be caught and prosecuted or sued. * To express that Internet design is still "open". there's (NSF) effort to design the future internet, considering all the mistakes, it evolves. There's still stuff to do. End-to-end addressing. IP addresses were intended to be "global" Over time, the IP addresses evolved. First to become "classless" 1) class A address prefixes (1-127.*.*.*) - first half of the address space represented networks of 262,000 machines. 2) eventually redesigned so that the prefix length was made explicit, allowing the allocation to be tailored (within a power of two) to the right number of addresses, eliminating internal fragmentation, extending the lifespan of 32 bit addresses. Because they're precious, when you get your service, you get only one. In order to have more than one machine, you have to find a way to "fake" being just one, leads to NATs (network address translators) that pretend to be fronted by a single IP. IP addresses no longer represent individual hosts. Architecture busted. NATs can cause unexpected behavior... can block packets they shouldn't, preventing future innovation. Blog comments on tussle. What does it mean for users to choose their routes. in the context of source routing. - might be super detailed (strict source routing) enumerating all routers on the path. - might be less detailed (loose source routing) enumerating some of the routers - might be somehow abstract (list of ISPs to traverse) Source routing in this paper is the mechanism to provide choice to users (at least about providers deep in the network) "Users" doesn't mean actual users (necessarily) but can be just the extreme example, where reality is something smart acting on the user's behalf. TCP background for understanding next week's papers. Class meeting #5 two papers to discuss: Jacobson (congestion control) and Floyd (RED) background material in research paper form. - if you're unfamiliar with tcp congestion control, red, ecn (thursday) you'll understand after reading. - if you were, there might be more in here that you're not familiar with. what design decisions were made, what the alternatives were, how it was evaluated. blog posts: - there is a grading mark "redundant" -- if your post could have been written by summarizing other blog comments. write it first, or, disagree with someone. - packets, not packages. - affect vs effect. congestion control paper. whether the ends can take of "sharing" the network. before: oblivious higher (transport) layer would just vomit a window of packets into the network, not really learning anything about the network carrying these packets. this led to "congestion collapse" congestion collapse: get nothing done, because you're sending gobs of retransmissions. sliding window transport. up to w (window size) packets/bytes can be sent into the network at a time. Figure 3. First, the network can't accept a big w (window size) all at once. But if the w is too small -- miss out on performance. Goal: find a good w quickly. slow start. on each ack of new data, increase cwnd by one (segment/packet). when in slow start, cwnd < ssthresh; ssthresh initialized to \infty then set to cwnd/2 on loss. why does increasing by one on each ack result in exponential growth of cwnd? each ack leads to two packets, leads to two acks. create the *ack clock.* figure 1: if we're in steady state, the acknowledgements will come back to the sender spaced out at the rate the network could support. figure 1 also leads to packet-pair style bandwidth measurement (sketchy) once we have the ack clock for a cwnd that's somewhat close to right, want to keep it. congestion avoidance phase: on each ack of new data, increase cwnd by one/cwnd (segment/packet). when in congestion avoidance, cwnd > ssthresh; to keep the clock (beyond the paper) 1. retransmit on 3rd duplicate ack. (fast retransmission) 2. send packets off the top (right edge) for additional duplicates (fast recovery) if we believe in conservation of packets, we know something left, we know we can inject. if you lose enough packets, you'll lose the clock, fall back to timeout, and fast recovery won't have helped. fast recovery is an optimization, supporting timeout is always required. FAIRNESS. is the scheme fair? what's your definition of fair? everybody gets the same amount? could be evil / greedy / consume the queue, don't back off... but even if everyone is playing the game by the rules... short rtt's get more throughput. cwnd climbs more quickly. window doesn't have to be as large to get the throughput. is that fair? these guys use fewer resources from the network. fewer links, less cross country stuff.... it encourages you to use a mirror. (but there are gobs of papers about enforcing max/min fairness) is this scheme fair to interactive applications? the scheme of ss/ca is to saturate the network until it cries uncle, then back off. leaves queue size as a kind of important parameter: too small -> lose throughput (lose utilization). too big -> interactive stuff might wait for seconds. low bandwidth dsl/modem links. interactive stuff gets shafted. (high delay) paraphrasing: "we treat all loss as congestive loss, yes maybe some packets can be corrupted, but that pretty much never happens." -> when finally run on wireless networks (where bit errors are more likely), lots of people chased how to separate the two. -> in practice 802.11 just retransmits corrupted packets, perhaps to support TCP's assumption. Class meeting #6 TCP bits. crazy behaviors 1-5. Routing Process of setting up the paths for packets to traverse. Distinct from *forwarding* which is the process of taking a packet and putting it on the next link. Links fail. Routers fail. Four options for routing. By hand. (easy if small scale, few choices.) Centralized thing. (everyone reports, and a central box decides) Link state approaches. (every router floods a fragment of the topology (their neighborhood) to everyone, and everyone runs the same algorithm (dijkstra's) to figure out their paths. Flooding: sort of like a broadcast, each node propagates new information. Canonical: OSPF, IS-IS; used in modern big ISPs. Distance vector. aka Bellman-Ford, aka Ford-Fulkerson. each router sends to its neighbors the "cost" to reach every destination. decision process for each destination is: accept if route is of lower cost accept if from the designated next hop. (even if higher cost). maintain a routing table: I am E destination | next hop | cost A | A | 1 C | A | 3 E | that's me | 0 when we get a routing update from B B at cost 0 C at cost 1 "cost" or "metric" is intended to represent how long the path is, our goal is to choose the shortest path. For this example, the cost of each link is always 1. if you wanted to do something else, the cost of the link has to be known to the router. 1. modify our routing table destination | next hop | cost (our cost to get there) A | A | 1 B | B | 1 C | B | 2->4 2. send a triggered update. hey, A, I can now reach B at cost 1, C at cost 2. If B advertises a route to A at cost 3, do we take it? no. If B advertises a route to C at cost 3, do we take it? yes. Can hold onto backup routes, Definitely BGP (a variant of this) keeps two backup routes. Path-vector approach. Decision criteria might not be shortest path. -> have to worry about loops. -> in order to avoid loops, each advertisement includes the path. B, the path is B C, the path is B C D, the path is B E A D if we are E, we know not to accepd that route to D. In BGP, we operate over 16-bit identified AS numbers (which represent ISPs) (so it's not as bad as listing every router.) Business rules govern the paths that are chosen in BGP: RED what's congestion? why is it bad? too many packets, limiting performance. avoid defining congestion as the loss of packets. include delays. Context: 1. Sally's dissertation work was on synchronization effects in networks, how flows and routers could self-synchronize and by self-synchronizing break assumptions. -> adding random delay solves lots of problems. 2. RED is one of many active queue management schemes. (pretty simple for them). alternatives like fair queueing. Takes advantage of having more buffer space than you really want to use. Key bit in RED is to drop based on AVERAGE queue occupancy. Class meeting #7 Sketch of project ideas at http://scriptroute.cs.umd.edu/711s09/turnin/project_ideas 200 ok. might edit that a bit over the next week, but that's been a losing battle for me in the past. RED parameters min_th, max_th, max_p "gentle" red ECN and the bits why mark instead of drop? - don't have to retransmit - doesn't waste the effort of prior routers. (especially if close to the destination) - inherent badness of sending "more" when the network is oversubscribed. - stall that's required - receiver can't deliver / ack data, - sender is stuck (unable to send new data off the top) - crummy bandwidth - crummy load time for your web page. - possible that you might not retransmit (audio) but still learn to slow down without loss of quality. what might not work with marking? - a bit of extra work in the router (to set the bit, what else has to happen?) - make sure the packet supports it. - checksum - might have cheaters (which we'll talk about). Bits IP header CE, ECT - Congestion Experienced (the mark); ECN capable transport (tells the routers it's okay to mark instead of drop.) TCP header ECE, CWR - ECN Echo, (recipient of the mark sets this bit in every ack until it gets CWR) Congestion Window Reduced. (indication (ack) that the sender got the ECE and slowed down) Mess: using reserved "must be zero" bits - for people implementing tcp. people implementing firewalls enforced that they must be zero. attempting to use, or negotiate, ECN could yield no connection. - there's at least one "survey" of places you can't reach. source quench bad except: you could, if close to the sender, tell the sender more quickly without waiting the whole rtt can't assume symmetry in the path. the path from source to destination is not typically the path from destination to source. (it might be that bottlenecks are more likely to be on both paths) (open question to me) if you had route symmetry, you could maybe do the equivalent of setting ECE on acks instead of CE on packets. ecn vs sack sack reduces reliance on RTO, because it tells the sender which packets to retransmit early, don't burn acks trying to figure it out. ecn reduces reliance RTO because you don't have to retransmit. hooray. sack uses more bits. ha! BGP relationships What do they mean - it's attractive to believe that the topology of the internet (who connects to whom) is the important piece, that we can use the interdomain toplogy in a simulation - except there are lots of forbidden paths. - shortest path doesn't matter. - about once a year, big players fight over who's big and who is not. - assumption that the high-degree nodes were at the top - who is at the top can be up for debate even among those at the top. - if you are tier-1, you don't have to pay other ISPs to carry your traffic. - iirc, cogent gets de-peered occasionally. - understanding what goes into a no-cost peering matters for stability of the network. - BGP is a ridiculously complicated (warty) protocol. - 12 (13) step process for choosing routes. - what you want (weight) - what you want (localpref) - how long the path is - med. (where the recipient wants you to drop it). - if what it's doing is actually very simple, maybe we wouldn't need all this? - maybe something less error-prone - maybe something that would necessarily converge - can construct "bad widget" where can arrange four isps (a destination and three others) where each of the three wants to route through its clockwise neighbor but can't route through two clockwise neighbors. Graph of choices: How to make this behavior happen 1. prefer our customers in "localpref" - if we see a route advertised by customers, then peers, we'll take it. 2. we do not advertise routes we don't want our neighbors to use - see #1, they'll use it¸. Ability to modify paths taken to reach us: AS prepending: arbitrarily inflate the length of the path by adding ourselves more than once. example in routeviews of 14135 MEDs: multi-exit discriminator: ask a single upstream ISP to deposit traffic in one place or another. - an attempt to include a "metric"-like thing in BGP - recipient of the MED likely gets paid to respect it could imagine the alternative of only advertise a prefix in one place. tell the upstream about a prefix only at one place, so that traffic for it enters only there. typically FORBIDDEN though because it's cheating. if you do it: advantage (relative to MEDs) is that it needs little cooperation in the protocol disadvatage is that it might break reachability when it doesn't have to break. Selective advertisements, de-aggregation. Patricia trie... reasonably efficient data structure for longest-prefix matching. when you have a set of routes, choose the most-specific match. if you knew one path for 128.8/16 and another for 128.8.126/24, that really means you should pick the one for 128.8.126/24. To adjust the balance of traffic, how might you use this? Let's say we're still umd at 128.8/16 and we want slightly more traffic to come in through AS 10886. Add a route to: 128.8.126.0/25 27 and send it only to 10886. want to know that there's some traffic for that prefix (and the scheme will work) - and not not too much traffic (that's not the campus dorm prefix.) only expecting traffic for that carved out prefix to choose this path exclusively. if 10886 fails... the peers of 10886 will withdraw the route.... I don't think it changes much. 128.8/16 is still advertised to 10886. Disadvantage is: now every bgp speaking router has to track this extra prefix... potentially useless prefix. that means cost borne by everyone else for a little feature we want... kind of unfair. -> in practice, ISPs will discard too-specific routes. (longer prefix than 27-ish) AS7007 incident. (details may be wrong) A customer of sprint let routes learned through BGP into it's IGP (internal routing) and then right back out. - external routes learned through BGP are supposed to be segregated from the internal routes. - and the internal routes aren't supposed to be automatically exported. Provided all of the prefixes that 7007 knew about back to sprint. If you follow prefer-customer rule... traffic is black-holed. -> (a) providers are much more careful about the prefixes they accept from their customers. -> (b) everyone else is interested in making sure other people don't originate their prefixes. Traceroute / AS traces IP address assignments Traceroute operation What does traceroute do? tells you the "router" path that packets take. How? sends repeated packets with increasing time-to-live in response to a packet with ttl=0, route sends ICMP error message "time exceeded" to the source. (might also see other ICMP errors) the source address of the ICMP message is an address of the router. How many packets does it take? as many as there are routers to the destination times the number of probes per hop (default 3) or until the user gives up (the host doesn't respond, or a network filters) or the program gives up (max ttl possible is 255) Why? debugging. can't get there... why not? comcast is slow. bgp routes suck... via chicago. missing routes somewhere. can also research hosting services. To filter: a) many firewalls just block icmp, so you miss the responses. b) could recognize the udp ports used 33000 range. c) could also recognize short-remaining ttl packets and discard. (short ttl packets might be confusing to intrusion detection systems anyway.) (bwahaha) Can expand the TTL-limited probing in a few ways. send packets of varying size in order to estimate the capacity at each hop. - catch is that you might need many many probes to get one that experiences only the serialization delay (and no queueing) - min filter NIRA. addressing scheme a) tells the sender who his and the recipients providers are b) by choosing addresses for the packet, specifies the route ~20 addresses for a node. - opportunity for attack? (I argue its not new, but maybe.) (and you could counter.) Class meeting #8 (?) Announcements, NIRA continued, RON, slicing. Project stuff: on blog, (a) updated "ideas" handout... hth. feb 22 in the date field. (b) paper (that you've read and think is good enough to build upon or well motivated but flawed) and paragraph due march 5. for pair turnins (for now) each turnin the same thing. (reference each other) pairs optional for this exercise. NIRA does this solidify the set of providers that belong to the "core"? can a small ISP ever grow to become part of the core? how much work does a provider's change cause for all of its customers? no reason a medium-sized ISP couldn't get a number as a core ISP, (aside from that address being not-too-useful unless that ISP is actually in the core.) analogy to the telephone was flawed. - internet works differently... don't need more wire. - billing for telephone way different. I tried to find a reasonable history of the "divestiture"; the breakup of Bell by regulators into the baby bells. RBOCs. - claimed economies of scale -> breakup would lead to increased cost. - claimed benefit toward national security - claimed that telephone service was a "natural monopoly" (because of the circuit switch) the history of connecting your own devices to the network predates (by a bit) the breakup. 1956 allowing people to connect things. Carterphone (connected to radio) in 1968. the breakup stuff was somewhat precipitated by MCI (microwave communications inc.) who wanted to connect chicago and st.louis but couldn't break the monopoly. - conventional wisdom: the providers of long-haul internet service don't make money doing it. (they make money in the other parts of their business to subsidize this internet thing.) in addition to being able to choose a route, you could choose many routes at once, to get path diversity; could lead to the information slicing paper, might not need as many relays in the middle, and could more easily tell when all your slices belong to at&t. NIRA's source route format for traversing peer links. precedent in at least TCP options where different TCP segments will potentially bear different options. (SACK will depend on receiving out-of-order stuff, timestamps usually on or off). NIRA's different header size is less dynamic because it is fixed for a connection. In IP, the second nibble (4 bits, half a byte) represents the header length in 32-bit words. Typically, the first byte of an IP datagram is 0x45. version 4. header 20 bytes long. however, this could go up to 0x4f, which means to look for the TCP header after 60 bytes. Role of users in NIRA. If any paper author ever says that users are put in charge of something, that's not really what they mean. * users can make arbitrary decisions. we do that in the supermarket. * users can make the same decisions repeatedly. (commuting.) * users can delegate their decisions to "smart" people. e.g., mutual funds., congress. If you read "user", unless you see a GUI, think smart agent thing. * restriction that eventually people have to be able to express some policy under some interface. my analogies hold up (sort of)... we can pick the completely volatile mutual funds through the interface of morningstar, we can pick the allegedly ideology-matching representatives with party information. Who / how charges for not-default paths? * imagine at the outset that default path was chosen because it is cheapest. * and that allowing someone else to choose the not-default path leads to cost, which should be passed along. * how? a main challenge is that per-packet accounting is widely considered impossible. - I would expect that the ability to use these alternate routes would itself be sold. maybe nobody wants it. - being assigned an address through some of the (better) providers is itself a privledge. Applications - * if you were netflix, might benefit by sending through different networks. (imagine the "user" as a company.) * if you were Cohen, writing your little bittorrent client, and you knew the addresses of potential peers, you could use nira addresses to pick peers close to you in network-land... potentially saving your provider money. (important?) Expect that every provider of consequence has more than one provider (or tier-1 peer). - reasonably easy to tell whether this is true over routeviews data. - potentially biased because you only need an AS number if you have more than one provider. NIRA embeds policies into addressing. can't choose arbitrary paths (making customers provide transit, for example.) RON: Tries to solve the problem: reliability BGP takes way too long to recover from faults. - routeviews showed "all" backup paths. - real ones store maybe three. - when a route is "withdrawn", router will choose its backup, and then advertise the backup route as the one being chosen, (so that its neighbors don't use a looping route). - except there's no evidence that the backup will work. - route flap damping (hystesis to slow the updates to a manageable rate) - leads to a long time to converge. the IGP routing protocols would be perfect at solving this reliability problem, too bad they can't do inter-domain. What doesn't work: "scalability" 50 nodes... constantly ping each other in the mesh to estimate loss rate, latency. n^2 stuff, and it has to be frequent enough to failover quickly. - my view: you wouldn't ever want a single global RON. you wouldn't be able to construct a single global RON, (in part) because you are not friends with everyone. - my view (2): you could get almost all the benefit from having a second RON. additional RONs are just different collections of nodes running the same software with a different peer list, they don't have to know about each other. (you could imagine that just about every overlay network piece of software has RON-like attributes. ) my view: RON is an optimization on the network. you could still go direct to the other endpoint of communication. (by having more than one RON, you're not partitioning the network... scalability also in volume of traffic to handle. (if you view it as an optimization, overloading a relay has an obvious solution) what about catastrophic failure? - suddenly everyone needs to go through the RON paths because something really bad happened to the network. - not immediately obvious how to provision for failure of a certain magnitude. "Baltimore tunnel fire" physical disruption to common rights of way damages connectivity in many places at once. Class meeting #9 Midterm stuff framing: sentinel scheme (beginning of frame, end of frame marked) 4b/5b scheme -> 4bit sequence, map that to a 5-bit sequence s.t. there are no more than three consecutive zeroes. to promote clock recovery and avoid baseline wander. clock recovery being that the receiver has to know when there's a bit, infers when there's a bit by watching transitions between bits. won't use 00001 won't use 00011 turns out there are more than 16 legal 5-bit sequences can use those as frame markers clock-based: SONET (ATM) length-based (here's the length, then the frame) bit stuffing - should be in the text in framing, look for HDLC to avoid the end-of-frame marker appearing in the bitstream, have to escape the marker somehow. expected marker is 01111110 -> 011111*0*10 also stuffs: 011111010 -> 011111*0*010 if you have bytes (say, PPP, so you have bytes) it might use the same frame marker, but escape it with a different character. RON continued / cleanup testbeds: emulab: run fake "emulated" networks on controlled machines without exposing the internet to your applications. either you want really controlled environments (two machines separated by a 100ms link), reproduceability, or dangerous code. ("deter" testbed that's intended for viruses and worms and stuff.) planet-lab: experiments can run as lightweight virtual machines on linux boxes at various universities and research labs. so you can run lots of pings, build an overlay network, the double round trip time measurement of using the response to the response to get the other side an estimate. node A would send a probe to B. B responds -> A a sample of RTT (and I guess loss) A responds to B -> B has a sample too. 25% fewer packets! SOSR RON's probing doesn't scale, but wait, RON's probing is useless. Anything you can route around, you can route around without much knowledge. the probing won't help you (that much). Is the alternate path any good? low latency? RON would end up choosing the shortest (lowest latency, highest performance based on some model)... SOSR asserts that the performance doesn't matter getting there is infinitely better than not... so why bother shaving milliseconds? 20% of failures could be fixed. the server breaks (nothing you can do) 56% of network failures could be fixed. the broadband link breaks (nothing you can do) Do you believe them at all? given the unclassifiable figure 2? The offline hosts... broadband failures overestimated? Send 10 acks, if they're not rate limited, maybe it doesn't... If you want to do any better at improving the reliability of HTTP sessions to these "popular" servers: 1) fix the server software so that it doesn't have to go off line. 2) fix broadband access links (last hop for clients) comcast is bad rant. latency goes bad... Also perhaps bad peering, those residential providers are greedy... Also: Gummadi's "Monarch" paper was an attempt to measure the performance of residential links without any cooperation from the owners of those links... selection of "servers" in the experiment was potentiall biased toward US servers, making the experiment biased toward US networks and US servers... with this technique, how do you evaluate it well? there are (very likely) regional servers providing content pretty much just to one country. the ability to reach one of these servers from outside its region might not matter much to anyone. because it doesn't matter, maybe none of the ISPs are really working on checking that path and making it work. the study makes an analysis over "all" the paths that they measured, with no understanding / guess / knowledge about which paths might be used and which ones not. maybe all the failures people experience are with less-popular servers? or when traversing less-likely-to-be-traversed paths? moral of the story: measure the problem first. eliminate the strawman. did you understand section 4? implementation. nat, the iptables, Would people use this? Have their HTTP requests proxied by "randomly chosen" intermediaries? only tries to do it on failure, maybe not so much to worry about. Would you run a relay? it would be abused. legal "stuff". might have to rely on someone to provide the "service" of a relay. TOR is the onion routing overlay for anonymity, but could easily tcpdump all the traffic out of one of these things, to see all the traffic people wanted to keep anonymous. (and someone has). (use an encrypted connection, if you care about the privacy.) Slicing Class meeting #10 Daytona BitTorrent Slicing paper (though stale). Misbehaving TCP receiver paper... Three attacks: - duplicate ack - optimistic ack - ack division Why should a receiver "attack"? gets more bandwidth... more quickly long term effects: not described - most flows are thought to be short. I'm not convinced you couldn't use the techniques to recover quickly from losses in the middle of a long transfer. - division could probably be used in CA phase. - duplicate ack scheme fixes the rate at sender's buffer / (1+epsilon rtt) (maybe large value of epsilon) - optimistic ack - I think you could use for the duration of the transfer. * but I don't think I've seen lots of experiments on this. these tricks have the danger of backfiring. why are we worried about the receiver (not the sender) doing bad things? - sender controls how fast stuff goes out... - 1 sender has no incentive to send faster than expected/standard/TCP - 2 sender has responsibility, but makes weak assumptions Remember the Van Jacobson paper... The best-case model is that by keeping your connection's cwnd somewhat higher than it is supposed to be, you can get a higher rate relative to others sharing the link. In what scenario would a sender *want* to send faster than by the rules... * would want to be "guaranteed" that sending faster wouldn't result in more rxmissions. - if your link is pegged, there's no (obvious) reason to send to one guy faster if you have to take away from someone else. * streaming a/v at a rate the customer is willing to pay for * streaming video chat in which the other. * potential for peer-to-peer (higher sending rate -> higher receiving rate) - especially when there are too few peers * uploading your oversized homework file or research paper before the submission deadline. * testing curiosity. I think there are billions of scenarios where one might want to *download* faster... * movies, news, mail. * because otherwise you don't have the data/information that you're downloading. - and can't perform whatever task you wanted. Would a content provider want to send faster (than standardized) * to provide their customers with "better" service. * competitive? * by sending faster, expect to bump the loss rate. (function unclear) Aggressive sender not a problem if the receivers can't take the high rate. "Good": * cooperative Bad: * selfish - if it helps them, do it. (may hurt others, don't care.) * malicious - if it hurts others, do it. (probably doesn't help.) If there are bad senders, what do you do about it? (but good receivers, good receiver could maybe use the flow control fields to suggest that the sender slow down...) Relation to the end-to-end argument - can we completely and correctly implement the sharing of network resources at the end stations? * no answer -> have to police inside the network. * yes answer -> believe that there's an equilibrium out there where everyone can still play selfish and still have a good result. - if the model moves from one of cooperative implementations to selfish implementations, does it still apply? Comcast - part of the reason comcast matters here is that the medium is shared with your neighborhood. rate-limit long (download) flows. solves our problem? * question how they define "flow": ip-address to ip-address (ip-address,port to ip-address,port),protocol star to ip-address (if they're smart...) * if we believe that the lower rate is "fair" not even there because of misbehavior.... just oversubscription. why not trust TCP: * slow start is pretty mean / aggressive / bursty.... and making lots of web connections results in a not-very tcp friendly traffic mix. * a reaction to loss is to increase queue sizes, increasing queue sizes (without RED) causes well-established connections to hold bandwidth longer. - dunno queue sizes... maybe we should read Monarch paper. not all ISPs do this... Is this a real problem? Is anyone out there misbehaving / selfish? * why does comcast do this rate-limiting thing if TCP is there to be "fair". * make you upgrade? * keep you from downloading television shows without a $70/mo cable subscription.... * because people are unlikely to notice... * or unlikely to complain because of the not-entirely-legal content they're downloading. * as opposed to selling a smaller pipe to users or buying a smaller pipe upstream BT a client relative to another Interested or not (have pieces I don't have) Choked or not (unchoke a peer -> I'm willing to send pieces in response to requests) I will send to each unchoked peer at the same rate. (as a fraction of my configured upload rate) -- Interested -> <- Unchoke -- -- request piece 7 -> "optimistic" unchoke - means of performing "research" into who can send us stuff. A (optimistically) unchokes B, B asks for stuff, A sends stuff, B realizes he's getting lots from A, B (normally) unchokes A, A asks for stuff, B sends stuff... BitThief takes advantage of the opt unchoke by just waiting for it, then not unchoking in return. A might give a short leash. Tracker doesn't give you too many peers at a time to leech from. Class meeting 3/24 -- a) project stuff b) planetlab stuff c) arrange your groups. project stuff timeline; embedded in the turnin page. some sense of where you are april 3 repeat of the paragraph exercise (minimum bar) set of potential questions to look at. a sketch of the graph that would validate. a graph, with data, validating your hypothesis (maximum bar) a draft of your "introduction" april 17 the draft (for before the presentation) may 1 presentations may 5, 7, hopefully not 12. the proposal (done) may 12 (then there's a take home final due may 14) the composition of the class no one else in the class has seniority or sufficient research experience to have a claim on being smarter than you. goal develop your voice. (avoid neil's mistakes as a youthful graduate studnet) - use your skills - do something fun (while you still can) partners, after class, please stick around; advertise what you're trying to do, solicit help. or offer labor/expertise. feel free to break up. PlanetLab. what's the motivation? why build it? - researchers want to test on the "real" Internet. many testbeds have existed in the past: Internet2 - more a network (doesn't run code) Emulab - more collection of machines that run code (not the real Internet) ABONE - idea of "active networks" where packets contain (references to) code, intermediate nodes would then execute this code on the packet to effect some outcome (multicast, anycast, different routing, etc.) XBONE - almost no distinction (similar goals) The Grid. - allgedly focused on computation. Ad hoc testbeds - researchers would call up their friends and exchange accounts to run experiments on geographically disparate machines. why the planetlab version instead of calling your friends? - allegedly "easier" to develop stuff because of the "PlanetLab API" (meaning of "the planetlab api" remains unclear to me.) - bigger... "PlanetLab currently consists of 985 nodes at 484 sites." - might be hard to otherwise get accounts at that many sites - solves a social problem (can band together to provide ourselves with a testbed) Authors assert a set of principles 1) the slice - everyone gets a virtual piece (share of the resources) of a set of machines. * virtualization enables this form of testbed. * - an alternative would have been to not timeshare the machines, i.e., to schedule experiments to run one at a time (batch-like); - batch-like is like emulab (you can get a group of machines to run on for a set time...) the catch with the slice is that isolation (the effect of running all slices in a virtual machine) can make some tasks difficult: - one thing you might like: have an "experiment" that instruments another. content distribution network experiments (codeen (proxy you configure in the browser) and coral (proxy the server configures, used as a slashdot-effect mitigator (.nyud.net addresses)) -> lots of traffic, to lots of places. -> lots of potential to instrument that traffic (understand bandwidth, loss, geography, browser share, performance, routing) -> isolation (since I can't see their packets) makes that a little difficult. - because we're timeshared, one experiment might wait for the other... ... the timing of events is not easily predicted. (and may not be small) ... difficult to diagnose (because you can't see who you're contending with.) - hard to cooperate with others (building services for others to use is a bit challenging) - potentially less efficient because vm's are more heavyweight than processes. - virtual machines they use are kind of faked "vserver" linux machines everybody runs the same kernel ... - provides the appearance that you have "root" - can write files owned by root, can run setuid things, etc. 2) distributed control you'd like to give hosting sites some ability to enforce policy (we don't like certain experiments). - JANET is the british equivalent of abilene (research network provider). - hard line against sending packets to anyone who didn't ask for them. - many planetlab "experiments" do send packets to IP addresses that don't ask for them. - leads to JANET types making them stop. * flow of policy starts at network operators who observe something they don't think agrees with their policies, goes to the institution hosting planetlab machines, goes to planetlab, then to the experimenter, and then the experimenter "voluntarily" stops running on the objecting machine. 3) "unbundled management" idea that PL central doesn't have to do stuff. maybe the open sourcing can solve our management problems. - partial success in that there are services to help with things that are difficult. - partial success in that people found some tools to help (parallel ssh) - kind of a failure in that it took years to accomplish these things. Anderson Complaints 1. Centralizing trust - should everyone be compelled to trust PLC (Princeton) 2. Centralizing resource control - who decides what resources go to what experiments? PLC. 3. Decentralizing management failed - by not providing their vision of the starting point, the API was incomplete. 4. Treating bandwidth as free - no incentive not to send wastefully, some of the more "popular" projects consume lots of bandwidth at the planetlab sites (proxies) (with some potential to reduce overall network) - somewhat unlike akamai, which can act as a cache for downloading to your organization, but not upload out to others via that cache. 5. Providing only best effort: virtualization and time-sharing means that experiments that would need predictability, guarantees, minimum resources, might fail. 6. Choosing Linux. Some comments encouraged CMS. Anderson argues that PlanetLab provides Linux, without many of the features of linux that make things useful/easy. 7. Missing distributed services. (Tension between alleged API for building big things and reality.... imagine the absense of mapreduce (gag) equivalent in the PlanetLab API. 8. Focus on the old machine room rather than the new mobile device environment. The fight over whether the planetlab design is good might be thought to be moot (second-guessing), if we have planetlab and aren't going to build another one. However, there's expectation that PlanetLab ideas (and some software) will be part of GENI Global Environment for Network Innovations, which is the hardware/testbed component for an initiative to redesign the internet. == Lecture 3/26 | Department of Computer Science | | Leana Golubchik, Associate Professor from University of Southern | California will be giving a talk on Monday, April 6, 2009 - CSIC | Building - Room 1115 at 4:00 p.m. Refreshments will be served before | the talk at 3:30 p.m. - CSIC Lobby. Title and Abstract are below: | | | Title: BitTorrent: how to plan your seeds. | | Abstract: | Peer-to-peer (P2P) systems in general, and BitTorrent (BT) specifically, | have been of significant interest to researchers and Internet users | alike. | | Measurement studies have shown that real world BT systems exhibit high | seed capacity. Thus, appropriate use of seed capacity can have a | significant effect on BT performance. Moreover, seed capacity is also | easily exploitable by free-riders, and such free-riding clients exist | out in the wild today. | | In this talk, we discuss a simple and scalable approach that makes more | intelligent use of seed capacity by hurting free-riders, without their | explicit identification, while improving the performance of contributing | nodes. We also present a simple yet accurate and easily extensible | model of BT, which is able to incorporating approaches to protocol | changes in BT (e.g., such as the one mentioned above). | -- Exam review stuff final has roughly similar format to the midterm. the effort in all of the m.c. questions is to only allow those who know the answer to get it right. Material things. the set of socket calls... (my context is typically tcp sockets) which would you call first: bind() or accept()? what does shutdown() do? permits a half-close what does accept() return? a new socket number. routing... three classes of routing protocol distance vector - periodic update - kind of like the timeout in tcp. reliable exchange of information. - triggered update - kind of like fast retransmission (or maybe more like sack). an optimization to make information propagate quickly. - split horizon. link state path vector - why keep the path? lets you avoid loops. - eventually distance vector will eliminate loops by choosing lowest-cost paths. - BGP can't be relied upon to eventually eliminate loops - because it doesn't choose shortest paths.... it chooses greatest-profit paths. - what are the BGP rules? (from the Gao paper) - no-valley, valley-free, customers and peers don't provide transit. - prefer-customer, make money don't spend money how is the root chosen in the Ethernet bridge spanning tree protocool? - every node claims to be the root, relinquishes if it hears about a *lower-addressed* candidate. - then enable all the ports that lead to the root. - if two bridges would "serve" the same ethernet segment, disable the port to that segment on the bridge with the highest address. ARP: address resolution protocol - converts IP addresses to mac addresses - when the IP address is of a machine .... on the same subnet. (reachable by ethernet.) - how is the request sent? what does it look like? - broadcast. - who has X.Y.Z.W? - how is the response sent - unicast back to the mac address that asked. - X.Y.Z.W is-at 00:00:00:11:22:33 Congestion control. Two phases of TCP's scheme: SS, CA. What's the slow start rule? on an ack of new data, increase cwnd by 1 (segment) What's the congestion avoidance rule? on ack of new data, increase cwnd by 1/cwnd. When? cwnd > ssthresh. first loss tells you where to set sshthresh < \infinity. IP fragmentation: Don't fragment bit: tells routers not to fragment... send ICMP fragmentation-needed error message instead. More fragments bit: more fragments follow this fragment. you (the receiver) 're not done reassembling... IP identifier: 16-bit number in the header. uniquely identify the original packet. all fragments of this packet share the identifier. Offset: this is where this fragment starts. 13 bits long. max size of an ip datagram is 65535-ish. how is this magic possible? can you not fragment packets larger than 8192 bytes? fragments are 8-byte aligned. (take the offset, shift it left 3.) Can you fragment a fragment? yes. (understand how) TCP things: state machine. rfc 793. delayed ack rule: send an ack of every other in-order packet, unless 200ms have elapsed. nagle's algorithm: don't send more than one smaller-than mss packet into the network at a time (as evidenced by acks) karn/partridge algorithm: don't measure RTT when retransmitting. if you've decided to retransmit, maybe you're early. if you're early, you're maybe about to measure a "round trip time" that is absurdly small. which leads you to think that the RTT is very small. which leads you to set the RTO smaller. which leads you to send retransmissions early again. TODO (for neil): figure out exactly what happens if delay spikes to 3s without warning, how does the jacobson/karels RTT estimator learn about it. sequence and ack plots... how to infer information. Bit encodings: Manchester (ethernet) 4b/5b NRZ (obvious, 1=high, 0=low) NRZI (transition on 1. don't transition on zero). CSMA/CD: carrier sense multiple access with collision detection. old school ethernet. carrier sense: listen before speaking to make sure the medium is idle multiple access: not point-to-point, other people might be there. collision detection: you can tell if someone else is talking when you are. must have a minimum frame length... why? (somewhere around 512 bits) the transmitter will still be sending when it receives any colliding bits. effectively it owns the entire "wire" where the wire can include up to four repeaters (that have some delay) and ~2km* of cable. *neil forgets the value. CSMA/CA: collision avoidance technology: 802.11 why? can't tell if you collide. collisions happen at the receiver. (you might be out of range of the other guy sending.) (among other reasons) technique involves somewhat more elaborate backoff/waiting schemes because it's cheaper to wait a bit to avoid a collision than recover from one. adds the acks to ensure that packets were delivered without collision. Exponential backoff: (in the context of ethernet, 802.11) after collision, before retransmit find some way to take turns, so that we don't both send at the same time. in ethernet, we'll each flip a coin (0,1), then wait 0 or 1 x 51.2 microseconds. 50% chance, if there are just two stations, that this will solve our problem. if that doesn't work. roll 1d4. four sided die. (and subtract one.) part of the goal is to estimate the number of stations that want to transmit concurrently. If we run TCP, why might see a collision on "every" packet? TCP -> sliding window -> send more than one packet "at a time" receiver, acks upon receipt. both sender and receiver always have something to send. RED: min_{th} - average queue length at which we start increasing the probability of drops. what problem does RED solve? Intended to avoid persistent queuing (and associated delay) without hurting bursty senders. (tolerates bursts that do not increase the average queue occupancy) DNS: provides a distributed database of name to address (and other things) mappings what is the A (address) record for this name? "www.cs.umd.edu" each name is made up of labels (which are the things separated by dots) there are various record types (A for address, MX for mail server, CN for canonical name, PTR for reverse (ip address to name)) your machine wants to know the ip address associated with www.google.com what happens? construct a DNS request, asking for the A record... send it to "your" DNS server. nearby. run by the department, your ISP... configured (possibly by DHCP). assume your DNS server knows NOTHING. what does it do? NOTHING actually means knows only the 13 root name servers. Ask one of those guys .... they tell you who serves .com.... into cache. Ask one of those guys .... they tell you who serves .google.com.... into cache. Ask one of those guys .... they tell you who is www.google.com.... into cache. then it gets back to you. each entry has a time to live. - long ttls -> better caching. - short ttls -> can move a service to a different machine more easily. - two-day default. the query your machine makes is "recursive" the query your dns server makes is "iterative" April 2 Exam grades grades.cs.umd.edu range: 40-76. 12 and 24 don't exist. neil will fix the last few questions to be worth 2 like they're supposed to be. Chord, consistent hashing, distributed mumble. distributed hash table scheme. what operations? put(key, value) get(key) exists before this paper the idea of *consistent hashing* take the keys of objects, hash those to some value take the machines, (assign them identifiers), hash those to some value. keys belong to the machines that have values just greater (or just below) their value. don't assign in such a way that the number of machines alters which machines store what stuff. (minimize the disruption of adding a machine or removing a machine.) how do you balance load? (i.e., how much stuff any machine has to store (max)) a) you believe that the hash is uniform and that you're not unlucky (that the objects and the hosts are distributed evenly) b) can add machines at more than one place. what does chord add? consistent hashing was used in clusters, requiring knowledge of all the machines that were up.... that knowledge doesn't scale (because machines can fail and join) chord is a way of maintaining this global lookup structure. (finding the machine associated with the right identifier for a data item) how do we know that a node doesn't lie to us? a) provide false data - the key is generated as a (one-way) function of the data. - would need to find a collision - don't trust it (implement authentication externally) b) say something that does exist doesn't possible that you could find a replica dht node that would give you what you asked for. how does insertion (of a machine) work? assume we know someone who is participating. step 1: where do you insert yourself? hash your own ip, and look yourself up (possibly with that guy's help), gives you your soon-to-be predecessor step 2: get predecessor and successor step 3: migrate some data step 4: build your finger table. - have an idea of N. - have an idea of log N. - so you know the number of entries in the finger table. - you know your hash value, and you know to look up other hash values, - lookup the guys who hold the hash values at each of the intervals. (with some scheme for getting a reasonably complete list of the nearby set). 4/9 Intro Exercise, Rocketfuel, and Discarte. comments out in general, great work. progress seems a little behind. (I wanted a graph. or a sketch of a graph.) intro exercise review grading: spelling (has been fine) grammar (has been fine, aside from some spurious et.al's) et means and al stands for alii which means others. topic sentences (less likely good) the point of the paragraph should not surprise the reader at the end. covers : context (big accessible problem) conventional research approaches (classified) definition of the key problem (to be solved) your insight in solving it (what you bring) (expected) contributions, (results, successes.) lacks unnecessary detail (list of related work without classification, description of the algorithm involved) under the page limit (one letter page, two column 10 point, 1" margins, title space counts) my report from the nsf meeting: I learned a few things. 1. 3g networks have really intermittent unavailability. short-duration outages. (not just loss) - useful question to whether applies to wimax 2. mobility stuff at ucla is pretty cool; addresses some of the privacy issues of tracking your own location and using that trace of location without sacrificing too much privacy. 3. solving security problems with an architectural redesign of the internet is hard to publish, because it's hard to evaluate as either a network design (performance) or a security protocol (provable) or a system (realism of assumptions) - mismatch between what funding agencies realize is a potentially high-impact direction for research and what conferences and reviewers realize is good research for publication. - can be a problem for people like me. (junior faculty.) 4. two (maybe three) of the projects used old "rocketfuel" topologies. that this keeps happening seven years later compels me to describe it in here. (otherwise I'd feel bad plugging my own work...) - potentially using them to the exclusion of other potential topologies, and that seems wrong (to me). topologies for: simulation diagnosis, traffic engineering; analysis, guess what happens if a link were to fail. new routing protocols -- ways of exploiting the common design patterns in networks to converge the routing protocols much more quickly. - convergence is that everyone knows the same info (and the paths are set) - likely that the redundancy in the network is totally obvious (because people have to look at it.) - might be able to build a really simple routing protocol that would quickly recognize a fault and use the preconfigured, obvious backup. - well, what are those patterns in the design of networks, and are they universal enough to pursue this sort of protocol? significant challenges: "completeness" in covering all the links using really just the traceroute tool "accuracy" in alias resolution pansiot and grad traceroute to places that sent them mail. faloutsos et al. said there's a power law in that data. burch and cheswick generated a visual map of the network. multicolored mess. look closely, and it's a tree. mercator: govindan and tangmunarunkit source routed tracroutes. lots of people (enough) had source routing enabled. took a long time to collect. expect source-routing to be disabled. - flaw in the design of source routing that alters the source address, makes it harder to prevent some misbehavior. also (I believe) source routed their alias resolution probes (common source address in the response) rocketfuel isps might be good units to understand IGP routing, look for features of design, tractable size functional unit. traceroute servers mostly individual cgi script. also looking glass servers that allow running commands on an ISP's routers. can't hammer them. focus on isps: use of BGP information to determine which traces would be valuable. try to measure ingress/egress pairs. what went wrong: - false aliases, - positives: counters - negatives: need responses to declare aliases. - false locations (bad names, bad inferences from names, e.g., springfield), - alternative schemes using latency - sometimes have to guess which state/country a city is in. - why bother? - a unit of design is point-of-presence; might believe that links with a pop are free and links across pops are expensive - simulation requires more than just the connectivity, expects bandwidth and latency metrics - tricky separation of ISPs, - difficult to draw a line around an ISP. how do you guess where one ISP ends? - dns name is a start... dns names are associated with ip addresses, these addresses belong to links, and the link that connects one ISP to another is "owned" by one of those ISPs. i.e., at least one of those routers has an IP address that belongs to the other ISP. means that your router's "name" might be assigned by me. reverse dns allows "att-gw.sprintlink.net" (approximate example) which is ATT's attempt to say this ip address belongs to a sprint router, even though this isn't sprint's naming convention. - pops are a bit easier when they're all administered by the same organization. abilene naming convention: sttl-chic.abilene.net (getting it wrong) complementary chic-sttl.abilene.net name/ip address binding for the other side. - anonymous routers. - they don't respond. bloggy: - who's interested, what's the business case for keeping this. - there's a usefulness to sharing some of this info for debugging. - my take is that as long as we stick to these operationally useful tools, we're safe. - stability over the measurement interval. - huge potential error. want a snapshot, but really get ridiculously-long exposure. - is weibull interesting? - no. eventually you get enough parameters and you can fit a line. whopee. - it's science if you go into the analysis with a hypothesis, otherwise it's just curve fitting. - contribution was making the maps available. lakhina et al. power law? bah. measurement artifact. - sketchy assertion that the degree distribution of nodes close to vantage points and far from vantage points should be the same - there are traceroute at home plans. neti... ga tech and erm, technion? work on these ideas. netdimes. discarte techniques are rotting, missing useful stuff. can deal well with anon routers (because they aren't truly) can deal with many unreachable routers blog comments: - you could frustrate (techniques), but you'd be shooting yourself in the foot. - why is it harder? I assert security, not a ploy to make my life harder. - there have been these "attacks" that involve sending RST packets to try to terminate BGP sessions between routers. - why stoplist? - a significant practical challenge is the IWF. - dlp, have to determine the costs. how do you know the costs are right? can you use the frequency of implementations as input to make tie-breaking decisions? - potential sensitivity. - mpls issue: at the end of the tunnel, icmp generators can provide extra information 4/14: Day of Multicast. Friday the intro is due. Multicast background (DVMRP) Which IP addresses are used for multicast? Class D. 1110* (224-239.* (class A 0*, class B 10*, class C 110* On Ethernet, what does a multicast packet look like? designated prefix in the first three octets of the destination address (01:00:5e or something) suffix (last 24 bits) represents the IP destination address (28 bits of multicast address, this 24 bits is kind of an optimization) ethernet device will try to filter out multicast intended for others. How does a client "subscribe" to a multicast group? Internet Group Management Protocol (IGMP), send a message to your local router to subscribe. (or to cancel the subscription) MBone - multicast backbone. overlay of unicast tunnels that will carry multicast traffic to your network. The point of multicast is scalability (a point kind of missed by the papers) many groups, many participants in each group in order to separate the groups, we have 28 bits; the design mostly assumes that picking an address won't yield a collision... or if there is a collision the app can deal with it... or you can pick an address and see if someone else is using it, then move to a different one. This implementation of multicast is tied to a specific model of the users and the network. (that publishers don't have money to ensure quality transmission or pay for their use of the network; that network operators have the incentive to replicate packets (costing them more) without passing the cost back to the sender of the replicating packet; that the mbone nodes (mrouted instances) want to (and are able to) carry multicast traffic on behalf of other people. Yet, the idea of multicast, the interface of send-once receive-everywhere remains and is implemented in lots of places. It's early content distribution. If you were to just unicast everything around, you'd be paying for lots of bandwidth without exchanging lots of information across those links. -> content distribution schemes (akamai) ESM's five reasons multicast doesn't work practically: no authentication, which means that additional senders can flood. "amplification attack" liked the idea that the network was stateless. complexity of maintaining multicast state. probably not fast path behavior. (and if not fast path, may not work) globally unique address; no good scheme for assigning global addresses dynamically. unreliability / best effort service model; makes everything else a mess - getting reliable delivery in multicast is particularly challenging (retransmission from the source is incredibly wasteful). "requires infrastructure" -- if you wanted to use it, need everyone to deploy stuff... lots of cooperation from operators. amplification attack -- an attacker can send a small message or a single message and cause a large (or many) message(s); means an underprovisioned guy can cause lots of damage. Smurfing - spoofed broadcast ping. Recursive DNS - you send the query, the response can be large; DNS servers now (typically) answer requests only from the local network. "fast path" -- anything that's common gets implemented on asics anything that's not common is punted to the router's host processor. any feature that requires the host processor is often disabled because that can interfere with the routing protocol (that's run on the host processor). RLM Receiver-driven layered multicast Problem that RLM tries to address? different receivers have different speeds, should be able to provide high quality to those who can get it, something okay to those who cannot. Assumption: can split a video stream into core layers and detail, in a cumulative fashion, and in real time. fair? hdtv broadcast 720p or 1080i and hardware will rescale to fit on the display. there exist (image) frames that reset the image, and then differences that enhance it or change it over time; could imagine something smart enough to skip or select difference frames or change the frame rate. alternate scheme - layering by N x 24 kbits/s choose how many 24 kbit channels we want. * would take an incredible number of channels - layering of base rate * 2^n; my understanding is that this assumption is not so good. - fast ethernet (100 Mbits/s) gig ether (1000 Mbits/s) ordinary (10Mbits/s) - cable modem types (2-20 Mbits/s?) - question 1: is what the bitrates of the layers should be. how fast should the base layer be: dial-up-ish. 24 kbits/s. (or at least capped at something like this) how fast should the high-quality feed be? 512 kbits/s (okay...) - question 2: with these two rates does cumulative-model layering make any sense? potentially better off just sending the independent streams. (practical issue based on the video tech.) open question (from this paper, many attempts to figure it out) how does bad network behavior manifest as bad user experience? what can people tolerate. assumption about the users: they're selfish, and not evil, (and not stupid) choose the rate that is appropriate to avoid loss. requesting any faster rate is likely to hurt yourself. (and other people too... but that only matters if your're evil.) what's wrong with priority drop? strawman - tell the routers to preserve the base layers at the expense of the detail layers... why not? - provides a poor set of incentives to the client.... why not ask for more? never feel pain, except that it might cause persistent congestion. - priority drop is kind of euphemism for qos. what do they like about RED? potential benefit is the early signal: the sooner RLM realises that a join experiment is failing, the sooner it can leave the extra layer, normalcy can return. "application layer framing" euphemism for udp; it's up to the app to include information about what the packet is there for. ESM End system multicast most important part is the indictment of IP multicast, coupled with a not-very-scalable attempt to clone it at smart end systems. the synth topologies... RLM topologies carefully thought out extreme cases.... ESM topologies as randomly generated arbitrary graphs that don't stress the system in any useful way. overlay scheme easier to deploy, uglier in implementation. why should you trust another end system to participate? at least in IP multicast, you're mostly just trusting the same routers you have to trust otherwise. in ESM, some machine has access to the bits you're downloading... maybe that machine could alter the stream. - authentication? - assumption that the other members of your overlay are interested in (distributing) the same stuff? what happens if a node decides to stop forwarding. - bad nodes can send control traffic, yet still not forward data traffic. 4/16: BondBreaker/XSD and Digital Fountain Randy describes online social network experiment can experiment here because the social network exists. The XSD; Eve the eavesdropper. Bob asks, Alice answers, goals : eve can't impersonate alice. neither alice nor bob reveal... eve can't get any information out of alice... (neil tries to add) absence of dictionary attack potential. small answer space. by not permitting more than one guess. (sacrificing liveness) draft protocol: B->A: Q, X=H(Ans) xor nonce A->B: PK_alice, H=Hash(PK_alice || signature of H(Ans) using PK_alice || nonce) B->A: Hash ( H || H(Ans) ) A->B: signature of H(Ans) using PK_alice practicality through web of trust. with the potential to strengthen the existing web via addtional challenges. BondBreaker as simulation. points. Friends as breakers. - good since they're the people who can break. - perhaps unrealistic if friends aren't the likely threats. Directional bonds. Initial results: very difficult to ask these questions. Also, compound questions. q: country issue. q: do you know what you would do? the can't ask a question. q: do you know how many people abort at the consent form? 1/3-ish. q: the couples problem. q: is the draft protocol coupled to the off-line-ness ? q: are there similar commit and then reveal protocols? augmented encrypted key exchange? q: how to do cliques / groups at the same time? q: if you built it, would you have secure communication for dummies? q: if you don't want to actually do it, why bother with the study? --- Digital Fountain What's the goal? big file distribution with extremely limited client to sender feedback. no acks, no nacks. efficiency - no wasted bandwith, as close as possible to multicast on demand - clients join when they want, leave when they are done. tolerant of diverse receivers - with different loss rates and bandwidths. Packets are of roughly equal importance. lose em and it's not so bad, get em and they're likely to contribute. Strawmen (metaphor for a the simple system to compare against, caricature of your opponents argument that you can easily knock down.) Reed Solomon why not use RS? took too much time to encode/decode. Data Carousel Why is it a tornado? "whirlwind" of activity as the result of a single packet that allows decoding (nearly?) everything else. (1+eps)k Why bail on real-time? now that they use tornado and it takes 2 seconds to decode, why not real-time? - aspects of the file distribution problem (anybody joins when they want until the file is done) -> design decisions that avoid real time. - would have to break stuff up into bins like the RS example, probably be unhappy... maybe don't even have to solve this problem when video codecs can just deal with a missing packet. Treatment of the layered multicast. Using two features that weren't theirs: 1. getting sender cooperation/involvement. sender will ramp up, and if you receive fine during the ramp, you can subscribe to a higher rate. 2. everybody joins at roughly the same time. (so there aren't these experiments going on all the time. Blog stuff: what happens when nobody's listening? want to find a way for the sender to not send. conceptually in IP multicast, no one will be subscribed and eventually even if the process is transmitting, the packets won't leave the machine. in a real system, you might have session start and end messages. (that the server could probably handle okay) what about congestion? does d.f. control it? receivers should drop extra channels if there's loss. does d.f. increase it? sender initiated congestion events that are part of the experiments. now you need 1+e times the original number of packets. I think there's a relationship between d.f. and the priority drop graph in the RLM paper. - diminished incentive for receivers to unsubscribe from those extra channels. april 21 intro grading: spelling (has been fine) grammar (has been fine, aside from some spurious et.al's) et means and al stands for alii which means others. topic sentences (less likely good) the point of the paragraph should not surprise the reader at the end. covers : context (big accessible problem) conventional research approaches (classified) definition of the key problem (to be solved) your insight in solving it (what you bring) (expected) contributions, (results, successes.) lacks unnecessary detail (list of related work without classification, description of the algorithm involved) under the page limit (one letter page, two column 10 point, 1" margins, title space counts) DNS: primary task: resolve human-readable hostnames to ip addresses how to perform the task (the main points of the design) distributed administration - scalability of entering and maintaining the name-to-address bindings. replication - authoritative answers for each zone are provided by more than one server. caching - don't need to ask again. (faster) (also negative caching) soft state - much state of dns doesn't have to go to disk. if a recursive name server forgets (is rebooted), doesn't hurt anything. Why UDP? - fast/less overhead/no connection setup before sending the query. retrasmission by sending to an "other" dns server. the hosts.txt file: the one big text file of name to address mappings (even large sites may distribute an /etc/hosts file to speed name lookup.) four problems with having one bit hosts file: 1 - traffic / load on the server or file distribution mechanism 2 - each additional host increases the size of the file, have to download many many mappings you don't care about. 3 - updates may not be propagated to everyone at the same time (potential inconsistency) 4 - potential for name conflicts in a flat namespace. 5 - "the guy" managing the list has a lot of work to do. the hierarchy: what is the root? "." the unnamed root (empty string... but the full name of the root is ".") "umd.edu." is the name. "umd.edu" technically, you could get: "umd.edu.cs.umd.edu." "ssh junkfood" does that mean that "www.cs.umd.edu." is a valid name? yes.... what are children of the root? top-level domains generic (gTLD): com, edu country code (ccTLD): uk au us special one for reverse mappings: arpa your resolver will ask a server (listed in resolv.conf) to answer the question for you. that server will then ask (iteratively, walking down the hierarchy) to get the answer you seek, and only then return to you. if our recursive name server (will permit us to ask a recursive query; does not issue recursive queries of its own) knows nothing, who does it ask to help resolve a name? one of the "13" "root" name servers. A to M.ROOT-SERVERS.NET and their addresses are known. D is UMD (go umd.) C, F, I, and J are anycast accessible servers. Naming in general: whether DNS names matter any more. (in what way do they continue to matter) we have google. what do we need dns names for? (can you return to the same place easily?) (google could be evil) what are names in facebook? ultimately there's an id; doesn't really represent a name (human readable, level of indirection).... relative naming 4/23: Syllabus amendment. Project draft criteria (due next friday). King. Syllabus: ~10% of presentation of a background paper. (in class/class material) comments: 15%->20% final: 15%->20% Project draft. Sections -- expect most sections to be ~1pg. Introduction precisely as before; I will likely be more picky, especially if I see the same issues in the same intros. comments in grades.cs; Related work goal: teach the reader about the research context, where you fit, and how you are distinguished. a taxonomy of related approaches: group related stuff into classes, describe the classes, using related papers as examples or evidence. -> don't start a paragraph with a paper name then spend the rest of the paragraph describing that paragraph. -> instead, start the paragraph by describing the broad approach taken by [1] and [2] and [3].... you are unconstrained for space... that means that you don't have to compress... that means, don't use cites as nouns. Avoid: In~\cite{gummadi-king}, the authors send dns queries. Prefer: Gummadi et al.~\cite{gummadi-king} send dns queries.... References list: should be of length ten or greater. should include IEEE or ACM conferences (or both, and maybe others) should include work from the last three years. Motivation (*) Why your study is worth pursuing Why the artifact would be useful (and to whom) Why the hypothesis would be generally applicable, influential if validated Why your insight is new and different and important. (why all prior work is flawed) Suggest: instill doubt. Approach (*) walkthrough, figure, whatever describes in some detail how your scheme would work. enough to convince that it should. high enough level that it's not boring. Implementation (*) Typically enough to convince the reviewer that you've actually done what you describe. Enough interesting details that would be hard to produce if you hadn't built the thing. example: library interposition for socket. gethostbyname calls an internal close (you don't see it.) reviewers can be smart/experienced; task is to show that you know the stuff as well. Results (*) I don't know how much you're likely to have; I would like for you to have at least something. For the draft, you may have placeholders.... preferably not all placeholders. Experimental setup (here are the datasets, the platform, the dates) Each result: (tiny) problem hypothesis (what we should expect or want to see) experiment *then* the figure/table, interpretation. Conclusion super easy: copy the contributions from the introduction. Groups of two will have at least three of the four starred bits. Groups of three will have four of the four starred bits. draft due a week from friday; final version due last day of class. expect project presentations to occupy 5/5, 5/7, 5/12; 4 per meeting. I'll generate a random schedule; you can trade if you like. King --- Why compare to GNP? (and IDMaps) GNP has landmarks, they compute their coordinates, you guess what the latency is between two hosts. IDMaps has landmarks, add distance to closest landmarks and between landmarks to estimate the latency. King also works when hosts aren't cooperative / aren't running the protocol. If you had come up with King, what else is there to compare against? The game network optimization problem. - fps (latency sensitive)... playing the game is unpleasant if someone is far away. - want people to play with who aren't too far away. - I expected this to be done by choosing nearby servers. - clients are smart enough to ping the servers and choose the ones with small latency. - might have more constraints ... choose the closest server that supports your skill level? - maybe you could build a multiplayer game that didn't require a server. - still need matchmaking service... even though the server isn't involved in the individual moves, the server starts people off. - maybe that matchmaking service would need to find clusters of users - you could maybe use king... or since they're cooperative, make em ping candidate players. doesn't seem to be a case for using king to build networks of game players... Is it useful? Vivaldi used the data that came from King. a few others... Vivaldi data was distributed without any anonymization, so gets used more than other data. For any real purpose: - in a distributed (something) protocol, if it was in a node's interest to lie about its latency to another node, perhaps King could be used to check. * imagine we're doing end system multicast. * if you wanted to be a free-loader, you might say that your latency to just one other node was small, and your latency to everyone else was extra large. - the nodes, if cooperative, are easily measured using other schemes. Is this an abuse of DNS? - seems like it. Is it that bad? - well, you run a dns server... - traceroute is a hack. time exceeded messages were not intended for traceroute. almost all network measurement code has this hack behavior 98% of dns queries that reach (one of the) roots are useless. The blocking of things. - the blocking of ping. icmp is useless. (ping is harmless).. aside from the use of bandwidth. - you can block everything if you want. - block just ping, and you're not going to hide. - hiding is not really sufficient. king client -> DNS A -> DNS B R I - the blocking of king. you have an authoritative dns server, you can't drop all requests. which requests can you drop? those with the recursive flag set... and from someplace other than your network. would you make this change? there exist dns servers configured this way already, which means they weren't trying to block king, what were they trying to do? prevent DOS attacks... could launder an attack of an authoritative server by forcing lots of queries for different, non-existent names. could spoof the query (as the victim), small queries yield larger responses; those larger responses hammer the victim. I expect that King largely doesn't work anymore. (though I base that on almost nothing) The assumption that name servers are close to the hosts they represent. - aol already has just one (set of) centralized authoritative servers - is this widespread? statical spot check suggests that 100% (with very low confidence) of dns servers aren't where the clients are. - you could probably tell when it's busted: ping both and observe the difference? "self-diagnostic" scheme: compare the addresses and the names... the more that's in common, the more likely the estimate is correct. 4/28 summary cache: belief that web proxy caching is a good idea; further belief that on a local miss, your cache should ask its friends for the content. unfortunately, asking your friend cache every time you get a miss.... fair chance he doesn't have it, so there was no point, you're just delaying the inevitable. there's extra traffic; not only does the cache have to answer its clients, it has to answer everyone else's clients too. ICP basic scheme: tell everyone what you have. Except, that traffic is somewhat large. Replace with the Bloom filter! md5(the url), md5(the url + the url) (if you need more bits) split into 4 or 5 indices, if it's yours: set the bits to one at those locations increment a counter at each location. can expire an entry from the cache by decrementing the counter. (won't set bits to zero when other stuff would have them set to one) if it's a page you're looking for: see if all 4-5 bits are set to one in your neighbor's Bloom filters. 1. false positives: we believe that our neighbor has something that he doesn't because the bits in the filter happen to be set. with 10x bits as objects -> 1% false positives. - the doc has been expired from the cache, but we don't know that yet. 2. false miss: not updated. newly-requested page is in the cache, but not yet in the bloom filter. 3. - the doc shouldn't be cached anymore because it is too old and would need to be refreshed from the server. no false negatives in the Bloom filter. Blog - How would using Chord work? In Chord, specific machines are responsible for specific documents. Chord-based proxy node: option 1: our guy stores stuff for his clients and what he's responsible for. - less storage is available. - dumb (have to send the retrieved document both to the client and to the responsible node(s). option 1b: our guy stores stuff for his clients and indexes what he's responsible for. (knows which nodes store which urls, for the urls in the list) - option 2: our guy stores only what he's responsible for, has to fetch from friends on nearly every access. - chord lookup on every hit or miss. If a client's proxy misses: - (1 or 1b) not in cache, do lookup, if lookup fails, ask origin server. - (2) not in cache, do lookup, that guy does the request (like a recursive dns query) - store the document (or a forwarding pointer to us) at the correct node in the ring. Makes sense only if the Chord lookup is really fast: either we're small scale and know everybody (so don't need the log n) or we're small and every rtt betwen caches is negligible Here the time to lookup in chord has to beat the time to just fetch the doc from the origin server. Another small catch: this scheme might shuffle related documents on the same page. can maybe patch for documents from the same server, but that might not be enough. Why does none of this matter (really): do we go to the single origin server to satisfy our web requests? - no.- big services distribute their servers; build the distribution network so that your requests are served by nearby machines, bringing down the latency and the (global, sort of) cost of serving your requests. it's possible that akamai (and similar) do a great job for their customers, but not a sufficiently good job over client requests. (does akamai obviate client side caching?) What do we do about streaming media? typical: use special infrastructure to serve it, not expect client-site caches to do much. is there enough duplication of video stream requests to warrant a cache? popularity distribution: there will be a few videos very popular for a very short time. e.g., Susan Boyle. mini-slashdot effect. Does caching violate copyright mumble? IANAL. How does caching interact with ad view tracking? Typical to check with the server for modifications; Typical to credit ads for clicks more than for views. Scale and performance of cooperative caching. "unpopular websites are universally unpopular" Conclusions: 1. don't bother with scalable cooperative caching; all the benefit can be had at sizes that can easily be accommodated by a single cache. 2. largest benefit for small populations 3. performance is limited by cacheability 4. mutual interest doesn't offer much above random or organization-based groupings. Comments: 1. this data set is completely unrepresentative. data sets are from smart people (university and microsoft) how does not including arbitrary internet users affect the study? "more diverse" than general population. "the masses look only at email and news." "less diverse" than general population. university population predominately young. the only thing to worry about is that these might be more diverse than reality. "less diverse" would mean that the cooperative hit rates should be higher than reality. 2. this was 1999. documents clearly won't be so cacheable any more. 4/30 - Redundancy Presentations: expect 6 slides: background; problem; insight and solution; how it work; that it works; conclusion. though you can mess with this organization or even number of slides if it works for you. Redundancy suppression. Box that we control on either end of a point to point link. Scheme for indexing uses this Rabin fingerprint. compute a hash of every fixed-length substring of the packet by sliding a window across the packet (not computing each hash individually) compressor, decompressor as boxes on either side of a constrained link look for redundancy across flows for different users have to deal with the potential loss of packets don't want to stall if a packet is dropped. why not just gzip? * operating over individual packets pays gzip startup cost (has no initial dictionary) * operating over streams of packets assumes that the stream is intact (there are little problems to solve) * can be slow. Blog stuff: How fast? in processing all the bits in 2000, 10 Mbits was the floor (any sane parameter setting ran faster) 44 Mbits (T3 speed) was possible (with some settings). End-to-end violation 1. how private is your data when these boxes store it. - unclear if it's much worse than normal... - IF something breaks, you might see data intended for someone else. 2. if you can get the benefit by running this at the ends, you really should. - if you were to run it at the ends, it *might* be harder to recognize sharing across node pairs. (or maybe not) 3. if you want to encrypt, you'd have to run it at the ends. - within the same transmission it's common to compress before encrypting, so it might already capture a lot of redundancy. How much cache do you need (and the cost) looks like 200MB is enough? seems sort of cheap if DRAM... Where is the redundancy coming from? broadcast-ish email (chatty smtp) might be an issue. spam? source code repository stuff curious? diff commands? What's the cost model (who pays?) this looks a lot like multicast - there's a constrained link, and a customer pays for the link *and* the ability to transit that much traffic: here we're getting more transit traffic than the link constrains us to. the end: Summary of papers and relationships. Architectures Tussles, end-to-end argument, NIRA, Inferring BGP (Gao) Started with the first two: the lenses with which to look at other protocols and developments. If you could do things end-to-end, without help from the network, you should. If you can design a protocol that makes government powerless, perhaps you shouldn't. BGP and NIRA describe different schemes for deciding how people should be permitted to decide how to communicate. BGP gives power to ISPs, by allowing them to make arbitrarily bad-for-performance choices. Coordinates Vivaldi and GNP The differences. The goals. How do we compare them? What do they tell us about the network? What do they do for us? Overlays RON, SOSR, PeerWise, PlanetLab, Slicing What does SOSR add over RON? What do we need source routing for? What are overlays for? How can we make better networks when Cisco and ATT control lots? Congestion and lots of TCP stuff ECN, RED (slow start, congestion avoidance, how to understand its performance) How do we share, without clobbering stuff. Incentives BitTorrent/BitThief/BitTyrant/PropShare, Bad congestion control How do we build protocols that are mechanisms to design s.t. everyone plays fair and the system benefit is apparent. Distributed operations Chord base of consistent hashing; the ability to walk the ring with log n accesses. Measurements and tricks King, Discarte, Rocketfuel, What does the network look like? how can we use existing protocols to determine representative behavior across the internet? How are things connected? Caching and performance DNS, Summary Cache, Cooperative proxy caching, Redundancy Suppression, Digital fountain We have the ability to talk to anywhere we want, but talking to nearby nodes is much faster.