"A Brief History of the Net"

Evan Golub
egolub@acm.org

"The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships." - Vannevar Bush, 1945

(A draft of a work in progress....)


An introduction...

The Internet as we know it today had its origins far away from the e-mail, instant messaging and World Wide Web that have made it a household word. This is not dissimilar from the fact that the use of telephones as direct person-to-person communications devices was not foreseen as their primary use [Berg2000]. The driving force behind the creation of a nationwide network has been attributed to things as trivial as a man wanting to be able to have a single computer terminal on his desk rather than three [HL1996], to the ability to have a computer network that could survive a war. Whichever of the theories you believe in (I am a follower of the former one) it is interesting to note that the Internet as it exists today allows us to both access countless machines from a single computer on our desk as well as enable communications in times when "conventional" methods like telephones have not sufficed. The goal of this paper is to give the reader some of the flavor of the development path that led from the original ARPAnet to the current Internet, as well as some insight into how the tools we use today evolved.


The origins of the ARPAnet

The Department of Defense's Advanced Research Projects Agency (ARPA) funded a number of research projects that required mainframe computers. Having a way of connecting these machines would make it easier to facilitate communication with the various research groups, but could also be a cost-effective move. It would open the possibility for research groups to be able to work on a range of (very expensive) computers without having to have one of each in their lab. It could also allow smaller research groups that might not have received funding large enough to buy their own computer to gain access to computer time remotely. Today there are still many high-end, specialized, machines with high price tags which researcher from around the country and world access remotely via the Internet (which grew out of the ARPAnet).

In the 1960s, there weren't really any standard computer architectures (such as the x86 architecture that most personal computers in the 1980s and 1990s shared). There also wasn't really a standard operating system and associated set of commands. In order to have these very different machines communicate with each other as well as simple terminals, a single type of machine with its own sets of rules for communicating was designed. These machines would communicate with each other in a single, standard way and would then be connected to the individual computers. The connections to the individual computers had to be customized for each machine at the time, but the standard protocol is what became the Internet Protocol (IP) that we still use an augmented form of today.

As the network grew, it went through several names which reflected in part who was funding it and/or what its purpose was. It began as the ARPAnet, funded by ARPA. In later years other networks were launched (CSnet, BITnet, UUCP) and also used the Internet Protocol. The National Science Foundation (NSF) created a national backbone (NSFnet) to which other academic networks could connect. This network of networks that used the Internet Protocols came to be known as the Internet. As private networks began to be built, also using the Internet Protocols, there were questions of whether or not they could establish connections with the NSF backbone since the NSFnet was intended for jumpstarting academic networks rather than for business purposes. Though never really enforced, in 1993 the academic/research restriction was removed thanks to the Boucher Bill. In 1995 the NSFnet initiative came to an end. The line between public and private networks is quite transparent today.

Today's computers are mostly built based on a small set of common hardware and operating systems and many if not most computers that are on the Internet today directly communicate using IP. There are, however, still some that are dialed in to machines that are in turn connected to the Internet in a similar way to the machines of the original ARPAnet. On the horizon is the Internet 2 to provide even higher bandwidth for the next generation of technologies.


The development of electronic communications

Although the idea of sending messages between users on the same machine had been used as early as the early 60s, this had been limited to communicating with other users on your machine. The power of electronic communications began to grow as more users could remotely access a common computer or computing network (such as Prodigy or America Online) or as these messages could be sent between individual machines and networks on the Internet. There was also the development of systems such as Hyperties that would allow electronic documents to be connected to other electronic documents. The core ideas of this and other work led to the eventual creation of the World Wide Web. These types of communication began to grow over time (Queen Elizabeth sent her first e-mail message in 1976) but did not become (what I would consider) mainstream until the 1990's. There are essentially two basics types of computer-based electronic communication.

The first type (synchronous) is similar to a phone conversation or conference call - two or more users typing (or even speaking) messages to each other via their local computer or terminal. Examples of this type of communication include early programs such as write, talk, and Relay chat as well as more recent programs such as ICQ, IRC, and Instant Messaging. In the early days of the ARPAnet and NSFnet, it was questionable whether this type of communication could be used for personal conversation or business ventures, but this is a long-gone concern at the network level. It is, however, a question which is on the rise on a corporate level. If you access Internet resources from work, they might be monitored and misuse of resources (according to a company's rules) can lead to sanctions. Also, communication over the Internet is not encrypted by default, so it is possible that an industrial spy could monitor (for example) an AIM meeting that takes place at a company.

More details of these types of communications are discussed in the next section.

The second (asynchronous) is similar to letters, postcards or public bulletin boards - one or more users sending messages to each other or leaving messages for people to come to read. Examples of this type of communication includes early bulletin board systems (BBSs), newsgroups (such as USENET), Web forums, e-mail and resources that utilize e-mail such as mailing lists (such as LISTSERV). It is worth noting that BBSs would often contain more than just message board areas. They could contain game rooms and even file sharing rooms. In many ways, the BBSs shared many of the features we find on the Internet (eg: Yahoo! Games and Napster).

With the early BBSs, you needed to connect to the BBS directly using your Modem and phone line, so it was very common to limit yourself to local BBSs since non-local ones would require long distance phone calls with per-minute charges and become quite expensive. Also, since each user was making a direct connection to a Modem bank, it was possible for a BBS to become "full". One of the advantages of things like Web forums is that there is no need for a bank of Modems dedicated to the forum - the user connects to the Internet using their own Internet Service Provider (ISP) and then go to the forum via the Internet. Although the Web server for the forum could become overloaded, the nature of Web pages in general (connect to server, get page, disconnect from server) reduces this likelihood somewhat. Additionally, if demand does grow, adding a single server to a site greatly increases its ability to handle requests.

With the USENET newsgroups, the user would connect to their local service and (if that service received USENET feeds) would read messages from that local copy. With mailing lists (such as the original LISTSERV) each member of the list would receive a posted message via e-mail. The advantage of using a mailing list rather than sending mail to everyone yourself was that the central server could maintain an up-to-date list of members rather than each individual keeping track of such things. One primary difference between newsgroups and mailing lists is storage space. For USENET, a local provider would receive a copy of the articles that had been posted, keep them in a central location, and allow the individual users to access them. Each user has a file associated with their account which keeps track of which messages they had read. In this way, hundreds or thousands of users could essentially receive a message without the system needing to make hundreds or thousands of copies of the message. With a mailing list, if several hundred users at the same company were on the list, then several hundred copies of the same message would be sent to that company's machines.


Let's talk about it...

There are a variety of ways that two individuals can communicate with one another online. One of the factors in a person's choice was often that of technology. Academic users on the network via a UNIX machine might choose to use the talk program while academic users at the same institution who accessed the network via a mainframe with a BITnet connection might choose to use RELAY chat. The fact that the talk program restricted you to one-on-one communication, while RELAY chat had virtual rooms (some public some private) made the two tools very different. However, users of BITnet may never have heard of talk and vice-versa. On a personal note, when I was a BITnet user, I never heard of talk and once I began using UNIX machines on the Internet, I did not see users in that community discuss using RELAY chat. In many ways, current instant messaging systems combine the features of these two services - you can IM an individual or enter a chatroom. One of the major advantages of IM systems is that it keeps track of which users are currently online as well as what machine they are using. With the talk program, you needed to know the hostname of the machine being used by the person you wanted to contact in order to make the connection. Since a person might use one of several machines, you could need to keep track of all of these, and check each one. There is a utility called finger that allows you to query a remote machine to determine if someone is currently logged onto that machine (though typically only UNIX machines have finger servers running on them). Some users (including myself) would add instructions to their account that would place information in a file that finger accessed (the .plan file) which would insert information about which machine they were currently using each time they logged onto a machine. However, this was rare, and was an awkward hack.

An interesting thing to take note of with regard to instant messaging systems, is that the information about which user is on which machine is (by necessity) available over the Internet. There have been controversies in past few years where companies put code into their IM software that would look at the AIM and other IM registries. This was done to allow users to interact with users of the various IM services without having to use those services directly themselves. This type of sharing of resources and information was commonplace and in many ways encouraged in the early days of the 'net, but in the newly evolving commercial sector of the Internet, this was seen as a way of "grabbing" users from (for example) AOL by making it unnecessary for users to go get AOL accounts if they wanted to IM with AOL users. After a program called Jabber was released as a client that would interact with other IM systems, AOL started blocking messages from Jabber. Jabber was promptly recoded to get around these blocks. Another IM client called Trillian is gaining popularity as a client that will interact with AIM, MSN Messenger, Yahoo! Messenger, ICQ and IRC using a single interface. It has the added bonus of allowing you to have IM conversations with other Trillian users that are encrypted as they travel over the Internet. There are many legal, ethical and philosophical questions that events and developments such as these raise.

A very different type of communication forum is the Web chatroom. In this situation, there is a chat server running and users interact with this server via an interface presented on a Web page. The Web site itself might be running the server, or it could be using an external server resource, and just be providing the page with the interface from its own server. These types of chatrooms are typically targeting a community that would frequent that Web site, possibly in an attempt to foster a better "local" community. There are also a large number of Web forums, where users post messages to a local newsgroup. The advantage of these chatrooms and newsgroups is that they are strongly tied to the Web site that provides them, and are therefore often easier for the user discover.


When cultures collide

As the various computer networks (ARPAnet, BITnet, AOL, Prodigy, CompuServe, etc.) developed, each typically served a certain category of users. Although certain behaviors seem to have crossed network lines very easily (such as the use of text-based emoticons like the :-) "smiley"), each community often had its own internal set of rules and conventions. As the different communities began to interact, some interesting and some unfortunate events took place. Among the most memorable should be when America Online connected its users to the USENET newsgroups. AOL had their own internal discussion groups, but in September of 1993 they enabled their users to read and post to USENET newsgroups. However, the way in which they integrated the USENET groups made it appear to just be another part of AOL, and users treated them as such. Unfortunately, the casual behavior that was commonly accepted and encouraged on AOL was not received well by some members of the USENET community. While not all users of AOL committed "breeches" of netiquette such as posting messages to newsgroups saying "Me too!" when messages were posted by users saying they were looking for answers to questions, the America Online community as a whole was somewhat tainted by the initial encounters between the AOL community and the more typically academic and research community of USENET in general. While the influx of new users who were not aware of the norms of this community was common each Fall when many new College and University students got their first Internet account, the time period following this event is often referred to as The September That Never Ended. An excellent discussion of many events such as AOL connecting its users to the USENET groups and the incidents mentioned later in this article involving Church of Scientology documents appear in Net.Wars. Even prior to this event, AOL users were often viewed as "lesser" members of the community of computer users - "just everyday people" rather than "computer-savvy people" - this series of events simply served to push forward the already-begun process of identifying different communities and attempting to classify people based on some generalizable characteristic. In the non-virtual world this is often done by skin color, gender, etc. while in the virtual world, the domain of your e-mail address (aol.com, yahoo.com, umd.edu, netzero.net, etc.) started to become a way that others could attempt to place you into a user type.

The 1980's also saw the "criminal element" begin to move into the 'net neighborhood. An interesting example of this is recounted in The Cuckoo's Egg. Part of the community of military users on the 'net in the 1980's were at first reluctant to believe that their computer were being broken into, and later were reluctant to believe that any of the machines that could be accessed had any information of use. The fact that many different pieces of non-classified information could be gathered and reassembled to discover classified information had not yet been realized by many. Also, the ability to disseminate information became much easier as more people became members of the online world. This meant that (for example) if someone discovered the password to a system, they could spread this information to a vast number of people much more quickly. Additionally, people looking for such information could more easily come looking for it on "known" locations. One of the side-effects of this was the genesis of script kiddies - computer crackers who download pre-built tools to enable them to do things such as break into other computers that are on the Internet. They are not themselves technically knowledgeable, but can often do as much damage as crackers who are knowledgeable, simply by causing massive, random chaos through their attacks. Policing the Internet is a difficult challenge since it is an international entity, and questions of jurisdiction arise. This is discussed more later in this article.


TLDs and other TLAs

Every computer on the Internet has an Internet Protocol (IP) address. This address (currently) is in the form #.#.#.# which each # is an integer between 0 and 255. However, we rarely use these directly but rather use a machine's hostname. A machine's hostname will typically be a local machine name followed by the domain in which the machine exists. As an example, in www.umd.edu the www is the machine's local name and umd.edu is the domain. Many organizations will have a single domain for the organization as a whole (such as umd.edu) but also have local subdomains. For example, www.cs.umd.edu is a machine with local name www which is in the domain cs.umd.edu which actually corresponds to being to cs subdomain of the umd.edu domain. What happens is that when you enter a name in a Web browser or an e-mail program, a lookup is done behind the scenes to determine the IP address of the machine to which you are referring.

The right-most part of the domain is referred to as the Top Level Domain (TLD). Some common TLDs are com, org and edu. These domains (as well as several others) are overseen by an organization called Internic. If a person, company or other organization wants to register a domain with a .com, .org or .net TLD, they would contact a domain registrar (such as Network Solutions, Gandi or some other accredited company). As the Internet grew, more TLDs have been added. There are three different ways in which this has happened. The first was that each country has a country code TLD which it can use (such as .uk or England, .ca for Canada, etc.). The second was that additional TLDs such as biz and info were added to the group which included com, org and edu. Both of these categories of additions to the Internet's TLD list are fully integrated and deemed as "official" by the Internet Corporation for Assigned Names and Numbers (ICANN). A third way is that some companies such as New.Net created their own top-level domains. Users needed to install software from the company which would allow the user's Web browser to look up what the IP address was for machines in these unofficial TLDs. This does mean that for machines within these domains, you can only access them from computers with this extra software installed, and only via programs with which this software works. This in turn means that (for example) you might not be able to send e-mail to users in this domain.

While the existence of a company such as New.Net might seem strange, the addition of new TLDs by ICANN is a slow process and the number of domains they add is very small. New.Net offers TLDs which can be more descriptive (such as .travel or .scifi). It has been argued that this type of descriptive domain might become less important as the number of sites in general continues to expand, and people rely more and more on search engines such as Google to find sites rather than trying to guess or remember hostnames. The concept of the association of TLDs to being important is similar to the desire for a 212 area code - 212 was assigned to New York City because it was a major city (on a rotary phone 212 is the fastest valid area code to dial). An interesting business move made by the country of Tuvalu was to take advantage of the fact that their country code of tv doesn't sound like a country code, but rather a "cool" TLD.


Ports and Protocols

Although many (if not most) of the machines on the Internet that have Web servers running on them are named www.something, it is not the machine's name that determines what server(s) it runs. It is possible for a single machine, with a generic name, to run many servers, including a Web server. A computer's connection to the Internet can be seen as a telephone switchboard in some ways. Just as you could dial a central switchboard and then request to be connected to a certain extension, you can connect to a machine on the Internet and request to communicate with a certain port. Programs can listen for incoming messages on specific ports. The programs that listen for messages on these ports will typically implement some protocol (or set of rules) corresponding to an Internet resource. There is a list of "well known" ports that should only be used for certain services (such as the World Wide Web). This list is coordinated by the Internet Assigned Numbers Authority (IANA). Some examples of "well known" port numbers are Port 21 (ftp), Port 23 (telnet) aand Port 80 (www).

Each Internet service is typically defined by a set of protocols that define how a client and server exchange information and commands. For example, the hypertext transfer protocol (http) defines how Web servers interact with requests from Web browsers. It is possible for a client program to speak many different protocols. This is the case with many Web browsers. The way in which the browser knows which protocol to use is determined by the Uniform Resource Locator's (URL) resource type. If the URL begins with http: then the browser will communicate using the hypertext transfer protocol. If, however, the URL begins with ftp: then the browser will communicate using the Internet file transfer protocol. A URL could request the use of a protocol that the browser does not have the ability to use. In this situation, the browser might be able to start another program for the user that does know how to communicate using this protocol. An example of this is the telnet: identifier - if you enter a URL beginning with this identifier using most browsers, the browser will start a telnet program for you to interact with.

This association between port numbers of resources is mostly an internal one. However, it can come in useful when analyzing certain data. For example, according to records about NSFnet traffic, in January of 1993, about 47% of the data being transmitted over the NSFnet was associated with ftp ports 20 and 21 while less than a tenth of a percent of transmitted data was associated with the www port 80. By December of that year, about 41% of the data were associated with ftp ports and www port 80 traffic had grown to over 2%. However, according to the same records, in January of 1993 16% of NSFnet traffic was going to ports that were not associated with well-known services. In December of 1993 13% of traffic was going to these "unknown" ports. A question that should be raised is whether the jump of Web activity suggested here (0.002% --> 2.213%) was due to more Web traffic or due to Web servers being set to listen to port 80 rather than other ports (such as a popular early port, 8080).


Freedom of Speech and other interesting notions

The Internet has proven to be a very interesting arena for issues such as free speech. The Web provides a forum in which an individual has as much ability as a large organization to publish their views. It is also possible for individuals and groups in countries that are more restrictive to obtain Web space on sites in other countries, possibly on free sites such as Geocities or Tripod). There are also techniques available to allow for anonymous posting to forums and newsgroups. However, even with this, there are certain challenges that present themselves.

As an example, if someone writes a piece of software that breaks the copy protection on a DVD disk, and posting this software is illegal in the United States, but not in some other country, what (if anything) can be done to that person if they post their code on a site whose servers reside in that other country? Similar questions arise regarding exporting certain technologies. If it is illegal to export some piece of information, what happens if you place it on a local Web site that is accessed by a user in a different country? At a technological level, it is possible for a country to restrict contact with certain parts of the Internet by restricting which domains are allowed to pass through filters. However, it is not an easy task, and there are ways around many (if not most or all) of these techniques. It does, however, create an extra hurdle that will block many users. As an example of other ways in which a government can work to enlist others in their pursuit of restricting information, in January of 2001, the Chinese government ordered ISPs to screen private e-mail for political content. They also stated that they will hold the ISPs responsible for subversive postings on their Web sites.

There are also questions of how much influence or control countries can have on other countries. In 2000, a French judge ordered Yahoo! to restrict French citizens from accessing auctions featuring Nazi memorabilia. There is also the potential for conflicts between freedom of speech and freedom of religion. In 1995, the person running an anonymizer service at anon.penet.fi (in Finland) was forced to reveal the name of the person who used a specific account to post information which was claimed to have been stolen from the Church of Scientology. Around a year and a half later the site shut down due to several more legal actions requesting the real names of users of the system. The Electronic Frontier Foundation (EFF) has a whole page about the Church of Scientology and issues of free speech online.


War, revolution and cutting class...

As was mentioned earlier, one of the claims made of the Internet is that it would allow communications to continue in the event of a nuclear war. Though we will hopefully not discover the validity of that claim, the Internet has served as a communications channel in times of war and peril. It is also starting to be looked at for use in other non-fixed communications situations.

During the Gulf War in 1991, many information updates were sent from the RELAY chat users in the Middle East to other users across the world. During the war in Yugoslavia in 1999, information continued to flow over the Internet. Additionally, citizens gathered virtually in chatrooms during attacks. Add discussion of Croatia and Milosevic and SMS. The Internet and mobile phone use was credited with helping to mobilize people in the movement to remove then-President Joseph Estrada from office. One page referred to it as "spam democracy". Web pages were used as a powerful communications medium to spread the word of the Zapatistas' plight and plans in Mexico. In the aftermath of the terrorist attacks on the World Trade Center, although phone lines (both landline and cell based) were overwhelmed, many people were able to contact friends and family in the area using e-mail, IM and text message systems such as Blackberry PDAs and Short Message Service (SMS) systems. In 2003, thousands of listeners used text messaging and e-mail to express their opinions on the Iraqi invasion to the BBC World Service.

These communications channels are now being looked at for other information distribution and contact purposes. After rumors started to spread about Hong Kong being declared an "infected city" (a rumor that ironically started online), the government sent SMS messages to roughly 6 million phones informing them of the hoax. A company in England called Truancy Call uses text messaging as one way to alert parents if their child is not present at school that morning. However, these types of uses of SMS have many potential problems. Just as Internet users are vulnerable to e-mail and Web page hoaxes, so to would SMS users be vulnerable to text-message hoaxes. The need to be able to verify the sender is crucial in situations such as these.


Sure we can do it, but should we?

There are a variety of things we can do with the technology available today. A large question is whether we should be doing these things. Archives of newsgroup posts prove to be an interesting area. Many people were unaware that these archives were being kept. When DejaNews put parts of them online with a search interface, it came as a surprise to many people. When Google acquired this archive from DejaNews, they added the ability to request the removal of something you had posted. This has brought about a number of questions. At the technical level, many users have changed their Internet accounts since posting messages, so there is a question of how to prove you are the original poster. At the newsgroup dynamic level, there is a question of how to deal with posts by other users who responded to the post that you want removed. Some of these follow-up posts include the text of your original post. At the historical level, there is the question of whether this revision of a historical should be allowed at all.

Search engines in general raise copyright questions. In order to be a useful search tool, the search engine gathers information from Web documents to build their database, and then show partial contents of the pages that match the user's search. Image search engines have been the focus of much attention due to questions of whether the search results are infringing on the copyrights of the creators of images. Google shows you both the image that matched your search query, as well as the URL of the page that displays the image. If you click on the image that is returned, Google takes you to a page which shows the image in the top frame and the actual page on which it was found in the lower frame. A designer of a Web page or a Web site can indicate to search engines that certain documents (or entire parts of sites) should not be included in their database.

Another area in which questions of fair use and copyright are being called into question is the trading of audio and video files over the Internet. The trading of files online dates back to at least the early BBSs. However, it was not until the bandwidth (amount of information that can be transferred over the network in a given amount of time) increased to higher levels that this started to become more noticed. An audio file trading service called Napster became the focus of much legal attention as users from across the Internet began trading MP3 sound files (typically music) using this service. MP3 files are compressed versions of sound files. These files are about 10% of the size of a sound file as stored on a compact disk, but do not have a proportional drop in quality. Although this service was forced to shutdown and reorganize their system in an attempt to protect artists' copyrights, the question of whether this type of trading of music falls into fair use continues. Even with the changes to this service, there were (and still are) many different resources on the Internet where MP3s can be found. Additionally, network bandwidth has continued to increase and the collection of files available for download now includes movies and television programs. The coming years will see the questioning and possibly rewriting of many laws regarding copyrighted works. It is important to note that these are not new issues but merely new phrasings of old questions (the questions were raised and dealt with when personal tape decks and VCRs became household items).


What's next?

As more ubiquitous devices become commonplace, new versions of old questions will be raised once again. Due to the size and abilities of these potentially smaller devices, Web pages will need to be altered in order to be displayed. If companies make these alterations, will they be violating the copyrights of the Web page creator?

We are already seeing a great increase in the ability to track individuals using technology. Employers have the technical ability (and in many if not most cases the legal ability) to capture every keystroke their employees type and every message they send. Centralized services such as AIM have the ability to monitor and record communications through their service (there was a brief hoax where it was announced that AOL had been keeping copies of all AIM messages and that Google had obtained this archive and had made it available via their search engine).

The Internet itself provides no real security for information being sent across it. However, as wireless networks have become more prevalent, this lack of inherent security has become more well known. Many companies currently discourage employees from discussing internal corporate matters online using public services such as AIM. However, with wireless networks, even internal communications have the potential of being monitored. As people begin to set up wireless networks at home, how many realize that their neighbors could be using their service or even monitoring their activities? Again, it is important to note that this is no different than using a cordless phone and other having the ability to monitor your conversation - it is just bringing an existing issue back to the forefront due to its rediscovery with new technologies.








This page last modified on Saturday, 06-Sep-2003 20:22:55 EDT.