Efficient engineering of the Next Generation Internet demands automated monitoring and diagnosis techniques based on a good understanding of the networks dynamic characteristics, including the intensity and time-variation of traffic and resulting delays at congested routers, the extent of packet loss, reordering, and duplication at routers, etc. Any effective network monitoring and diagnosis tool must operate at fine-grained long-duration time scale, otherwise it will miss important phenomena and reach erroneous conclusions. At the same time, the monitoring must have a low overhead, otherwise it will have significant impact on the phenomena being observed and may not be used in practice.
B. Motivation for using fine-grained time scale
It is detailed traffic and performance measurements and analysis that has been essential to identifying and ameliorating network problems. There have been several studies of Internet round-trip delays. Their pinging period is one second (or more) and provides very coarse-grain information. But, do one second (or more) measurements give appropriate information about the network dynamics? For example, in the following graph, the pinging period is one second:
For the same site, during the same time interval, with a pinging period of 40ms we get the following graph:
As evident by the above graphs, medium-grain (seconds/minutes) measurements do not give reasonable estimates of round trip times and of network performance, so we need to do fine-grain (milliseconds) measurements.
C. Monitoring and diagnosis tool used
In 1992, we developed a revolutionary approach to extracting network dynamics, based upon a fine-grained low-overhead probing tool called NetDyn and a deterministic network model for probe data interpretation.
Through extensive experimentation using NetDyn, we have discovered that the end-to-end dynamic characteristics of the Internet vary enormously at both fine-grained time scale (on the order of milliseconds) as well as at long-duration time scale (on the order of minutes and hours). This is a result of datagram routing, and hence will hold for the Next Generation Internet also. Hence
D. Studies to Date
The following table gives the links to the results from experiments conducted in 1992/1993 and 1997/1998.
E. Current Challenges
The deterministic model used in the analysis of the data presented above assumed in-order processing of data with no loss, reordering or duplicates. In practice, we do encounter these anomalies. The first step is to extend the deterministic model to account for packet losses, reorders, and duplications. This requires assumptions about the causes of these errors, to be verified by controlled experimentation, simulation or testbed experiments on logs collected from the Internet.
The next step is to develop a monitoring and diagnosis tool that probes an area of the Internet by using multiple concurrent connections between multiple host pairs spanning the area of interest. There are two aspects to this. One is the development of techniques to obtain a common time base for all the involved hosts from the logs of all the probing connections. The other is the development of appropriate deterministic models to interpret the observed data. Basically, each connection corresponds to a sequence of routers in tandem with some of the routers being traversed by one or more connections. The resulting extended delay correlation would be extracted by the measurements and their post-processing, using techniques similar to that of tomography.
F. Concluding Remark
In an Internet environment where traffic, technology, and topologies change fast, fine-grain (milliseconds) measurements are significantly different from medium-grain(seconds/minutes) and coarse-grain (tens of minutes/ hours) measurements. Consequently, fine resolution measurements are required to understand the true nature of the network dynamics. NetDyn permitted us to observe the time dependent behavior of a connection at the user level. Simple observations combined with realistic deterministic models give good insight (even though lot more needs to be done). By observing multiple connections concurrently we may obtain the exact locations of the problems. Also, by monitoring resource misuse we may be able to get first estimate of intrusion problems.
[Top] [Back to NetCalliper Project]