Application Presentation:
Internet Traffic Measurement Visualization
Nada Golmie
Data Set Description
The data set used in this project consists of Round Trip
Time and Loss measurements on some Internet packet paths. It is available
from The National Laboratory for Applied
Network Research (NLANR) that conducts performance measurements and
traffic analysis of several NSF High Performance Connections sites in order
to derive a better understanding of service models and metrics of the Internet.
This includes both passive and active measurements. While passive
measurements are mainly based on analysis of packet header traces, active
measurements probe information from participating servers. There are about
80 monitors at various sites collecting data (Figure 1). Every minute,
each machine on the list is pinged once, with the results being collected
and stored. Raw data is available through a query mechanism from the Measurement
and Operation Analysis Team (MOAT), Active Measurement Program (AMP)
site. The data is tabulated and can be obtained in text form. It is indexed
by day and source monitor. The measurements include Round Trip Time (Min,
Max, Mean) and Loss to the 79 other sites. In order to gain additional
insights into the data I added three fields to the tabular display: (1)
the route, (2) the source and (3) the destination location fields as shown
in Figure 2.
 |
 |
|
Figure 1
|
Figure 2
|
Typical Visualization: Gnuplot/Excel
This type of data set is usually visualized by
a 2-D graph using gnuplot (or Excel) to obtain the plot shown in Figure
3 that depicts the RTT in milliseconds (y-axis) as a function of the day
of the year (x-axis). This is from one source (in this case the University
of Alaska) to a specified destination (Boston University).
 |
For the particular data set chosen, there are 80x80=6400
source and destination pairs. In order to gain any insights on RTT we need
to look at least 6400 plots which makes this representation not very effective. |
|
Figure 3
|
|
A More Interesting Visualization: Cichlid
A more interesting visualization of this data set is
obtained with Cichlid,
an experimental 3D visualization tool developed and maintained by MOAT.
It was created in September 1998 to visualize the IP address space utilization
of the Super Computing '98
conference network in real-time. Cichlid main features include real-time
3D display, animation, and point-and-click user interaction. It allows
the user to visualize and interact with real-time data sets in 3D.
It was designed with remote data generation and machine independence in
mind; data is transmitted via TCP from any number of sources (data servers)
to the visualization code (the client), which displays them concurrently.
It is written in C using OpenGl & GLUT graphics libraries and is publicly
available. Although Cichlid could be a rather powerful and somewhat flexible
visualization tool, it doesn't have much of a User Interface (UI). In order
to make it run with my data set I had to write my own server based on the
examples of servers provided with the distribution. That turned out to
be a rather challenging experience.
 |
The result is a 3D image (Figure 4) associating to each
source and destination pair a RTT. This type of display could be useful
for troubleshooting and observing the largest delay between two sites for
example. However, it suffer from occlusion and clutter that are usually
associated with visualization of large data sets. In addition, this representation
looses the geography associated with the sites so you still have to look
at a map in order to locate the sites and draw conclusions. It is also
extremely difficult to make any correlation between the types of statistics
collected such as RTT and loss, source and destination location. |
| Figure 4 |
|
An Unconventional Visualization: Spotfire
For this project, my main objective (other than looking
at a "cool" 3D visualization) was to try to find some correlation and trends
from the data collected. Some interesting questions that I had in mind
were:
-
does RTT depend on the distance traveled? the geographic
area?
-
where are the bottlenecks in the network?
-
what is the smallest RTT from a particular source? to a particular
destination?
-
is there any correlation between packet loss and RTT?
I thought that using Spotfire for visualizing a network data
set would be rather unusual, since Spotfire does not have any built-in
functionality to recognize the inherent relationships that exist between
the different elements of the data. Visualization of network data in most
existing tools such as the commercial package netViz
or SeeNet [1][2] developed at Bell Labs, tend to focus on the structure
of the data and the relationships between the nodes rather than on the
data itself and the statistics associated with it. Usually, the geographic
placement of the nodes which represents the physical network is the most
dominant element of the display. The statistics associated with the network
structure are usually dealt with through dynamic and interactive control
mechanisms such as the system described by Eick [1] and Becker [2].
Here are a few sample screen shots taken while manipulating
the data with Spotfire.
Loss - RTT Mean Relationship
 |
The scatter plot in Figure 5 describes the loss percentage
as a function of Mean RTT. A third and fourth dimension of the data are
visualized by using size and color coding for the source location and Min
RTT respectively. From the figure we can make the following general observations:
(1) routes originating in the NE (red coding) have relatively low loss
(below 20%) and routes originating in the NW (blue) have generally the
highest RTT (around 200 ms) and loss percentage (above 20%). An in-depth
analysis of the data is possible if we zoom in on the details by manipulating
the interactive control panel. For example we could isolate the routes
originating in NW from the rest of the data and look at Min RTT, Max RTT,
and Loss. |
|
Figure 5
|
|
Source -Destination - RTT Mean
 |
Figure 6 is a 3D scatter plot that describes the
distribution of Mean RTT with respect to the source and destination locations.
The color and size coding used are the same as in Figure 5. In this case
we observe that routes originating from SW (black) to almost all destination
have a RTT Mean around 100 ms. We also note that Mean RTT for routes originating
in NE (red) and ending in SE is smaller than for those starting in NE and
ending in SW. |
|
Figure 6
|
|
Path - Destination - RTT Mean
 |
 |
|
Figure 7
|
Figure 8
|
Figure 7 is a much busier 3D scatter plot representing all
routes with respect to destination locations and Mean RTT. The size coding
is set according to the Loss percentage and the color coding is set according
to the source location. We note that the display is dominated by paths
originating in NE (a majority of red). Paths originating in NW are
split in two groups. One group with relatively low loss but higher delays
and one with higher loss but lower delays. This 3D display contains close
to 4300 data points so it is quite normal that it suffers from occlusion
and cluttering . In Figure 8, only routes from and to Georgwtown University
are shown. We observe that the Mean RTT from NW sites (blue) is split in
two groups: one (from and to California) well below ~100ms and another
one (from and to Alaska) around ~200 ms.
Comments
I thought the interactive control panel provided by Spotfire
to be extremely useful and user friendly. It is very intuitive. That provided
a "nice" platform to manipulate the data and look for rather hidden aspects
and relationships. Although Spotfire was not developed with network data
in mind it was the right tool to use given the type of questions that I
wanted to answer. I cannot compare Spotfire to Cichlid where interactive
control is limited to rotating the display in 3D and one has to program
an interface for each data set.
I had relatively few problems using Spotfire. One thing
I noted is that it crashed often especially when saving the workspace or
exporting the display. Also the Edit/Properties menu button was disabled
in the version I used.
In terms of suggested improvements, it would be nice
if Spotfire had additional flexibility to manipulate the original data
set such as creating new categories from existing ones (i.e. adding columns
from existing ones) or even creating subcategories (some hierarchical ordering
of the data). In Excel one needs to write macros which are rather cumbersome.
I ended up writing a perl script to reformat the original data and add
new fields.
References
1. Eick, S.G. and Wills, G.J. Navigating Large Networks
with Hierarchies, in Proc. IEEE Visualization ‘93,1993
2.Richard A. Becker, Stephen G. Eick, and Allan
R. Wilks. Visualizing network data. IEEE Transactions on Visualization
and Computer Graphics, 1(1):16-28, March 1995.
Web Accessibility