Visualizing
and Analyzing Web Browser History Data
using Eureka and SpotFire
CMSC 838B: Information Visualization Application Project
Matthias Mayer
Research Visitor at HCIL
mayer@informatik.uni-hamburg.de
March 6th, 2001
Motivation
About 80% of all accessed web pages have been visited befor by the same user
(Cockburn & McKenzie 2001). Revisiting these pages,
however, is no easy task. Today Back Buttons and Favorites are used to revisit
pages in current or previous sessions. These mechanisms could be enhanced using
better and more adequate interfaces. The patterns with which users access web
pages should help to find how these new interfaces should look like.
This small project uses two commercial data visualization tools in order to
find patterns in my own web browsing behaviour. Therefore I use a Netscape history
file which documents about 7 months of my browsing activities. Thus, Visualization
here is used to reveal characteristics after browsing has already taken place
- not to support the user during his sessions.
In my PhD project I develop a tool to visually support web users during their
sessions by showing their history. More about these 'Browsing Icons' can be
found on the web (Mayer 2001).
Questions
What Questions must be answered in order to develop better tools for revisiting
web pages?
Some questions are:
- How often are pages revisited and in what intervals?
- Which pages are visited often?
- Can we charachterize sessions? E.g. how often is just the last page important
to the user?
Data
- I used a Netscape Communicator 4.7 history file, produced on my laptop under
Windows98.
- Data cover 7 months: Oct. 16th 1999 - May 15th 2000.
- 5096 web pages were accessed (rows in the file).
I wrote a simple Java application to convert the Netscape file in input to
Eureka and SpotFire. This step includes the calculation of derived data such
as duration between first and last visit. Each row in the file contains information
about one URL:
| index |
Simple linecount |
| URL |
the URL of the page |
| Title |
the title of the page |
| Kind |
I assigned each etry one of the following kinds:
file (on my laptop)
server homepage
query
comp. sci. department site
our lab's website
secure accesses
unknown
|
| Host |
the host part of the URL |
| FirstVisit |
date fo the first visit |
| FirstDaytime |
time in the day of the first visit |
| LastVisit |
date fo the last visit |
| LastDaytime |
time in the day of the last visit |
| DiffFirstLast 1 |
difference between first and last visit in milliseconds |
| DiffFirstLast 2 |
difference between first and last visit in days, hours, minutes |
| numOfVisits |
number of visits to this page |
|
 |
Visualization Tools
I used InXight Eureka 1.1 (www.inxight.com,
formerly Table Lens) and SpotFire 5.1 (www.spotfire.com).
Restrictions
There are several major restrictions due to the kind of data:
- The date lack certain important information because Netscape doesn't store
them:
each single visits date and duration,
widget by which page was loaded,
window in which it was displayed, etc.
- just one person's data, thus highly biased.
- I used another machine in my office as well, so data are missing.
- I sometimes changed the time on my computer, so there are errors in the
time fields.
- These restrictions show the necessity for a well planned study with modified
browsers and a larger number of users. This small project just tries to find
out to what extent the tools would support the analysis of a better data set.
Selected Visualizations
 |
| Fig. 1a Eureka |
|
|
 |
| Fig. 1b SpotFire |
|
Getting rid of errors: An accidentally inserted outlier was obvious in both
tools at first sight (one data point was dated early 2001).

Fig. 2
Fig. 2 shows the data first sorted by kind, then by number of visits. Findings:
- More than half of all web pages were just visited once (see right column).
- Few pages show extremely high numbers of visits. Maximum: 456, then 86,
80, 74, ..., decreasing logarithmically).
- All 'kinds' show this pattern.
- The negative difference between the visit dates point at an error in the
data: I sometimes changed the date of my machine in order to circumvent expiration
dates...

Fig. 3
Fig 3. shows the data sorted by 'number of visits' and then by 'kind'. Findings:
- All kinds show both less and more frequented pages (compare two rightmost
columns).
- Our institute's site (third 'kind' from bottom) shows the highest percentage
of large number of visits.
- Just a few of the pages that were visited often were also visited over a
long period of time (compare orange line and right column).

Fig. 4
Fig. 4 shows a SpotFire Scatterplot. Mapping: x - number of visits, y - difference
between visits (in 'artificial' dates beginning at 1.1.1970), size - number
of visits, color - kind, markers are jittered. The homepage would be further
on the right. Findings:
- As in Eureka it was obvious that there were just a few pages which were
visited very often or over a long period of time.
- Counting single objects or looking at content was much easier than in Eureka.
- It was harder to estimat proportions due to the large number of hidden items.

Fig. 5
Fig. 5 shows a SpotFire Histogram View. Mapping: x - host, y - sum of entries
with this host. Findings:
- Queries are counted as different pages, thus the high green and black columns.
They should be eliminated to explore the rest.
- A lot of different pages were visited on java.sun.com and our lab's site.
- Good: clustering of items (Eureka just provides grouping, but still keeps
single items).

Fig. 6
Fig. 6 shows a SpotFire profile chart (a parallel coordinates visualization
developed by Inselberg). Mapping: Six parallel axes show values of six attributes,
each in it's own scale. Entities are connected by lines.
I filtered the view to show just items that were visited more than 20 times
and over a period of at least five days. Findings:
- Again, the broker site shows the most hits. (Well, fall 1999...)
- cybergeography.com was visited over the longest period of time.

Fig.7
Fig. 7 shows a SpotFire 3D ScatterPlot. Mapping: right to left - date of first
visit, bottom to top - number of visits, depth - hosts. Number of visits was
redundantly coded to size, kind to color. Findings:
- I liked the spatial feeling in this visualization.
- My march vacation was visible as large valley (in fact, i was in 'Valle
Gran Rey').
- Top bubbles represent again the broker pages.
- The cluster of black pages on the left in the valley are visits to an online
dictionary. When I came home from vacation I wrote an English paper.
Conclusion
- Both tools provide good overviews.
- They are good to verify or reject expected patterns and hypotheses.
- Intimate knowledge of the domain and dataset were important for interpretation.
- More insights could be found if we had richer data.
- None of the tools made it easy to get the following numbers: I visited 5098
pages and made 12179 visits. This is an average of 24 pages and 58 visits
per day (Cockburn & McKenzie 2001 found the average visits to be 42 per
day).
Critique of Tools
Eureka
- + easy to learn.
- + One view shows overview, proportions, distributions, correllations and
even details on demand.
- - Focus is still hard to control.
- - Statistical numbers can't be retrieved.
- - lacks a lot of functionality on the detail level (e.g. sums, scrolling
whole pages)
SpotFire
- + Variety of specific visualizations.
- - Therefore, more difficult to learn.
- + Better access to single data entries.
- - occlusion impedes seeing proportions.
Both tools could be enhanced by
- improving interactivity (selection, adjusting the focus etc.).
- a better concept for grouping, clustering.
- better undo semantics.
- allowing to extract statistical information easily.
- enabling modifications of data (e.g. to get rid of outliers).
References
We currently build a bibliography for history visualizations on the HCIL wiki
web:
http://jazz.cs.umd.edu:8080/hcil/historyviz.wiki?cmd=get&anchor=HistoryViz
(Cockburn & McKenzie
2001)
Cockburn, Andy and Bruce McKenzie (2001). What Do Web Users Do? An Empirical
Analysis of Web Use. International Journal of Human-Computer Studies, In Press.
2001.
http://www.cosc.canterbury.ac.nz/~andy/papers/ijhcsAnalysis.pdf
(Mayer 2001)
Mayer, Matthias (2001). Browsing Icons. Website which accompanies the PhD project.
http://www.hfbk.uni-hamburg.de/lem/mayer/phd/
[go to top]