Coordinating Overviews and
Detail Views of WWW Log Data
Harry Hochheiser, Ben Shneiderman*
Human-Computer Interaction Lab, Department
of Computer Science
*Institute for Systems Research and
Institute for Advanced Computer Studies,
University of Maryland, College Park, MD
20742
{hsh,ben}@cs.umd.edu
ABSTRACT
Web server log analysis tools
provide site operators with useful information regarding the visitors to their
sites. Unfortunately, the utility of
these tools is often limited by the use of aggregate summaries that hide the
information associated with individual requests, and by the absence of
contextual data that might help users interpret those summaries. Building upon
earlier work in the use of starfield visualizations to display web site
requests as individual data points [8], this paper describes the use of
multiple-coordinated visualizations of web log data at varying granularities,
and alongside additional related displays of appropriate contextual
information.
Keywords
World Wide Web, Log File Analysis,
Information Visualization, Snap-Together Visualization
1. INTRODUCTION
Analysis and visualization of
WWW Log data is an area of active research and commercial development, with
numerous products complementing a variety of research efforts [2,4,5]. Many of
these systems use aggregation to handle the large data sets generated by Web
servers. Reports that summarize the number hits per page, display period, or
requesting domain (for example) provide useful feedback for site operators, at
the expense of hiding the potentially useful information contained in the
individual request points.
In earlier work [8], we have
described the use of interactive starfield visualizations [1] for examination
of WWW Log Data. Using the Spotfire visualization tool [13], we have developed
visualizations of tens of thousands of individual web requests, with each
request represented by a single point. Although this approach avoids the data
loss present in tools that use aggregation, it suffers from the opposite
problem: the lack of appropriate summary data for understanding of higher-level
trends.
A further shortcoming of
existing log analysis tools is the display of data without context: reports of
web site usage are presented without any contextual information such as a site
map or even the content of the individual pages. The absence of this supporting information may complicate the
task of interpreting the usage reports.
Coordinated visualizations that
simultaneously present multiple views of relevant information might be used to
address both of these problems. This paper describes the use of Snap-Together
Visualization (STV) [12] to manage tightly coupled displays of log data at
different granularities, and in the presence of supporting contextual
information.
2. MULTIPLE COORDINATED VISUALIZATIONS & STVs
Coordinated, tightly coupled
displays have been shown useful in a number of domains [3,5,6]. Visualizations that present tightly coupled
displays of web log data could assist in the process of inferring usage patterns.
Possibilities include coordinated visualizations between aggregated and
disaggregated data, and coordination between log displays and supporting
external information such as site maps or web page displays. The use of coordinated visualizations for
database exploration has become an active area of research in recent years
[7,9,10,11].
Snap-Together Visualization
(STV) [12] is an architecture that allows users to connect visualization tools
such that actions of selections, navigation, and querying are coordinated. Furthermore, STV supports several
visualization tools (including Spotfire), raising the possibility of
coordinating starfield visualizations with tables, outline views, and web
browsers. STV uses Microsoft's ODBC tools for database connectivity, so data
preparation is straightforward: we import the data into Microsoft Access, and
design appropriate database queries, using SQL or Access' visual query
editor. Previously, STV to has been
used to visualize aggregations of highway incident data [6]: this paper
describes the application of similar techniques to web log data.
Figure 1: Two coordinated visualization
windows: selection of an aggregate in the upper window leads to display of the
appropriate constituent points in the lower window.
Using STV, we can create
visualizations that provide coordinated views of multiple data sets, or
multiple views of the same set. This coordination provides expressive power
that goes beyond single displays of individual or aggregated web requests. This
paper presents two possibilities: many others are possible.
3. AGGREGATIONS
Visualizations that present
each web request as an individual point lead to densely populated displays that
can be used – in combination with Spotfire's dynamic query tools - to infer
patterns. This approach is fundamentally limited, as it does not account for
aggregate counts that many site operators find useful. Visualizations of total
number of hits by URLs, or hits counted by time of day, increase the expressive
power of the visualizations.
Ideally, coordinated views
based on aggregations would support moving between different levels of detail.
By snapping a view of aggregations to a second window containing individual
data points, users can move quickly between overview and detail analysis.
Selection of an aggregate in the first visualization will lead to the display
of the component requests in the second display, thus allowing users to
``drill-down'' to finer levels of detail.
An example of this technique is
shown in Figure 1. The aggregate display shows totals of the number of hits to
a given URL (y-axis) on a given date (x-axis). Size coding displays the number
of hits, so the larger circles indicate higher number of hits for the given URL
on the given day. This visualization might be used to determine which pages are
accessed most frequently, or how usage varies across dates (or days of the
week). The individual data points found
in a given aggregation can be displayed in a second visualization, which might
present time on the x-axis and hostname of the requesting computer on the
y-axis, presenting each request for a given URL on a given day as a single
point. The displays are tightly-coupled: selection of an aggregate point in the
first visualization is selected, leads to display of the points found in that
aggregation in the second visualization window.
4. INCREASED CONTEXTUAL INFORMATION
The data found in web logs is
heavily context-dependent: the requests that are made, and the relationships
between those requests are strongly influenced by a variety of internal and
external factors. Perhaps most obviously, the paths that users take will be
largely determined by the links that are provided. Given the crucial role that
site design can play in influencing site usage patterns, it seems clear that
consideration of site topology might be useful for interpretation of web log
data. For example, a tool that
provided site layout information alongside log data might help users build
understandings that tie both data sets together. Unfortunately, log
visualization tools often fail to support the integrated display of this useful
information.
A simple example of the use of
STV to coordinate web log visualization involves coordination between an
outline view of site URLs, a browser window displaying a page from the site,
and a Spotfire window displaying requests to a URL by time (x-axis) and
hostname (y-axis) (Figure 11). When the
user selects a URL in the outline view, data for that page is displayed in the
Spotfire window, while the page itself is loaded into the browser window. This provides the user with additional
context that would not be available in a single visualization. This added context may simplify the process
of understanding patterns in the data.
Figure 2: Coordinated visualizations for context:
The outline window on the left-hand side provides a hierarchical view of URLs
on the site, while the web browser window in the lower right corner displays a
selected web page and the Spotfire display plots requests for a given URL, with
time on the x-axis and hostname on the y-axis.
5. DISCUSSION & FUTURE WORK
By presenting two or more
tightly coupled views, coordinated visualizations of web log data provide users
with multiple perspectives which can be used to build interpretations and
understandings in the appropriate context.
As STV provides a general-purpose, platform for visualization coordination, log
data might be visualized alongside other relevant organizational data. For
example, operators of e-commerce sites might construct coordinated
visualizations that relate web log access patterns to customer purchase
records.
These scenarios are just two
applications of coordinated visualizations to web log data. As STV provides a
general-purpose, database-driven, platform for visualization coordination, log
data might be visualized alongside other relevant organizational data. For
example, operators of e-commerce sites might construct coordinated
visualizations that relate web log access patterns to customer purchase records.
Furthermore, the flexibility provided by STV's use of a relational database
provides the possibility of visualizing the results of arbitrary aggregations
through SQL queries, side-by-side with “snapped” visualizations providing
context and drill-down capabilities.
The utility of these
coordinated visualizations might be improved by increasing the ease of
constructing coordinated views and integrating these views with external data
sources. Currently, development of coordinated visualizations using STV involves
manual creation of appropriate SQL queries for aggregation and formatting the
data. Appropriately designed tools could support the process of specifying
queries and selecting the data sets to be coordinated and the tools used to
generate the individual visualizations.
Aggregation tools similar to Visage’s outliner [10] or the Aggregation
Eye [11] might simplify the process of specifying the desired aggregates. Further expressive power might be gained by
increasing the range of visualization tools that can be used. Finally, tools that simplified the process
of integrating the log data with external data sets might provide site
operators with additional contextual information.
ACKNOWLEDGEMENTS
This research was supported by
a grant from IBM's University Partnership Program. Thanks to Chris North for
his assistance with Snap-Together Visualizations.
BIBLIOGRAPHY
[1] Ahlberg,
C., & Shneiderman, B. (1994) Visual information seeking: tight coupling of
dynamic query filters with starfield displays. Proc. ACM CHI ’94
Conference, ACM Press, New York,
313-317.
[2] Chi,
E., Pitkow, J., Mackinlay, J., Pirolli, P., Gossweiler, R., & Card., S.
(1998). Visualizing the evolution of web ecologies. Proc. ACM CHI ’98
Conference, ACM Press, New York, 400-407.
[3] Chimera,
R., & Shneiderman, B. (1994) An exploratory evaluation of three interfaces
for browsing large hierarchical tables of contents. ACM Transactions on
Information Systems 12(4) October 1994, 383-406.
[4 ] Cooley,
R. Mobasher, B., & Srivastava, J.(1999). Data preparation for mining world
wide web browsing patterns. Journal of Knowledge and Information Systems 1(1).
[5] Cugini,
J. & Scholtz, J. (1999). VISVIP: 3D Visualization of paths through web
sites. Proceedings of the International
Workshop on Web-Based Information Visualization (WebVis ’99), in conjunction
with DEXA ’99 Tenth International Workshop on Database and Expert Systems
Applications, 259-263.
[6]
Fredrikson, A., North, C., Plaisant, C., & Shneiderman, B. (1999) Temporal, geographical and
categorical aggregations viewed through coordinated displays: a case study with
highway incident data
Proc.
of the Workshop on New Paradigms in Information Visualization and Manipulation,
ACM Press, NY.
[7]
Goldstein, J., & Roth, S. F., (1994) Using aggregation and dynamic
queries for exploring large data sets. Proc. ACM CHI ’94 Conference, ACM Press,
New York, 23-29.
[8] Hochheiser,
H. & Shneiderman, B (in press) Using interactive visualizations of WWW log
data to characterize access patterns and inform site design. Journal of the
American Society for Information Science, forthcoming.
[9]
Livny, M., Ramakrishnan, R., Beyer, K., Chen, G., Donjerkovic, D.,
Lawande,S., Myllymaki, J., &
Wenger, K. (1997) DEVise: integrated querying and visual exploration of large
datasets", Proc. ACM SIGMOD'97,
ACM Press, New York, 301-312.
[10] Kolojejchick, J. & Roth, S. (1997)
Information appliances in Visage. IEEE Computer Graphics and Applications,
17(4), July/August 1997, 32-41.b
[11] Mockus, A. (1998) Navigating
aggregation spaces. Proc. IEEE
Information Visualization Symposium 1998 Late Breaking Hot Topics Proceedings,
IEEE Computer Society Press, 29-32.
[12] North,
C. & Shneiderman, B (2000). Snap-Together visualization: a user interface
for coordinating visualizations via relational schemata. Conf. Proc. Advanced Visual Interfaces 2000,
ACM Press, New York.
[13] Spotfire.
(1999). Spotfire [Online] Available at http://www.spotfire.com
(Accessed June 16, 2000).bb