Tunable Viewtips Icon - Click to Download Demo! Tunable Viewtips
 

Kartik Parija and Jaime Spacco

{kartik, jspacco@cs.umd.edu}


CMSC 838B - Information Visualization
University of Maryland, College Park, MD

Abstract

There are many tools that perform sophisticated visualization and analysis of data. However, most of these tools require some familiarity with the dataset in order to identify interesting characteristics. We propose Tunable Viewtips, a standalone tool that performs a "first pass" on an unfamiliar data set. It provides rapid dynamic profiling of multidimensional data utilizing common statistical methods. In our first version, we concentrate on single dimensional data which is often ignored or misrepresented. We present Tunable Viewtips 1D, which focuses on the visualization and analysis of one dimensional data in its true form.

Introduction and Motivation

Spotfire provides a "view tip" tool [1] which returns an ordered ranking of potentially interesting correlations. The user can preview the scatter plots of pairs of axes in a small window, or look at histograms of various axes [See Figure 1]. This is an excellent beginning with much potential! But in it's current form, Spotfire's "view tip" tool is insufficient. First, it is static! The user cannot tune parameters to the algorithms, nor can the user select other algorithms. There is no mechanism for a third-party developer to add new algorithms. Finally, too little screen space is devoted to displaying these "view tips". It's difficult to distinguish values. We believe that this feature needs an overhaul.

Figure 1: Spotfire View Tip showing Melanoma data [1]


The first issue we address is the limited functionality of the view tip feature. We remedy this by including new algorithms, along with a plug-in architecture for adding additional algorithms; by providing dynamic sliders that tune these algorithms; and by providing more feedback, such as displaying the mean and standard deviations.

We then focus on is visualizing true one dimensional data. To our knowledge, this area has been ignored or (mis)-represented by using line or bar graphs [6]. We want to profile individual columns of the dataset to find any interesting relationships or patterns that lie within them. 

Finally, we address the problem of occlusion for dense datasets. We've implemented jitter in one-dimension that uses the entire screen space allotted for the plot. Since only the Y coordinate of the point encodes it's value, jittering purely in the X direction does not significantly compromise the visualization. A 'reflection' technique is used to show as many data values in densely packed data.

 

Previous and Related work

    Past research has primarily focused on visualizing textual data in the case of 1D. Examples of these include program listings, documents with many lines, and document search results. Gary Geisler [2] and the On-line Library of Information Visualization Environments (OLIVE) [5] provide an excellent overview of such research. Figure 2 shows SeeSoft, a package from Visual Insights, a Lucent Venture [4], that visualizes lines of code in large software engineering endeavors. Using color and scaling, it provides a remarkably effective means of understanding the various components and inter-relationships between modules.

 

Figure 2: SeeSoft from Lucent Technolgies, a system to visualize code in software [4]

 

However, there are very few visualizations of individual columns of numerical data. We found two software packages that concentrate on the visualization and analysis of 1D data.

 

xgraph can be used to view 1D data files with the format
			"Time=0.0
			0.0  0.0
			0.2  0.04
			0.4  0.16
			0.6  0.36
			0.8  0.64
			1.0  1.0

			"Time=1.0
			0.0  0.0
			0.2 -0.04
			0.4 -0.16
			0.6 -0.36
			0.8 -0.64
			1.0 -1.0

Figure 3: Xgraph [7]

 

 

Figure 4: xPloRe: Teaching Quantlet tw1d, to visualize 1D data. [8]


Implementation

 
Demonstration Tunable Viewtips Icon - Click to Download Demo! Click icon to download a demo and sample data files

    In this section we present application of our tool to various kinds of datasets. As far as possible, we have used 'real-world' data, which allows us to interpret the results in some meaningful manner. 

Error Detection: The first example considers a dataset with just one column of data, namely the closing price of the the Dow Jones between years 1900 and 1901. This is part of a very large data set obtained from CMU's Statistics repository [3]. By simply plugging this dataset into the tool, it was immediately obvious that there were some errors in the dataset, as there were occurrences of negative closing prices which are naturally absurd. [See Figure 7]

Figure 7: Dow Jones Closing Price, 1900 - 1901 [3]

Advantage of the Jitter Feature: The above example uses the Jitter feature to show as much of the dataset as possible.  This is extremely useful for visualizing datasets which have many repeat values or are very densely packed within certain ranges. We present an example to show the difference when Jitter is turned on and off when viewing a dataset consisting of 5 columns of data each containing the price of a particular stock recorded over the past 29 months. They are Intel (INTC), Cisco (CSCO), Microsoft (MSFT), Human Genome Sciences (HGSI) and General Electric (GE). The figure shows the column containing the Microsoft data [Data Courtesy: MSN Moneycentral, www.moneycentral.com]. This output also shows that the Outlier detection algorithm has been performed. [See Figure 8 (a) and (b)]

 

Figure 8 (a) and (b): Microsoft Stock Price over the last 29 months. With and Without Jitter

Use of Outlier and Standard Deviation Slider Bar: The figure below shows the grading sheet of CMSC 434 offered in Spring of 2001. The columns are grades assigned in various homework assignments and projects, in addition to a column showing total number of points and overall percentage grade. The current column being viewed shows the percentage grade. Using the outlier detection algorithm and the standard deviation slider bar, we are able to able to get a fairly accurate view of the "Letter Grade Spread". For instance, with a normal grading scale in place (A ~ 90 and above, B ~ 80 and above) and taking into account class performance and average, we could assign about 14 - 15 A's, 4 C's and award the rest of the class B's. [See Figure 9]

Figure 9: Overall Percentage Grade of CMSC 434, Spring 2001

Weaknesses 

As with many visualizations, our tool is not without problematic areas. Here we list some that we have identified:

 
  • There exists a tool-tip feature that allows the value of the data point to be shown when the mouse is hovered over the particular point. In densely packed data sets, this might not be the best way to view individual points.  
  • Bounding Box Overlap: We use bounding boxes to indicate the presence of clusters. Our cluster algorithms currently clearly state that a data point can belong to just one cluster. Figure 10 shows the cluster algorithm being run on a dataset showing grades of a particular CMSC course. The column being examined is the overall grade percentages. The impression being given here is that some points belong to 2 clusters. This is not true and is caused by the fact that the padding for each of the bounding boxes happen to overlap data points that do not belong to the particular cluster. This could often be a problem if we are examining densely packed column data. However, we are limited by pixel size within the physical space 

Bounding Box Problem - Click for bigger picture

Figure 10: Example of a bounding box problem

 
  • Currently there exists lack of common visualization techniques such zooming, panning, selection and filtering. However, these features are being targeted as immediate future work. Inclusion of these capabilities will greatly enhance the functionality of Tunable Viewtips.

 

Contributions

Our tool makes two contributions. First, we are expanding and improving Spotfire's 'view tips' visualization feature by incorporating dynamically tunable algorithms. Second, we are exploring the profiling of true one dimensional data.

What new visualization features does our code add? Initially, we intended to write a tool that emulated Spotfires's 'view tips' tool, and displayed interesting two-dimensional plots based on different algorithms. However, much work has been done on two dimensional statistical analysis. Instead, we focus on one dimension at a time. This has several advantages: the algorithms are much faster (two dimensional comparisons require pair-wise enumeration over all columns, which grows at about (n2)/2), and there has been comparatively little work done in this area. However, because we're tried to separate the display from the data, the core functionality for two (or more) dimensional display already exists.

We are not aware of a tool that performs "jittering" in one dimension. This is especially effective because it solves the problem of occlusion without altering the sanctity of data points along their axis. The use of the second dimension is only a trick in "pixel space"; the data points still line up with their correct location along the Y axis.

We position our tool as a profiling tool used to glean basic statistical information from a new dataset. Tunable Viewtips is a new way of visualizing a dataset. However, we are also visualizing new aspects of a dataset, namely the individual columns.

Imagine that we have many results from time in [0..N]. We would like to know which of these results show interesting properties. We are not really interested in the relationship between the results at different timesteps; we just want to know which of the vectors of results show interesting statistical properties (clusters, gaps, outliers, etc.). Our tool can profile the dataset for such information and show a ranked list of which columns could be examined.

Possible Application Areas

There are a number of areas where one-dimensional data is very useful. Some of these include fluid dynamics [12], image analysis and enhancement [13], information retrieval [14], and motion in 1D such as uniform and non-uniform acceleration or retardation  [15]. We expand briefly on a couple of these application areas.

 

Figure 11: Artificially triggered powder snow avalanche in the avalanche dynamics test region of Vallée de la Sionne, Switzerland, 
Picture Courtesy: Swiss Federal Institute for Snow and Avalanche Research Davos [12].

Both dense and powder snow can produce avalanches. The fluid dynamic calculations involved in simulating such activity involve the calculation of one dimensional flow. These flow calculations help predict the motion (velocities, dynamic pressures) of  avalanches and visualization of such data is a key factor in the analysis of such simulations. [See Figure 11]

 

Another area where 1D data is used often is in image analysis and enhancement. The images are examined as a matrix and each column of the matrix, corresponding to a single column of pixels are analyzed individually to spot effects like edges and repetitiveness. These columns are grouped as histograms and are run through various mathematical functions. One way to enhance the visualization of the histogram of images after the application of a edge-detector operator is by using the logarithm of the histogram. Figures 12 (a) and (b) show the application of such a technique.

 

Figure12 (a) and (b): Application of an edge-detector operator to enhance the image.


In addition, we think our too can be used to explore individual columns in datasets that have traditionally been part of multi-dimensional exploration.  Colleagues have recommended that we could examine data showing characteristics of Amino Acids (possibly where reduction to 1D has been performed) and traditional temporal data where the behavior of data could be examined outside the consideration of time.

 

Future Work

There is much room for improvement in our tool. Currently, version one supports the visualization and analysis of 1D data alone. We would like to reach the proposed goal of having a tool that will support multi-dimensional data. Using another open source graphing tool [17], we have begun initial work on adding 2D support to our existing tool [See Figures13 (a) and (b)]. We have successfully implemented the Pearson Correlation metric to rank pairs of columns in a dataset. This already replicates the functionality of Spotfire's View Tip. Once we've coupled this with the Tunable Viewtips 1D features we have described, we have greatly enhanced the View Tip mechanism.

There are two general directions that future work can take. First, this work could be integrated into an existing visualizaton tool, such as Spotfire or Stardom [16]. This approach makes sense, as any interesting visualization mined by our standalone tool would need to be imported into a more mature tool anyway for futher analysis. Second, we can add more features to our current tool.

Regardless of which direction future development takes, we are scoping out some other improvements. First and foremost, we want to test out new algorithms. We have algorithms for similarity and gaps in two dimensions that we'd like to run once we find/write a decent 2D display tool. The cluster box mechanism in 1D will correctly draw the boxes regardless of how they're computed. We'd like to add a non-greedy algorithm that find the maximum cluster size in each dimension. Second, we want the ability to zoom in, especially for densely packed data sets. We'd like to zoom into a part of a 1D or 2D plot and run algorithms on
that subset of our data set. Next, we want a dynamic filtration and selection mechanism where the user can specify ranges with the mouse and filter the data. This would be most useful in 2D where any limit to the ranges, gaps and clusters will help narrow down the search space for the algorithms. Finally, we need to fix some glaring inefficiencies in the intermediate data storage format by
eliminating it. The display widget should not store any data, and any data that it requires it should read out of the dataset.

 
Figure 13 (a): Plotting the stock prices of Intel Vs. Cisco since Jan '99 Figure 13 (b): 2D Plot of UACC383 Vs. KA in the Melanoma dataset

 

Acknowledgements

Our sincere gratitude to Larry Leonard [17] of Definitive Solutions, Inc for allowing us to use his Microsoft VC++ based 2D Graphing Class. As novices in this development platform, it provided a great starting point to develop what we believe is a useful tool. We would like to thank Narendar Shankar for his assistance in the GUI development, Dave Hovemeyer for his help in porting our Unix code to the Windows platform, Brian Postow for his suggestions to improve our Jitter feature, Rezarta Islamaj and Omer Horvitz for sitting through multiple demonstrations, and Jinwook Seo and Bongshin Lee for inspiring us to use the Melanoma dataset . We also greatly appreciate Dr. Ben Shneiderman and Dr. Catherine Plaisant's guidance through the various stages of our project.

 

References

  1. Spotfire, "Help on the Viewtip Feature", pp. 131 - 134, Spotfire Manual,  www.spotfire.com
  2. Geisler, G., "Making Information More Accessible: A Suvery of Information Visualization Applications and Techniques", http://www.ils.unc.edu/~geisg/info/infovis/paper.html
  3. CMU's statLib repository, http://www.stat.cmu.edu/datasets
  4. SeeSoft, Software visualization tool, Lucent Technologies, Visual Insights, http://www.visualinsights.com/
  5. Olive: Multidimensional Data - 1D "http://otal.umd.edu/Olive/1D.html", University of Maryland
  6. Fortner, B., "The Data Handbook: A Guide to Understand the Organization and Visualization of Technical Data", pp. 91-102, Spyglass, 1992.
  7. XGraph: Animated, Easy Client for 1D Line Plots, http://www.cactuscode.org/VizTools/xgraph.html
  8. XPloRe, Teachware quantlet "tw1d" , http://www.quantlet.de/scripts/xlg/html/xlghtmlnode22.html
  9. Stockburger, D.W., "Introductory Statistics, Concepts, Models and Applications", http://www.psychstat.smsu.edu/introbook/sbk00.htm, Southwest Missouri State University
  10. Neter, et al., "Applied Linear Statistical Models", IRWIN, 4th Edition, 1996
  11. Shneiderman, B., "Dynamic Queries for Visual Information Seeking", IEEE Software, 11(6), 70-77
  12. AVAL-1D, Numerical Calculation of One Dimensional Flow in Avalanches, http://www.slf.ch/aval-1d/welcome-en.html
  13. Use of 1D Data in Image Analysis and Enhancement, http://www.khoral.com/contrib/contrib/dip2001/html-dip/c4/s6/node3.html
  14. Jonsson, H.A. et al., "Retrieval of One Dimensional Data", Proceeding of the 3rd Basque International Workshop on Information Technology '97.
  15. Pausch, R. et al., "One Dimensional Motion Tailoring for the Disabled: A User Study", pp. 405-411, ACM CHI '92.
  16. Cailleteau, L., "Interfaces for Visualizing Multi-valued Attributes: Design and Implementation using Starfield Displays", ftp://ftp.cs.umd.edu/pub/hcil/Reports-Abstracts-Bibliography/99-20html/99.20.html, University of Maryland
  17. Larry Leonard, "2D Graphing Class", http://www.codeguru.com/controls/SimpleGraphControl.html
  18. Paul Barvinko, "2D Visualization Class" http://www.codeguru.com/controls/graph2d.shtml
  19. Tufte, E., "The Visual Display of Quantitative Information. Graphics Press, Chelshire, CT, 1983.