Tunable Viewtips Icon - Click to Download Demo! Tunable Viewtips: 
User Controlled Specification of Interesting Data Patterns

 

 

Kartik Parija and Jaime Spacco

{kartik, jspacco@cs.umd.edu}

 


CMSC 838B - Information Visualization
University of Maryland, College Park, MD

 

¨ Tunable Viewtips 1.1 Released!
Click here to download it with sample data files.
                       [May 20th, 2000]

¨ Click here to see earlier version of this paper.
                       [May 16th, 2000]

¨ Click here to download version 1.0 and sample data files
                       [May 16th, 2000] 

[Please Note: Larger and clearer versions of every picture can be viewed by clicking on it ]

Abstract

There are many tools that perform sophisticated visualization and analysis of data. However, most of these tools require some familiarity with the dataset in order to identify interesting characteristics. We propose Tunable Viewtips, a standalone tool that performs a "first pass" on an unfamiliar data set. It provides rapid dynamic profiling of multidimensional data utilizing common statistical methods. In our first version, we concentrate on single dimensional data which is often ignored or misrepresented. We present Tunable Viewtips 1D, which focuses on the visualization and analysis of one dimensional data in its true form.

Introduction and Motivation

Spotfire (www.spotfire.com), a commercial information visualization package, provides a "view tip" tool [1] which returns an ordered ranking of potentially interesting correlations. The user can preview the scatter plots of pairs of axes in a small window, or look at histograms of various axes [See Figure 1 (a) and (b)]. This is an excellent beginning with much potential! But in it's current form, Spotfire's "view tip" tool is insufficient. It is completely static which results in the fact that the user cannot tune parameters of the ranking algorithm (Pearson's Product Moment Correlation metric), nor can the user select other algorithms to rank results.  Currently there is no mechanism for a third-party developer to add new algorithms. Furthermore, little screen space is devoted to displaying these "view tips" and it is difficult to distinguish values. We believe that users would benefit from user controlled specification of what they consider interesting patterns.

 
Figure 1 (a): Spotfire View Tip [1] showing ranking of columns containing various statistics relevant to the game of baseball (data included as demonstration example in Spotfire 5.1) Figure 1 (b): Spotfire View Tip [1] (Histogram View) showing ranking of columns containing data on genes used in a Melanoma detection experiment [19]


The first issue we address is the limited functionality of the view tip feature. We remedy this by including new algorithms, along with a plug-in architecture for adding additional algorithms; by providing dynamic sliders that tune these algorithms; and by providing more feedback, such as displaying the mean and standard deviations.

We then focus on visualizing one dimensional data. To our knowledge, this area has been ignored or (mis)-represented by using line or bar graphs [6]. We want to profile individual columns of the dataset to find any interesting relationships or patterns that lie within them. 

Finally, we address the problem of occlusion for dense datasets. We've implemented jitter in one-dimension that uses the entire screen space allotted for the plot. Since only the Y coordinate of the point encodes it's value, jittering purely in the X direction does not significantly compromise the visualization. A 'reflection' technique is used to show as many data values in densely packed data.

 

Previous and Related work

    Past research has primarily focused on visualizing textual data in the case of 1D. Examples of these include program listings, documents with many lines, and document search results. Gary Geisler [2] and the On-line Library of Information Visualization Environments (OLIVE) [5] provide an excellent overview of such research.

However, there are very few visualizations of individual columns of numerical data. We found two software packages that claim to visualize and analyze 1D data.

 

Figure 2: xPloRe: Teaching Quantlet tw1d, to visualize 1D data. [8]

 

xgraph can be used to view 1D data files with the format
	"Time=0.0	"	"Time=1.0"
	0.0  0.0		0.0  0.0
	0.2  0.04		0.2 -0.04
	0.4  0.16		0.4 -0.16
	0.6  0.36		0.6 -0.36
	0.8  0.64		0.8 -0.64
	1.0  1.0		1.0 -1.0
Figure 3: Xgraph [7]


Implementation

 
Demonstration Tunable Viewtips Icon - Click to Download Demo! Click icon to download a demo and sample data files

    In this section we present application of our tool to various kinds of datasets. As far as possible, we have used 'real-world' data, which allows us to interpret the results in some meaningful manner. 

Error Detection: The first example considers a dataset with just one column of data, namely the closing price of the the Dow Jones between years 1900 and 1901. This is part of a very large data set obtained from CMU's Statistics repository [3]. By simply plugging this dataset into the tool, it was immediately obvious that there were some errors in the dataset, as there were occurrences of negative closing prices which are naturally absurd. [See Figure 6 below]

Figure 6: Dow Jones Closing Price, 1900 - 1901 [3]

Advantage of the Jitter Feature: The above example uses the Jitter feature to show as much of the dataset as possible.  This is extremely useful for visualizing datasets which have many repeat values or are very densely packed within certain ranges. We present an example to show the difference when Jitter is turned on and off when viewing a dataset consisting of 5 columns of data each containing the price of a particular stock recorded over the past 29 months. They are Intel (INTC), Cisco (CSCO), Microsoft (MSFT), Human Genome Sciences (HGSI) and General Electric (GE). The figure shows the column containing the Microsoft data [Data Courtesy: MSN Moneycentral, www.moneycentral.com]. This output also shows that the Outlier detection algorithm has been performed. [See Figure 7 (a) and (b)]

 

Figure 7 (a) and (b): Microsoft Stock Price over the last 29 months. With and Without Jitter

Simple Decision making: The figure below shows the grading sheet of CMSC 434 offered in Spring of 2001. The columns are grades assigned in various homework assignments and projects, in addition to a column showing total number of points and overall percentage grade. The current column being viewed shows the percentage grade. By showing the mean, standard deviation, the tool-tip feature and the standard deviation slider bar, we are able to able to get a fairly accurate view of the "Letter Grade Spread". For instance, with a grading scale in place (A ~ 89 and above, B ~ 79 and above) and taking into account class performance and average, we could assign 19 A's, 2 C's and award the rest of the class B's. [See Figure 8 (a) and (b)]

 
Figure 8 (a): Overall Percentage Grade of CMSC 434, Spring 2001
Using the tool-tip to identify the number of students (19) who received A's (89 and above)
Figure 8 (a): Overall Percentage Grade of CMSC 434, Spring 2001
Using the tool-tip to identify the number of students (2) who received C's
(78 and below)

 

Finding Outliers: As most data follows patterns and relationships, it is always interesting to find outliers that deviate away from such patters and relations. We ran both of our Outlier detection algorithms on the Cereal data set obtained from CMU Statistics Library [3]. This data set describes seventy seven breakfast cereals by containing information described on the mandated FDA nutritional facts label. For example under the column [potass], there are 77 entries with the amount of potassium contained in each cereal. Figure 9 (a) and (b) show an important example where the results of the two algorithms vary.

 
Figure 9 (a): The Column [Sodium ] and [Carbo] shows the most number of outliers (15) calculated with 1 standard deviation. Figure 9 (b): However, when the algorithm "Outlyingness" is run with the same standard deviation, it shows that the column [Calories] has outliers that are further away.

 

Finding Clusters: It is often useful to find clusters in data. Either of our cluster algorithms can be run to identify clusters in 1D data. Figure 10 (a) shows the results of the first cluster finder algorithm being run on the same dataset containing the performance of 5 stocks, used in Figure 7. Figure 10 (b) shows the results of the second cluster finder being run on the Melanoma dataset described in Figure 1(b).

 
Figure  (a): The first cluster finder algorithm found 7 clusters in the data showing the stock price performance of Intel [INTC] over the last 29 months. Figure 10 (b): The second cluster finder algorithm found 4 clusters in Gene M93-007 in the Melanoma dataset [19] described in Figure 1(b)


Weaknesses 

As with many visualizations, our tool is not without problematic areas. Here we list some that we have identified:

 
  • There exists a tool-tip feature that allows the value of the data point to be shown when the mouse is hovered over the particular point. In densely packed data sets, this might not be the best way to view individual points.  
  • Currently there exists lack of common visualization techniques such zooming, panning, selection and filtering. However, these features are being targeted as immediate future work. Inclusion of these capabilities will greatly enhance the functionality of Tunable Viewtips.

 

Contributions

Our tool makes two contributions. First, we are expanding and improving Spotfire's 'view tips' visualization feature by incorporating dynamically tunable algorithms. Second, we are exploring the profiling of one dimensional data.

What new visualization features does our code add? Initially, we intended to write a tool that emulated Spotfires's 'view tips' tool, and displayed interesting two-dimensional plots based on different algorithms. However, much work has been done on two dimensional statistical analysis. Instead, we focus on one dimension at a time. This has several advantages: the algorithms are much faster (two dimensional comparisons require pair-wise enumeration over all columns, which grows at about (n2)/2), and there has been comparatively little work done in this area. However, because we're tried to separate the display from the data, the core functionality for two (or more) dimensional display already exists.

We are not aware of a tool that performs "jittering" in one dimension. This is especially effective because it solves the problem of occlusion without altering the sanctity of data points along their axis. The use of the second dimension is only a trick in "pixel space"; the data points still line up with their correct location along the Y axis.

We position our tool as a profiling tool used to glean basic statistical information from a new dataset. Tunable Viewtips is a new way of visualizing a dataset. However, we are also visualizing new aspects of a dataset, namely the individual columns.

Imagine that we have many results measured over time. We would like to know which of these results show interesting properties. We are not really interested in the relationship between the results at different timesteps; we just want to know which of the vectors of results show interesting statistical properties (clusters, gaps, outliers, etc.). Our tool can profile the dataset for such information and show a ranked list of columns that could be examined. The user can dynamically tune the parameters of the algorithms and changes in the results due to these adjustments are instantaneously displayed.

Possible Application Areas

There are a number of areas where one-dimensional data is very useful. Some of these include fluid dynamics [12], image analysis and enhancement [13], information retrieval [14], and motion in 1D such as uniform and non-uniform acceleration or retardation  [15]. We expand briefly on a couple of these application areas.


An area where 1D data is used often is in image analysis and enhancement. The images are examined as a matrix and each column of the matrix, corresponding to a single column of pixels are analyzed individually to spot effects like edges and repetitiveness. These columns are grouped as histograms and are run through various mathematical functions. One way to enhance the visualization of the histogram of images after the application of a edge-detector operator is by using the logarithm of the histogram. Figures 11 (a) and (b) show the application of such a technique.

 

Figure11 (a) and (b): Application of an edge-detector operator to enhance the image.

 

Fluid Dynamics is field where computationally intensive algorithms are used to model complicated flows. While such modelling exercises usually concentrate on 2D and 3D flows, there are are instances of important problems where 1D flow needs to represented and visualized. One such flow occurs in avalanches where the total time involved is very small. 

Both dense and powder snow can produce avalanches. The fluid dynamic calculations involved in simulating such activity involve the calculation of one dimensional flow. These flow calculations help predict the motion (velocities, dynamic pressures) of  avalanches and visualization of such data is a key factor in the analysis of such simulations. [See Figure 12]

 

Figure 12: Artificially triggered powder snow avalanche in the avalanche dynamics test region of Vallée de la Sionne, Switzerland, 
Picture Courtesy: Swiss Federal Institute for Snow and Avalanche Research Davos [12].

 

In addition, we think our tool can be used to explore individual columns in datasets that have traditionally been part of multi-dimensional exploration.  Colleagues have recommended that we could examine data showing characteristics of Amino Acids (possibly where reduction to 1D has been performed) and traditional temporal data where the behavior of data could be examined outside the consideration of time.

 

Future Work

There is much room for improvement in our tool. Currently, version one supports the visualization and analysis of 1D data alone. We would like to reach the proposed goal of having a tool that will support multi-dimensional data. Using another open source graphing tool [17], we have begun initial work on adding 2D support to our existing tool [See Figures13 (a) and (b)]. We have successfully implemented the Pearson Correlation metric to rank pairs of columns in a dataset. This already replicates the functionality of Spotfire's View Tip. Once we've coupled this with the Tunable Viewtips 1D features we have described, we have greatly enhanced the View Tip mechanism.

There are two general directions that future work can take. First, this work could be integrated into an existing visualizaton tool, such as Spotfire or Stardom [16]. This approach makes sense, as any interesting visualization mined by our standalone tool would need to be imported into a more mature tool anyway for futher analysis. Second, we can add more features to our current tool.

Regardless of which direction future development takes, we are scoping out some other improvements. First and foremost, we want to test out new algorithms. We have algorithms for similarity and gaps in two dimensions that we'd like to run once we find/write a decent 2D display tool. The cluster box mechanism in 1D will correctly draw the boxes regardless of how they're computed. We'd like to add a non-greedy algorithm that find the maximum cluster size in each dimension. Second, we want the ability to zoom in, especially for densely packed data sets. We'd like to zoom into a part of a 1D or 2D plot and run algorithms on
that subset of our data set. Next, we want a dynamic filtration and selection mechanism where the user can specify ranges with the mouse and filter the data. This would be most useful in 2D where any limit to the ranges, gaps and clusters will help narrow down the search space for the algorithms. Finally, we need to fix some glaring inefficiencies in the intermediate data storage format by
eliminating it. The display widget should not store any data, and any data that it requires it should read out of the dataset.

 
Figure 13 (a): Plotting the stock prices of Intel Vs. Cisco since Jan '99 Figure 13 (b): 2D Plot of UACC383 Vs. KA in the Melanoma dataset

 

Acknowledgements

Our sincere gratitude to Larry Leonard [17] of Definitive Solutions, Inc for allowing us to use his Microsoft VC++ based 2D Graphing Class. As novices in this development platform, it provided a great starting point to develop what we believe is a useful tool. We would like to thank Narendar Shankar for his assistance in the GUI development, Dave Hovemeyer for his help in porting our Unix code to the Windows platform, Brian Postow for his suggestions to improve our Jitter feature, Rezarta Islamaj and Omer Horvitz for sitting through multiple demonstrations, and Jinwook Seo and Bongshin Lee for inspiring us to use the Melanoma dataset . We also greatly appreciate Dr. Ben Shneiderman and Dr. Catherine Plaisant's guidance through the various stages of our project.

 

References

  1. Spotfire, "Help on the Viewtip Feature", pp. 131 - 134, Spotfire Manual,  www.spotfire.com
    [Accessed Feb 28th 2001 onwards]
  2. Geisler, G., "Making Information More Accessible: A Suvery of Information Visualization Applications and Techniques", http://www.ils.unc.edu/~geisg/info/infovis/paper.html
    [Accessed April 2nd 2001]
  3. CMU's statLib repository, http://www.stat.cmu.edu/datasets
    [Accessed April 15th 2001]
  4. SeeSoft, Software visualization tool, Lucent Technologies, Visual Insights, http://www.visualinsights.com/
    [Accessed April 2nd 2001]
  5. Olive: Multidimensional Data - 1D "http://otal.umd.edu/Olive/1D.html", University of Maryland
    [Accessed March 1st 2001 onwards]
  6. Fortner, B., "The Data Handbook: A Guide to Understand the Organization and Visualization of Technical Data", pp. 91-102, Spyglass, 1992.
  7. XGraph: Animated, Easy Client for 1D Line Plots, http://www.cactuscode.org/VizTools/xgraph.html
    [Accessed April 2nd 2001]
  8. XPloRe, Teachware quantlet "tw1d" , http://www.quantlet.de/scripts/xlg/html/xlghtmlnode22.html
    [Accessed April 2nd 2001]
  9. Stockburger, D.W., "Introductory Statistics, Concepts, Models and Applications", http://www.psychstat.smsu.edu/introbook/sbk00.htm, Southwest Missouri State University
    [Accessed March 1st 2001]
  10. Neter, et al., "Applied Linear Statistical Models", IRWIN, 4th Edition, 1996
  11. Shneiderman, B., "Dynamic Queries for Visual Information Seeking", IEEE Software, 11(6), 70-77
  12. AVAL-1D, Numerical Calculation of One Dimensional Flow in Avalanches, http://www.slf.ch/aval-1d/welcome-en.html
    [Accessed May 4th 2001]
  13. Use of 1D Data in Image Analysis and Enhancement, http://www.khoral.com/contrib/contrib/dip2001/html-dip/c4/s6/node3.html
    [Accessed May 4th 2001]
  14. Jonsson, H.A. et al., "Retrieval of One Dimensional Data", Proceeding of the 3rd Basque International Workshop on Information Technology '97.
  15. Pausch, R. et al., "One Dimensional Motion Tailoring for the Disabled: A User Study", pp. 405-411, ACM CHI '92.
  16. Cailleteau, L., "Interfaces for Visualizing Multi-valued Attributes: Design and Implementation using Starfield Displays", ftp://ftp.cs.umd.edu/pub/hcil/Reports-Abstracts-Bibliography/99-20html/99.20.html, University of Maryland
    [Accessed March 1st 2001]
  17. Larry Leonard, "2D Graphing Class", http://www.codeguru.com/controls/SimpleGraphControl.html
    [Accessed April 2nd 2001 onwards]
  18. Paul Barvinko, "2D Visualization Class" http://www.codeguru.com/controls/graph2d.shtml
    [Accessed April 19th 2001 onwards]
  19. M.Bitter, P.Meltzer, Y.Chen, et al, "Modecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling, Nature, vol. 406, pp. 536-40, 2000,  http://www.nhgri.nih.gov/DIR/Microarray/selected_publications.html 
    [Accessed May 14th 2001 onwards]
  20. Tufte, E., "The Visual Display of Quantitative Information. Graphics Press, Chelshire, CT, 1983.