Tunable Viewtips 1D - K.Parija & J.Spacco

Tunable Viewtips:
User Controlled Specification of Interesting Data Patterns

Kartik Parija and Jaime Spacco

{kartik, jspacco@cs.umd.edu}

CMSC 838B - Information Visualization
University of Maryland, College Park, MD

¨ Tunable Viewtips 1.1 Released!
Click here to download it with sample data files.
[May 20th, 2000]

¨ Click here to see earlier version of this paper.
[May 16th, 2000]

¨ Click here to download version 1.0 and sample data files
[May 16th, 2000]

[Please Note: Larger and clearer versions of every picture can be viewed by clicking on it ]

Abstract

There are many tools that perform sophisticated visualization and analysis of data. However, most of these tools require some familiarity with the dataset in order to identify interesting characteristics. We propose Tunable Viewtips, a standalone tool that performs a "first pass" on an unfamiliar data set. It provides rapid dynamic profiling of multidimensional data utilizing common statistical methods. In our first version, we concentrate on single dimensional data which is often ignored or misrepresented. We present Tunable Viewtips 1D, which focuses on the visualization and analysis of one dimensional data in its true form.

Introduction and Motivation

Spotfire (www.spotfire.com), a commercial information visualization package, provides a "view tip" tool [1] which returns an ordered ranking of potentially interesting correlations. The user can preview the scatter plots of pairs of axes in a small window, or look at histograms of various axes [See Figure 1 (a) and (b)]. This is an excellent beginning with much potential! But in it's current form, Spotfire's "view tip" tool is insufficient. It is completely static which results in the fact that the user cannot tune parameters of the ranking algorithm (Pearson's Product Moment Correlation metric), nor can the user select other algorithms to rank results. Currently there is no mechanism for a third-party developer to add new algorithms. Furthermore, little screen space is devoted to displaying these "view tips" and it is difficult to distinguish values. We believe that users would benefit from user controlled specification of what they consider interesting patterns.

Figure 1 (a): Spotfire View Tip [1] showing ranking of columns containing various statistics relevant to the game of baseball (data included as demonstration example in Spotfire 5.1) Figure 1 (b): Spotfire View Tip [1] (Histogram View) showing ranking of columns containing data on genes used in a Melanoma detection experiment [19]

The first issue we address is the limited functionality of the view tip feature. We remedy this by including new algorithms, along with a plug-in architecture for adding additional algorithms; by providing dynamic sliders that tune these algorithms; and by providing more feedback, such as displaying the mean and standard deviations.

We then focus on visualizing one dimensional data. To our knowledge, this area has been ignored or (mis)-represented by using line or bar graphs [6]. We want to profile individual columns of the dataset to find any interesting relationships or patterns that lie within them.

Finally, we address the problem of occlusion for dense datasets. We've implemented jitter in one-dimension that uses the entire screen space allotted for the plot. Since only the Y coordinate of the point encodes it's value, jittering purely in the X direction does not significantly compromise the visualization. A 'reflection' technique is used to show as many data values in densely packed data.

Previous and Related work

Past research has primarily focused on visualizing textual data in the case of 1D. Examples of these include program listings, documents with many lines, and document search results. Gary Geisler [2] and the On-line Library of Information Visualization Environments (OLIVE) [5] provide an excellent overview of such research.

However, there are very few visualizations of individual columns of numerical data. We found two software packages that claim to visualize and analyze 1D data.

xPloRe: This package contains a "teachware quantlet" called tw1d [8] to visualize 1D data. While containing useful statistical features [See Figure 2], the tool uses bar graphs to display 1D data. Actually this is a 2D visualization technique because the x-axis displays the order of the data which in most cases is time.


Figure 2: xPloRe: Teaching Quantlet tw1d, to visualize 1D data. [8]

xgraph: This tool is billed as a "freely available, lightweight and easy to use visualization client for viewing 1D data files" [7]. Figure 3 shows a screenshot of xgraph. It claims the use of a line plot to show 1D data, which is actually a technique more appropriate to visualize 2D data.

	xgraph can be used to view 1D data files with the format`"Time=0.0 " "Time=1.0" 0.0 0.0 0.0 0.0 0.2 0.04 0.2 -0.04 0.4 0.16 0.4 -0.16 0.6 0.36 0.6 -0.36 0.8 0.64 0.8 -0.64 1.0 1.0 1.0 -1.0`
Figure 3: Xgraph [7]

Implementation

Architecture:
Our inspiration was the Java Swing model, which separates the underlying data storage from its display on the screen. This allows several graphical components or different views within a component to display the same data without shuffling the data. We've tried to emulate this approach as much as possible for efficiency and simplicity. We've stuck as closely as possible to this model, but in the end we employ a (redundant) intermediate layer of storage for graphical objects that is recreated every time we render a graph. [See Figure 4]

Figure 4: Dataset Architecture
We've tried to separate the graphical display from the data storage model. Furthermore, since we will run a variety of algorithms on the dataset, we have tried to minimize the overhead of calculating new sets of results. Thus, each set of results stores references into the Dataset to minimize the need for copying data that's already stored someplace. To make the Tunable Viewtips tool easily extensible, we have a simple and reasonably effective object-oriented framework that treats one-dimensional and two-dimensional results (and more-dimensional results if we chose to add that functionality) virtually identically, allowing the GUI to process and visualize results appropriately. [See Figure 5]

Figure 5: GUI Architecture

Having comparatively little experience writing GUI applications, we made a few rookie mistake which have made our software more complicated than we intended. It's more difficult than I had anticipated to display the data directly out of our underlying Dataset structure due to the lack of effective tools for displaying simple things like scatter plots and bar charts. Ultimately, the need for results outweighed any purist desires for a clean implementation. We made a few compromises. Data is stored into a redundant intermediate layer before being displayed by our much-modified version of a third-party tool [17]. This happens every time we render a new display, and is wildly inefficient, but saved lots of time.
Technical Features
File formats: We read tab-delimited text files into a simple internal data structure, and then apply our algorithms to this simple data structure.
Implemented Algorithms: We currently have implemented five statistical algorithms [9, 10] to examine single dimensional data. A pseudo-code representation of each algorithm is given below each description.
- Number of Outliers: This calculates the number of outliers in a single column of data based on whether a particular element is within the 'span' of a multiple [x] of the standard deviation. The factor [x] can be specified by the user. In most cases, a dataset is composed of multi-column data and this algorithm ranks columns of such datasets in descending order of number of outliers found. Changes made in the factor [x] are immediately reflected on the graphing window as are recalculations in the number of outliers found and associated ranking of columns.
  
  set N stdevs dynamically based on the standard deviation slider bar foreach column c foreach value v in c if ( | v - mean | > N stdevs ){ outliers++; } end foreach return outliers for that column end foreach
- "Outlyingness": Similar to the above algorithm, this enables the user to rank columns of data not by the total number of outliers found, but by how 'far' an outlier may lie. A column of data may not have the most number of outliers, but contain one outlier that is significantly further from the rest of the data points. A point based system (lamda) based on how far each point is from the standard deviation is used to rank such columns higher than others.
  
  set N stdevs dynamically based on the standard deviation slider bar foreach column foreach value v if ( |v - mean|> N stdevs ){ LAMBDA *= |v / mean|^2 } end foreach return LAMBDA for that column end foreach
- Uniformity: This algorithm measure how uniformly elements of a column of data are spread out between the minimum and maximum data values. Variance between successive points is measured to arrive at a uniformity metric. A column 'uniformly' distributed between its maximum and minimum value (i.e. the gap between successive points is equal for all points) receives a score equal to 1.00 and is ranked most uniform.
foreach column c deviation = (c.max - c.min) / c.size; foreach value v in c total_deviation += | v - previous(v) | - deviation end foreach return 1.0 - ( total_deviation / ( c.max - c.min ) ) end foreach
- Cluster Finder: Going one step further, we attempt to find clusters within the data. A very simple clustering algorithm measures if data points are within a certain percentage of the range (max - min) of the data once an initial value in the cluster is fixed. This percentage is user defined. Currently, every point can belong to only one cluster, but it is not hard to extend this so that points may belong to different clusters. Bounding boxes are used to indicate clusters. Again, columns of data are ranked by the number of clusters found.
tightness is a percentage of the range (max - min) set by the cluster slider bar foreach column c cluster_root = c.first range = ( c.max - c.min ) * tightness foreach value v in c if ( v <= cluster_root + tightness ) { add v to the cluster } else { start a new cluster cluster_root = v } end foreach end foreach
- Cluster Finder II: A slight variation of the above algorithm measures if successive points are within a certain percentage of the range (max - min) of the data. This percentage is again user defined and columns of data are ranked by the number of clusters found.
tightness is a percentage of the range (max - min) set by the slider bar foreach column c cluster 1 begins with the first element range = ( c.max - c.min ) * tightness foreach value v in c if ( v <= previous(v) + tightness ) { add v to the cluster } else { start a new cluster at v } end foreach end foreach

Easily extensible: New algorithms can easily be plugged into our underlying architecture, and can make use of the existing dynamic feedback features. The GUI need know nothing about how the results were computed. We have written the framework to compute two dimensional results and implemented the same Pearson's Product Moment Correlation metric that Spotfire uses. All we require is a two-dimensional scatterplot display tool. We have already begun work on this.
Visual Statistical Algorithmic Debugging: New statistical algorithms can be integrated at a later date, without the overhead of the Spotfire plug-in API or any related proprietary file formats. Furthermore, the results of these algorithms can be quickly visualized to determine whether they are along expected lines.
Dynamic Query Mechanisms: [11]
- Standard deviation slider: Allows dynamic manipulation of the number of standard deviations used to define an outlier for the two outlier algorithms. It would be very easy for any new outlier algorithms to use this feedback mechanism. We have chosen to limit the maximum number of standard deviations to 3 since statisticians consider an outlier of more than 3 standard deviations to be a "hard" outlier [10], and datasets rarely have outliers of more than 3 standard deviations.
- Jitter slider: Controls the amount the user wants to jitter the data. One-dimensional jitter is a surprisingly effective technique. Since the location of points along the Y axis encodes their value, jittering along the X axis reduces occlusion without sacrificing data. [See Figure 7 (a) and (b) ]. As we are bound by the physical space of a graphing window and it is not uncommon for one dimensional data to be densely packed, we use arrow-head markers "<" and ">" to indicate that that there are more points at that specific value which are not being shown. [See Figure 10 (b)]
- Cluster slider: Controls the "tightness" of the clustering algorithm(s). We define tightness as the percentage of the range (max data value - min data value) that determines the size of a cluster. The mechanism for calculating bounding boxes could easily be used by other clustering algorithms.
Color scheme: Color redundantly encodes the value. The minimum is always a very dark blue (close to black), and the maximum red. We interpolate by subtracting blue and adding green, until we hit pure green. Then we begin subtracting green and adding red. This yields a nice interpolated color encoding which clearly shows the minimum and maximum. Clusters can also be identified based on their color patterns, though it is important to note that the gradations of color are not always consistent. It is easier to use color to identify a cluster that is close to the mid-point of interpolation ( 0, 255, 0 ) than it is to spot one halfway between the mean and max ( about ( 125, 125, 0 ) since a light green is easier to distinguish than a color somewhere between red and green.

Demonstration

Click icon to download a demo and sample data files

In this section we present application of our tool to various kinds of datasets. As far as possible, we have used 'real-world' data, which allows us to interpret the results in some meaningful manner.

Error Detection: The first example considers a dataset with just one column of data, namely the closing price of the the Dow Jones between years 1900 and 1901. This is part of a very large data set obtained from CMU's Statistics repository [3]. By simply plugging this dataset into the tool, it was immediately obvious that there were some errors in the dataset, as there were occurrences of negative closing prices which are naturally absurd. [See Figure 6 below]

Figure 6: Dow Jones Closing Price, 1900 - 1901 [3]

Advantage of the Jitter Feature: The above example uses the Jitter feature to show as much of the dataset as possible. This is extremely useful for visualizing datasets which have many repeat values or are very densely packed within certain ranges. We present an example to show the difference when Jitter is turned on and off when viewing a dataset consisting of 5 columns of data each containing the price of a particular stock recorded over the past 29 months. They are Intel (INTC), Cisco (CSCO), Microsoft (MSFT), Human Genome Sciences (HGSI) and General Electric (GE). The figure shows the column containing the Microsoft data [Data Courtesy: MSN Moneycentral, www.moneycentral.com]. This output also shows that the Outlier detection algorithm has been performed. [See Figure 7 (a) and (b)]

Figure 7 (a) and (b): Microsoft Stock Price over the last 29 months. With and Without Jitter

Simple Decision making: The figure below shows the grading sheet of CMSC 434 offered in Spring of 2001. The columns are grades assigned in various homework assignments and projects, in addition to a column showing total number of points and overall percentage grade. The current column being viewed shows the percentage grade. By showing the mean, standard deviation, the tool-tip feature and the standard deviation slider bar, we are able to able to get a fairly accurate view of the "Letter Grade Spread". For instance, with a grading scale in place (A ~ 89 and above, B ~ 79 and above) and taking into account class performance and average, we could assign 19 A's, 2 C's and award the rest of the class B's. [See Figure 8 (a) and (b)]

Figure 8 (a): Overall Percentage Grade of CMSC 434, Spring 2001
Using the tool-tip to identify the number of students (19) who received A's (89 and above) Figure 8 (a): Overall Percentage Grade of CMSC 434, Spring 2001
Using the tool-tip to identify the number of students (2) who received C's
(78 and below)

Finding Outliers: As most data follows patterns and relationships, it is always interesting to find outliers that deviate away from such patters and relations. We ran both of our Outlier detection algorithms on the Cereal data set obtained from CMU Statistics Library [3]. This data set describes seventy seven breakfast cereals by containing information described on the mandated FDA nutritional facts label. For example under the column [potass], there are 77 entries with the amount of potassium contained in each cereal. Figure 9 (a) and (b) show an important example where the results of the two algorithms vary.

Figure 9 (a): The Column [Sodium ] and [Carbo] shows the most number of outliers (15) calculated with 1 standard deviation. Figure 9 (b): However, when the algorithm "Outlyingness" is run with the same standard deviation, it shows that the column [Calories] has outliers that are further away.

Finding Clusters: It is often useful to find clusters in data. Either of our cluster algorithms can be run to identify clusters in 1D data. Figure 10 (a) shows the results of the first cluster finder algorithm being run on the same dataset containing the performance of 5 stocks, used in Figure 7. Figure 10 (b) shows the results of the second cluster finder being run on the Melanoma dataset described in Figure 1(b).

Figure (a): The first cluster finder algorithm found 7 clusters in the data showing the stock price performance of Intel [INTC] over the last 29 months. Figure 10 (b): The second cluster finder algorithm found 4 clusters in Gene M93-007 in the Melanoma dataset [19] described in Figure 1(b)

Weaknesses

As with many visualizations, our tool is not without problematic areas. Here we list some that we have identified:

There exists a tool-tip feature that allows the value of the data point to be shown when the mouse is hovered over the particular point. In densely packed data sets, this might not be the best way to view individual points.

Currently there exists lack of common visualization techniques such zooming, panning, selection and filtering. However, these features are being targeted as immediate future work. Inclusion of these capabilities will greatly enhance the functionality of Tunable Viewtips.

Contributions

Our tool makes two contributions. First, we are expanding and improving Spotfire's 'view tips' visualization feature by incorporating dynamically tunable algorithms. Second, we are exploring the profiling of one dimensional data.

What new visualization features does our code add? Initially, we intended to write a tool that emulated Spotfires's 'view tips' tool, and displayed interesting two-dimensional plots based on different algorithms. However, much work has been done on two dimensional statistical analysis. Instead, we focus on one dimension at a time. This has several advantages: the algorithms are much faster (two dimensional comparisons require pair-wise enumeration over all columns, which grows at about (n²)/2), and there has been comparatively little work done in this area. However, because we're tried to separate the display from the data, the core functionality for two (or more) dimensional display already exists.

We are not aware of a tool that performs "jittering" in one dimension. This is especially effective because it solves the problem of occlusion without altering the sanctity of data points along their axis. The use of the second dimension is only a trick in "pixel space"; the data points still line up with their correct location along the Y axis.

We position our tool as a profiling tool used to glean basic statistical information from a new dataset. Tunable Viewtips is a new way of visualizing a dataset. However, we are also visualizing new aspects of a dataset, namely the individual columns.

Imagine that we have many results measured over time. We would like to know which of these results show interesting properties. We are not really interested in the relationship between the results at different timesteps; we just want to know which of the vectors of results show interesting statistical properties (clusters, gaps, outliers, etc.). Our tool can profile the dataset for such information and show a ranked list of columns that could be examined. The user can dynamically tune the parameters of the algorithms and changes in the results due to these adjustments are instantaneously displayed.

Possible Application Areas

There are a number of areas where one-dimensional data is very useful. Some of these include fluid dynamics [12], image analysis and enhancement [13], information retrieval [14], and motion in 1D such as uniform and non-uniform acceleration or retardation [15]. We expand briefly on a couple of these application areas.

An area where 1D data is used often is in image analysis and enhancement. The images are examined as a matrix and each column of the matrix, corresponding to a single column of pixels are analyzed individually to spot effects like edges and repetitiveness. These columns are grouped as histograms and are run through various mathematical functions. One way to enhance the visualization of the histogram of images after the application of a edge-detector operator is by using the logarithm of the histogram. Figures 11 (a) and (b) show the application of such a technique.

Figure11 (a) and (b): Application of an edge-detector operator to enhance the image.

Fluid Dynamics is field where computationally intensive algorithms are used to model complicated flows. While such modelling exercises usually concentrate on 2D and 3D flows, there are are instances of important problems where 1D flow needs to represented and visualized. One such flow occurs in avalanches where the total time involved is very small.

Both dense and powder snow can produce avalanches. The fluid dynamic calculations involved in simulating such activity involve the calculation of one dimensional flow. These flow calculations help predict the motion (velocities, dynamic pressures) of avalanches and visualization of such data is a key factor in the analysis of such simulations. [See Figure 12]

Figure 12: Artificially triggered powder snow avalanche in the avalanche dynamics test region of Vallée de la Sionne, Switzerland,
Picture Courtesy: Swiss Federal Institute for Snow and Avalanche Research Davos [12].

In addition, we think our tool can be used to explore individual columns in datasets that have traditionally been part of multi-dimensional exploration. Colleagues have recommended that we could examine data showing characteristics of Amino Acids (possibly where reduction to 1D has been performed) and traditional temporal data where the behavior of data could be examined outside the consideration of time.

Future Work

There is much room for improvement in our tool. Currently, version one supports the visualization and analysis of 1D data alone. We would like to reach the proposed goal of having a tool that will support multi-dimensional data. Using another open source graphing tool [17], we have begun initial work on adding 2D support to our existing tool [See Figures13 (a) and (b)]. We have successfully implemented the Pearson Correlation metric to rank pairs of columns in a dataset. This already replicates the functionality of Spotfire's View Tip. Once we've coupled this with the Tunable Viewtips 1D features we have described, we have greatly enhanced the View Tip mechanism.

There are two general directions that future work can take. First, this work could be integrated into an existing visualizaton tool, such as Spotfire or Stardom [16]. This approach makes sense, as any interesting visualization mined by our standalone tool would need to be imported into a more mature tool anyway for futher analysis. Second, we can add more features to our current tool.

Regardless of which direction future development takes, we are scoping out some other improvements. First and foremost, we want to test out new algorithms. We have algorithms for similarity and gaps in two dimensions that we'd like to run once we find/write a decent 2D display tool. The cluster box mechanism in 1D will correctly draw the boxes regardless of how they're computed. We'd like to add a non-greedy algorithm that find the maximum cluster size in each dimension. Second, we want the ability to zoom in, especially for densely packed data sets. We'd like to zoom into a part of a 1D or 2D plot and run algorithms on
that subset of our data set. Next, we want a dynamic filtration and selection mechanism where the user can specify ranges with the mouse and filter the data. This would be most useful in 2D where any limit to the ranges, gaps and clusters will help narrow down the search space for the algorithms. Finally, we need to fix some glaring inefficiencies in the intermediate data storage format by
eliminating it. The display widget should not store any data, and any data that it requires it should read out of the dataset.

Figure 13 (a): Plotting the stock prices of Intel Vs. Cisco since Jan '99 Figure 13 (b): 2D Plot of UACC383 Vs. KA in the Melanoma dataset

Acknowledgements

Our sincere gratitude to Larry Leonard [17] of Definitive Solutions, Inc for allowing us to use his Microsoft VC++ based 2D Graphing Class. As novices in this development platform, it provided a great starting point to develop what we believe is a useful tool. We would like to thank Narendar Shankar for his assistance in the GUI development, Dave Hovemeyer for his help in porting our Unix code to the Windows platform, Brian Postow for his suggestions to improve our Jitter feature, Rezarta Islamaj and Omer Horvitz for sitting through multiple demonstrations, and Jinwook Seo and Bongshin Lee for inspiring us to use the Melanoma dataset . We also greatly appreciate Dr. Ben Shneiderman and Dr. Catherine Plaisant's guidance through the various stages of our project.

References

Spotfire, "Help on the Viewtip Feature", pp. 131 - 134, Spotfire Manual, www.spotfire.com
[Accessed Feb 28th 2001 onwards]
Geisler, G., "Making Information More Accessible: A Suvery of Information Visualization Applications and Techniques", http://www.ils.unc.edu/~geisg/info/infovis/paper.html
[Accessed April 2nd 2001]
CMU's statLib repository, http://www.stat.cmu.edu/datasets
[Accessed April 15th 2001]
SeeSoft, Software visualization tool, Lucent Technologies, Visual Insights, http://www.visualinsights.com/
[Accessed April 2nd 2001]
Olive: Multidimensional Data - 1D "http://otal.umd.edu/Olive/1D.html", University of Maryland
[Accessed March 1st 2001 onwards]
Fortner, B., "The Data Handbook: A Guide to Understand the Organization and Visualization of Technical Data", pp. 91-102, Spyglass, 1992.
XGraph: Animated, Easy Client for 1D Line Plots, http://www.cactuscode.org/VizTools/xgraph.html
[Accessed April 2nd 2001]
XPloRe, Teachware quantlet "tw1d" , http://www.quantlet.de/scripts/xlg/html/xlghtmlnode22.html
[Accessed April 2nd 2001]
Stockburger, D.W., "Introductory Statistics, Concepts, Models and Applications", http://www.psychstat.smsu.edu/introbook/sbk00.htm, Southwest Missouri State University
[Accessed March 1st 2001]
Neter, et al., "Applied Linear Statistical Models", IRWIN, 4th Edition, 1996
Shneiderman, B., "Dynamic Queries for Visual Information Seeking", IEEE Software, 11(6), 70-77
AVAL-1D, Numerical Calculation of One Dimensional Flow in Avalanches, http://www.slf.ch/aval-1d/welcome-en.html
[Accessed May 4th 2001]
Use of 1D Data in Image Analysis and Enhancement, http://www.khoral.com/contrib/contrib/dip2001/html-dip/c4/s6/node3.html
[Accessed May 4th 2001]
Jonsson, H.A. et al., "Retrieval of One Dimensional Data", Proceeding of the 3rd Basque International Workshop on Information Technology '97.
Pausch, R. et al., "One Dimensional Motion Tailoring for the Disabled: A User Study", pp. 405-411, ACM CHI '92.
Cailleteau, L., "Interfaces for Visualizing Multi-valued Attributes: Design and Implementation using Starfield Displays", ftp://ftp.cs.umd.edu/pub/hcil/Reports-Abstracts-Bibliography/99-20html/99.20.html, University of Maryland
[Accessed March 1st 2001]
Larry Leonard, "2D Graphing Class", http://www.codeguru.com/controls/SimpleGraphControl.html
[Accessed April 2nd 2001 onwards]
Paul Barvinko, "2D Visualization Class" http://www.codeguru.com/controls/graph2d.shtml
[Accessed April 19th 2001 onwards]
M.Bitter, P.Meltzer, Y.Chen, et al, "Modecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling, Nature, vol. 406, pp. 536-40, 2000, http://www.nhgri.nih.gov/DIR/Microarray/selected_publications.html
[Accessed May 14th 2001 onwards]
Tufte, E., "The Visual Display of Quantitative Information. Graphics Press, Chelshire, CT, 1983.

Web Accessibility


Figure 1 (a): Spotfire View Tip [1] showing ranking of columns containing various statistics relevant to the game of baseball (data included as demonstration example in Spotfire 5.1)	Figure 1 (b): Spotfire View Tip [1] (Histogram View) showing ranking of columns containing data on genes used in a Melanoma detection experiment [19]


Figure 9 (a): The Column [Sodium ] and [Carbo] shows the most number of outliers (15) calculated with 1 standard deviation.	Figure 9 (b): However, when the algorithm "Outlyingness" is run with the same standard deviation, it shows that the column [Calories] has outliers that are further away.


Figure (a): The first cluster finder algorithm found 7 clusters in the data showing the stock price performance of Intel [INTC] over the last 29 months.	Figure 10 (b): The second cluster finder algorithm found 4 clusters in Gene M93-007 in the Melanoma dataset [19] described in Figure 1(b)


Figure 13 (a): Plotting the stock prices of Intel Vs. Cisco since Jan '99	Figure 13 (b): 2D Plot of UACC383 Vs. KA in the Melanoma dataset