‹header›
‹date/time›
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
‹footer›
‹#›
Live Demonstration
The goal we want to achieve with the new idea is to find interesting features in multidimensional data.
Finding features like correlations, clusters, outliers, gaps is difficult in multidimensional data,
because of the cognitive difficulties in understanding more than 3 dimensions.
Therefore we need to utilize low-dimensional projections since the human visual system is very effective in 1D and 2D.
So, the rank-by-feature framework use 1D and 2D projections to guide discovery process.
The goal we want to achieve with the new idea is to find interesting features in multidimensional data.
Finding features like correlations, clusters, outliers, gaps is difficult in multidimensional data,
because of the cognitive difficulties in understanding more than 3 dimensions.
Therefore we need to utilize low-dimensional projections since the human visual system is very effective in 1D and 2D.
So, the rank-by-feature framework use 1D and 2D projections to guide discovery process.
Here is a small multidimensional data, the periodic table, shown in an Excel spreadsheet.
Can you see any clusters of atoms?
Can you know which columns are correlated?
Can you see outliers and gaps?
Of course it’s not easy to find such features in a spreadsheet like this.
This is a 2D projection of the same data set, do you see any interesting feature?
<<pause>>
Yes, we can easily see a strong positive correlation.  What else can you see?
<<pause>>
Yes, two strong outliers,  He and Rn (Radon /reidon/).
As you see, 2D scatterplots are very efficient to reveal interesting features hidden in the multidimensional data. The idea of the rank-by-feature framework is to provide an automated tool to guide discovery using 1D and 2D projections.
Let me show you a demo of the rank-by-feature framework with a small data set.
<<demo>>
(Table View) This is the raw data.  Each row is a breakfast cereal and each column is a nutrition component.
(Scatterplot Ordering)  This is our rank-by-feature framework interface for 2D scatterplots. 
( move from top to bottom at the list view ) There are 36 possible 2D scatterplots for this data sets.
Each cell of this view represents a 2D scatterplot. For example, this cell is for protein and fat.
These three views are coordinated.  So you can easily change and see projections.
But even small data sets like this data is not easy to identify interesting projections.
So what is our solution?  Here is the key.  Users can select a feature from this combo box. 
(combo box) Lets select correlation coefficient here.
(prism) This view, we call it feature prism, shows the overview of the score distribution, in this case correlation coefficient.
If a cell is bright, the corresponding projection gets a high score.
We can easily find that this cell the most bright. We can know that the amount of potassium and dietary fiber are highly correlated.
We can also easily identify scatterplots that show negative correlations. Carbohydrate and dietary fiber.
(browser) In this view called scatterplot browser, you can easily change the variable for each axis by just dragging this slider.
Let me get back to the slides.
This diagram shows the three components of the rank-by-feature framework interface for 1D histograms.
Users choose a ranking criterion, and they can see the overall distribution of color-coded scores in the rank-by-feature prism.
Numerical details are shown in the ordered score list, and
manual projection browser enables users to examine all histograms by dragging this itemized slider to change the dimension in focus.
These three components are coordinated with each other according to the change of the variable in focus.
Here are the three components of the rank-by-feature framework interface for 2D scatterplots.
Rank-by-feature prism is now a 2D grid, where each cell represents a 2D scatterplot.
There are two item sliders for x and y axis in the manual projection browser.
Again, these three components are coordinated with each other according to the change of the variable in focus.
<< county data set>>
Show ranking by uniformness for 1D
Show ranking by correlation for 2D

Web Accessibility