Skip to main content


CateRank - Interactive Exploration of Multivariate Categorical Data

overview overview

OVERVIEW

Analyzing multivariate data sets is not an easy task: the user has to analyze individual variables as well as the relationships between them. Lower-level projection techniques may assist the user in finding interesting combinations; however, few tools support systematic exploration of those combinations. One way to deal with the "curse of dimensionality" is to rank all such relationships according to some measure of interestingness. At the same time, clustering and ranking algorithms are well researched for continuous data, while categorical data analysis has not received equal attention. This paper explores the ways to analyze categorical data sets and visualize and rank the relationships between categorical variables. CateRank uses histograms (bar charts) to visualize one-dimensional variable distribution and reorderable matrix to visualize the relationship between two categorical variables. The tool proposes several metrics based on the matrix properties that describe the nature of the relationship between the two categorical variables and allow comparing relationships within the data set.

PARTICIPANTS

PUBLICATIONS

  • Filippova, D. Interactive exploration of multivariate categorical data with CateRank. 2008. (technical report for Fall 08 independent study) [doc]
  • Filippova, D., Shneiderman, B. Interactive Exploration of Multivariate Categorical Data: Exploiting Ranking Criteria to Reveal Patterns and Outliers. 2009. [pdf]

PRESENTATIONS

  • CateRank presentation for HCIL [ppt]

TRY IT

Install CateRank 0.5
Get data:
  • Drug Use and Health Survey 2008 12-17 y.o. [csv]
    Original source: SAMHDA
  • Fatal crashes in Maryland in 2008 [csv]
    Original source: FARS