Skip to main content

Carlea Holl-Jensen||cholljen@umd.edu


HCIL-2005-12

Seo, J., Shneiderman, B. (May 2005)
Using Categorical Information in Multidimensional Data Sets: Interactive Partition and Cluster Comparison
HCIL-2005-12, CS-TR-4752, UMIACS-TR-2005-55, ISR-TR-2005-102

Multidimensional data sets often include categorical information. When most columns have categorical information, clustering the data set by similarity of categorical values can reveal interesting patterns in the data set. However, when the data set includes only a small number (one or two) of categorical columns, the categorical information is probably more useful as a way to partition the data set. For example, researchers might be interested in gene expression data for healthy vs. diseased patients or stock performance for common, preferred, or convertible shares. For these cases, we present a novel way to utilize the categorical information together with clustering algorithms. Instead of incorporating categorical information into the clustering process, we can partition the data set according to categorical information. Clustering is then performed with each subset to generate two or more clustering results, each of which is homogeneous (i.e. only includes the same categorical value for the categorical column). By comparing the partitioned clustering results, users can get meaningful insights into the data set: users can identify an interesting group of items that are differentially/similarly expressed in two different homogeneous partitions. The partition can be done in two different directions: (1) by rows if categorical information is available for each column (e.g. some columns are from disease samples and other columns are from healthy samples) or (2) by a column if a column contains categorical information (e.g. a column represents a categorical attribute such as colors or sex). We designed and implemented an interface to facilitate this interactive partition-based clustering results comparison. Coordination between clustering results displays and comparison results overview enables users to identify interesting clusters, and a simple grid display clearly reveals correspondence between two clusters.


[HTML


Graph Visualization Screenshot

Graph Visualization
More information

Tech Reports
Video Reports
Annual Symposium

News
Seminars + Events
Calendar
HCIL Seminar Series
Annual Symposium
HCIL Service Grants
Events Archives
Awards
HCIL Conference Travel Award
Job Openings
For the Press
HCIL Overview
Become a Member
Collaborators
Collaborating Groups + People
Academic Visitors
Join our Mailing List
Contact Us
Visit Us
HCIL Store
Give the HCIL a Hand
HCIL T-shirts for Sale
Our Lighter Side
HCIL Memories Page
Faculty/ Staff
Students
Ph.D. Alumni
Past Members
Research Areas
Communities
Design Process
Digital Libraries
Education
Physical Devices
Public Access
Visualization
Research Histories
Faculty Listed by Research
Project Highlights
Project Screenshots
Publications and TRs
Videos
Books
Products
Presentations
Studying HCI
Masters in HCI
PhD in HCI
Visiting Scholars
Class Websites
Sponsor our Research
Sponsor our Annual Symposium
Active Sponsorship
Industrial Visitors