Hierarchical Clustering Explorer 

for Interactive Exploration of Multidimensional Data

About This Project | HCE 3.5 (Power Analysis) | HCE 3.0 (HCE2W) User Manual | HCE 2.0 | HCE 1.0 | Download

Project Description :

Multidimensional data sets are common in many research areas, including microarray experiment data sets. Genome researchers are using cluster analysis to find meaningful groups in microarray data. Some clustering algorithms, such as k-means, require users to specify the number of clusters as an input, but users rarely know the right number beforehand. Other clustering algorithms automatically determine the right number of clusters, but users may not be convinced of the result since they had little or no control over the clustering process. To avoid this dilemma, the Hierarchical Clustering Explorer (HCE) applies the hierarchical clustering algorithm without a predetermined number of clusters, and then enables users to determine the natural grouping with interactive visual feedback (dendrogram and color mosaic) and dynamic query controls. HCE 1.0  implemented four general techniques that could be used in interactive explorations of clustering results.

However, the high dimensionality of the data sets still hinders users from finding interesting patterns, clusters, and outliers. Determining the biological significance of such features remains problematic due to the difficulties of integrating biological knowledge. In addition, it is not efficient to perform a cluster analysis over the whole data set in cases where researchers know the approximate temporal pattern of the gene expression that they are seeking. To address these problems, we developed the Hierarchical Clustering Explorer 2.0 by adding three new features to HCE:

As an important part of our continuing effort to give users more controls over multidimensional data analysis processes and to enable more interactions with analysis results through interactive visualization techniques, we present a set of principles, GRID principles, that could enable users to better understand distributions in one (1D) or two dimensions (2D), and then discover relationships, clusters, gaps, outliers, and other features in multidimensional data sets. By combining information visualization techniques (overview, coordination, and dynamic query) with summaries and statistical methods, users can systematically examine the most important 1D and 2D axis-parallel projections. Detecting interesting features in low dimensions (1D or 2D) by utilizing powerful human perceptual abilities is crucial to understand the original multidimensional data set. Familiar graphical displays such as histograms, scatterplots, and other well-known 2D plots are effective to reveal features including basic summary statistics, and even unexpected features in the data set. There are also many algorithmic or statistical techniques that are especially effective in low dimensional spaces. While there have been many approaches utilizing such visual displays and low dimensional techniques, most of them lack a systematic framework that organizes such functionalities to help analysts in their feature detection tasks.

We summarize the GRID principles as:

Abiding by the GRID principles, we implement a systematic framework, rank-by-feature framework as two separate tabs in Hierarchical Clustering Explorer 3.0:

If you have any comment or question, send an email to Jinwook Seo (jinwook@cs.umd.edu).

Participants:

Papers:

Presentations:

Application Examples:

Availability & Download:

HCE is a standalone Windows® application running on a general PC environment. It is freely downloadable for academic and/or research purposes. Commercial licenses can be negotiated with the UM Office of Technology Commercialization (Gayatri Varma, gayatri@umd.edu).
 

Download HCE 3.5 test version(released on Nov. 11, 2005)

User's Guide for HCE version 3

Register and Download HCE 3.0 (released on Dec. 29, 2004)

Register and Download HCE version 2.0 beta now! (released on May 5, 2003)

Register and Download HCE 1.0

System requirements:

Intel® Pentium® processor
Microsoft® Windows 2000®, Windows XP

Support:

This research has been partially supported by the grant N01 NS-1-2339 from the National Institutes of Health.

Related Sites:

[Bioinformatics Visualization] [Bioinformatics Resources in GenMed]


Last updated