University of Maryland, College Park
Spring 2006
Set Visualization  



Hamid Haidarian Shahri     Mudit Agrawal
Computer Science Department
University of Maryland
College Park, MD 20770 USA
{hamid, mudit}

In this project, we explore the area of set visualization. Our focus is more geared towards providing scalability to some of the approaches shown in [Liu05], and facilitating the analysis of sets by representing the clusters graphically in order to depict their internal, as well as external links. The significant contribution of our work is to apply the SOM and K-means clustering for producing better visualizations. Although it might not be apparent at first glance, focusing on the problem reveals that both of the above algorithms, as documented in the literature, are not applicable to set visualization, as they assume a 2D or nD (vector) representation for each data point (i.e. law case). More specifically, the attributes must form a vector space. This assumption does not hold and there is no clear geometric attribute corresponding to our dataset. Nevertheless, our algorithms produce high quality 2D visual representations of large datasets. We tested the algorithms on about 2800 points, while most previous approaches fail to represent any dataset larger than 100ís of nodes. The details of how similarity was computed, in the face of no geometric distances, for clustering algorithms are provided. Additionally, the system provides various interactive tools to enable users to explore sets and navigate between them.