Seo, J., Bakay, M., Zhao, P., Chen, Y., Clarkson, P., Shneiderman, B., Hoffman, E. (April 2003)
Data analysis and visualization is strongly influenced by noise and noise filters. There are multiple sources of “noise” in microarray data analysis, but signal/noise ratios are rarely optimized, or even considered. Here, we report a noise analysis of a novel 13 million oligonucleotide dataset - 25 human U133A (~500,000 features) profiles of patient muscle biposies. We use our recently described interactive visualization tool, the Hierarchical Clustering Explorer (HCE) to systemically address the effect of different noise filters on resolution of arrays into “correct” biological groups (unsupervised clustering into three patient groups of known diagnosis). We varied probe set interpretation methods (MAS 5.0, RMA), “present call” filters, and clustering linkage methods, and investigated the results in HCE. HCE's interactive features enabled us to quickly see the impact of these three variables. Dendrogram displays showed the clustering results systematically, and color mosaic displays provided a visual support for the results. We show that each of these three variables has a strong effect on unsupervised clustering. For this dataset, the strength of the biological variable was maximized, and noise minimized, using MAS 5.0, 10% present call filter, and Average Group Linkage. We propose a general method of using interactive tools to identify the optimal signal/noise balance or the optimal combination of these three variables to maximize the effect of the desired biological variable on data interpretation.