Seo, J., Bakay, M., Chen, Y., Hilmer, S., Shneiderman, B., Hoffman, E. (December 2003)
Motivation: Sources of uncontrolled noise strongly influence data analysis in microarray studies, yet signal/noise ratios are rarely considered in microarray data analyses. We hypothesized that different research projects would have different sources and levels of confounding noise, and built an interactive visual analysis tool to test and define parameters in Affymetrix analyses that optimize the ratio of signal (desired biological variable) versus noise (confounding uncontrolled variables). Results: Five probe set algorithms were studied with and without statistical weighting of probe sets using Microarray Suite (MAS) 5.0 probe set detection p values. The signal/noise optimization method was tested in two large novel microarray datasets with different levels of confounding noise; a 105 sample U133A human muscle biopsy data set (11 groups) (extensive noise), and a 40 sample U74A inbred mouse lung data set (8 groups) (little noise). Success was measured using F-measure value of success of unsupervised clustering into appropriate biological groups (signal). We show that both probe set signal algorithm and probe set detection p-value weighting have a strong effect on signal/noise ratios, and that the different methods performed quite differently in the two data sets. Among the signal algorithms tested, dChip difference model with p-value weighting was the most consistent at maximizing the effect of the target biological variables on data interpretation of the two data sets. Availability: The Hierarchical Clustering Explorer 2.0 is available at http://www.cs.umd.edu/hcil/hce/ , and the improved version of the Hierarchical Clustering Explorer 2.0 with p-value weighting and F-measure is available upon request to the first author. Murine arrays (40 samples) are publicly available at the PEPR resource (http://microarray.cnmcresearch.org/pgadatatable.asp) (Chen et al., 2004).