Abstract:
|
Microarray-based genomics and other high-throughput experimental approaches are becoming increasingly important in biology and chemistry. In this type of omic research (Weinstein, Science 282:627, 1998), one often generates and analyzes large databases without knowing in certainty what the most important molecular questions will turn out to be. Consequently, we need to develop and improve our ability to analyze the large amounts of quantitative measurements that these approaches produce. We will discuss the issues of normalization and transformation of the data as they affect the clustering results across different algorithms and measures of similarity, including the problem of assessing the reliability of clustering classes. As an application we will consider microarray gene expression data on the panel of 60 human cancer cell lines (NCI-60) used by the drug discovery program at the National Cancer Institute. We have assessed expression levels in those cells for thousands of genes using cDNA microarrays (Ross, et al., Nature Genetics 24:227, 2000; Scherf, et al., Nature Genetics 24:236, 2000) and Affymetrix Oligonucleotide arrays (Staunton, et al., PNAS 98:10787, 2001)
|