Abstract:
|
Genomic data analysis has emerged as a valuable research area in the 21st century. Since gene expression data are usually highly correlated and subject to many sources of variations, we need to normalize them before implementing any statistical test. So far, no normalization procedure is totally satisfactory. Since we never know which genes are truly differential expressed (DE), all normalization procedures inevitably introduce a bias by borrowing information from all genes. Also, there is always a trade-off between achieving power and controlling false discovery rate (FDR). In this study, we carry out a comprehensive comparison on different normalization procedures in terms of their impact on a t-test. These microarray data are obtained from GEO (Gene Expression Omnibus) database and contain expressions on probe set level from breast cancer patients. We propose "super-delta", a local normalization procedure, which involves taking difference between genes interactively. It is compared with traditional normalization procedures to demonstrate its large power and good control of FDR. We also give theoretical justifications to this new approach.
|