Abstract:
|
Data normalization is crucial to gene expression analyses by removing systematic noises. A main drawback of variance reduction is that it borrows information from all genes, which includes differentially expressed genes (DEGs). Such practice will inevitably introduce bias, resulting in inflate of type I error and reduction of power. In this study, we propose a new differential expression analysis pipeline, dubbed as super-delta. This procedure involves a robust strategy to exclude genes with large group difference for normalization, followed by a modified t-test based on asymptotic theory. We compared super-delta with three commonly used normalization methods: global, median-IQR, and quantile normalization, by applying all four methods to a microarray dataset on breast cancer patients who took chemotherapy. Super-delta consistently identified more DEGs with biological connections to breast cancer or chemotherapy, verified by functional enrichment analyses. Simulations showed that super-delta had better statistical power with tighter type I error control than its competitors. In many cases, the performance of super-delta was close to an oracle test using noise-free datasets.
|