Abstract:
|
Survival analysis plays an important role in biomedical transcriptomics studies for developing reliable predictors of patient prognosis and treatment response. While survival analysis methods are available to address the issues of high dimensionality and signal sparsity, research is still lacking on the issue of data artifacts associated with disparate experimental handling, which is a pivotal feature of transcriptomics data. Published studies often deal with these artifacts by borrowing normalization methods designed for differential expression analysis, without re-evaluating their performance. We first built a benchmarking tool for assessing data normalization in survival prediction, using a unique pair of microarray datasets and resampling-based simulations. Despite the unfounded optimism for the ‘off-label’ uses, we found that existing normalization methods, such as quantile normalization, may distort a marker’s ordering across samples and subsequently compromise the detection of outcome-associated markers and the accuracy of outcome prediction. We then proposed a new and improved method for dealing with artifacts when developing survival predictors for transcriptomics studies.
|