Abstract:
|
Data normalization is an important preprocessing step for molecular data containing unwanted data variation due to experimental handling. There has been a critical yet over-looked disconnection between the use of data normalization and the goals of subsequent analysis: on one hand, methods for data normalization have been mostly developed for the analysis goal of group comparison; on the other hand, these methods have encountered frequent ‘off-label’ use for other goals such as sample classification, neglecting the impact of potential ‘side-effects’ of normalization such as over-compressed data variability. A bridge between these two is made possible by a unique pair of microRNA array datasets on the same set of tumor tissue samples that were collected at Memorial Sloan Kettering Cancer Center. In this talk, I will share our findings, through empirical analysis and re-sampling-based simulations using this dataset pair, on how data normalization impacts the development of tumor sample classifiers and survival outcome predictors.
|