Online Program Home
  My Program

Abstract Details

Activity Number: 54 - Enabling Reproducibility in Statistical Translations of Genomics Data for Biomedical Research
Type: Invited
Date/Time: Sunday, July 30, 2017 : 4:00 PM to 5:50 PM
Sponsor: ENAR
Abstract #322000
Title: A Cautionary Note on the Validation Method for Molecular Classification Studies
Author(s): Li-Xuan Qin*
Companies: Memorial Sloan Kettering Cancer Center
Keywords: Genomics ; Classification ; Cross-validation ; Study design ; Data normalization
Abstract:

In this study we draw attention to the connection between inflated over-optimistic findings and the use of cross-validation (CV) for error estimation in molecular classification studies. We demonstrate this important yet over-looked complication of CV using a unique pair of microarray datasets on the same set of tumor samples. Our study showed that (1) CV tended to under-estimate the error rate when the data possessed confounding handling effects, (2) depending on the relative amount of handling effects, normalization may further worsen the under-estimation of the error rate, (3) balanced assignment of arrays to comparison groups allowed CV to provide an unbiased error estimate. Our study demonstrates the benefits of balanced array assignment for reproducible molecular classification and calls for caution on the routine use of data normalization and CV in such analysis. In addition, we provide recommendations on the study design issues and data normalization considerations, when using an independent study for external validation.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association