Abstract:
|
In this study we draw attention to the connection between inflated over-optimistic findings and the use of cross-validation (CV) for error estimation in molecular classification studies. We demonstrate this important yet over-looked complication of CV using a unique pair of microarray datasets on the same set of tumor samples. Our study showed that (1) CV tended to under-estimate the error rate when the data possessed confounding handling effects, (2) depending on the relative amount of handling effects, normalization may further worsen the under-estimation of the error rate, (3) balanced assignment of arrays to comparison groups allowed CV to provide an unbiased error estimate. Our study demonstrates the benefits of balanced array assignment for reproducible molecular classification and calls for caution on the routine use of data normalization and CV in such analysis. In addition, we provide recommendations on the study design issues and data normalization considerations, when using an independent study for external validation.
|