Abstract:
|
Public databases such as the Gene Expression Omnibus now provide almost overwhelming free resources for developing and validating models of association between tumor gene expression and patient outcome in cancer. For topics of widespread interest there are multiple independent datasets available, enabling systematic investigation of cross-study reproducibility, or lack thereof, as a phenomenon. This talk will discuss an approach to cross-study validation where models are trained and validated using all possible pairwise combinations of datasets and compared to traditional within-study cross-validation, to identify the sources of heterogeneity that impact prediction accuracy on new studies. It will discuss how cross-study validation can be used to develop and select algorithms and models that are more robust to study-specific effects. Results highlight that 1) the most obvious sources of heterogeneity may not be those impacting prediction accuracy, and 2) robust prediction rules can be generated from patient samples that are not fully representative, from experimental data containing batch and platform effects, using surprisingly simple methods.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.