Abstract:
|
Prediction models are very useful in clinical and translational research. They can be used to predict overall survival, recurrence risk, or response to treatment in cancer patients and they can inform treatment decisions. Ideally prediction models are developed using a training data set and then validated in an independent validation data set. However, we often need to assess the predictive power of the model using the same training data that was used to develop the model. Cross-validation is a widely used technique in such a situation to avoid over-fitting. In survival studies, the event numbers are often low so the sample classes are imbalanced. The strategies to select the most appropriate cross-validation method are not obvious, e.g. should we do stratified sampling or not? In this study, we will evaluate several cross-validation strategies using a TCGA data set. We will also show a straightforward method to visualize the predictive power of the model using cross-validation which is useful for communicating with the clinicians.
|