Abstract:
|
The process of splitting data into training and validation sets for prediction model development is subject to variation, particularly in the presence of small and imbalanced data. Failure to appropriately account for this variability can decrease the validity and replicability of a prediction model, and lead to erroneous conclusions in a statistical comparison of competing prediction models’ performance metrics. This presentation will compare existing methods of cross validation in the context of small and imbalanced datasets. Stemming from this, we will discuss innovative strategies and statistical methods for comparing several prediction models in terms of performance metrics (e.g. AUC).
|