Abstract:
|
The Area Under Receiver Operating Characteristics Curve (or AUC) is a ubiquitous measure of performance of any risk prediction or two-class classification model. It measures the quality of discrimination and possesses important properties: it is a proper scoring rule, has intuitive interpretation, and has semi-parametric estimates of its accuracy. We put properties of AUC in the context of training-test approach in risk prediction. We show that in a training-test setting, change in AUC is strictly less than zero under the null, approaching its lower boundary of .5 as the number of added uninformative predictors increases. In the absence of a test set, AUC under the null quickly approaches its upper boundary of 1.0. Therefore, in the absence of a set-aside testing set, AUC as a non-decreasing function for nested models, always improves even with the addition of new uninformative predictor variable(s). We put these results in the context of the development of new risk prediction models including polygenic risk scores and Machine Learning risk classification models in general.
|