Abstract:
|
The performance of risk prediction models can be evaluated in terms of discrimination, calibration and reclassification using such metrics as delta AUC, NRI, IDI, Brier Score, Net Benefit and other measures. Some of these metrics are invariant with respect to the underlying event rate (such as the AUC) while others vary with the event rate. We explore the association of these measures of model performance with the event rate. We show which metrics are invariant and which are not. We find that some non-invariant metrics can stay relatively flat over a wide range of event rates while others can vary substantially by event rate. It is thus difficult to compare such metrics across studies. For example, a lower IDI in one study may be explained by a lower event rate and not by weaker model improvement. We discuss the implications of this and provide practical recommendations on how to make these measures comparable across studies.
|