Abstract:
|
The nested case-control design is a sampling scheme intended to reduce data collection costs when dealing with time-to-event data. If the event of interest is rare and covariates are difficult or expensive to collect, the nested case-control (NCC) design provides reduced costs with minimal impact on inferential precision. In previous work, we have shown that model misspecification under the NCC design leads to an estimand that depends on the number of controls sampled. In this work, we show that a similar dependence on the number of controls arises when the goal is model selection for individual-level prediction of survival times. Specifically, we illustrate that when the NCC sample is used to perform model selection, the model selected differs depending on how many controls are sampled. We propose an estimator of the out-of-sample prediction error that is robust to the number of controls utilized in the NCC sample. We assess the performance of the proposed estimator in finite-sample settings through simulation studies and apply the proposed methods to data from the Alzheimer’s Disease Neuroimaging Study (ADNI).
|