Abstract:
|
There is currently great interest in developing tools to predict an individual's future risk of experiencing an adverse health event by utilizing patients' electronic health data (EHD). However, the nature of EHD does not guarantee that all subjects will be tracked for the entire timeframe over which we want to make predictions, and hence it may be uncertain whether or not the event occurred within that timeframe. Given the size and complexity of EHD, machine learning (ML) techniques are an appealing alternative to less flexible time-to-event regression methods, but most ML methods assume outcomes to be fully observed and therefore become biased when dealing with censored outcomes. We propose a universal and easy to implement technique that allows any ML technique to handle censored data by averaging predictions across a set of weighted bootstrap samples. The bootstrap sampling weights are computed using inverse probability of censoring weighting (IPCW). We demonstrate this method using EHD from a large Midwestern health insurance company where over 50% of the observations are censored. We employ several ML methods to predict cardiovascular risk in these data.
|