Online Program

Machine learning methods for risk prediction with censored electronic health data

*Julian Wolfson, University of Minnesota 
David M Vock, University of Minnesota 
Sunayan Bandyopadhyay, University of Minnesota 
Gediminas Adomavicius, University of Minnesota 
Paul Johnson, University of Minnesota 
Gabriela Vazquez-Benitez, HealthPartners Institute for Education and Research 
Patrick J O'Connor, HealthPartners Institute for Education and Research 

Keywords: electronic health data, machine learning, risk prediction

Electronic health data (EHD) are appealing sources of data for building risk prediction models. However, some fraction of subjects will not be under observation for the entire time frame over which one wants to make predictions, often due to disenrollment from the health system. Machine learning (ML) approaches to building risk models are attractive because of their ability to capture complex relationships between individual characteristics and health outcomes, but most ML techniques for dealing with right-censored data have been relatively ad hoc---for example, discarding the censored observations or treating them as non-events. We present a general-purpose approach to adapting machine learning techniques to handle right-censored time-to-event outcomes using inverse probability of censoring weighting (IPCW). Our techniques are motivated by and illustrated on the problem of predicting the five-year risk of experiencing a cardiovascular event using EHD from a large U.S. midwestern health care system.