Keywords: predictive modeling, readmission, validation, logistic regression, bootstrap forests, boosted trees, neural networks, penalized regression, LASSO, Elastic Net, Ridge
Diabetes is a chronic disease, affecting nearly one in 12 United States residents and costing approximately $250 billion annually in the US alone. In this talk we use data on diabetes patient and hospital outcomes to identify key factors related diabetic readmissions and to accurately predict the probability of readmission using modern modeling techniques. The data set, which is publicly available, includes 10 years of data (1999-2008) on clinical care and demographic information for approximately 70,000 patients at 130 hospitals and integrated delivery networks. Using these data, we illustrate tools for data preparation, demonstrate interactive visualization techniques, and discuss the importance of validation to gauge model accuracy. Then, we build several exploratory and predictive models, including logistic regression, bootstrap forests and boosted trees, neural networks, and penalized regression (LASSO, Elastic Net, and Ridge). We interactively compare competing models, evaluate the risks of misclassification, and select the best model(s). Finally, we generate scoring code to deploy the selected model as an interactive web-based application.