Online Program Home
My Program

Abstract Details

Activity Number: 51 - EHR Data + X: Expanding the Reach of EHR Data Through Data Integration
Type: Invited
Date/Time: Sunday, July 29, 2018 : 4:00 PM to 5:50 PM
Sponsor: Biometrics Section
Abstract #326695 Presentation
Title: Integrating Observational Data with Prior Knowledge: Wikipedia-Informed Priors for Predicting Health Outcomes
Author(s): Martijn Jeroen Schuemie*
Companies: Janssen R&D
Keywords: predictive models; regularized regression; bayes rule

Observational health data such as electronic health records and insurance claims data provide the opportunity to study many questions, including fitting prediction models for health outcomes of interest, potentially including tens of thousands of covariates. Traditionally, these analyses are completely data-driven, for example using L1 regularized regression with identical priors on all covariates. Here we propose to integrate existing knowledge into our priors. We automatically extract known risk factors for thousands of diseases from Wikipedia by utilizing page-to-page links and page-to-code (e.g. ICD-10 codes) links. Priors for risk factors are still centered on 0, but their variance is driven by a second hyperparameter. Cross-validation is used to select both hyperparameters. As a proof of concept we fit predictive models for cardiovascular events in diabetes populations using large claims databases. Results show that the selected variance for risk factors is much larger than non-risk factors, that predictive accuracy is comparable between informed and uninformed models, but the resulting models are more parsimonious and are more likely to include the known risk factors.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program