Online Program Home
My Program

Abstract Details

Activity Number: 482 - Causal Inference and Related Methods
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistics in Epidemiology
Abstract #329534 Presentation
Title: Can We Train Machine Learning Methods to Outperform the High-Dimensional Propensity Score Algorithm?
Author(s): Mohammad Ehsanul Karim* and Robert W Platt
Companies: University of British Columbia and McGill University
Keywords: Causal inference; Propensity score; High-dimensional; Machine learning; High-dimensional Propensity Score; Observational study

The use of retrospective health care claims datasets is frequently criticized for the lack of complete information on potential confounders. Utilizing patient's health status-related information from claims datasets as surrogates or proxies for mismeasured and unobserved confounders, the high-dimensional propensity score algorithm enables us to reduce bias. Using a previously published cohort study of postmyocardial infarction statin use (1998-2012), we compare the performance of the algorithm with a number of popular machine learning approaches for confounder selection in high-dimensional covariate spaces: random forest, least absolute shrinkage, and selection operator, and elastic net. Our results suggest that, when the data analysis is done with epidemiologic principles in mind, machine learning methods perform as well as the high-dimensional propensity score algorithm. Using a plasmode framework that mimicked the empirical data, we also showed that a hybrid of machine learning and high-dimensional propensity score algorithms generally perform slightly better than both in terms of mean squared error, when a bias-based analysis is used.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program