Conference Program

Return to main conference page

All Times ET

Friday, June 10
Computational Statistics
Cluster and Graphical Analyses
Fri, Jun 10, 11:30 AM - 1:00 PM
Cambria
 

Conservative Causal Discovery by Use of Supervised Machine Learning (310083)

Presentation

Claus Thorn Ekstrøm, University of Copenhagen 
*Anne Helby Petersen, University of Copenhagen 
Joseph Ramsey, Carnegie Mellon University 
Peter Spirtes, Carnegie Mellon University 

Keywords: Causal discovery, structure learning, observational data, supervised machine learning, neural network

Causal questions are prevalent across many scientific disciplines: How can we prevent development of depression? Why is the global temperature rising? Statistical causal inference can quantify a given causal effect, but unless data are collected experimentally, the statistical analysis relies on specification of a causal model. Causal discovery algortithms are empirical methods for constructing such causal models from data. There exists several methods (e.g. GES) that are asymptotically correct, but they generally struggle on smaller samples. Moreover, most methods were developed with an interest in controlling false positive findings, but focus less on false negative control. However, this error tradeoff is not always ideal: A causal model with many missing causal relations entails too strong causal assumptions, and thus if it is used as the basis for conducting statistical causal inference, it may lead to biased causal effect estimates.

We propose a new causal discovery method that addresses these issues: Supervised learning discovery (SLdisco). SLdisco uses supervised machine learning to obtain a mapping from observational data to equivalence classes of causal models. We evaluate SLdisco in a large simulation study and a real epidemiological data application. The simulation study is based on linear Gaussian data generating mechanisms and we consider several choices of model size and sample size. Comparing with GES, we find that SLdisco is less sensitive towards sample size. Moreover, SLdisco provides better control of false negative errors, and only moderately reduced false postive error control. In the application, we use random subsampling to investigate real data performance on small samples. We again find that SLdisco is less sensitive towards sample size and hence seems to better utilize the information available in small datasets. All in all, SLdisco is hence more conservative, only somewhat less informative and performs relatively better on small samples.