Abstract:
|
Many datasets are collected from multiple environments (e.g. different labs, perturbations, etc.), and it is often advantageous to learn models and relations that are invariant across environments. Invariance can improve robustness to unknown confounders and improve generalization to new domains. We develop a novel framework that we term Kullback-Leibler regression (KL regression) to reliably estimate regression coefficients in a challenging multi-environment setting, where latent confounders affect the data from each environment. KL regression is based on a new objective of simultaneously minimizing the sum of Kullback-Leibler divergences between a parametric model and the observed data in each environment, and we derive an analytic solution for its global optimum. We prove that KL regression recovers the true invariant factors under a flexible confounding setup. Extensive experiments show that KL regression performed better than state-of-the-art causal inference techniques across a variety of settings, even with large model mismatch. Moreover KL regression achieved the top score on a DREAM5 challenge for inferring causal genes.
|