Abstract:

Researchers are often interested in generalizing the average treatment effect (ATE) estimated in a randomized experiment to nonexperimental target populations. Previous studies have shown that an unbiased estimate for the population ATE can be obtained if selection into the experiment is independent of treatment heterogeneity given a set of variables researchers adjust for. Although this separating set has simple mathematical representation, it is often unclear how to select this set in practice. In this paper, we propose a datadriven method to estimate the minimum separating set. Our approach has two advantages. First, because we find a separating set of the smallest size, it is easier for researchers to measure it in the target population. Second, our algorithm can incorporate researcherspecific data constraints. When they know certain variables are unmeasurable in the target population, our method can identify a minimal separating set subject to such constraints, if one is feasible. We validate our proposed method using naturalistic simulations.
