Abstract:
|
“Design-based” treatment effect estimates from RCTs assume no statistical model other than the experimental design itself. However, when the sample size is small or moderate, these estimates can be imprecise. In contrast, analysis of observational data requires untestable assumptions, chiefly no unmeasured confounding. But often, large observational datasets boast a much larger sample than equivalent RCTs. Sometimes, observational and RCT data coexist within the same database, such as covariate and outcome data from an A/B test—an RCT run within online software—alongside a “remnant” of similar data from users who were not randomized. This paper presents ReLOOP, a novel design-based RCT estimator that incorporates remnant data via machine learning, in order to improve statistical precision. It boasts the same accuracy guarantees as traditional design-based RCT estimators, even in small samples. ReLOOP combines two recent causal methods: rebar, which incorporates remnant data into RCT estimators, and LOOP, which uses machine learning for covariate adjustment in RCTs. We demonstrate ReLOOP in an analysis of A/B tests run within the ASSISTments online mathematics tutor
|