Abstract:
|
We consider estimating treatment effects when the outcome of interest (e.g., long-term health status) is only seldom observed but abundant surrogate observations (e.g., short-term health outcomes) are available. To investigate the role of surrogates in this setting, we derive the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings. These bounds characterize the best-possible precision of ATE estimation in each case, and their difference quantifies the efficiency gains from optimally leveraging the surrogates with only limited outcome data. These results apply in two important regimes: when the number of surrogate observations is comparable to primary-outcome observations or when the former dominates the latter. Importantly, we take a missing-data approach that circumvents overly strong surrogate conditions that are commonly assumed in previous literature. To leverage efficiency gains of surrogate observations, we propose ATE estimators and inferential methods based on flexible machine learning nuisance estimators and show their efficiency and robustness under mild conditions.
|