Abstract:
|
This paper investigates the problem of making inference about the regression of an outcome variable Y on covariates (V,L) under a parametric model when data are fused from two separate sources, one which contains information only on (V,Y) while the other contains information only on covariates. This data fusion setting may be viewed as an extreme form of missing data in which the probability of observing complete data (V,L,Y) on any given subject is zero. We have developed a large class of semiparametric estimators, which includes doubly-robust (DR) estimators, of the regression coefficients in fused data. The proposed method is DR in that it is consistent and asymptotic normal if, in addition to the model of interest, we correctly specify a model for either the missing data mechanism under a missing at random assumption, or the distribution of unobserved covariates. This paper carefully lays out settings when identifiability is achievable. We evaluate the performance of the DR method via simulation study.
|