Abstract:
|
The term 'Data Fusion' describes a particular missing-data pattern in combination with a particular analysis objective. The pattern emerges when two data sources A and B are stacked over each other, yielding three sets of variables: a set of variables (X) observed in both sources, a set of variables (Y) only available in source A, and a set of variables (Z) only available in source B. The analysis objective is to draw inference about the joint distribution of Y and Z which are not jointly observed. At first glance, this missing-data pattern resembles the missing-data pattern of the potential outcomes framework (Rubin, 1974). Propensity Score Matching (PSM) (Rosenbaum & Rubin, 1983) is a popular method for causal inference with observational data, and it is tempting to apply his method to data fusion problems, especially, since 'Statistical Matching' is used synonymously for combining data from different sources. In order to investigate its suitability for data fusion settings, we compare PSM with parametric and non-parametric imputation method in a simulation study.
|