Abstract:
|
Observational data nowadays can easily reach enormous sizes and high dimensionalities given the diverse channels of data collection. However, the workload of matching the treated and control subjects for causal inference becomes so heavy that it may well exceed the capacity of an individual analyst or the computational constraints. We propose a practical solution by distributing the task among multiple matchers, with each offering a subclassification or matching of the subjects based on a pre-assigned subset of features, and collaborating with other matchers to determine a satisfactory consensus. We achieve the consensus via clustering aggregation methods with covariate imbalance measures integrated in the objectives. Performance of several aggregation methods are explored and compared via simulation studies as well as real world case studies. Ultimately, our multi-matcher framework facilitates existing matching methods and the design of observational studies under settings of high-dimensional distributed data due to computation or security considerations.
|