Abstract:
|
Integrating Electronic Health Records (EHR) from multiple centers provides researchers a larger sample size of the population for better estimation and prediction. The challenges in sharing patient-level information in data integration promote the development of distributed algorithms, which only require sharing aggregated information. However, most of the existing distributed algorithms rest on the assumption that data across clinical sites are homogenous. This assumption ignores the heterogeneity in patients’ characteristics, environments, and data collection processes. In this paper, we propose a communication-efficient distributed algorithm. We use the pairwise conditioning approach to construct a pseudolikelihood function to account for the heterogeneous distributions by allowing site-specific unknown nuisance baseline probability function. We evaluate our algorithm through a systematic simulation study motivated by real-world scenarios and apply the algorithm to multiple datasets from the Children’s Hospital of Philadelphia (CHOP). The results show that the proposed method leads to a sensible data sharing scheme for EHRs across different clinical sites.
|