There has been a proliferation of EHR data in recent years. However, the data tends to be splintered across institutional silos. Integration of the data across silos at the patient level has many challenges, including patient privacy and the practical necessities of minimizing the rounds of communication between between sites.
Informally, there is a trade-off between efficiency and privacy. Freely sharing patient-level information across silos is efficient but impractical. Restricting information sharing to coarse aggregate summaries (e.g. meta-analytic approaches that essentially average treatment effects across different studies) preserves privacy but is inefficient and may lead to bias. There is a need to identify approaches that lie between these two extremes of privacy and efficiency.
Building on the communication-efficient surrogate likelihood work of Jordan et al. (2018 JASA), we outline approaches for fitting Cox models and its extensions when data is distributed among multiple sites. We show that our approaches are more accurate than naive meta-analytic approaches while privacy and minimizing the frequency of communication required between separate data sites.
|