Abstract:
|
Electronic health records (EHR) contain rich information about patients’ diagnoses, lab tests, and medications, etc., and the wide adoption of EHRs throughout the United States facilitates data integration among different institutions to enrich the study population in biomedical research and improve statistical power. To this end, we propose a one-shot summary-statistics-based distributed algorithm for fitting penalized generalized linear model in multicenter research networks based on patient-level data from a leading site and summary-level statistics from other participating sites. This method only requires one round of communication of summary statistics and avoids transferring patient-level data. Taking the logistic lasso regression as an example, we evaluate the performance of the proposed method in terms of estimation, prediction, and feature selection using both simulation studies and a real-world application.
|