Online Program

Return to main conference page
Friday, October 19
Knowledge
Fri, Oct 19, 10:00 AM - 11:30 AM
Salons FG
Current Topics in Big Data

A Bayesian Approach to Modeling Heterogeneity in Electronic Health Records Accounting for Missing Data (304815)

*Rebecca Anthopolos, Columbia University 
Qixuan Chen, Columbia University 
Ying Wei, Columbia University 

Keywords: Bayesian, electronic health records, heterogeneity, missing data

Compared to longitudinal data collected in a prospective study design, electronic health records (EHRs) are a complex data source for clinical research. The data are high dimensional, containing manifold sparsely populated variables on large subpopulations. The resultant heterogeneity in EHRs can be ascribed to two sources: First, subpopulations often exhibit different health profiles. Second, data collection is largely initiated by when and why a patient decides to visit the clinic. Different patterns of missing data arise from the timing and frequency of clinic visits, in addition to the varying subsets of variables that may be documented given a clinic visit. EHR-based research that fails to account for underlying sources of heterogeneity may result in misleading statistical inferences. In a Bayesian setting, we use general growth mixture modeling to capture latent heterogeneity in the study population and to incorporate the layered missing data processes. This enables evaluating the sensitivity of estimated clinical associations to unobserved confounding from the missing data processes. We apply our method in a study of children’s height and weight measurements in EHRs.