Online Program

Return to main conference page
Friday, February 16
CS14 Working with Health Care Data Fri, Feb 16, 3:45 PM - 5:15 PM
Salons BC

Assessing Correspondence Between Two Data Sources Across Categorical Covariates with Missing Data: Application to Electronic Health Records (303547)

View Presentation View Presentation

Yiyi Chen, Oregon Health & Science University 
John Heintzman, Oregon Health & Science Univeristy 
*Emile Latour, Oregon Health & Science University 
Miguel Marino, Oregon Health & Science University 

Keywords: Missing data; multiple imputation; chained equations; fully conditional specification; electronic health records; EHR; kappa statistics; agreement.

The application of multiple imputation methods with correspondence statistics (e.g. agreement, kappa) between two data sources is not well documented. We combined these two statistical methods in a novel way to examine their utility in applied healthcare research with electronic health records (EHRs). To motivate this work, we performed a validation study comparing the documentation of adult preventive care services in EHRs against a gold standard of Medicaid claims data across many patient characteristics, specifically focusing on Race and Federal Poverty Level (FPL). Using data from N=13,101 Medicaid-insured adult patients receiving care in 43 Oregon community health centers, we compared documentation for screening services between EHR and Medicaid data sources using kappa statistics, before and after imputing missing patient characteristic data. We used multivariate imputation by chained equations (MICE) to impute missing values due to its flexibility working with large and complex data sets. We successfully provide a practical example and guidance for combining these statistical methods. We conclude that MICE is a beneficial tool in this setting.