Online Program

Return to main conference page
Saturday, May 19
Data Science
Data Science in Health
Sat, May 19, 1:15 PM - 2:45 PM
Grand Ballroom G

Classifying Health Insurance Type from Survey Responses Using Enrollment Data (304432)


Kathleen Thiede Call, State Health Access Data Assistance Center 
Oellerich Don, US Health and Human Services Department 
Angela Fertig, Univesity of Minnesota 
*Joanne Pascale, US Census Bureau 

Keywords: health insurance, classification, administrative records, survey measurement error

The Census Bureau recently implemented a redesign of the health insurance module of the Current Population Survey Annual Social and Economic Supplement (CPS ASEC). The objective of this research is to inform development of an algorithm for combining answers to questions about health insurance from this module in order to maximize accurate categorization of coverage type. Data come from the CHIME study (Comparing Health Insurance Measurement Error), a reverse record check study in which households with individuals enrolled in a range of public and private health insurance plans (including the marketplace) were administered a telephone survey that included the CPS health module. After the survey and records data were linked, answers to survey questions about the characteristics of coverage (e.g., general source, program name, premium, subsidization) were examined in relation to the coverage type indicated by the record. A machine learning approach was used to develop three alternative algorithms to categorize coverage type – one skewing toward public coverage in ambiguous cases, one skewing toward marketplace and one in-between. Three different accuracy metrics were calculated for each algorithm: sensitivity, predictive power and prevalence. Results varied slightly across algorithms and showed sensitivity for private and public coverage was about 98 percent and 82 percent, respectively. Predictive power was about 97 percent for both private and public coverage. The survey estimate of private coverage was about 8 percentage points higher than the population prevalence, and the survey estimate of public coverage was about 3 percentage points lower than the population prevalence.