Online Program

Return to main conference page

Wednesday, January 10
Wed, Jan 10, 5:30 PM - 7:00 PM
Crystal Ballroom CD & Prefunction
Welcome Reception & Poster Session I

Comparing Techniques to Address Selection Bias due to Missing Data in Observational Studies (304268)

Katherine E Miller, Gillings School of Global Public Health, University of North Carolina-Chapel Hill 
Megan Shepherd-Banigan, Health Services Research and Development, Durham VA Medical Center 
*Valerie Anne Smith, Duke University; Department of Veterans Affairs 

Keywords: selection model, multiple imputation, selection bias

Heckman selection models control for selection bias using a two-stage equation. Hot-decking is a Bayesian multiple imputation method. Unlike other methods to address missing data, neither method requires the assumption of data missing at random or missing completely at random. Using the MEPS 2014 household survey, we compared a Heckman selection model to an OLS regression model using imputed values and to an OLS regression model with no imputed values. The outcome of interest was female log wages aged 18-65. We examined the residuals and the root mean squared error of an adjusted OLS regression with non-imputed wage values, an adjusted OLS regression with imputed wage values and an adjusted Heckman selection model. The adjusted OLS models controlled for socioeconomic factors, industry/occupation, hours worked, and employment in >1 job. The first step of the selection model controlled for education, work/housework limitations, alimony, & marital status to predict workforce engagement. We found the selection model produces the smallest root mean squared error (0.0196). These findings may help guide future analyses considering methods to address truncated data in secondary analyses.