Race/ethnicity data is often not present in healthcare and other administrative datasets. One solution is to impute race/ethnicity, which surname has long been used for. Although less predictive, first-name information improves imputations that already include surname information.
We compare the marginal contribution of first-name information to the accuracy of racial/ethnic imputations in a sample of Medicare beneficiaries with known race/ethnicity. We analyze two scenarios: a sparse set of predictors and a rich set of predictors to assess whether gains in accuracy from first names differ by gender and race/ethnicity.
Among non-Hispanic white, Hispanic, and Asian/Pacific Islander beneficiaries, first-name information improves accuracy more for women and narrows the gender gap in accuracy. Gains in accuracy from adding first-name information are similar for black men and women. For all groups, the addition of first-name information improves prediction accuracy more under our sparse predictor scenario.
Thus, first-name information increases the accuracy of racial/ethnic imputations, especially when there are only a sparse set of predictors, and especially for women.