Keywords: race/ethnicity, imputation, Medicare, disparities
Race/ethnicity (R/E) is often unavailable or inaccurate. Medicare files contain a Social Security Administration (SSA) R/E variable with known limitations. The Medicare Bayesian Improved Surname Geocoding (MBISG 1.0) method estimates a vector of 6 R/E probabilities (White, Black, Hispanic, American Indian/Alaska Native, and Asian/Pacific Islander (API), and multiracial) by augmenting SSA R/E with surname and address-based R/E Census information. Using data from 284,627 Medicare beneficiaries, we improve MBISG 1.0 by (a) allowing the association of SSA data with self-reported R/E to vary by age, (b) disaggregating compound surnames, (c) better accounting for Puerto Rican residence, (d) incorporating additional data elements (e.g., first names), and (e) allowing more flexible multinomial logistic regression modeling, resulting in MBISG 2.0. MBISG 2.0 significantly improves accuracy for Hispanic, White, and API beneficiaries by removing about 1/3 of remaining error in cross-validated results. The R/E group with the lowest MBISG 1.0 performance (Hispanic) improved the most Thus, MBISG 2.0 performance is higher and more uniform.