Combining Information from Multiple Complex Surveys
*Qi Dong, Program in Survey Methodology, University of Michigan 
Michael Elliott, Program in Survey Methodology, University of Michigan 
Trivellore Raghunathan, Program in Survey Methodology, University of Michigan 

Keywords: combining rule for multiple surveys, complex sampling design features, synthetic populations, multiple imputation, health insurance coverage rates, BRFSS, NHIS, MEPS

Increasingly many substantive research questions require a web of information that is not adequately collected in a single survey. Fortunately, survey organizations often repeatedly draw samples from the same populations for different surveys and collect a considerable amount of overlapped variables. While the existing combining survey methods have produced improved estimates under different situations, all of them have limitations in that 1) they mostly focus on combining two surveys and have difficulties to be generalized to combine more than two surveys, especially when each survey covers parts of the variables of interest but not all and 2) they do not fully adjust for the different complex sampling design features such as weighting, clustering and stratification and various nonsampling errors such as nonresponse and measurement error that the multiple surveys may have. To fill this gap, this paper presents a principled method for combining multiple surveys from a missing data perspective. The unobserved portion of the population in any of the single surveys will be treated as missing data to be multiply imputed, which together with the observed data will produce multiple synthetic populations. The imputation model will account for the complex sampling design features and nonsampling errors. The missing variables could be imputed by borrowing information across surveys. The estimate for the population quantity of interest will be calculated from each complete synthetic population and will be combined first within each individual survey and then across multiple surveys. The combined estimator is proved to be more accurate and precise than the ones from individual surveys. It is also proved that the more surveys we combine, the more accurate and precise the combined estimator is. The proposed method is used to combine the Behavioral Risk Factor Surveillance System (BRFSS) with the National Health Interview Survey (NHIS) and the Medical Expenditure Panel Survey (MEPS) to estimate the health insurance coverage rates for the whole and some subdomain populations.