633 – Analyzing Linked Data: Challenges, Solutions, and Potential Opportunities
Data Analysis Using NHIS-EPA Linked Files: Issues with Using Incomplete Linkage
Rong Wei
National Center for Health Statistics
Van Parsons
National Center for Health Statistics
Jennifer Parker
NCHS
Yulei He
NCHS/CDC
The National Health Interview Survey (NHIS) is an annual large scale national survey that collects individual health outcome data. As the NHIS is based on a complex survey design, analytical "best practice" involves accounting for the survey design features in analysis of the data. To expand analytical utility, the NHIS has been linked geographically to select EPA pollution data over the years 1985 to 2005. This available EPA-linked data is only partially complete with respect to geographical coverage, and some analytical caution is advised since a "missing at random" distribution for linked pollutants cannot be assumed. Inferences about associations between population health outcomes and air pollution status may be biased if standard design-based analytical methods are implemented. The present study focuses on investigating situations where such biases may occur and some possible analytical corrective actions. We suggest model-based alternatives for estimating associations between population health and air quality. The impact of bias and variance of the demographical components in the statistical weights, as well as clustering effects are examined.