Validity of Using Census-Based Area Level Socioeconomic Information As a Proxy for Individual Level Socioeconomic Confounders in Instrumental Variables Regression
*Yenchih Hsu, University of Pennsylvania 
Scott A. Lorch, University of Pennsylvania 
Dylan Small, University of Pennsylvania 

Keywords: Aggregation, Causal inference, Instrumental Variables, Proxy Variables

A frequent concern in making statistical inference for causal effects of a policy or treatment based on observational studies is that there are unmeasured confounding variables. The instrumental variable method is an approach to estimate a causal relationship in the presence of unmeasured confounders. A valid instrumental variable needs to be independent of the unmeasured confounding variables given measured confounding variables. In health services research, distance to a specialty care center has been used as an instrumental variable for the effect of specialty care vs. general care. Because distance to a specialty care center is often associated with socioeconomic status, for distance to be a valid instrument variable, it is important that socioeconomic status be measured and controlled for. However, health data sets often lack individual socioeconomic information but contain area average socioeconomic information from the US Census, e.g., average income or education level in a zip code. These area averages are collected from the general population. However, in our motivating study of the effect of specialty perinatal care for premature infants, the population of interest is mothers, which are a subset of the general population in a zip code. Mothers in a zip code may differ in their socioeconomic characteristics from the general population in the zip code. We study the effects on the bias of the two stage least squares estimates in instrumental variables regression of (1) using an area-level variable in models for the general population and (2) using an area-level variable from the general population in models for a subset of the general population. We will present simulation results and an application to a study of perinatal care for premature infants.