Modeling Coverage Error in Address Lists Due to Geocoding Error: The Impact on Survey Operations and Sampling
Lee Fiorio
NORC
Jizhou Fu
NORC
Survey research organizations have been researching the use of extracts of the United States Postal Service delivery sequence file (DSF) as a replacement for traditional listing. Due to software limitations, individual housing units (HUs) on the DSF are sometimes errantly geocoded which can influence coverage properties of selected segments. NORC undertook a national listing effort in 2011 to augment the DSF in areas known to have limited coverage, such as rural areas and areas with new construction. We used an enhanced listing method, where the lister, using a handheld device, verifies and edits the DSF list geocoded to a designated segment. One benefit of enhanced listing is the ability to capture the geographic coordinates of each HU, thus providing data to further explore the nature of DSF coverage. We focus on a selection of rural and urban segments from the national listing effort. For addresses on the DSF but not found by the lister, we use logistic regression to model the likelihood of address-level geocoding error using DSF flags and census data from 2010. We also build an autologistic model (Besag, 1972) to account for spatially dependent data by incorporating spatial autocorrelation. Results indicate geocoding error occurrences are spatially dependent, and the probability of geocoding error is related to address characteristics such as drop delivery and address type as well as rural block characteristics including geographic area. Our model also demonstrates that low block-level DSF coverage is associated with geocoding error. Understanding the correlates of geocoding error in the DSF will increase listing efficiency and frame quality by allowing the identification of areas with the most limited DSF coverage that will require listing for sampling frame construction.