Online Program

Thursday, February 21
PS1 Poster Session 1 & Opening Mixer Thu, Feb 21, 6:30 PM - 8:00 PM
Napoleon Ballroom

Can Survey Response Propensities Grow on Trees? Comparing the Validity of Random Forests and Logistic Regression Models Using Population Variables Appended to an ABS Sampling Frame

Anh Thu Burks, The Nielsen Company 
*Trent D Buskirk, The Nielsen Company 

Keywords: Random Forests, Response Propensity Models, Address Based Sampling, Logistic Regression, Principal Components, Internal and External Validity

Address based sampling (ABS) provides survey researchers and statisticians a vast array of ancillary information that can be appended to the sampling frame at the block-group level for virtually every sampling unit. Information such as median household income, percentage of renters or householders over 55, can be used a priori as part of the sampling design or post-sampling to improve the survey recruitment processes. In this presentation we report the results of a study aimed at evaluating the use of a series of variables available both at the block-group and zip-code+4 levels from both the U.S. Census and other commercial sources to estimate response propensities for a national media diary survey (MDS). The MDS sample consisted of over 650,000 addresses randomly selected from a national ABS sampling frame. The response propensity models were constructed from the entire catalogue of over 100 ancillary variables using both random forests and logistic regression models based on principal components. The internal validity of these models was evaluated using both a 10-fold cross validation method as well as using a 25% hold-out test sample. Finally external validity was estimated by applying these predictive models to a separate and subsequent media diary survey that was also based on a national random sample.