JSM 2016 Online Program

Online Program Home

My Program

Abstract Details

Activity Number:	349
Type:	Topic Contributed
Date/Time:	Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
Sponsor:	Survey Research Methods Section
Abstract #321032	View Presentation
Title:	Matchmaker, Data Scientist, or Both? Using Unsupervised Learning Methods for Matching Nonprobability Samples to Probability Sample
Author(s):	Trent Buskirk* and David Dutwin
Companies:	Marketing Systems Group and SSRS
Keywords:	Random Forests ; Similarity Measures ; Nearest Neighbor ; Non-probability Samples ; Sample Matching ; Probability Samples
Abstract:	Research continues to emerge exploring methods for reducing the impact of self-selection bias on the estimates derived from non-probability samples. Sample matching is one such common method that identifies a subset of the non-probability sample that is linked to units within a relevant probability sample using a distance function measured on key indicators, or more generally bases the match on a propensity score, typically derived from a logistic regression model. Analyses and estimates are then produced using the matched sample subset of the non-probability sample. To date, there has been relatively little research comparing various machine learning algorithms for generating matches that are better suited for larger number of predictors. In this study we explore the use of RFs for generating matched samples using the proximity measure generated from an unsupervised version of random forests grown using the probability data set and then applied to the non-probability sample to obtain an overall distance matrix. We compare the resulting matched sample to one obtained using a more common proximity measure based on a simple matching coefficient.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association

Privacy Policy | Conduct Policy | Previous JSMs