Activity Number:
|
349
|
Type:
|
Topic Contributed
|
Date/Time:
|
Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Survey Research Methods Section
|
Abstract #321032
|
View Presentation
|
Title:
|
Matchmaker, Data Scientist, or Both? Using Unsupervised Learning Methods for Matching Nonprobability Samples to Probability Sample
|
Author(s):
|
Trent Buskirk* and David Dutwin
|
Companies:
|
Marketing Systems Group and SSRS
|
Keywords:
|
Random Forests ;
Similarity Measures ;
Nearest Neighbor ;
Non-probability Samples ;
Sample Matching ;
Probability Samples
|
Abstract:
|
Research continues to emerge exploring methods for reducing the impact of self-selection bias on the estimates derived from non-probability samples. Sample matching is one such common method that identifies a subset of the non-probability sample that is linked to units within a relevant probability sample using a distance function measured on key indicators, or more generally bases the match on a propensity score, typically derived from a logistic regression model. Analyses and estimates are then produced using the matched sample subset of the non-probability sample. To date, there has been relatively little research comparing various machine learning algorithms for generating matches that are better suited for larger number of predictors. In this study we explore the use of RFs for generating matched samples using the proximity measure generated from an unsupervised version of random forests grown using the probability data set and then applied to the non-probability sample to obtain an overall distance matrix. We compare the resulting matched sample to one obtained using a more common proximity measure based on a simple matching coefficient.
|
Authors who are presenting talks have a * after their name.
Back to the full JSM 2016 program
|