105 – Social Media Analysis
Using Social Media Data to Predict Survey Responses: A Comparison to Multiple Imputation
Ashley Richards
RTI International
Joe Murphy
RTI International
Darryl Creel
RTI International
Justin Landwehr
RTI International
Social media represents a potential auxiliary data source about survey respondents that can be used when measures of interest are not obtained or are missing. Often, these omissions occur from item nonresponse or when survey constraints limit the information that can be collected. In the event that survey respondents share information related to key survey outcomes in their social media postings and allow the researchers access to these data, there is potential to infer the survey outcomes from the social media data. While there are possible pitfalls with such an approach (selection bias, social desirability effects, etc.), social media may serve as a valuable source for these missing data. To investigate the validity of Twitter data compared to more traditional methods of deriving values for missing data, we compare the results of data predicted from multiple imputation with data from predicted from respondents' Twitter posts using two methods: human coders and a machine algorithm. By randomly selecting cases with non-missing values and masking them to the analysis, we are able to use the survey data as a gold standard to evaluate the results of the different methods.