Abstract:
|
The issue of representativeness is one of the great barriers encountered by efforts to produce valid inferences via sources of big data such as Twitter, where little is known about individual users. This proposal presents methods for generalizing inferences drawn using non-representative big data sources using a multi-phase survey strategy. We illustrate plans for its implementation with data from Twitter. The research team will collect a smaller sample of Twitter users and administer a comprehensive survey designed to measure wide ranging characteristics to this group. The team will administer the same survey to a probability sample that is representative of all US adults. Next, we will collect and analyze tweets from a much large group of users (i.e., the Twitter universe), so that the smaller Twitter survey can then act as a bridge between the overall US population (the probability sample) and the Twitter population (the Twitter universe). Weighting methods are proposed that can be used to adjust the Twitter universe. Sentiment analyses are performed on Tweets from users in the Twitter universe to yield a method than can be used to gauge public opinion in real time via Twitter.
|