Abstract:
|
This paper presents an evaluation of a new method for probability sampling from Twitter to target a hard to reach population. Social media platforms, such as Twitter, offer researchers a wealth of information to identify and target hard to reach populations; however, generally samples are convenience based and likely not representative. Our methodology utilizes two steps. First, we accessed the application programming interface to draw a random subset of Twitter users. Second, based on user keywords and public tweets, we developed an algorithm that stratifies sample members by their likelihood of being in the subdomain of interest. We present the results of a pilot test to estimate attempted suicide and suicide ideation among youth with an oversample of those who are LGBTQ. Our analysis examines (1) the efficiency of the stratification algorithm and whether we achieve a meaningful increase in LGBTQ respondents relative to traditional probability-based samples, and (2) our ability to accurately weight the data to minimize bias through comparisons to the Youth Risk Behavior Surveillance System.
|