Abstract:
|
Nowcasting based on social media and other naturally-occurring data holds great promise for tracking social and economic phenomena. A nowcasting model that takes social media data as input can potentially track phenomena at higher frequency, lower cost, and greater timeliness than traditional methods. The Google Flu Trends experience raises concern that initial good tracking may be followed by persistent divergence from ground truth. Our experience with a 5-year project using Tweets to track initial unemployment claims was similar: the nowcaster showed initial success, followed by rapid and long-lasting deterioration. The nowcasters' instability reflects the challenge of relating social media data generated over relatively short time spans to social phenomena that play out over much longer periods. It is easy to think that with social media data, data sparsity is a thing of the past. Yet, given the short timespan for which social media data are available, a nowcaster may draw on a limited portion of the target phenonenon's time span. This paper suggests an approach that is robust to features of ground truth that may be latent in one period, but later emerge to create instability.
|