Online Program Home
My Program

Abstract Details

Activity Number: 533 - SLDS CPapers NEW 2
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329756 Presentation
Title: Big Data, Google, and Infectious Disease Prediction: a Statistical Perspective
Author(s): Shihao Yang* and S. C. Kou and Mauricio Santillana
Companies: and Harvard University and Harvard University
Keywords: digital disease detection; spatial-temporal modeling; lasso

Big data generated from the internet have great potential in tracking and predicting massive social activities, in particular infectious diseases, whose accurate real-time prediction could help public health officials make timely decisions to save lives.

We introduce a model ARGO (AutoRegression with GOogle search data / AutoRegression with General Online data) that has successfully utilized publicly available Google search data, with/without cloud-based electronic health records, to estimate current and near-future influenza-like illness activity level and/or dengue fever activity level for United States and five other countries around the globe.

Our regularized multivariate regression model dynamically selects the most appropriate variables for prediction every week, and significantly outperforms all previous internet-based tracking models, including Google Flu Trends and Google Dengue Trends.

We further extend the model to multiple geographical resolution, tracking infectious disease not only at national level but also at regional level, with spatial-temporal information pooling, making it flexible, self-correcting, robust and scalable.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program