Abstract:
|
Beginning in April 2020, the Delphi group has produced COVIDcast: a massive, continuously updated database of COVID-19 indicators across the United States. The data features numerous unique signals, such as behaviors reported by Facebook users in the largest public health survey ever conducted, or symptom indicators extracted from aggregated medical claims data. This data is available via a public API and is widely used by forecasters and researchers. In this talk, I'll discuss the statistical computing challenges this posed: the challenges of delivering timely, accurate, and detailed data on COVID-19 at a large scale, and of performing reliable statistical estimation on a daily basis. Developing fast, accurate, and reliable statistical software requires more than just the right algorithms; the statistician must learn to think like a software engineer, and to focus on delivering products, not simply analyses. I'll use examples of our own mistakes to illustrate how designing statistical products requires different skills from designing statistical analyses.
|