Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 341 - Contributed Poster Presentations: Section on Statistical Learning and Data Science
Type: Contributed
Date/Time: Tuesday, August 9, 2022 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322056
Title: Applying Machine Learning to National Surveillance Data to Predict Excess Growth in Clusters of Tuberculosis Cases
Author(s): Kathryn Winglee* and Sandy Althomsons and Charles M Heilig and Sarah Talarico and Benjamin Silk and Jonathan Wortham and Andrew Hill and Thomas Navin
Companies: Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention
Keywords: tuberculosis; machine learning; epidemiology; transmission
Abstract:

Early identification of clusters of persons with tuberculosis (TB) that are likely to grow creates opportunities to prevent further spread of TB. We applied machine learning algorithms to U.S. surveillance data to predict which clusters of genotype-matched TB cases are likely to have excess growth within a 1-year follow-up period, as defined by a negative binomial hurdle model. These algorithms included tree-based ensembles, support vector machine, and regularized regression. The Youden index was used to select the best model, which was generalizable to a validation dataset. Feature importance and accumulated local effects plots indicated that characteristics of clusters were more important than the social, demographic, and clinical characteristics of the patients in those clusters. The most important predictor was the time between cases before unexpected cluster growth was identified, with less time increasing the prediction score for excess growth. These results add to existing tools to help prioritize clusters for public health interventions. Consideration of an entire cluster, not just individual patients, may assist in interrupting ongoing transmission.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program