Abstract:
|
Early identification of clusters of persons with tuberculosis (TB) that are likely to grow creates opportunities to prevent further spread of TB. We applied machine learning algorithms to U.S. surveillance data to predict which clusters of genotype-matched TB cases are likely to have excess growth within a 1-year follow-up period, as defined by a negative binomial hurdle model. These algorithms included tree-based ensembles, support vector machine, and regularized regression. The Youden index was used to select the best model, which was generalizable to a validation dataset. Feature importance and accumulated local effects plots indicated that characteristics of clusters were more important than the social, demographic, and clinical characteristics of the patients in those clusters. The most important predictor was the time between cases before unexpected cluster growth was identified, with less time increasing the prediction score for excess growth. These results add to existing tools to help prioritize clusters for public health interventions. Consideration of an entire cluster, not just individual patients, may assist in interrupting ongoing transmission.
|