Online Program Home
  My Program

Abstract Details

Activity Number: 68 - Government Health Statistics
Type: Contributed
Date/Time: Sunday, July 30, 2017 : 4:00 PM to 5:50 PM
Sponsor: Government Statistics Section
Abstract #323842
Title: Latent Dirichlet Allocation Topic Models Applied to the Centers for Disease Control and Prevention's Grant Portfolio
Author(s): Matthew Eblen* and Robin Wagner
Companies: Centers for Disease Control and Prevention and Centers for Disease Control and Prevention
Keywords: Public Health ; Natural Language Processing ; Machine Learning ; Topic Models ; Latent Dirichlet Allocation
Abstract:

In fiscal year 2015, the Centers for Disease Control and Prevention (CDC) administered over $5 billion in grants to institutions across the United States and the world. Each grant was administered by one of CDC's 13 Centers, Institutes or Offices (CIOs), each of which has responsibility for different areas of public health. The scope and content of those grants varied widely. This paper explores the use of natural language processing and machine learning to uncover common themes, or topics, in the content of those grants. Specifically, a Latent Dirchlet Allocation (LDA) topic model was applied to a corpus of CDC grant abstracts, resulting in topical word clusters that categorized CDC's recent investments in public health. The LDA topic estimates for grants aligned well with the subject areas of their respective CIOs, but also demonstrated that there were significant areas of overlap and mutual interest, as multiple CIOs administered grants with similar topical content. Trends in funding for the different topics were also examined, resulting in identification of public health areas that have been receiving increased attention and investment over time.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association