Abstract:
|
In fiscal year 2015, the Centers for Disease Control and Prevention (CDC) administered over $5 billion in grants to institutions across the United States and the world. Each grant was administered by one of CDC's 13 Centers, Institutes or Offices (CIOs), each of which has responsibility for different areas of public health. The scope and content of those grants varied widely. This paper explores the use of natural language processing and machine learning to uncover common themes, or topics, in the content of those grants. Specifically, a Latent Dirchlet Allocation (LDA) topic model was applied to a corpus of CDC grant abstracts, resulting in topical word clusters that categorized CDC's recent investments in public health. The LDA topic estimates for grants aligned well with the subject areas of their respective CIOs, but also demonstrated that there were significant areas of overlap and mutual interest, as multiple CIOs administered grants with similar topical content. Trends in funding for the different topics were also examined, resulting in identification of public health areas that have been receiving increased attention and investment over time.
|