Online Program

Return to main conference page
Saturday, May 19
Applications
Scientific and Financial Modeling
Sat, May 19, 10:30 AM - 12:00 PM
Lake Fairfax A
 

Anomaly Detection in News Articles for Biosurveillance (304617)

Lauren Charles, Pacific Northwest National Laboratory 
*Karl Pazdernik, Pacific Northwest National Laboratory 

Keywords: Anomaly, biosurveillance, cosine distance, natural language processing, semi-supervised learning, term frequency-inverse document frequency

Disease can cripple a population, particularly when possessing severe characteristics, such as being highly contagious or deadly. Given today’s high rate of international traffic, maintaining a healthy and safe ecosystem requires a global approach to monitoring biological threats. Bio-surveillance analysts have long monitored the state of world health through news articles on the internet, however, the vast corpus of available documents make a thorough review of each costly and often impossible. Recent research has focused on the development of automated systems to retrieve documents and even identify active public health events, however, this work is not designed to detect anomalous behavior that often exists within an event timeline. To best monitor biosurveillance, identification of anomalous behavior, such as the geographic spread of disease or a change in disease transmission, is also necessary. Thus, in this work, we present natural language processing algorithms that couple term frequency-inverse document frequency (tf-idf), cosine distance, and a variety of unsupervised and semi-supervised techniques to detect anomalous behavior in online health-related documents in a streaming fashion. We illustrate the effects of structuring document data in varying dimensions and describe how the algorithm can be tuned to balance the rate of false positive and false negative anomaly detection.