Online Program

Return to main conference page
Thursday, May 17
Applications
CyberLanguage: Applications of Natural Language Processing to CyberSecurity
Thu, May 17, 3:30 PM - 5:00 PM
Lake Fairfax A
 

Time Series Pattern Mining and Visualization Using Statistical Language Processing Techniques (304734)

*Jessica Lin, George Mason University 

Massive amounts of data are generated daily at a rapid rate. As a result, the world is faced with unprecedented challenges and opportunities on managing the ever-growing data, and much of the world's supply of data is in the form of time series. Time series data mining has thus attracted an enormous amount of attention in the past two decades. In recent work, we showed that there is tremendous promise in importing ideas, representations, and algorithms from statistical language processing (SLP) into knowledge discovery in time series. Specifically, fast algorithms for learning context-free grammars can expose hierarchical structure in time series that will enable efficient discovery of variable-length patterns and facilitate human understanding of time series structure. We proposed several algorithms based on grammar for efficient discovery of co-existing variable-length frequently occurring patterns (motifs) and rare patterns (anomalies) without any prior knowledge about their length, shape, or minimal occurrence frequency. We present GrammarViz, an interactive tool for grammar-driven mining and visualization of variable-length time series patterns.