Online Program

Return to main conference page
Saturday, May 19
Data Science
Time-based Models
Sat, May 19, 8:30 AM - 10:00 AM
Lake Fairfax B

Detection of Excessive Activities in Time Series of Graphs Using Scan Statistics (304576)

*Suchismita Goswami, Computational Data Science, George Mason University 
Edward J. Wegman, Computational Data Science, George Mason University 

Keywords: Time Series, Dynamic Network, Scan Statistics, Inter-Organizational emails

Considerable emphasis has recently been given to detect excessive activities in social networks, particularly in dynamic networks obtained from inter-organizational emails. Anomaly occurs as a result of sudden increase in email interaction between groups of senders. Here we employ temporal scan statistics, defined as a maximum of local statistics estimated from the local region of the data, to detect anomalies in the time series of graphs constructed from organizational e-mails based on Poisson process and Binomial models. We implement the likelihood ratio as a test statistic and use graph invariant, such as betweenness as a locality statistic. Initially, we apply scan statistics for locating a primary and secondary cluster using temporal count data. A primary cluster, corresponding to a log likelihood ratio of 51.3 estimated using Poisson model, and a p-value of 0.001 obtained from the Monte Carlo simulation, has been identified. Temporal networks are obtained based on the 28 time steps around the observed primary cluster, and then the vertex with the maximum betweenness is chosen for neighbors, k = 1, 2, and 3. The analysis of dynamic network using binomial model clearly indicates significantly excessive communications, which coincide with the observed primary cluster obtained from the temporal analysis of count data. The methodologies developed here can be applied to other dynamic networks to predict excessive activities. In this presentation, we will discuss the theory of scan statistics and the procedure in detail.