Abstract:
|
In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for high-dimensional and non-Euclidean data sequences, which are common in this big data era. However, this graph-based framework encounters problems when there are repeated observations in the sequence, which often happens for discrete data, such as for some network data. In this work, we extend the graph-based framework to solve this problem. We consider both the single change-point and the changed interval alternatives, and derive analytic formulas to control the type I error of the extended methods, making them fast applicable to large data sets.
|