Abstract:
|
We present an approach to clustering time series data using a model-based generalization of the K-means algorithm. We start with an AR(p) clustering example and show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria. We then build our clustering algorithm up for clustering ARMA(p,q) models and extend this to ARIMA(p,d,q) models. We prove convergence of the algorithm and we discuss model appropriateness for the fitted clusters using a generalization of the Ljung-Box test. We perform experiments with simulated data to show how the algorithm can be used for outlier detection, detecting distributional drift, and discuss the impact of initialization method on empty clusters.
|