Abstract:
|
Exploratory data analysis is critical for discovering hidden insights. The rapid-growing data size and complexity make it challenging to generate in-depth, novel and interpretable insights. In this paper, we propose a systematic framework to perform in-depth data exploratory and detect novel insights from large scale time series data. First, the proposed method uses robust Principal Component Analysis based approach to separate normal variation versus the novel, anomaly patterns. This approach resolve the definition of novel patterns through a mathematical formulation and provides provable theoretical guidance. In addition, we perform a bi-clustering on the extracted novel variation components with sparse regularization. The number of detected insights are data-driven with a tuning knob can be selected by human analysts. The sparsity constraints on rows and columns also help to clearly isolate the novel patterns, making it easy to generate contextual information in interpretation module. Applications on real data demonstrate the effectiveness of our method in distilling insights.
|