Abstract:
|
We propose a new two-stage procedure for detecting multiple outliers when the dimension of the data is much larger than available sample size. In the first stage, the observations are split into two sets, one containing surely non-outliers and the other with the rest, which are candidate outliers. In the second stage, a series of hypothesis tests are carried out to test the abnormality of each candidate outlier. A nonparametric test based on uniform random rotations in Stiefel manifolds is proposed for the hypothesis testing. The power of the proposed test is studied under a high dimensional asymptotic framework and its finite-sample exactness is established under mild conditions. Empirical studies based on simulated examples and face recognition data suggest that the proposed approach is superior to existing methods, especially with respect to false identification of outliers.
|