Abstract:
|
Determining the number of components describing most of the variations of the data is one of the fundamental issues in the principal component analysis (PCA). In the (generalized) spiked model, the eigenvalues of a population covariance matrix consist of two different types of elements, a set of a small fixed number of spikes and the rest of non-spikes. The spikes and non-spikes can be considered as the eigenvalues corresponding to the directions that explain the most of the variations of the data and the directions from the noise, respectively. In the paper, we propose a methodology for detecting the number of sample eigenvalues corresponding to the spikes based on the asymptotic behavior of the sample eigenvalues from the generalized spiked model when the dimension and sample sizes both grow to infinity such that their ratio converges to a positive constant. We allow the sources of noise can be more than one and we also consider the case where the noise comes from some special distributions.
|