Abstract:
|
Selecting the number of components in PCA and factor analysis is a key problem facing practitioners. One of the most popular methods is a permutation approach that randomly scrambles the elements of each feature. It selects the components whose singular values are large compared to the permuted data. This method (also known as parallel analysis) is recommended in many textbooks and review papers, and used in genomics by leading applied statisticians including T Hastie, M Stephens, J Storey, R Tibshirani and WH Wong. However, it is poorly understood. In this talk, we develop a theoretical understanding and propose improvements.
|