Abstract:
|
Spectral embedding has been widely used in statistics and machine learning. To provide fundamental understanding of its statistical properties, we develop an $\ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $\ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $\ell_p$ norm. For sub-Gaussian mixture models, the choice of $p$ in the theoretical analysis depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $\ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. This provides optimal recovery results for the stochastic block model and Gaussian mixture model as special cases.
|