Abstract:
|
We study the problem of high-dimensional Principal Component Analysis (PCA) with missing observations. Our main contribution is a new method, which we call primePCA, that is designed to cope with situations where observations may be missing in a heterogeneous manner. Given a good initialiser, primePCA iteratively projects the observed entries of the data matrix onto the column space of our current estimate to impute the missing entries, and then updates our estimate by computing the leading right singular space of the imputed data matrix. When the true principal components satisfy an incoherence condition and the signal is not too small, the error of primePCA provably converges to zero at a geometric rate. An important feature of our theoretical guarantees is that they depend on average, as opposed to worst-case, properties of the missingness mechanism. Our numerical studies on both simulated and real data reveal that primePCA exhibits very encouraging performance across a wide range of scenarios, including settings where the data are not Missing Completely At Random.
|