JSM 2014 Home
Online Program Home
My Program

Abstract Details

Activity Number: 315
Type: Invited
Date/Time: Tuesday, August 5, 2014 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #310646 View Presentation
Title: Randomized Approximation of Principal Components Analysis for Large Data Sets
Author(s): Daniel J. McDonald*+ and Darren Homrighausen
Companies: Indiana University and Colorado State University
Keywords: Nystrom extension ; Singular value decomposition ; Grassmanian manifolds ; Random projection ; Big data
Abstract:

In this talk, we analyze an approximate method for undertaking a principal components analysis (PCA) on large data sets. PCA is a classical dimension reduction method which involves the projection of the data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. This projection can be used either for exploratory purposes or as an input for further analysis, e.g. regression. If the data have trillions of entries, or more, the computational and space requirements for saving and manipulating the design matrix in fast memory is prohibitive. Recently, the Nystrom and column-sampling methods for the randomized approximation of the singular value decomposition of large matrices have appeared in the numerical linear algebra community. We compare both the computational demands and the distances between the subspaces generated by these methods and the subspace generated by PCA with theory, simulations, and a real data example involving a large corpus of text data. Additionally, we propose a new sampling method that improves the approximations when principal components regression is the goal.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2014 program




2014 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Professional Development program, please contact the Education Department.

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

ASA Meetings Department  •  732 North Washington Street, Alexandria, VA 22314  •  (703) 684-1221  •  meetings@amstat.org
Copyright © American Statistical Association.