Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 351 - Variable Selection and Computationally Intensive Methods
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Computing
Abstract #313391
Title: A Scalable Approach to Estimating the Rank of High-Dimensional Data
Author(s): Wenlan Zang* and Jen-hwa Chu and Michael J Kane
Companies: Yale University School of Public Health and Yale University School of Medicine and Yale University School of Public Health
Keywords: rank estimation; SVD; random matrices; Bayesian change point; Marcenko-Pastur; high-dimensional
Abstract:

A key challenge to effective analyses of high-dimensional data is finding a low-dimensional, signal-rich subspace in the ambient space defined by the data. For linear subspaces, this is generally performed by decomposing the design matrix into orthogonal components, and then retaining those components with sufficient variation. The number of components retained is generally determined using ad-hoc approaches such as plotting the decreasing pattern of the eigenvalues and looking for the ”elbow” in the plot. While these approaches have been shown effective, a poorly calibrated heuristic or misjudgment in the case of choosing the elbow can result in an overabundance of noise or an underabundance of predictive information in the low-dimensional space. Here we propose a procedure to estimate the rank of a matrix thereby retaining components with variations greater than those of a random matrix, of which the eigenvalues follow a universal Marcenko-Pastur distribution. In addition, we also demonstrated the efficiency, scalability, and robustness of our novel dimension determination procedure in simulated and real data, and compared its performance to previous methods.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program