Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 363 - Contributed Poster Presentations: IMS
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: IMS
Abstract #314422
Title: Selecting meaningful principal components in heterogeneous data using signflips
Author(s): David Hong* and Yue Sheng and Edgar Dobriban
Companies: University of Pennsylvania and University of Pennsylvania and University of Pennsylvania
Keywords:
Abstract:

Principal component analysis is a ubiquitous method for discovering latent factors in data. Using it presents an important but challenging task: identify which components capture signal in the data, rather than noise. Parallel analysis via permutations is a popular approach with widespread use, empirical support, and recent work on its theoretical foundation using random matrix theory. In this approach, random permutations of the data provide a sort of null distribution for pure-noise eigenvalues; data eigenvalues greater than their “null”, i.e., noise, counterparts get selected. When the noise is heterogeneous, however, permutations can destroy the structure, significantly harming performance. This work proposes a new variant based on random signflips that addresses this shortcoming. Building on recent random matrix theoretic justifications for parallel analysis, we show that parallel analysis via signflips consistently selects perceptible components in certain high-dimensional and heterogeneous factor models; small signal components that do not separate from the noise are imperceptible and are not selected. Finally, we illustrate an application to single cell RNA sequencing data.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program