Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 52 - Contrastive Dimension Reduction: Exploring Differential Patterns in High-Dimensional Data
Type: Topic Contributed
Date/Time: Sunday, August 7, 2022 : 4:00 PM to 5:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322387
Title: Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis
Author(s): Philippe Boileau* and Nima S Hejazi and Sandrine Dudoit
Companies: University of California, Berkeley and Weill Cornell Medicine and University of California, Berkeley
Keywords: dimension reduction; sparsity; high-dimensional statistics; exploratory data analysis; unwanted variation
Abstract:

Statistical analyses of high-throughput sequencing data have reshaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances. However, no procedure currently satisfies the dual objectives of recovering interpretable and relevant features simultaneously. Inspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis that extracts sparse and relevant biological signal. Indeed, this novel method was found to produce more informative and interpretable embeddings than linear (e.g. PCA, contrastive PCA, sparse PCA) and non-linear dimensionality reduction methods (e.g. UMAP, t-SNE) commonly used to explore high-dimensional biological data. We demonstrate this through the re-analysis of publicly available protein expression, microarray gene expression, and single-cell transcriptome sequencing datasets.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program