Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 244 - Statistical methods for microbiome data analysis and beyond
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #318550
Title: Separating and Re-Integrating Latent Variables for Improved Classification of Large-Scale Genomic Data
Author(s): Yujia Pan* and Johann Gagnon-Bartsch
Companies: University of Michigan and University of Michigan
Keywords: gene expression; linear discriminant analysis; classification; prediction
Abstract:

Genomic datasets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes), giving rise to dense latent variation, which presents both challenges and opportunities for classification. While some of these latent variables may be partially correlated with the phenotype of interest and thus helpful, others may be uncorrelated and merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. To address these challenges, we propose the cross-residualization classifier (CRC), a linear discriminant-based ensemble classifier that accounts for latent variables without discarding any potentially predictive information. We apply the method to simulated data and a variety of genomic datasets from multiple platforms. In general, we find that the CRC performs well relative to existing classifiers and sometimes offers substantial gains.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program