Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 306 - SPEED: SPAAC SESSION II
Type: Topic-Contributed
Date/Time: Wednesday, August 11, 2021 : 3:30 PM to 5:20 PM
Sponsor: Biometrics Section
Abstract #317854
Title: A Sparse Negative Binomial Mixture Model for Clustering RNA-Seq Count Data
Author(s): YUJIA LI* and Tanbin Rahman and Tianzhou Ma and Lu Tang and George Tseng
Companies: Department of Biostatistics, University of Pittsburgh and Department of Biostatistics, MD Anderson Cancer Center and Department of Epidemiology and Biostatistics, University of Maryland and University of Pittsburgh and Department of Biostatistics, University of Pittsburgh
Keywords: cluster analysis; Gaussian mixture model; sparse K-means; feature selection
Abstract:

Clustering with variable selection is a challenging yet critical task for modern small-n-large-p data. Existing methods based on sparse Gaussian mixture models or sparse K-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with a Gaussian assumption. In this paper, we develop a negative binomial mixture model with lasso or fused lasso gene regularization to cluster samples (small n) with high-dimensional gene features (large p). A modfi ed EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with existing methods using extensive simulations and two real transcriptomic applications in rat brain and breast cancer studies. The result shows superior performance of the proposed count data model in clustering accuracy, feature selection and biological interpretation in pathways.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program