Online Program Home
My Program

Abstract Details

Activity Number: 663 - Regression, Clustering and Gene Set Methods in Genomics
Type: Contributed
Date/Time: Thursday, August 1, 2019 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #305323
Title: FSCseq: Simultaneous Feature Selection and Clustering of RNA-Seq Data
Author(s): David Lim* and Naim U. Rashid and Joseph G Ibrahim
Companies: UNC Chapel Hill and University of North Carolina at Chapel Hill and UNC
Keywords: clustering; RNAseq; genomics; EM; feature selection; model-based

Clustering is a form of unsupervised learning that aims to uncover latent groups within data based on similarity across some features. A common application of this in biomedical research is in delineating novel cancer subtypes from patient gene expression data, given some informative genes. However, it is typically unknown a priori what genes may be informative in discriminating clusters, and what the optimal number of clusters is. Also, no method exists for unsupervised clustering of RNA-seq samples that can adjust for between-sample global normalization factors and potential confounding variables, while selecting cluster-discriminatory genes and clustering subjects. To address this, we propose Feature Selection and Clustering of RNAseq (FSCseq): a model-based clustering algorithm that uses a finite mixture of regression model and a quadratic penalty method with a SCAD penalty. Maximization is done by penalized EM, allowing us to include normalization factors and confounders in our model. Given the fitted model, our framework allows for subtype prediction in new patients via posterior probabilities of cluster membership. Based on our analyses, we show the utility of our method.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program