Online Program Home
My Program

Abstract Details

Activity Number: 366 - SPEED: Recent Advances in Statistical Genomics and Genetics
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 11:15 AM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #332539
Title: Penalized Latent Dirichlet Allocation Model in Single Cell RNA Sequencing
Author(s): Xiaotian Wu* and Zhijin Wu and Hao Wu
Companies: Brown University and Brown University and Emory University
Keywords: single cell RNA sequencing; topic models; Latent Dirichlet Allocation; penalization

Single cell RNA sequencing (scRNA-seq) is a recently developed technology that allows quantification of RNA transcripts at individual cell level, providing cellular level resolution of gene expression variation. The scRNA-seq data are counts of RNA transcripts of all genes in species' genome. We adapt the Latent Dirichlet Allocation (LDA), a generative probabilistic model originated in natural language processing (NLP), to model the scRNA-seq data by considering genes as words and cells as documents, and latent biological functions as topics. In LDA, each documents is considered as the result of words generated from a mixture of topics, each with a different word usage frequency profile. We propose a penalized version of LDA to reflect the structure in scRNAseq, that only a small subset of genes are expected to be topicspecific. We apply the penalized LDA to two scRNA-seq data sets to illustrate the usefulness of the model. Using inferred topic frequency instead of word frequency substantially improves the accuracy in cell type classification.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program