Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 179 - Statistical Methods in Single-Cell Transcriptomics
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #313894
Title: A Joint Deep Learning Model for Clustering and Denoising of ScRNA-Seq Data with Batch Effect Removal
Author(s): Justin Lakkis* and David Wang and Xiangjie Li and Kui Wang and Gang Hu and Lyle Ungar and Mingyao Li
Companies: Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania and Graduate Group in Genomics and Computational Biology, University of Pennsylvania and Fuwai Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College and Department of Informatics Theory and Data Science, Nankai University and School of Statistics and Data Science, Nankai University and University of Pennsylvania and University of Pennsylvania
Keywords: Deep Learning; Genomics; Clustering; Machine Learning; scRNA-seq; Batch Effect
Abstract:

Recent development of single-cell RNA-seq (scRNA-seq) technologies has led to enormous biological discoveries. An important step in scRNA-seq analysis is clustering. However, popular methods for scRNA-seq clustering are susceptible to batch effects, which can introduce serious bias into downstream analyses. Other methods are capable of removing batch effects, but leave clustering as a separate step that occurs downstream of batch effect removal. To overcome these limitations, we present CarDEC (Count adapted regularization for Deep Embedded Clustering), a joint deep learning model that can simultaneously cluster and denoise scRNA-seq data, while correcting for batch effects in both tasks. To our knowledge, almost no methods in the literature can accomplish both tasks while removing batch effects in a single model. We show that CarDEC achieves higher clustering accuracy than other methods which do not jointly cluster and denoise the data by benchmarking our method on a macaque retina dataset of over 20,000 cells that exhibits multiple layers of complex batch effects. We also show that both our clusters and our denoised expression counts are free of batch effect.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program