Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 181 - Statistical Methods in Gene Expression Data Analysis II
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #313851
Title: A Robust Distance Metric for Single Cell RNA-Sequencing Based on a Multinomial Model
Author(s): Nathan Dyjack* and Andrew Papanastasiou and Luca Pinello and Catalina Vallejos and Daniel Baker and Ben Langmead and Vladimir Braverman and Stephanie C Hicks
Companies: Johns Hopkins Bloomberg School of Public Health, Department of Biostatistics and The University of Edinburgh, MRC Human Genetics Unit and Massachusetts General Hospital / Harvard Medical School and The University of Edinburgh MRC, Human Genetics Unit and Johns Hopkins University Department of Computer Science and Johns Hopkins University Department of Computer Science and Johns Hopkins University Department of Computer Science and Johns Hopkins Bloomberg School of Public Health
Keywords: single-cell; RNA-sequencing; distance; multinomial
Abstract:

Recent advances in technologies have enabled genome-wide profiling of transcripts from not only bulk samples, but also individual cells. Previous work demonstrated (Witten et al., 2011) that applying standard distance metrics (such as Euclidean distance, which assumes the data follow a Gaussian distribution) to bulk RNA-sequencing (RNA-seq) data can be improved using a distance metric that directly modeled (with a Poisson distribution) the nonnegative counts in bulk RNA-seq data. In contrast to bulk RNA-seq, single-cell RNA-sequencing (scRNA-seq) data is more sparse (higher fraction of observed zeros where a zero refers to no unique molecular identifiers (UMIs) mapping to a given gene in a cell), and has recently been shown to exhibit technical variability that is better modeled with a multinomial distribution. Here, we present a novel distance metric derived from a likelihood ratio test based on a multinomial model of the UMI gene counts of scRNA-seq data. We provide evidence that our metric is computationally competitive with current best practice, while substantially improving downstream clustering assessment using both simulated and in vivo ground-truth datasets.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program