Online Program Home
My Program

Abstract Details

Activity Number: 356
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
Sponsor: Biometrics Section
Abstract #319101
Title: Excess False Positives in Negative-Binomial-Based Analysis of Data from RNA-Seq Experiments
Author(s): David Rocke* and Yilun Zhang
Companies: University of California at Davis and University of California at Davis
Keywords: RNA-Seq ; Gene Expression ; edgeR ; DESeq2 ; limma ; Negative Binomial

RNA-Seq data are increasingly used for whole-genome differential mRNA expression analysis in lieu of gene expression arrays such as those from Affymetrix and Illumina. Because the raw data in RNA-Seq consist of counts of fragments mapping to each gene or exon, and because the counts are over-dispersed, it is common to model the distribution as negative binomial. Yet empirically methods based on the negative binomial generate often massively inflated false positives whether real data are used or simulated negative binomial data. This appears to be a consequence of the fact that the negative binomial with unknown scale is not an exponential family distribution, and that as a quasi-likelihood the link function, and thus the natural parameter, are functions of the scale parameter. Consequently also, a linear model with negative binomial quasi-likelihood is not a proper generalized linear model unless the scale is known. We demonstrate that, even when the data are truly negative binomial, it is better to use transformation or weighting followed by standard linear models than it is to fit a version of a generalized linear model with estimated scale.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association