Impact of data analysis algorithm choice on RNA-seq gene expression estimation and downstream gene-based prediction
*May D. Wang, Emory-Georgia Tech Cancer Nanotechnology Center Keywords: As RNA-seq technologies mature, the choice of data analysis has become a critical challenge in clinical application. The FDA-led Sequencing Quality Control (SEQC) Consortium has conducted a comprehensive investigation of 278 representative RNA-seq data analysis pipelines to determine the impact of algorithms on many aspects of gene expression output and summaries such as reproducible expression estimation in comparison to qPCR reference data, repeatable expression estimation for technical replicates, detection of low-expressing genes, detection of differentially expressed genes, and RNA-seq-based predictive models in clinical settings. Results reveal that the gene expression quality and the downstream prediction vary significantly with pipeline components such as mapping, quantification, and normalization. This study established a general guideline for selecting safe RNA-seq data analysis pipelines to assist clinicians or bioinformaticians in achieving improved biological utility, reproducibility, repeatability, and effectiveness in decision making.
|
Key Dates
-
April 30 - May 22, 2013
Invited Abstract Submission Open -
June 4, 2013
Online Registration Opens -
August 9 - August 23, 2013
Invited Abstract Editing -
August 23, 2013
Short Course materials due from Instructors -
August 26, 2013
Housing Deadline -
September 9, 2013
Cancellation Deadline and Registration Closes @ 11:59 pm EDT -
September 16 - September 18, 2013
Marriott Wardman Park, Washington, DC