RNA-Seq versus Microarray Predictive Modeling: Lessons Learned from the Sequencing Quality Control (SEQC) Consortium Neuroblastoma Study
*Russ Wolfinger, SAS Institute Inc 

Keywords: Predictive Modeling, RNA-Seq, Microarray, Next-Gen Sequencing

To systematically evaluate the capability of RNA deep-sequencing (RNA-Seq)-based classification for clinical endpoint prediction, we generated gene expression profiles from 498 primary neuroblastomas using RNA-Seq and microarrays. Discontinuous mapping of 30.8 billion reads revealed expression of >50,000 genes, >220,000 transcripts and >510,000 exon junctions, >250,000 of which were newly discovered. The transcribed part of the genome covered 316 Mbp, including 39,052 novel exons in regions previously considered to be untranscribed. The neuroblastoma cohort was randomly divided into training and validation sets, and 360 predictive models on six clinical endpoints were generated and evaluated. While prediction performances did not differ considerably between technical platforms, data processing pipelines, and feature levels (i.e., gene, transcript, and exon junction levels), RNA-Seq models based on the AceView database performed best on most endpoints. Collectively, our study reveals an unprecedented complexity of the neuroblastoma transcriptome, and provides guidelines for the development of gene expression-based predictive classifiers using high-throughput technologies.