Online Program Home
My Program

Abstract Details

Activity Number: 133 - Gene-Set Based Analysis in Genomic Studies
Type: Contributed
Date/Time: Monday, July 30, 2018 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #328671 Presentation
Title: Evaluating Statistical Classifiers for Detecting C9orf72 Amyotrophic Lateral Sclerosis Patients Based on Whole Blood RNAseq Data
Author(s): Wenting Wang* and Guolin Zhao and Feng Gao and Tzu-Ying Liu and Ayla Ergun and Jessica Hurt
Companies: Biogen and Biogen and Biogen and University of Michigan and Biogen and Biogen
Keywords: C9orf72 mutation; ALS; class-imbalanced outcome; RNAseq data; simulation; high-dimensional data

A hexanucleotide repeat expansion in chromosome 9 open reading frame 72 (C9orf72) is one of the most common genetic causes of amyotrophic lateral sclerosis (ALS). Biomarkers based on whole blood RNAseq data that differentiate C9orf72 ALS subjects may help in selecting the right patients for ALS targeted therapy. We aimed to find an appropriate statistical classifier under two major challenges: high dimensionality of RNAseq data and class-imbalanced outcome due to the rarity of C9orf72 mutation in ALS.

We developed a simulation framework to examine the performance of three classifiers: penalized support vector machine, lasso logistic regression and random forest under different class balancing strategies. RNAseq libraries were simulated based on the RNAseq data of brain tissues from public domain and the RNAseq data of whole blood from an internal clinical trial with various imbalance ratios.

Simulation studies showed that balancing strategies improved the performance of the three classifiers differently, depending on the imbalance ratio and the separation of classes. In particular, penalized support vector machine with undersampling strategy performed the best for our problem.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program