Name: 2018 Joint Statistical Meetings
Start: 2018-07-28T07:00:00+00:00
End: 2018-08-02
Location: Vancouver Convention Centre

Abstract Details

Activity Number:	29 - SPEED: An Ensemble of Advances in Genomics and Genetics
Type:	Contributed
Date/Time:	Sunday, July 29, 2018 : 2:00 PM to 3:50 PM
Sponsor:	Section on Statistics in Genomics and Genetics
Abstract #330102	Presentation
Title:	Gene Expression-Based Classification of Cancer Tumours via Penalized Probabilistic Principal Components Analysis
Author(s):	Wei Deng* and Radu V Craiu
Companies:	University of Toronto and University of Toronto
Keywords:	Classification; Gene expression; Probabilistic Principal Component Analysis; effective dimension; clustering
Abstract:	Probabilistic Principal Component Analysis is frequently used on noisy data for pre-processing. Though the number of principal components (PCs) provides insight into the complexity of sample dependence, cluster assignments based on PCs do not always perform well as noise in the data can weaken the degree of clusters separation. We previously proposed a penalized profile log-likelihood criterion to select the effective dimension of high-dimensional data. Here we take advantage of the learned representation and propose to train classification models in the projection space. We illustrate via simulations that this approach requires less training data, leads to faster computation for multiple classification algorithms. The proposed method was used on NCI 60 cell-line data to classify tumor types. On 30% and 50% training samples, we recorded 85% and 94% prediction accuracy using svm. In contrast, classification based on original data yielded 79% and 92% accuracy, on 30% and 50% training samples, respectively. Our approach is able to leverage the molecular variations for tens of thousands of genes simultaneously to produce accurate tumor classifications quickly.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program

JSM 2018 Online Program

Abstract Details

American Statistical Association