Name: 2018 Joint Statistical Meetings
Start: 2018-07-28T07:00:00+00:00
End: 2018-08-02
Location: Vancouver Convention Centre

Abstract Details

Activity Number:	133 - Gene-Set Based Analysis in Genomic Studies
Type:	Contributed
Date/Time:	Monday, July 30, 2018 : 8:30 AM to 10:20 AM
Sponsor:	Section on Statistics in Genomics and Genetics
Abstract #329193
Title:	Building a Genomic Signature via Transfer Learning on Both Labelled and Unlabelled High-Dimensional Data: a Case Study in Predicting Prostate Cancer Metastasis
Author(s):	Yang Liu* and Hossein Sharifi-Noghabi and Nicholas Erho and Raunak Shrestha and Mohammed Alshalalfa and Elai Davicioni and Colin Collins and Martin Ester
Companies:	GenomeDx Biosciences and Simon Fraser University and GenomeDX Biosciences and Vancouver Prostate Centre and GenomeDX Biosciences and GenomeDX Biosciences and Vancouver Prostate Centre and Simon Fraser University
Keywords:	transfer learning; high dimensional data; deep learning; auto encoder; high throughput gene expression data; elastic net
Abstract:	A common challenge in developing models (signatures) with the high dimensional, gene expression data is the limited number of labelled samples (n< < p). Recently, the clinical use of high throughput genomic tests, such as Decipher, produces a large cohorts of genomic data with no clinical outcome available (unlabelled data). In this study, we develop a transfer learning approach based on denoising auto-encoders (DAE, a type of neural network) to bridge the gap between the labelled and unlabelled data. The initial layers of the DAE trained on the unlabelled data are transferred to the labelled data, which further trains the deeper layers. The resulting deep neural network determines the features for a penalized logistic regression to predict the clinical outcome. We applied this approach to predict prostate cancer metastasis, which yielded a model that outperforms all state-of-the-art genomic signatures. This performance increase of our model was observed in five independent validation cohorts when assessing the models for prediction accuracy and survival analysis. Our approach also captured additional information over well-established clinical factors and other genomic signatures.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program

JSM 2018 Online Program

Abstract Details

American Statistical Association