Abstract:
|
A common challenge in developing models (signatures) with the high dimensional, gene expression data is the limited number of labelled samples (n< < p). Recently, the clinical use of high throughput genomic tests, such as Decipher, produces a large cohorts of genomic data with no clinical outcome available (unlabelled data). In this study, we develop a transfer learning approach based on denoising auto-encoders (DAE, a type of neural network) to bridge the gap between the labelled and unlabelled data. The initial layers of the DAE trained on the unlabelled data are transferred to the labelled data, which further trains the deeper layers. The resulting deep neural network determines the features for a penalized logistic regression to predict the clinical outcome. We applied this approach to predict prostate cancer metastasis, which yielded a model that outperforms all state-of-the-art genomic signatures. This performance increase of our model was observed in five independent validation cohorts when assessing the models for prediction accuracy and survival analysis. Our approach also captured additional information over well-established clinical factors and other genomic signatures.
|