Activity Number:
|
268
- Replicability and the Narrative of Scientific Research
|
Type:
|
Invited
|
Date/Time:
|
Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
|
Sponsor:
|
American Association for the Advancement of Science
|
Abstract #316860
|
|
Title:
|
Replicability and Missing Data in Deep Learning and Clinical Prediction
|
Author(s):
|
Naim Rashid* and David Lim and Joseph G Ibrahim
|
Companies:
|
University of North Carolina At Chapel Hill, Dept of Biostatistics and University of North Carolina At Chapel Hill, Dept of Biostatistics and UNC
|
Keywords:
|
Replicability;
Deep Learning;
Missing Data;
RNA-seq;
EHR data;
Clinical Prediction
|
Abstract:
|
The replicability of statistical algorithms for clinical decision-making has been of significant concern in biomedical and translational research, where multiple factors may limit the generalizability of models trained on individual studies. In the first part of this talk, we describe recent work in high dimensional data integration and meta-learning with respect to both supervised (clinical prediction) and unsupervised learning (cluster discovery). Applications to cancer subtype discovery and prediction will be discussed. In the second part of this talk we will discuss the issue of missing data in deep learning neural network and its impact on model generalizability. We introduce new methodology for principled handling of MCAR, MAR, and MNAR patterns of missingness in feed forward neural networks to improve the performance of regression and classification tasks in the presence of missing data. We show that our methodology avoids manual selection of features to model the missingness mechanism, and can flexibly handle multiple patterns of missingness across features in high dimensional data. We demonstrate the performance of our approach in simulated and real EHR datasets.
|
Authors who are presenting talks have a * after their name.