Activity Number:
|
593
- Computationally Intensive and Machine Learning Methods
|
Type:
|
Contributed
|
Date/Time:
|
Wednesday, August 1, 2018 : 2:00 PM to 3:50 PM
|
Sponsor:
|
Section on Statistical Computing
|
Abstract #330168
|
Presentation
|
Title:
|
Robust Outlier Detection for Low and High-Dimensional Neuroimaging Data with Principal Components Analysis and Split-Half Resampling
|
Author(s):
|
Derek Beaton* and Kelly M Sunderland and Abiramy Uthirakumaran and Stephen R Arnott and Robert Bartha and Sandra E Black and Leanne Casaubon and Morris Freedman and Richard H Swartz and Sean Symons and ONDRI Investigators and Malcolm A Binns and Stephen C Strother
|
Companies:
|
Baycrest Health Sciences and Baycrest Health Sciences and Baycrest Health Sciences and Baycrest Health Sciences and Robarts Research and Sunnybrook Health Sciences Centre and Krembil Research Institute and Baycrest Health Sciences and Sunnybrook Health Sciences Centre and Sunnybrook Health Sciences Centre and ONDRI and Baycrest Health Sciences and Baycrest Health Sciences
|
Keywords:
|
principal components analysis;
split-half;
resampling;
outliers;
Mahalanobis distance;
neurodegenerative diseases
|
Abstract:
|
The Ontario Neurodegenerative Disease Research Initiative (ONDRI) has collected hundreds of thousands of variables per participant across many data modalities. Such large and complex data across heterogeneous cohorts could be highly susceptible to outliers. To ensure high quality data and inference we need methods that can identify outliers and make robust estimates. Few such methods exist. We propose a novel framework based on principal components analysis (PCA) and split-half resampling (SHR) to identify outliers via Mahalanobis distance (MD) and robust subspaces. PCA+SHR allows us to: (1) make predictive estimates of MD in rank-deficient or highly collinear data in order to find outliers, and (2) identify a robust subspace through reproducibility estimates. We applied PCA+SHR to resting state functional neuroimaging data in one cohort of ONDRI participants (I=109 << J=32,768). Compared to existing techniques, PCA+SHR has many advantages: it provides multiple distance metrics with distributions to assess outlierness (e.g., average estimates or stability estimates), through SHR it provides a robust subspace, and it is a general purpose tool for outlier detection and robust subspaces because it can be used on data of any dimensionality. Code with examples available: https://www.github.com/derekbeaton/ours.
|
Authors who are presenting talks have a * after their name.