Online Program Home
My Program

Abstract Details

Activity Number: 593 - Computationally Intensive and Machine Learning Methods
Type: Contributed
Date/Time: Wednesday, August 1, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #330168 Presentation
Title: Robust Outlier Detection for Low and High-Dimensional Neuroimaging Data with Principal Components Analysis and Split-Half Resampling
Author(s): Derek Beaton* and Kelly M Sunderland and Abiramy Uthirakumaran and Stephen R Arnott and Robert Bartha and Sandra E Black and Leanne Casaubon and Morris Freedman and Richard H Swartz and Sean Symons and ONDRI Investigators and Malcolm A Binns and Stephen C Strother
Companies: Baycrest Health Sciences and Baycrest Health Sciences and Baycrest Health Sciences and Baycrest Health Sciences and Robarts Research and Sunnybrook Health Sciences Centre and Krembil Research Institute and Baycrest Health Sciences and Sunnybrook Health Sciences Centre and Sunnybrook Health Sciences Centre and ONDRI and Baycrest Health Sciences and Baycrest Health Sciences
Keywords: principal components analysis; split-half; resampling; outliers; Mahalanobis distance; neurodegenerative diseases

The Ontario Neurodegenerative Disease Research Initiative (ONDRI) has collected hundreds of thousands of variables per participant across many data modalities. Such large and complex data across heterogeneous cohorts could be highly susceptible to outliers. To ensure high quality data and inference we need methods that can identify outliers and make robust estimates. Few such methods exist. We propose a novel framework based on principal components analysis (PCA) and split-half resampling (SHR) to identify outliers via Mahalanobis distance (MD) and robust subspaces. PCA+SHR allows us to: (1) make predictive estimates of MD in rank-deficient or highly collinear data in order to find outliers, and (2) identify a robust subspace through reproducibility estimates. We applied PCA+SHR to resting state functional neuroimaging data in one cohort of ONDRI participants (I=109 << J=32,768). Compared to existing techniques, PCA+SHR has many advantages: it provides multiple distance metrics with distributions to assess outlierness (e.g., average estimates or stability estimates), through SHR it provides a robust subspace, and it is a general purpose tool for outlier detection and robust subspaces because it can be used on data of any dimensionality. Code with examples available:

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program