Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 356 - Statistical Learning: Methods and Applications
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #312540
Title: Flexible Feature Selection and Cluster Analysis for Heterogeneous Data with Application to a Diffusion Tensor Imaging Study
Author(s): Wanying Ma* and Luo Xiao and Jaroslaw Harezlak
Companies: Novartis Pharmaceuticals Company and North Carolina State University and Indiana University
Keywords: Functional data analysis; Functional principal component; Density transformation; Subset selection; Clustering; Data heterogeneity

Motivated by a diffusion tensor imaging (DTI) study of the sports-related concussions (SRC), we propose a novel probabilistic subset search (PSS) algorithm which enables simultaneous flexible feature selection and homogeneous subgroup clustering. We first transform the raw data along white matter tracts using density functions, then adopt the log-quantile density transformation to further transform the density function to the Hilbert space. Using functional principal component analysis, the transformed curves are represented by a group of functional PCs from 27 tracts under each of 4 considered measurements (Fractional Anisotropy(FA), Mean Diffusivity(MD), Radial Diffusivity(Dr) and Axial Diffusivity(Da)). The proposed PSS algorithm finds the underlying clusters using a weighted subset search based on the k-means. We evaluate the performance of the PSS algorithm on the motivating DTI study data of SRC. Application shows the proposed PSS algorithm provides high out-of-sample classification accuracy using a small subset of the high-dimensional features, and a 2-cluster subgroup structure is identified for the concussed group, which confirms the heterogeneity of SRC.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program