Abstract:
|
Clustering is of special interest in neuroimaging studies of mental illness, because psychiatrists believe that many psychiatric conditions present in multiple distinct and not yet identified subtypes. Subjects in such neuroimaging studies are often represented via their functional or structural connectivity matrices viewed as networks, with one network per subject. Clustering with a large number of features is challenging in itself, and the network nature of the observations presents additional difficulties. Our goal here is to develop a clustering method that respects the network nature of the data, allows for feature selection, and scales well to high dimensions. Sparse K-means, a general method for clustering and feature selection in high dimensions, was proposed by Witten and Tibshirani (2010). Here we develop a network-aware sparse K-means algorithm, using a network-induced penalty for clustering weighted networks and performing feature selection. We also develop a full Gaussian mixture model version of the algorithm, particularly useful when features are highly correlated, which is the case in neuroimaging. We illustrate the method on fMRI data on schizophrenia.
|