Abstract:
|
Clustering has long been a popular unsupervised approach to identify patterns from unlabelled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging due to its unsupervised nature. Meanwhile, in many real-life scenarios, some noisy auxiliary variables such as subjective diagnostic opinion might indicate some form of observed heterogeneity of the unlabelled data. By leveraging information from both auxiliary variables and unlabelled data, we seek to uncover more scientifically interpretable group structures that may be hidden in completely unsupervised analyses of data. In this work, we develop a new statistical pattern discovery method named Supervised Convex Clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns with a joint convex fusion penalty. To efficiently fit our model, we adopt the multi-block ADMM with provable convergence. Additionally, we develop extensions of SCC that allow for adjusting covariates as well as biclustering. We also demonstrate the practical advantages of SCC through simulations and a case study on genomics.
|