Abstract:
|
Clustering is a fundamental unsupervised learning technique that aims to discover groups of objects in a dataset. Biclustering extends clustering to two dimensions where both observations and features are grouped simultaneously. For example, clustering both cancerous tumors and genes or both documents and words. Triclustering is then the natural extension of clustering to three dimensions where the data are organized in a three-dimensional array, or tensor. We develop and study a convex formulation of the triclustering problem, which is guaranteed to obtain a unique global minimum. Convex triclustering generates an entire solution path of possible triclusters governed by one tuning parameter, and thus alleviates the need to specify the number of clusters a priori. We investigate the application of our method to biological datasets.
|