Abstract:
|
Clustering is a fundamental tool for exploratory analysis of big data that finds groups of similar observations. Recently, several have suggested convex approaches to clustering and biclustering, which simultaneously groups features and observations. These methods fit cluster means and use a convex fusion penalty to encourage the means to fuse together to yield a group of fused observations, or a cluster. A major advantage of convex clustering is that one tuning parameter determines both the number of clusters and the cluster assignments. As this tuning parameter is increased, observations begin to fuse together yielding a continuous and nested family of clusters that we term the convex clustering solution path. In this paper, we present new fast algorithms to approximately compute the convex clustering and biclustering solution path as well as new visualization tools to dynamically and interactively explore the clustering solutions. Our R + shiny tools allow users to watch their data form clusters or biclusters. We demonstrate this on examples from text mining and cancer genomics.
|