Abstract:
|
In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among samples that may be hidden in individualistic cluster analyses of a single data-view. We develop a convex formalization that inherits the strong statistical, mathematical and empirical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex losses for each data view with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our iGecco+ approach selects features from each data-view that are best for determining groups. Through a series of numerical experiments and real data examples on genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.
|