Abstract:
|
Biclustering has become a popular tool, particularly in the analysis of gene expression datasets. Such biclustering methods find subsets of genes which co-vary in only a subset of the samples. This is unlike usual clustering methods which utilize the entire set of genes, potentially missing important information. Biclusters of interest often manifest as rank-1 submatrices of the data matrix. This submatrix detection problem can be viewed as a factor analysis problem where both factors and loadings are sparse. In this paper, we propose a new biclustering method which utilizes the Spike-and-Slab Lasso of Rockova and George (2016) to find such a sparse factorization of the data matrix. This is achieved using a fast, deterministic EM algorithm that rapidly identifies promising biclusters. This method, called Spike-and-Slab Lasso Biclustering, outperforms other biclustering methods in a variety of simulation settings.
|