Abstract:
|
We present a flexible statistical framework for factorizing a high dimensional data matrix into the product of two sparse, low-dimensional matrices. Our model is motivated by Bayesian sparse factor analysis, modified to encourage sparsity in both the loading and factor matrices. The three layer three parameter Beta distribution (TPB) prior used in BicMix has the behavior of the spike-slab prior but the computational tractability of a continuous prior. Motivated by confounding in the data, we model loading and factor matrices as a mixture of sparse and dense components, capturing both sparse signals and experimental confounders. Using a fast variational Expectation Maximization algorithm, we show that our method dominates other Biclustering methods. We applied BicMix to a high dimensional gene expression dataset to extract sparse components, and we show the biological meaning behind the biclusters. Because this model yields a natural solution to estimating sparse covariance matrices, we use model parameters to construct a Markov random field for the genes, and recover gene co-expression networks that are either specific to, or differential across, specific experimental covariates.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.