Abstract:
|
Using multiple studies in a single statistical analysis leverages heterogeneous data to distinguish signal from artifacts by identifying what signal is shared by some or all of the studies, and what signal is specific to an individual study. The unsupervised identification of latent factors can be particularly useful for uncovering signal in the high-dimensional setting, but existing extensions of factor analysis to the multi-study context can only identify latent factors if they are common to all studies or unique to a single study. In this work, we introduce Bayesian Combinatorial Multi-Study Factor Analysis (BCMSFA), which learns latent factors shared by any subset of studies. We do so by using the Indian Buffet Process to model the shared ownership of factors across multiple studies. Our approach encourages sparse high-dimensional factor loading matrices through the multiplicative gamma process shrinkage prior. We estimate parameters using a computationally efficient Gibbs sampling algorithm. We demonstrate the robustness of BCMSFA through a broad range of simulations, and apply BCMSFA to multiple breast cancer gene expression datasets.
|