Abstract:
|
Bayesian networks have been widely used for modeling conditional independencies for multivariate data. Despite its popularity, the vast majority of existing approaches make a strong assumption of (partially) homogeneous sampling schemes. However, such assumption can be seriously violated causing significant biases when the underlying population is inherently heterogeneous. To explicitly account for the heterogeneity, we propose a novel Bayesian network model, termed BN-LTE, that embeds the heterogeneous data on a low dimensional manifold and builds a Bayesian network conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from population level to observation level. Moreover, while Bayesian networks are in general not identifiable with cross-sectional data due to Markov equivalence, with the “blessing of heterogeneity”, the proposed BN-LTE is identifiable because of the latent embedding. Two case studies (cancer genomics and single-cell transcriptomics) illustrate the unique capability of BN-LTE to infer patient-specific and cell-specific gene regulatory networks.
|