Abstract:
|
High-dimensional Gaussian graphical models have been a powerful tool for learning connections among a large number of variables. While most prior work focuses on the case when all variables are measured simultaneously, we consider a practically common but new setting when no simultaneous measurement of all variables is available and there are differing sample sizes for each pair of nodes. This occurs in estimating gene expression networks from single-cell sequencing data, functional connectivity from neuronal recordings, and sensor networks. In this paper, we focus on both graph estimation and edge-wise inference, developing novel methods and theoretical guarantees for this setting. We characterize (i) how the neighborhood recovery of a target node, and (ii) how the testing power/confidence interval width for a target edge depend on the sample sizes of each pair and quadruple differently, suggesting that neighborhood recovery and edge-wise inference are still possible even when a proportion of nodes or node pairs are highly missing. We also conduct simulations and real data experiments to validate our theory and testing methods.
|