Abstract:
|
Learning properties of large graphs from samples is an important problem in statistical network analysis, dating back to the early work of Goodman and Frank. We revisit the problem formulated by Frank (1978) of estimating the numbers of connected components in a graph of $N$ vertices based on the subgraph sampling model, where we observe the subgraph induced by $n$ vertices drawn uniformly at random. The key question is whether it is possible to achieve accurate estimation, i.e., vanishing normalized mean-square error, by sampling a vanishing fraction of the vertices. We show that this is impossible if the graph contains high-degree vertices or long induced cycles; otherwise, it is possible by accessing only sublinear number of samples. Optimal sample complexity bounds are obtained for several classes of graphs including forests, cliques and chordal graphs.
The methodology relies on topological identities of graph homomorphism numbers, which, in turn, also play a key role proving minimax lower bounds based on constructing random instances of graphs with matching structures of small subgraphs. If time permits, we will also discuss results for the neighborhood sampling model.
|