Abstract:
|
We study the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theories for testing whether a cluster exists (detection) and identifying the support of cluster (recovery). Specifically, we identify sharp boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible. We also develop tight computational thresholds: when the signal-to-noise ratio is below these thresholds, we prove that polynomial-time algorithms can not solve these problems under the computational hardness conjectures of hypergraphic planted clique (HPC) detection and hypergraphic planted dense subgraph (HPDS) recovery. We also propose polynomial-time tensor algorithms that achieve reliable detection and recovery when the signal-to-noise ratio is above these thresholds. The interplay between sparsity and tensor structure results in dramatic differences between high-order tensor clustering and matrix clustering in literature in aspects of phase transition diagrams, algorithms and proof techniques.
|