Abstract:
|
In classical classification, a data point is classified given its individual covariates. Often, additional network information describing the connectivity relationships between data points are also available, which in principle can be used to improve classification performance. In this work, we develop a general statistical framework for network augmented classification. Under this framework, we derive the optimal Bayes classifiers for two general families of distributions incorporating both node covariates and network link information, one being generative and the other being discriminative. Further, we establish theoretical consistency results for plug-in classifiers with respect to the optimal classifiers under the generative and discriminative families, respectively. We also apply the general approaches to two specific models and propose two effective classification methods for practical use. The proposed methods work well when evaluated using both simulation studies and real-world data examples.
|