Abstract:
|
Classical supervised learning problems usually assume the training data points are independent samples. However in many real-world applications, individuals are often connected by a network and interact in complex ways, therefore the independence assumption may not hold. Incorporating network information in modeling is expected to improve the prediction performance, as it provides additional information about relationships among individuals. In this talk, we focus on predicting a response variable, either continuous or categorical, using both covariates and network information. Specifically, we propose a matrix variate model, that allows two-way dependence among data points and among variables, to model the distribution of variables associated with nodes in a network. The network information is naturally incorporated into the matrix variate model, and the relationship between the response variable and predictors can be derived under such model. We have developed efficient algorithms for parameter estimation and also shown consistency of the estimators under mild conditions. Simulation studies and applications to data examples indicate that the proposed method works well.
|