Abstract:
|
Structured data arise, for example, from social networks among individuals that come from email and phone records to biological networks that come from high throughput biological experiments. We propose a novel framework to examine the association between high-dimensional structured data and an outcome of interest. Our proposed framework, entitled the generalized matrix decomposition regression (GMDR), has two parts: the GMDR estimation and the GMDR inference. The GMDR estimation directly regresses the outcome on a set of low-dimensional predictors, which can be viewed as a generalization of the principal component regression (PCR) by incorporating external structures. The GMDR inference framework allows one to construct confidence intervals for the components of the unknown parameter based on a wide range of estimators. Simulations reveal that the GMDR inference has better power than existing methods when the external structure is informative of the structure of the unknown parameter. The GMDR framework is illustrated on an application of the prediction of percent fat in a cohort of premenopausal women using gut microbiome data as well as the detection of important taxa.
|