Abstract:
|
Scattered data or multi-center data, which are collected and stored individually at local data centers, can be highly heterogeneous if data centers are very different. Clearly, a simple collating or pooling of data is not enough, sometimes even not feasible due to data privacy and bandwidth limitation. There is an urgent need for data fusion methods to integrate scattered data. We present a general feature space fusion framework through the multi-index model, which assumes that the response variable depends on several linear combinations of predictors through some unknown link functions. By fusing the feature space spanned by the regression indices for data in each center, we can borrow the strength of multiple local centers, and obtain a more accurate estimation. We show theoretically that the fused feature space is asymptotically consistent under some mild regularity conditions. We also establish the asymptotic convergence rate of the proposed algorithm. As we allow center-specific predictor distributions and link functions for local data centers, the method can well address the data heterogeneity in scattered data.
|