Abstract:
|
We consider the problem of building regression models based on individual-level data from an "internal" study while utilizing summary-level information, such as information on parameters for reduced models, from external big-data sources. We provide a general theory of semi-parametric constrained maximum-likelihood inference that allows distribution of covariates to remain completely unspecified. Extensions are developed for handling complex stratified sampling design, such as case-control sampling, for the "internal" study. We use multiple real datasets and simulation studies to assess the performance of the proposed method and contrast it to that of calibration methodology popular in sample survey. Connections of the proposed methodology with those used for analysis of two-phase study designs are also discussed.
|