Online Program Home
My Program

Abstract Details

Activity Number: 358 - Contributed Poster Presentations: Biometrics Section
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor: Biometrics Section
Abstract #330061
Title: Using Synthetic Data to Incorporate External Information into Regression Model Estimation
Author(s): Tian Gu* and Jeremy M.G. Taylor and Bhramar Mukherjee and Wenting Cheng
Companies: University of Michigan and University of Michigan and University of Michigan and University of Michigan
Keywords: Synthetic Data; Information Integration; Modeling ; Regression Estimation

We consider the situation where there is a well-established regression model [Y|X], using a set of commonly available risk predictors X to predict an important outcome Y. A modest sized dataset of size n containing Y, X, and B is available, where B is a new variable that is thought to be important and would enhance the prediction of Y. The challenge is to build a good model for [Y|X,B] that uses both the available dataset and the known model for [Y|X]. One popular proposal in the literature to achieve this is the constrained maximum likelihood (CML) approach, by maximizing the likelihood for [Y|X,B] subject to the constraints on the parameters from [Y|X]. We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n+m to estimate the parameters of the model [Y|X,B]. In two special cases we show that the synthetic data approach with large m gives identical asymptotic variance for the parameters of the [Y|X,B] model as the CML approach. This provides some theoretical justification for the synthetic data approach, and given its broad applicability makes the approach very appealing.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program