Online Program Home
My Program

Abstract Details

Activity Number: 128 - SPEED: Biometrics and Biostatistics Part 1
Type: Contributed
Date/Time: Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section
Abstract #307035 Presentation
Title: Synthetic Data Method to Incorporate External Information into a Current Study
Author(s): Tian Gu* and Jeremy Taylor and Bhramar Mukherjee
Companies: University of Michigan and University of Michigan and University of Michigan
Keywords: Constrained Maximum Likelihood; Data Integration; Prediction models; Synthetic Data

We consider the situation where there is a known regression model that can be used to predict an outcome, Y, from a set of predictor variables X. A new variable B is expected to enhance the prediction of Y. A modest sized dataset of size n containing Y, X and B is available, and the challenge is to build an improved model for Y|X, B that uses both the available dataset and the known model for Y|X. We propose a synthetic data approach, which consists of creating m additional synthetic data observations, and then analyzing the combined dataset of size n+m to estimate the parameters of the Y|X, B model. This combined dataset has missing values of B for m of the observations, and is analyzed using methods that can handle missing data. We illustrate the method using multiple imputation in an example and some simulations. To provide analytical justification, we consider two special cases, where we show that our approach with very large m gives identical asymptotic variance for the parameters of the Y|X, B model as an alternative published constrained maximum likelihood estimation approach. This justification and the methods broad applicability makes it appealing in more general cases.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program