Online Program Home
My Program

Abstract Details

Activity Number: 354 - Topics in Machine Learning
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #327211
Title: Statistical Modeling for Pooling and Analyzing Multi-Site Data Sets Using Maximum Mean Discrepancy
Author(s): Hao Zhou*
Companies: University of Wisconsin Madison
Keywords: maximum mean discrepancy; generative adversarial network; meta analysis; domain adaptation; multi-task learning; multi-site

When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. But variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions when we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable, which depends on maximum mean discrepancy and has relation with generative adversarial network. We analyze various properties of the framework for testing model fit, constructing confidence intervals and evaluating consistency characteristics. Our technical development is motivated by Alzheimer's disease (AD) studies and we present empirical results showing that our framework enables harmonizing protein biomarkers even when the assays across sites differ.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program