Abstract:
|
Pooling data from different sources has become popular in many scientific fields. Pooling data from multiple epidemiologic cohort studies to form a consortia is one example. Pooled analysis using cohort consortium data holds several major benefits, including larger sample sizes. However, across-cohort heterogeneity (CHet) can be a major statistical challenge when using data from different studies, with a key question being when are they too different for reliable combining. A formal framework articulating CHet is lacking. Here, we formalize the concept of CHet in the context of cohort consortia, and discuss its implications as applied to health research. Three types of CHet in terms of the distributions of pertinent variables are discussed: heterogeneity in the joint distribution, heterogeneity in the conditional distribution, and heterogeneity in the exposure-outcome effects. Using the ECHO consortium for child health, we examine scenarios where CHet influences estimands of descriptive, prediction, and association/causation studies. Our framework highlights a new field for detecting and resolving cohort-heterogeneity with respect to the diverse goals of statistical practice.
|