Abstract:
|
Humans are exposed to complex mixtures of chemicals as part of our daily lives. Statistical models for studying the health effects of these exposures aim to achieve several goals, including identifying individual important chemicals, estimating individual, interactions, or joint effects of correlated chemicals. Popular models include generalized linear models with variable selection, Bayesian kernel machine regression, quantile g-computation, and factor models. However, no single model is best at addressing all of the aforementioned goals. In practice, the model choice depends on the characteristics of the problem: the dimensionality of the mixtures, the degree of multicollinearity, the sample size, the effect size, noise level, and the hypothesized dose-response surface. We simulate datasets with different values in the characteristics listed above and estimate the aforementioned models’ power to detect individual chemicals or combinations of chemicals critical to the outcome. By comparing the models’ power, we identify data settings under which they demonstrate strengths and weaknesses, which can be used as a guideline for model choice.
|