Conference Program

Return to main conference page

All Times ET

Wednesday, June 8
Computational Statistics
Machine Learning
Practice and Applications
Modeling + Non-Parametric Methods
Wed, Jun 8, 1:15 PM - 2:45 PM
Fayette
 

Non-parametric identification and estimation of interactions using stochastic intervention target parameters: implications for mixed exposure analysis. (310119)

Alan Hubbard, University of California Berkeley 
*David Brenton McCoy, University of California Berkeley 
Mark van der Laan, Graduate Group in Biostatistics and Center for Computational Biology, UC Berkeley 

Keywords: Targeted Learning, Mixed Exposures, Causal Inference, Non-parametric Statistics

There are many regression-based statistical methods for the analysis of mixed environmental exposures on human health. The reliance on regression makes efficient estimation of a joint exposure with complex relationships difficult; likewise, results are often hard to interpret. Novel nonparametric methods with an interpretable target parameter for interactions are needed to ensure robust estimation of a joint exposure. The issue is that it is not known a priori what interactions exist and therefore the full data must be used to both identify interacting variables and estimate a target parameter on these variables. To meet this challenge, we use a V-fold cross-validation framework to estimate a data-adaptive parameter in training folds and a non-parametric interaction target parameter in estimation folds. Our data-adaptive parameters are variable sets used in basis functions in the best fitting multivariate adaptive regression spline model. The best fitting model is determined using a Super Learner which selects the model from an ensemble with the lowest cross-validated MSE. Variable sets are considered important based on ANOVA-like variance decompositions for the basis functions in the best fitting model. Individual variables and variable sets used in all the training folds are considered consistent predictors. The interaction target parameter is applied to variable sets composed of two variables in the mixed exposure. This target parameter is the expected outcome under a dual shift of both variables by some delta compared to the sum of individual shifts. Other parameters exist for effect modification and individual variable shifts. Cross-validated targeted minimum loss-based estimation (TMLE) is used to update the initial expected outcomes given stochastic shift interventions. This method, called SuperNOVA, guarantees consistency, efficiency, and multiple robustness. SuperNOVA provides researchers with V-fold specific and pooled results for each target parameter.