Abstract:
|
Along with its many challenges, Big Data present exciting opportunities to better understand risk factors, build improved predictors, and elucidate causal relationships. Still, there are many sources of association between two variables: direct effects, indirect effects, measured confounding, unmeasured confounding, and selection bias. Methods to delineate causation from correlation are perhaps more pressing now than ever. We introduce a roadmap for translating a causal query into a statistical analysis: 1) clear statement of the scientific question; 2) definition of the causal model and causal parameter; 3) assessment of identifiability - linking the causal effect to a parameter estimable from the observed data distribution; 4) choice and implementation of estimators including parametric and semi-parametric approaches; and 5) interpretation of findings. This framework assures the parameters being estimated match the questions posed, elaborates what assumptions are necessary to interpret an estimate causally, and when the assumptions are not met, provides guidance on future research. These concepts are illustrated with an application to HIV prevention and treatment in East Africa.
|