Multiple imputation is now well established as a practical and flexible method for analysing partially observed data under the missing at random assumption. However, in large datasets there are concerns about how to preserve heterogeneity in the relationship between variables in the imputation process.
Building on recent work, we describe an imputation model (and R software) which allows the covariance matrix of the variables to vary randomly across higher level units, which may represent health districts or hospitals.
We further show how this approach adapts to (i) impute data consistent with interaction and non-linear effects under investigation (ii) include weights, when the substantive model is weighted, and (iii) incorporate external information, when available.
We illustrate with an example from the UK Clinical Practice Research Datalink.
|