Abstract:
|
Missing data occur when constructing and applying a prediction model. The manner in which missing data are handled - in both phases - can significantly impact the model's performance. Perfect construction of a highly predictive model can still result in poor performance if missing data are not handled properly in the application phase. Traditional solutions for constructing models are unpractical, highly computational or hard to standardize - e.g. multiple imputation (MI) - and can compromise predictive ability when the missing data patterns are differential in the two phases (as often seen in practice). During application, it is often the sample mean that is imputed when the model is used in the clinic and this can lead to poor predictive performance. In this talk we frame the problem and propose several strategies for maintaining the predictive ability of a given model when used in a population with consequential missing data. We argue that a special, suitably constructed pattern mixture model is the best general approach for handling missing data in prediction problems. The selection model formulation can do well, but tends to preform poorly in many practical situations.
|