Abstract:
|
Logistic regression models based on selected predictors are often used to predict binary outcome of interest based on some function of selected predictor variables or covariates. However, in many studies, the true binary outcome is often assessed with an imperfect test and only a subsample is validated using a gold-standard (or error-free) method due to cost constraints. A penalized-likelihood based variable selection algorithm can be used to simultaneously select predictor variables and consistently estimate the odds ratios. However, discriminative power and classification accuracy (sensitivity, specificity, AUC under ROC) for the logistic regression-based predictive models cannot be directly measured directly without knowing the true outcome status. Naive methods that ignore the outcome misclassification can lead to biased estimates of classification accuracy. This paper proposes consistent estimates of classification accuracy of the selected logistic models when outcomes are measured with uncertainty. The implementation using publically available EM-based R algorithm and packages will be discussed.
|