Abstract:
|
One challenge in using electronic health records (EHRs) for research is that the true phenotype status of an individual must be derived using information available in the EHR. Phenotyping algorithms are often used to define cases and controls, but it is difficult to balance the accuracy of phenotype classification with sample size. We explore the use of an estimating equation (EE) approach that allows for more relaxed phenotype definitions and corrects the bias introduced by case contamination. Our approach relies on drawing a validation subset from a contaminated case pool and training a phenotyping model to distinguish cases from non-cases. Through simulation studies, we assess the performance of the EE method for bias correction, evaluate the robustness of the method to specification of the phenotyping model, and evaluate the performance of the EE method when the phenotyping model is fit using high-dimensional data methods. Finally, we apply the method to an EHR-based study of dilated cardiomyopathy. We find that our method outperforms other methods used for bias correction and can also perform well when high-dimensional data methods are necessary to fit the phenotyping model.
|