Abstract:
|
Large epidemiologic studies often rely on data sources that are error prone, such as those reliant on electronic health records. Data errors in even a single covariate can bias multiple regression coefficients, including bias in coefficients of precisely measured variables. Errorprone outcome variables can be an additional source of bias, particularly when that error is related to other regression variables. Validation of a subsample of records can be a practical way to obtain data regarding the nature of the errors, which can inform statistical adjustment methods to avoid error-induced biases. Design-based estimation methods are attractive in settings where errors in multiple variables may be too complex to model reliably. Efficiency of these estimators can be improved by sampling more informative subjects into the validation subset. This talk will present strategies to improve the efficiency of design-based estimators, which includes generalized raking, multi-wave sampling, and the application of the multi-frame approach of Metcalf and Scott 2009 to accommodate multiple outcomes of interest. Concepts are demonstrated with numerical studies and application to real data.
|