Abstract:
|
Statisticians are required to perform rigorous data analysis, but it is also their duty beforehand to make sure that their data are sound. The aim of this project was to build a data validation program in order to ensure data quality of a prospective cohort database about systemic autoimmune rheumatic diseases including 253 patients. A SAS program was built to identify potential data entry errors and to generate error reports for each patient. 152 validation conditions were created. 91% of the 1579 potential errors identified were real and corrected. Most frequent types of errors were data not entered in the appropriate field or with the correct format (30%), inconsistency between pairs of variables (25%), missing data (13%), and extreme values (12%). Many of these errors could have been prevented. Following this process, the data entry guide was enhanced and data entry personnel were informed about frequent problems. Data quality was much improved after the validation program was implemented, and it is still an ongoing process. Statisticians should have a leading role in database elaboration, data entry and data validation to sensitize research teams about data quality.
|