Keywords: Data quality/integrity, data anomalies, error detection, clinical investigator site selection
On-site inspections are important to ensure the quality of the trial data and the reliability of the trial results submitted to the U.S. Food and Drug Administration. With the increasing size and complexity of the trials, statistical tools are needed to assist the site selection process and identify potentially problematic sites. We describe our experience with a centralized statistical monitoring platform as part of a Cooperative Research and Development Agreement (CRADA) between CluePoints and the FDA. The approach employed in the CRADA to centralized statistical monitoring is based on a large number of statistical tests performed on all subject level data submitted, in order to identify sites that differ from the others. An overall data inconsistency score is calculated from a high-dimensional p-value matrix to assess the inconsistency of the data between one site and the data from all sites. Sites are ranked by the data inconsistency score (-log(p), where p is an aggregated p-value). Operationally, only sites with highest ranks and larger sizes are recommended for inspections. Results from one deidentified application are provided to demonstrate the typical data anomaly findings through the Statistical Monitoring Applied to Research Trials (SMART) analysis. Sensitivity analysis are performed after excluding laboratory data and questionnaires data. Graphics from deidentified subject-level trial data are provided to illustrate abnormal data patterns. Possible causes of data anomalies are discussed. This data driven approach can be effective and efficient in selecting sites which exhibit data anomalies and provides insights to the statistical reviewers for conducting sensitivity analyses, subgroup analyses and site by treatment effect explorations. However, challenges exist with messy data and with the lack of conformance to data standards.