Online Program

Return to main conference page
Thursday, September 13
Thu, Sep 13, 2:45 PM - 4:00 PM
Lincoln 5
Investigating Data Anomalies

Central Statistical Monitoring of Clinical Trials for Real-Time Detection of Data Anomalies (300769)

*Marc Buyse, CluePoints 

Keywords: Central statistical monitoring, data fraud, data inconsistencies, data quality, multicenter trials

Central statistical monitoring (CSM) has been advocated as an efficient way to ensure data quality of multicenter clinical trials. This talk will outline some key principles of CSM, and will discuss statistical models for the detection of centers with anomalous data. Mixed effects models and beta-binomial models are used to compare each center with all other centers in order to allow for the expected heterogeneity between centers. The use of these models will be illustrated with examples from actual trials.

CluePoints has developed a general, unsupervised approach to CSM by systematically comparing all the data of patients from one center compared with the data of patients from all other centers. Statistical tests are performed using mixed effects models and beta-binomial models. The P-values of all these tests are summarized in a so-called “data inconsistency score” (DIS). Centers with very high DIS should typically be scrutinized for further evidence of misunderstanding, sloppiness, data tampering, or fraud.

The operating characteristics of this approach have been studied through simulations. First, data from an actual clinical trial were contaminated in a given number of patients from a given number of centers. The DIS had very high specificity in all situations, and satisfactory sensitivity in situations likely to arise in practice – specifically, when several variables were contaminated for several patients in a single center. Second, bivariate data were completely fabricated by the investigator in one center to mimic data coming from a known bivariate distribution in the other centers. Here again, the DIS had very high specificity and sensitivity, which demonstrates that humans cannot fabricate plausible data.