Online Program

Return to main conference page
Thursday, September 13
Thu, Sep 13, 2:45 PM - 4:00 PM
Lincoln 5
Investigating Data Anomalies

Understanding the Individual Contributions to Multivariate Outliers in Assessments of Data Quality (300687)

Laura Castro-Schilo, JMP Division, SAS Institute 
Jianfeng Ding, JMP Division, SAS Institute 
*Richard C. Zink, TARGET PharmaSolutions 

Keywords: Data Visualization, Principal Components, Risk-Based Approaches, Statistical Monitoring

Mahalanobis distance is often recommended to identify patients or clinical sites that are considered notewothy in clinical trials. Patients extreme in one or more covariates may be considered outliers in that they reside some distance from the multivariate mean, which can be thought of as the center of the data cloud. Less often discussed, patients whose data are believed to be “too good to be true” are located near the centroid as inliers. In order to efficiently investigate these anomalies for potential lapses in data quality, it is important to understand how the individual variables contribute to each multivariate outlier. We describe how to identify multivariate inliers and outliers, and summarize the contributions of variables to facilitate further review. We illustrate these methodologies using data from a multicenter clinical trial.