Abstract:
|
Utilizing data collected from a 3rd party makes all subsequent observations dependent upon the data quality anomalies that may be present from that 3rd party data. With the advent of large, purchasable, datasets utilized to make informed statistical decisions, it becomes paramount to understand how to detect and quantify the extent of anomalous data, before analysis can begin. The most powerful methodology for detecting and correcting data quality anomalies within a data set requires knowledge of the 3rd party's business practices, which is often hard to acquire. Therefore, global and widely applicable methods for determining for anomaly quantification within a given data set become more important. This paper explores the use of quantifying data quality anomalies through use of statistical analysis checks applied to different types of data and presents a metric for evaluating purchased administrative data
|