37 – Statistical Process Control and Quality Assurance
Dataset Quality Assessment for Business Analytics Use
James Wendelberger
Urban Science
Electronic datasets are received for use in company business products and services. These datasets do not always contain the data that is expected. Assessing dataset quality is important and desirable. An assessment process is defined which breaks down the assessment into five steps. Step one is a check of the reasonableness of the file for a "big picture" view of the dataset. It confirms basic file information to verify that the file and format of the contents are as expected. The next steps test individual data values, individual variable distributions, multivariate variable distributions and the likelihood that the data is correct based upon external information, constraints or assumptions about the data. Here we present various examples of each of these assessments. Passing the data quality assessment tests results in acceptance of the received dataset. Failure(s) of the data quality assessment test(s) results in either data corrections/imputations or in the extreme case a new dataset request to the vendor. If a new dataset request is needed, then the specific assessment test failure(s) may be disclosed to the vendor.