Abstract:
|
Statistical agencies and other organizations that collect and process data are often faced with data files that contain faulty values. When these errors result in inconsistencies agencies usually correct them through a process known as edit-imputation. The dominant paradigm, due to Fellegi and Holt (1976), separates the task into an error localization and an imputation phase, and is based on finding the minimal set of changes needed for the records not be inconsistent. While this approach has the advantage of minimizing changes to the original data, it can produce biased estimations, as it ignores the distribution of the data during error localization. In this talk I introduce a new procedure for edit-imputation of categorical data based on joint modeling. This model includes a flexible representation for the underlying true values, with support only on the consistent responses; a model for the location of errors; and a model for the observed faulty data. Estimation is performed simultaneously using MCMC sampling. Through challenging data-based simulations I show how this method can deliver superior results than those obtained from the application of the Fellegi-Holt approach.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.