Online Program

Automatic editing of tick-box data in business collections
*Kin Hoong Chung, Australian Bureau of Statistics 

Keywords: Categorical, business survey, CANCEIS, donor imputation

Categorical data is hard to edit in an efficient way using selective editing methods, especially tick-box questions (questions where the respondent is required to choose an answer by ticking a box). Each tick-box is a Bernoulli variable, so the influence of any response to the estimate is only as great as the associated estimation weight. This means you do not have the small number of very large values needed for efficient selective editing.

Since the ABS is collecting more categorical business characteristics data, an alternative to selective editing is needed to edit this data efficiently. The obvious solution is automatic editing and imputation. After assessing various automatic editing systems, we selected CANCEIS for further study because we were very interested in the performance of its donor imputation method on tick-box business survey data, and to a lesser extent, mixed categorical and quantitative data.

In this paper, we present the results of our evaluation of applying CANCEIS to a subset of the Business Characteristics Survey data. We conclude with a discussion of the issues and unresolved questions we have about the use of CANCEIS on business characteristics data.