An Adaptive Approach to Hidiroglou-Berthelot Outlier Detection*Matthew C Nelson, US Census Bureau
Keywords: outlier detection, Hidiroglou-Berthelot, periodic surveys, maximum likelihood logistic regression, ratio edit
The Hidiroglou-Berthelot (H-B) ratio edit is a common method of bivariate outlier detection within establishment surveys. Among its strengths is the ability to vary acceptance thresholds depending on the size of a given case. Yet parameterization requires adjustment of up to three variables, and the determination of optimal parameters is not a straightforward process. Moreover, outliers are identified in a strictly binary sense, not allowing for uncertainty that a case may or may not be an outlier. In this paper I propose an adaptive variation of H-B, in which outlier probabilities are modeled against transformed terms of the H-B formula using maximum likelihood logistic regression. These model parameters are then trained using the results of supervised learning (user classification). In the paper I investigate several logistic models, considering their efficacy in replicating H-B decision boundaries and their training efficiency. The goal of this research is to produce a flexible, continuously-learning variation of H-B that still employs the terms of the original formula. Proposed methodology is demonstrated using example scenarios.