Online Program Home
My Program

Abstract Details

Activity Number: 39 - Topics in Clustering
Type: Contributed
Date/Time: Sunday, July 29, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329862 Presentation
Title: Exploring Clustering Applications in Outlier Detection for Administrative Data Sources
Author(s): Elizabeth Ayres*
Companies: Statistics Canada
Keywords: clustering; outlier detection; feature selection; big data; machine learning

National statistical agencies are relying more heavily on administrative data sources, which are becoming increasingly larger, requiring efficient edit and imputation procedures. Outlier detection methods currently available at Statistics Canada are highly effective in settings where the variable of interest follows a unimodal distribution, either on its own, or within groups formed by a set of class variables. Often with large administrative data sources, finding a set of class variables which can be used to satisfy this assumption is a challenge, and the effectiveness of the outlier detection is subsequently reduced. This is the case for our motivating application involving international merchandise trade data. This paper explores unsupervised clustering techniques capable of handling a mixture of quantitative and qualitative variables, with the goal of applying these techniques in order to increase outlier detection efficacy. We propose a method for using cluster analysis to isolate modal distributions as a pre-treatment to outlier detection. In addition, we examine a clustering method for outlier detection directly. These methods are contrasted with a standard approach commonly used for business surveys at Statistics Canada.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program