Online Program Home
My Program

Abstract Details

Activity Number: 555
Type: Contributed
Date/Time: Wednesday, August 3, 2016 : 10:30 AM to 12:20 PM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #321443
Title: Distance-Based Anomaly Detection by Random Sampling
Author(s): Kalbi Zongo* and Charlotte Wickham and Sarah Emerson
Companies: Oregon State University and Oklahoma State University and Oregon State University
Keywords: Anomaly ; Detection ; Random sampling ; Isolation Forest ; Local Outlier Factor ; Repeated Impossible Discrimination Ensemble
Abstract:

Anomaly detection has been gaining attention in academia and industry, with many algorithms recently developed to identify diverse type of anomalies. Here we consider detection of punctual anomalies, a single data point or a cluster of data points that behave differently than the majority of data points. Many competent algorithms, such as RIDE (Repeated Impossible Discrimination Ensemble), LOF (Local Outlier Factor) and IF (Isolation Forest), involve explicit or implicit distance calculations which can be computationally expensive. Both RIDE and IF are based on measures of point-influence: how easily a point is isolated/how strongly a point influences model fit. The implicit distance measures used by these methods add unnecessary complexity to the computations and randomness to the results. We propose a new method, RSOS (Random Sampling Outlier Score), which uses explicit pairwise distances to construct outlier scores, made computationally efficient through subsampling. Our method outperforms the first three in many scenarios, and exhibits improved or competitive running time. We also investigate methods and impact of variable selection on the anomaly detection procedures


Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

 
 
Copyright © American Statistical Association