Abstract:
|
Outlier detection has been of interest in many practical applications for long. The techniques used to detect outliers and noisy data vary widely depending to the given domain. For instance, outlier detection has been extensively studied in the fields of statistics, machine learning, data mining, and other major research areas. The methods developed are generally based on the premise that unusual observations have characteristics that differ from "normal" data. Practical applications of outlier detection methods have received particular interest in areas such as fraud detection, network intrusion, crime, terrorism, as well as medical research. The advent of big data has brought different challenges to the methodologies applied to detection outliers and noisy observations. For instance, "wide" data sets have become increasingly present in medical fields, while data sets with millions and even billions of observations with variables numbered in the thousands are commonplace in data from social media streams. We propose here a novel one-class convex hull peeling method to detect noisy observations in high dimensional data sets, which is also well suited to wide data sets.
|