Abstract:
|
Outlier detection is a very important task in many real applications like fraud detection in financial area, transportation, etc. Local outlier detection has been more and more popular recently. Due to the local approach, it is able to identify outliers in a dataset that would not be outliers in another area of the data set. The most representative local outlier method would be the local outlier factor (LOF). It quantified the degree of outlying of an object $p$ to be the ratio of its density and the average density of its neighboring objects. However, this method has a draw back of insensitive to the difference of the density distribution of the object's neighborhood. Furthermore, this method is purely based on the reachability distance defined using nearest neighborhood information, it does not have an intuitive statistical interpretation. Thus, we propose a method based on individualized two-sample test to quantify the degree of outlying in a more statistical way. Theoretical upper bound of the outlying score is provided, and it will provide us with a general guidance on detecting local outliers. Simulation and real data examples are also provided to illustrate the performance.
|