Abstract:
|
Many widely-used statistical procedures, including methods for goodness-of-fit tests, feature selection and changepoint analysis, rely critically on the estimation of the entropy of a distribution. I will initially present new results on a commonly used generalisation of the estimator originally proposed by Kozachenko and Leonenko (1987), which is based on the k-nearest neighbour distances of a sample of independent and identically distributed random vectors. These results show that, in up to 3 dimensions and under regularity conditions, the estimator is efficient for certain choices of k, in the sense of achieving the local asymptotic minimax lower bound. However, they also show that in higher dimensions a non-trivial bias precludes its efficiency regardless of the choice of k. This motivates us to consider a new entropy estimator, formed as a weighted average of Kozachenko-Leonenko estimators for different values of k. A careful choice of weights enables us to reduce the bias of the first estimator and thus obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness. Our results provided theoretical insight and have important methodological implications.
|