Abstract:
|
In this paper, we propose Max-Random Forest, a novel nonparametric density estimator inspired by random forest. It is computationally friendly and empirically has better performances compared with other existing methods. Especially it is capable of estimating a density with over 100 dimensions using one hundred thousand data with a reasonable error. Under Lipschitz assumptions, we provide a non-asymptotic bound on estimation error in squared Hellinger distance for a simplified version of the algorithm. We further modify Max-Random Forest into a conditional density estimator, which is capable of estimating and sampling from a high-dimensional conditional density with decent accuracy. We finally apply the conditional density estimator to missing data imputation. In real data cases where number of total dimensions is over 10 and that of missing dimensions is 5, Max-Random Forest enjoys competitive performance compared with state-of-the-art methods.
|