The Arborist is an open-source implementation of the Random Forest algorithm, written with several goals in mind. These include scalable performance, parallelization, language-agnostic implementation and ready adaptation to common workflows.
The Arborist has been implemented to expose as much concurrency as possible, allowing parallel hardware to take full advantage of opportunities for acceleration. Particular attention has also been paid to data locality, in an effort to minimize costly data movement. This has proven beneficial for both the multicore and GPU versions, and will prove essential for upcoming out-of-core implementations.
Quantile regression is supported natively, as is nonparametric resampling. Internalization of additional workflows is envisioned. Not only does this provide a ``loopless'' experience for the user but, more importantly, allows intermediate state to be reused profitably.
The initial release supports R, with Python bindings in preparation. Source code for the Arborist is hosted on Github. An R package implementation, "Rborist", is available on CRAN. The talk will focus on current features and briefly comment on the structure of the code.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.