Online Program Home
My Program

Abstract Details

Activity Number: 497 - Cloud and Distributed Computing for Statisticians
Type: Invited
Date/Time: Wednesday, August 1, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #326546 Presentation
Title: Distributed Machine Learning with H2O
Author(s): Navdeep Gill*
Keywords: machine learning; distributed computing

H2O is an open source, distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop in addition to a multi-node cluster. The core machine learning algorithms of H2O are implemented in high-performance Java, however, fully-featured APIs are available in R, Python, Scala, REST/JSON, and also through a web interface.

Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of Generalized Linear Models, Gradient Boosting Machines, Random Forest, Deep Neural Nets, Stacked Ensembles (aka "Super Learners"), dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), anomaly detection methods, among others.

R and Python code with H2O machine learning code examples will be demoed live and will be made available on GitHub for participants to follow along on their laptops. For those interested in running the code on a multi-node Amazon EC2 cluster, an H2O AMI is also available.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program