Online Program

Return to main conference page
Friday, May 31
Data Science Techologies
Data Science Platforms: Deep Learning
Fri, May 31, 1:30 PM - 3:05 PM
Grand Ballroom E
 

Deep Learning Models at Scale with Apache Spark (305087)

Presentation

*Joseph Kurata Bradley, Databricks, Inc. 
Xiangrui Meng, Databricks, Inc. 

Keywords: data management, deep learning, Apache Spark, TensorFlow, big data

Data is the key ingredient to building high-quality, production AI applications. In the training phase, more and higher-quality data enables fitting better models. In the production phase, managing input data and detecting changes in inputs and predictions are critical to understanding model behavior and maintaining a production application.

Data management and machine learning tools have separately seen great advances in recent years. In this presentation, we will discuss several efforts from Databricks, in Apache Spark, as well as other open source projects, to unify data and AI in order to simplify building production AI applications. Using an example of training and scoring a TensorFlow model on a Spark cluster, we will describe how Data Scientists can benefit from these unification efforts.