Keywords: data management, deep learning, Apache Spark, TensorFlow, big data
Data is the key ingredient to building high-quality, production AI applications. In the training phase, more and higher-quality data enables fitting better models. In the production phase, managing input data and detecting changes in inputs and predictions are critical to understanding model behavior and maintaining a production application.
Data management and machine learning tools have separately seen great advances in recent years. In this presentation, we will discuss several efforts from Databricks, in Apache Spark, as well as other open source projects, to unify data and AI in order to simplify building production AI applications. Using an example of training and scoring a TensorFlow model on a Spark cluster, we will describe how Data Scientists can benefit from these unification efforts.