Online Program

Return to main conference page
Friday, May 18
Data Science
Data Science Platforms II
Fri, May 18, 3:30 PM - 5:00 PM
Grand Ballroom G

The Unified Analytics Platform: Unifying Big Data Workloads in Apache Spark (304550)


*Hossein Falaki, Databricks 

Keywords: Apache Spark, Data Science, Platform, Notebooks

Apache Spark was designed to offer a unified engine to support diverse workloads, such as SQL, graph processing, iterative machine learning, streaming, and batch data processing. Although this approach may seem counterintuitive, it offers some unique benefits—most important, applications can combine workloads in ways that are not possible with specialized engines. However, any data practitioner will tell you that a powerful engine does not make a car. Data science is a team sport involving diverse personalities: engineers, statisticians, analysts and managers. These teams require data & model management, version control, access control, resource management, security & user management, collaboration and many more features to effectively function. A unified analytics platform brings all these together.