SDSS 2018 will offer one full-day and four half-day courses on Wednesday, May 16. Short courses are ticketed events that require an additional fee.
8:00 a.m. – 5:30 p.m.
SC1 - Data Science Workflows Using R and Spark
Instructor(s): Jim Harner, West Virginia University
This short course covers the data science process using R as a programming language and Spark as a big data platform. Powerful workflows are developed using the tidyr, dplyr, ggplot2, and sparklyr packages. Examples show how data are transported to and extracted from persistent data stores such as the Hadoop Distributed File System (HDFS), NoSQL databases, and relational databases. These data-based workflows extend to machine learning algorithms, model evaluation, and data visualization. TensorFlow for deep learning is introduced. Big data architectures are discussed, including the Docker containers used for building the course infrastructure called rspark (https://github.com/jharner/rspark). Attendees can optionally install Docker containers on their desktops or deploy them to Amazon Web Services (AWS) prior to the course (see the rspark repo).
8:00 a.m. – 12:00 p.m.
SC2 - H2O AutoML
Instructor(s): Navdeep Gill, H2O.ai
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. Although H2O has made it easier for practitioners to train and deploy machine learning models at scale, there is still a fair bit of knowledge and background in data science required to produce high-performing machine learning models. Deep neural networks, in particular, are notoriously difficult for a non-expert to tune properly. In this course, we provide an overview of the field of “Automatic Machine Learning” and introduce the new AutoML functionality in H2O. H2O’s AutoML provides an easy-to-use interface that automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model, which, in most cases, will be the top-performing model in the AutoML Leaderboard. H2O AutoML is available in all the H2O interfaces, including the H2O R package, Python module, and Flow web GUI. We will also provide code examples to get you started using AutoML.
SC3 - End-to-End Machine Learning and Model Deployment in SAS Viya
Instructor(s): Carlos Pinheiro, SAS & Data Science Tech Institute, France
In this course, you will learn the latest groundbreaking interface from SAS, which uses a pipeline flow approach to do the following:
- Access, manage, and explore data
- Develop and compare models
- Generate and register score code
- Publish champion models in a database
- Export score code to files
1:30 p.m. – 5:30 p.m.
SC4 - Cloudera Data Science Workbench (CDSW)
Join your peers in a Cloudera-hosted short course to discuss the data science needs across your organization.
Machine learning and data science are all about the data, but it’s often out of reach for analytics teams working at scale. Together we’ll explore how to leverage powerful open source tools to create a machine learning mixture that balances data scientists’ need for data access and flexible tooling with IT needs for security and governance. Cloudera Data Science Workbench enables fast, easy, and secure self-service data science in a collaborative environment.
Ultimately, you’ll walk away prepared to discover a new way to find value in your data and deliver increased value to your organization.
SC5 - Shiny Essentials
Instructor(s): Mine Cetinkaya-Rundel, Duke University & RStudio
Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host stand-alone apps on a webpage or embed them in R Markdown documents or build dashboards. This short course will introduce you to building web applications with Shiny, reactive programming, and customizing and deploying your apps for others to use. Please bring a laptop with you.