Online Program

Friday, February 19
CS16 Model Deployment and Diagnostics Fri, Feb 19, 3:45 PM - 5:15 PM
Emerald

Statistical Models in Production: A Taxonomy of Deployment Methods (303177)

*Neal Fultz, OpenMail 

Keywords: Data Engineering, Deployment

Most data analysis projects progress through the following phases:

Data preparation and cleaning (ETL) Exploratory analysis (EDA) Statistical Modeling (Train) ? Model Evaluation (Test) Decision-making in production environment (Deploy)

This talk focuses on the final phase, and presents an overview of the different ways statistical models may be used in production: realtime vs batch, push vs pull, transaction processing vs data warehousing, and discusses the costs and benefits of different methods. Large projects may have entirely separate teams for deployment, while smaller ones may have only one person for all the phases. A statistician embedded in a larger team can benefit from understanding how the engineering requirements can impact the other three phases of analysis, and the solo practitioner needs to choose the best way to deploy the model to maximize business value, and implement it. This talk will include real examples in R and Python, but the concepts should be generally useful.