|Friday, February 19|
|CS16 Model Deployment and Diagnostics||
Fri, Feb 19, 3:45 PM - 5:15 PM
Statistical Models in Production: A Taxonomy of Deployment Methods (303177)*Neal Fultz, OpenMail
Keywords: Data Engineering, Deployment
Most data analysis projects progress through the following phases:
Data preparation and cleaning (ETL) Exploratory analysis (EDA) Statistical Modeling (Train) ? Model Evaluation (Test) Decision-making in production environment (Deploy)
This talk focuses on the final phase, and presents an overview of the different ways statistical models may be used in production: realtime vs batch, push vs pull, transaction processing vs data warehousing, and discusses the costs and benefits of different methods. Large projects may have entirely separate teams for deployment, while smaller ones may have only one person for all the phases. A statistician embedded in a larger team can benefit from understanding how the engineering requirements can impact the other three phases of analysis, and the solo practitioner needs to choose the best way to deploy the model to maximize business value, and implement it. This talk will include real examples in R and Python, but the concepts should be generally useful.