Abstract:
|
In industry, production data science models are developed in order to consistently generate value by producing high-quality outputs based on their respective data inputs, running continuously over time. In practice, the value generated by a deployed model may decrease over time due to reasons including concept drift, or a change in the statistical properties of the modeled process. Letting go of additional research effort, and automatically retraining the model on a more up-to-date data set is a common mitigation strategy in machine learning operations (MLOps) to preserve production model value while shifting model research and development efforts to higher-value areas. However, in practice, automatic retraining in production, when performed without regard to possible common issues with either data or the model, may produce little benefit. Motivated by examples from Investment Management and simulated data, we discuss approaches to mitigate data and model selection risks, and motivate a more in-depth discussion on the tradeoffs between automatic and manual methods when retraining and redeploying models in production.
|