Keywords: forecasting, machine learning, data science, capacity planning, change points
Salesforce is a cloud-based enterprise software platform that supports a wide variety of customer use cases, from managing corporate sales teams to call centers. To support our diverse and growing customer base, Salesforce spends billions of dollars to build and operate first-party data centers and public cloud resources across the world. Since our customers' data is often business critical, it is imperative that our global infrastructure is always available.
In this presentation, we will share how the Salesforce infrastructure analytics team handles forecasting at scale in a multi-tenant infrastructure environment that is constantly growing and changing as our customers' workloads evolve. It as though you are flying in a Boeing 777 and mid-flight chairs are thrown out the window or a group of passengers with heavy luggages decides to join or a wing is removed, and you try to predict the flight direction or anticipate incidents or flight degradations. We thus need a forecasting system that can predict certain key metrics within the next year while readjusting for sudden changes.
Our capacity bottleneck metrics are diverse with different type of seasonalities or no seasonality, and often have changes in their growth trajectory. We forecast system metrics such as App and DB CPU Utilization, Storage, proxies for SAN IO, Customer Traffic, among others.
At such a large scale, it is impossible to manually evaluate, tune parameters, and perform model selection for each individual forecast. To quickly produce robust forecasts for new complex time series we built a forecasting package, Shepherd, that can handle metrics with varying seasonality, auto-detect level-shifts in the trend as well as changes in the trend using segmentation and hypothesis testing. The forecast is adjusted using change points as external regressors. Shepherd automatically chooses, among a range of algorithms by choosing the algorithm with the least MAPE and narrowest uncertainty intervals on the test dataset