Forecasting demand and understanding sales drivers are one of the most important tasks in retail analytics. However, traditionally, linear models and/or models with a small number of predictors have been predominantly used in sales modeling. Taking into account that real-world demand is naturally determined by complex substitution and complementation patterns among a large number of interrelated SKUs, nonlinear effects of prices, promotions, seasonality, as well as many other factors, their lagged values, and interactions, a realistic model has to be able to account for all that. We propose a conceptual model for sales modeling based on standard POS data available to any retailer and generate almost 500 potentially useful predictors of a focal SKU’s sales accordingly. In our comparison of three classes of models, Gradient Boosting Machines outperformed Random Forests and Elastic nets. By using interpretable machine learning methods, we came up with actionable insights related to the importance of various groups of predictors from the conceptual model, as well as demonstrated how helpful it can be for marketing managers to decompose predictions into the effects of individual regressors by using an approximation of Shapley values for feature attribution.

Similar content being viewed by others
Ailawadi, K.L., B.A. Harlam, J. Cesar, and D. Trounce. 2006. Promotion profitability for a retailer: The role of promotion, brand, category, and store characteristics. Journal of Marketing Research 43: 518–535.
Ailawadi, K.L., B.A. Harlam, J. César, and D. Trounce. 2007. Practice prize Report: Quantifying and improving promotion effectiveness at CVS. Marketing Science 26: 566–575.
Ali, Ö.G., S. Sayin, T. Van Woensel, and J. Fransoo. 2009. SKU demand forecasting in the presence of promotions. Expert Systems with Applications 36: 12340–12348.
Andrews, R.L., I.S. Currim, P. Leeflang, and J. Lim. 2008. Estimating the SCAN*PRO model of store sales: HB, FM or just OLS? International Journal of Research in Marketing 25: 22–33.
Bajari, P., D. Nekipelov, S.P. Ryan, and M. Yang. 2015. Machine learning methods for demand estimation. The American Economic Review 105: 481–485.
Bohanec, M., M.K. Borštnar, and M. Robnik-Šikonja. 2017. Explaining machine learning models in sales predictions. Expert Systems with Applications 71: 416–428.
Bradlow, E.T., M. Gangwar, P. Kopalle, and S. Voleti. 2017. The role of big data and predictive analytics in retailing. Journal of Retailing 93: 79–95.
Breiman, L. 1984. Classification and regression trees. Boca Raton: Chapman & Hall/CRC.
Breiman, L. 2001. Random forests. Machine Learning 45: 5–32.
Einav, L., and J. Levin. 2014. Economics in the age of big data. Science (80-) 346: 1243089.
Ferreira, K.J., B.H.A. Lee, and D. Simchi-Levi. 2015. Analytics for an online retailer: Demand forecasting and price optimization. Manufacturing & Service Operations Management 18: 69–88.
Friedman, J., T. Hastie, and R. Tibshirani. 2010. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33: 1.
Friedman, J.H. 2001. Greedy function approximation: a gradient boosting machine. Annals of Statistics 29: 1189–1232.
Gedenk, K. 2018. Retailer promotions. In Handbook of Research on Retailing, ed. K. Gedenk. Cheltenham: EdwardElgar Publishing.
Haupt, H., K. Kagerer, and W.J. Steiner. 2014. Smooth quantile-based modeling of brand sales, price and promotional effects from retail scanner panels. Journal of Applied Economics 29: 1007–1028.
Lundberg, S.M., and S.-I. Lee. 2017. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems. pp. 4765–4774.
Ma, S., and R. Fildes. 2017. A retail store SKU promotions optimization model for category multi-period profit maximization. European Journal of Operational Research 260: 680–692.
Ma, S., R. Fildes, and T. Huang. 2016. Demand forecasting with high dimensional data: The case of SKU retail sales forecasting with intra-and inter-category promotional information. European Journal of Operational Research 249: 245–257.
Molnar, C., 2018. Interpretable machine learning: A guide for making black box models explainable. Leanpub.
Ozhegov, E., and D. Teterina. 2018. The ensemble method for censored demand prediction. High. Sch. Econ. Res. Pap. No. WP BRP 200.
Rifkin, R., and A. Klautau. 2004. In defense of one-vs-all classification. Journal of Machine Learning Research 5: 101–141.
Štrumbelj, E., and I. Kononenko. 2014. Explaining prediction models and individual. predictions with feature contributions. Knowledge and Information Systems 41: 647–665.
Sun, Z.-L., T.-M. Choi, K.-F. Au, and Y. Yu. 2008. Sales forecasting using extreme learning machine with applications in fashion retailing. Decision Support Systems 46: 411–419.
Van Heerde, H.J., P.S.H. Leeflang, and D.R. Wittink. 2002. How promotions work: SCAN* PRO-based evolutionary model building. Schmalenbach Business Review 54: 198–220.
Van Heerde, H.J., P.S.H. Leeflang, and D.R. Wittink. 2004. Decomposing the sales promotion bump with store data. Marketing Science 23: 317–334.
Varian, H.R. 2014. Big data: New tricks for econometrics. Journal of Economic Perspective 28: 3–27.
Wittink, D.R., M.J. Addona, W.J. Hawkes, and J.C. Porter. 1988. SCAN*PRO: The estimation, validation and use of promotional effects based on scanner data. Ithaca: Cornell University.
Yang, D., and A.N. Zhang. 2018. Forecast UPC-level FMCG demand, Part IV: statistical ensemble. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 3180–3185.
The research was supported by the Russian Science Foundation (Project № 18-71-00119).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Antipov, E.A., Pokryshevskaya, E.B. Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values. J Revenue Pricing Manag 19, 355–364 (2020). https://doi.org/10.1057/s41272-020-00236-4
Issue Date:
DOI: https://doi.org/10.1057/s41272-020-00236-4