Learning to Solve a Stochastic Orienteering Problem with Time Windows

Fynn Schmitt-Ulms¹²,
André Hottung¹³,
Meinolf Sellmann¹⁴ &
…
Kevin Tierney¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13621))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

785 Accesses
2 Citations

Abstract

Reinforcement learning (RL) has seen increasing success at solving a variety of combinatorial optimization problems. These techniques have generally been applied to deterministic optimization problems with few side constraints, such as the traveling salesperson problem (TSP) or capacitated vehicle routing problem (CVRP). With this in mind, the recent IJCAI AI for TSP competition challenged participants to apply RL to a difficult routing problem involving optimization under uncertainty and time windows. We present the winning submission to the challenge, which uses the policy optimization with multiple optima (POMO) approach combined with efficient active search and Monte Carlo roll-outs. We present experimental results showing that our proposed approach outperforms the second place approach by 1.7%. Furthermore, our computational results suggest that solving more realistic routing problems may not be as difficult as previously thought.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Combinatorial Learning in Traffic Management

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Article Open access 23 July 2021

Learning Heuristics for the TSP by Policy Gradient

Notes

1.
We note that the TDOP is abbreviated as the TD-OPSWTW in some works.
2.
Although not the focus of our research, our approach can also generate complete solutions using the expected travel time for the supervised learning track, and these tie the winning team’s solutions and generate them in less computation time.
3.
We assume the penalty is large enough ($p > r_i$) such that, in the version of the problem with recourse, we should always avoid the late arrival penalty at nodes.

References

Basso, R., Kulcsár, B., Sanchez-Diaz, I., Qu, X.: Dynamic stochastic electric vehicle routing with safe reinforcement learning. Transp. Res. Part E: Logistics Transp. Rev. 157, 102496 (2022)
Article Google Scholar
Bello, I., Pham, H., Le, Q.V., Norouzi, M., Bengio, S.: Neural Combinatorial Optimization with Reinforcement Learning. arXiv:1611.0 (2016)
Bengio, Y., Lodi, A., Prouvost, A.: Machine learning for combinatorial optimization: a methodological tour d’horizon. Eur. J. Oper. Res. 290, 405–421 (2020)
Google Scholar
Bliek, L., et al.: The first AI4TSP competition: learning to solve stochastic routing problems (2022). https://doi.org/10.48550/arXiv.2201.10453
Bono, G.: Deep multi-agent reinforcement learning for dynamic and stochastic vehicle routing problems. Ph.D. thesis, Université de Lyon (2020)
Google Scholar
Chen, X., Tian, Y.: Learning to perform local rewriting for combinatorial optimization. In: Advances in Neural Information Processing Systems, pp. 6278–6289 (2019)
Google Scholar
Choo, J., et al.: Simulation-guided beam search for neural combinatorial optimization (2022). https://doi.org/10.48550/arXiv.2207.06190
de O. da Costa, P.R., Rhuggenaath, J., Zhang, Y., Akcay, A.: Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In: Asian Conference on Machine Learning, pp. 465–480. PMLR (2020)
Google Scholar
Helsgaun, K.: An effective implementation of the Lin-Kernighan traveling salesman heuristic. Eur. J. Oper. Res. 126, 106–130 (2000)
Article MathSciNet MATH Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. U.S.A. 79(8), 2554–2558 (1982)
Article MathSciNet MATH Google Scholar
Hottung, A., Bhandari, B., Tierney, K.: Learning a latent search space for routing problems using variational autoencoders. In: International Conference on Learning Representations (2021)
Google Scholar
Hottung, A., Kwon, Y.D., Tierney, K.: Efficient active search for combinatorial optimization problems. In: International Conference on Learning Representations (2022)
Google Scholar
Hottung, A., Tierney, K.: Neural large neighborhood search for the capacitated vehicle routing problem. In: European Conference on Artificial Intelligence, pp. 443–450 (2020)
Google Scholar
Joe, W., Lau, H.C.: Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. Proc. Int. Conf. Autom. Plann. Sched. 30, 394–402 (2020)
Google Scholar
Joshi, C.K., Cappart, Q., Rousseau, L.M., Laurent, T.: Learning TSP requires rethinking generalization. In: Michel, L.D. (ed.) 27th International Conference on Principles and Practice of Constraint Programming (CP 2021). Leibniz International Proceedings in Informatics (LIPIcs), vol. 210, pp. 33:1–33:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany (2021)
Google Scholar
Joshi, C.K., Laurent, T., Bresson, X.: An efficient graph convolutional network technique for the travelling salesman problem. arXiv:1906.01227 (2019)
Kool, W., van Hoof, H., Gromicho, J., Welling, M.: Deep policy dynamic programming for vehicle routing problems. In: International Conference on Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pp. 190–213 (2022)
Google Scholar
Kool, W., van Hoof, H., Welling, M.: Attention, learn to solve routing problems! International Conference on Learning Representations (2019). https://doi.org/10.48550/arXiv.1803.08475
Kwon, Y.D., Choo, J., Kim, B., Yoon, I., Gwon, Y., Min, S.: POMO: policy optimization with multiple optima for reinforcement learning. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems. vol. 33, pp. 21188–21198. Curran Associates, Inc. (2020)
Google Scholar
Li, S., Yan, Z., Wu, C.: Learning to delegate for large-scale vehicle routing. In: Advances in Neural Information Processing Systems. 34 (2021)
Google Scholar
Sultana, N.N., Baniwal, V., Basumatary, A., Mittal, P., Ghosh, S., Khadilkar, H.: Fast approximate solutions using reinforcement learning for dynamic capacitated vehicle routing with time windows. arXiv preprint arXiv:2102.12088 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. Adv. Neural Inf. Process. Syst. 28, 2692–2700 (2015)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)
Article MATH Google Scholar
Williams, R.J., Peng, J.: Function optimization using connectionist reinforcement learning algorithms. Connection Sci. 3(3), 241–268 (1991)
Article Google Scholar
Wu, Y., Song, W., Cao, Z., Zhang, J., Lim, A.: Learning improvement heuristics for solving routing problems. IEEE Trans. Neural Netw. Learn. Syst. 33(9), 5057–5069 (2021)
Google Scholar
Xin, L., Song, W., Cao, Z., Zhang, J.: NeuroLKH: combining deep learning model with lin-kernighan-helsgaun heuristic for solving the traveling salesman problem. In: Advances in Neural Information Processing Systems. 34 (2021)
Google Scholar

Download references

Acknowledgments

Fynn Schmitt-Ulms was supported by the German Academic Exchange Service Research Internships in Science and Engineering (DAAD RISE) program. The computational experiments in this work have been performed using the Bielefeld GPU cluster. We thank the Bielefeld HPC.NRW team for their support.

Author information

Authors and Affiliations

McGill University, Montreal, Canada
Fynn Schmitt-Ulms
Decision and Operation Technologies Group, Bielefeld University, Bielefeld, Germany
André Hottung & Kevin Tierney
InsideOpt, Dover, DE, USA
Meinolf Sellmann

Authors

Fynn Schmitt-Ulms
View author publications
You can also search for this author in PubMed Google Scholar
André Hottung
View author publications
You can also search for this author in PubMed Google Scholar
Meinolf Sellmann
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Tierney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Hottung .

Editor information

Editors and Affiliations

SBA Research, Vienna, Austria
Dimitris E. Simos
Moscow Aviation Institute (National Research University), Moscow, Russia
Varvara A. Rasskazova
Università degli Studi di Milano-Bicocca, Milan, Italy
Francesco Archetti
Wilfrid Laurier University, Waterloo, ON, Canada
Ilias S. Kotsireas
University of Florida, Gainesville, FL, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schmitt-Ulms, F., Hottung, A., Sellmann, M., Tierney, K. (2022). Learning to Solve a Stochastic Orienteering Problem with Time Windows. In: Simos, D.E., Rasskazova, V.A., Archetti, F., Kotsireas, I.S., Pardalos, P.M. (eds) Learning and Intelligent Optimization. LION 2022. Lecture Notes in Computer Science, vol 13621. Springer, Cham. https://doi.org/10.1007/978-3-031-24866-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-24866-5_8
Published: 05 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24865-8
Online ISBN: 978-3-031-24866-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Solve a Stochastic Orienteering Problem with Time Windows

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combinatorial Learning in Traffic Management

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Learning Heuristics for the TSP by Policy Gradient

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning to Solve a Stochastic Orienteering Problem with Time Windows

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Combinatorial Learning in Traffic Management

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

Learning Heuristics for the TSP by Policy Gradient

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation