Highly-Efficient Wait-Free Synchronization

Panagiota Fatourou¹ &
Nikolaos D. Kallimanis²

509 Accesses
20 Citations
Explore all metrics

Abstract

We study a simple technique, originally presented by Herlihy (ACM Trans. Program. Lang. Syst. 15(5):745–770, 1993), for executing concurrently, in a wait-free manner, blocks of code that have been programmed for sequential execution and require significant synchronization in order to be performed in parallel. We first present an implementation of this technique, called Sim, which employs a collect object. We describe a simple implementation of a collect object from a single shared object that supports atomic Add (or XOR) in addition to read; this implementation has step complexity O(1). By plugging in to Sim this implementation, Sim exhibits constant step complexity as well. This allows us to derive lower bounds on the step complexity of implementations of several shared objects, like Add, XOR, collect, and snapshot objects, from LL/SC objects.

We then present a practical version of Sim, called PSim, which is implemented in a real shared-memory machine. From a theoretical perspective, PSim has worse step complexity than Sim, its theoretical analog; in practice though, we experimentally show that PSim is highly-efficient: it outperforms several state-of-the-art lock-based and lock-free synchronization techniques, and this given that it is wait-free, i.e. that it satisfies a stronger progress condition than all the algorithms that it outperforms. We have used PSim to get highly-efficient wait-free implementations of stacks and queues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Set-Linearizable Implementations from Read/Write Operations: Sets, Fetch &Increment, Stacks and Queues with Multiplicity

Article 07 December 2022

On Helping and Stacks

On Composition and Implementation of Sequential Consistency

Notes

We sometimes abuse notation and use op to denote an operation or an instance of it, depending on the context.
For simplicity we sometimes say that a request req by a thread p _i executes Attempt (or any other line of the pseudocode) meaning that the instance of SimApplyOp that is called by p _i for req executes Attempt (or any code line in reference). Moreover, when we refer to the execution interval of some request req, we mean the execution interval of the instance of SimApplyOp that is invoked with parameter req.
For clarity of the proof, we consider each $\mathit{req}_{j}^{i}$ as distinct.
For simplicity, we assume that d is a divisor of b, so that the d bits allocated to each thread are not split across two Add objects.
We remark that in the real code we use a pool of nC+1 structures, where C>1 is a small constant, for performance reasons. However, using a pool of just 2n+1 structures is enough to prove correctness. For code simplicity, Algorithm 2 uses 2n+2 such structures.
In the real code, Pool is implemented as a one-dimensional array, and S is an index indicating one of its elements.
This problem occurs when a thread p reads some value A from a shared variable and then some other thread p′ modifies the variable to the value B and back to A; when p begins execution again, it sees that the value of the variable has not changed and continues executing normally which might be incorrect.
We experimentally saw that MCS spin locks [27] have similar performance to CLH spin locks, so we present our results only for CLH locks.
The number of combining rounds determines how many times the combiner (in flat-combining) traverses the request list before it gives up serving other requests.

References

Afek, Y., Dauber, D., Touitou, D.: Wait-free made fast. In: Proceedings of the 27th ACM Symposium on Theory of Computing, pp. 538–547 (1995)
Google Scholar
Afek, Y., Stupp, G., Touitou, D.: Long-lived adaptive collect with applications. In: Proceedings of the 40th Symposium on Foundations of Computer Science, pp. 262–272 (1999)
Google Scholar
Amdahl, G.: Validity of the single processor approach to achieving large-scale computing capabilities. In: Proceedings of National Computer Conference of the American Federation of Information Processing Societies, pp. 483–485 (1967)
Google Scholar
Anderson, J.H., Moir, M.: Universal constructions for multi-object operations. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing, pp. 184–193 (1995)
Google Scholar
Anderson, J.H., Moir, M.: Universal constructions for large objects. IEEE Trans. Parallel Distrib. Syst. 10(12), 1317–1332 (1999)
Article Google Scholar
Attiya, H., Guerraoui, R., Ruppert, E.: Partial snapshot objects. In: Proceedings of the 20th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 336–343 (2008)
Google Scholar
Barnes, G.: A method for implementing lock-free shared data structures. In: Proceedings of the 5th ACM Symposium on Parallel Algorithms and Architectures, pp. 261–270 (1993)
Google Scholar
Berger, E.D., McKinley, K.S., Blumofe, R.D., Wilson, P.R.: Hoard: a scalable memory allocator for multithreaded applications. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 117–128 (2000)
Google Scholar
Chuong, P., Ellen, F., Ramachandran, V.: A universal construction for wait-free transaction friendly data structures. In: Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 335–344 (2010)
Google Scholar
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Blade Computing with the AMD Opteron Processor (Magny-Cours). Hot chips 21 (2009)
Craig, T.S.: Building FIFO and priority-queueing spin locks from atomic swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington (1993)
Fatourou, P., Kallimanis, N.D.: The RedBlue adaptive universal constructions. In: Proceedings of the 23rd International Symposium on Distributed Computing, pp. 127–141 (2009)
Google Scholar
Fatourou, P., Kallimanis, N.D.: Fast Implementations of Shared Objects using Fetch&Add. Tech. Rep. TR 02-2010, Department of Computer Science, University of Ioannina (2010)
Fatourou, P., Kallimanis, N.D.: A Highly-Efficient Wait-Free Universal Construction. Tech. Rep. TR 01-2011, Department of Computer Science, University of Ioannina (2011)
Fatourou, P., Kallimanis, N.D.: Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 257–266. ACM, New York (2012)
Google Scholar
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: The source code for flat-combining. http://github.com/mit-carbon/Flat-Combining
Hendler, D., Incze, I., Shavit, N., Tzafrir, M.: Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the 22nd Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 355–364 (2010)
Google Scholar
Hendler, D., Shavit, N., Yerushalmi, L.: A scalable lock-free stack algorithm. In: Proceedings of the 16th ACM Symposium on Parallel Algorithms and Architectures, pp. 206–215 (2004)
Google Scholar
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13, 124–149 (1991)
Article Google Scholar
Herlihy, M.: A methodology for implementing highly concurrent data objects. ACM Trans. Program. Lang. Syst. 15(5), 745–770 (1993)
Article Google Scholar
Herlihy, M., Luchangco, V., Moir, M.: Space- and time-adaptive nonblocking algorithms. Electron. Notes Theor. Comput. Sci. 78(0), 260–280 (2003)
Article Google Scholar
Herlihy, M., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
Article Google Scholar
Imbs, D., Raynal, M.: Help when needed, but no more: efficient Read/Write partial snapshot. In: Proceedings of the 23rd International Symposium on Distributed Computing, pp. 142–156 (2009)
Google Scholar
Jayanti, P.: A time complexity lower bound for randomized implementations of some shared objects. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 201–210 (1998)
Google Scholar
Kallimanis, N.D., Fatourou, P.: The source code for PSim. http://code.google.com/p/sim-universal-construction
Magnusson, P.S., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th International Parallel Processing Symposium, pp. 165–171 (1994)
Chapter Google Scholar
Mellor-Crummey, J.M., Scott, M.L.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)
Article Google Scholar
Michael, M.M., Scott, M.L.: Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pp. 267–275 (1996)
Google Scholar
Oyama, Y., Taura, K., Yonezawa, A.: Executing parallel programs with synchronization bottlenecks efficiently. In: Proceedings of International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, pp. 182–204 (1999)
Google Scholar
Shalev, O., Shavit, N.: Predictive log-synchronization. In: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, EuroSys ’06, pp. 305–315 (2006)
Chapter Google Scholar
Shavit, N., Zemach, A.: Combining funnels: a dynamic approach to software combining. J. Parallel Distrib. Comput. 60(11), 1355–1387 (2000)
Article MATH Google Scholar
Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall, New York (2006)
Google Scholar
Treiber, R.K.: Systems programming: Coping with parallelism. Technical Report RJ 5118, IBM Almaden Research Center, (1986)
Yew, P.C., Tzeng, N.F., Lawrie, D.H.: Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. 100(4), 388–395 (1987)
Google Scholar

Download references

Acknowledgements

A preliminary version of this paper, entitled “A highly-efficient wait-free universal construction”, appears in the Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’11), pp. 325–334, San Jose, California, USA, June 2011.

We would like to thank Dimitris Nikolopoulos, Angelos Bilas, and Manolis Katevenis for several useful discussions. We especially thank Dimitris Nikolopoulos for arranging the provision of access to some of the multi-core machines of the Department of Computer Science at Virginia Tech (VT) where we ran our experiments. Many thanks to Nir Shavit and Danny Hendler for providing the code of flat-combining. We would like to especially thank Nir Shavit for several fruitful discussions and for his valuable feedback on the paper. Thanks also to Faith Ellen for several useful comments on a preliminary version of this paper [13]. Moreover, we thank Victor Luchangco for his valuable feedback on the results and the presentation of the paper. Finally, we would like to thank the anonymous reviewers for their valuable feedback.

Nikolaos Kallimanis is supported by a PhD scholarship from Empirikion Foundation, Athens, Greece. The work of Panagiota Fatourou is supported by the European Commission under 7th Framework Program through the TransForm (FP7-MCITN-238639) and CumuloNimbo (FP7-STREP-257993) projects, and by the GSRT-ESPA project GreenVM (Excellence-1643).

Author information

Authors and Affiliations

Department of Computer Science, University of Crete & FORTH ICS, Heraklion, Greece
Panagiota Fatourou
Department of Computer Science, University of Ioannina, Ioannina, Greece
Nikolaos D. Kallimanis

Authors

Panagiota Fatourou
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaos D. Kallimanis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaos D. Kallimanis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fatourou, P., Kallimanis, N.D. Highly-Efficient Wait-Free Synchronization. Theory Comput Syst 55, 475–520 (2014). https://doi.org/10.1007/s00224-013-9491-y

Download citation

Published: 11 September 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s00224-013-9491-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Set-Linearizable Implementations from Read/Write Operations: Sets, Fetch &Increment, Stacks and Queues with Multiplicity

On Helping and Stacks

On Composition and Implementation of Sequential Consistency

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Highly-Efficient Wait-Free Synchronization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Set-Linearizable Implementations from Read/Write Operations: Sets, Fetch &Increment, Stacks and Queues with Multiplicity

On Helping and Stacks

On Composition and Implementation of Sequential Consistency

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation