Preliminary Experience with OpenMP Memory Management Implementation

Adrien Roussel^12,13,14,
Patrick Carribault^12,14 &
Julien Jaeger^12,13,14

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12295))

Included in the following conference series:

International Workshop on OpenMP

591 Accesses
1 Citations

Abstract

Because of the evolution of compute units, memory heterogeneity is becoming popular in HPC systems. But dealing with such various memory levels often requires different approaches and interfaces. For this purpose, OpenMP 5.0 defines memory-management constructs to offer application developers the ability to tackle the issue of exploiting multiple memory spaces in a portable way. This paper proposes an overview of memory-management from applications to runtimes. Thus, we describe a convenient way to tune an application to include memory management constructs. We also detail a methodology to integrate them into an OpenMP runtime supporting multiple memory types (DDR, MCDRAM and NVDIMM). We implement our design into the MPC framework, while presenting some results on a realistic benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

Article 30 January 2021

Symmetric Memory Partitions in OpenSHMEM: A Case Study with Intel KNL

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

Notes

1.
Available at https://mpc.hpcframework.com/.
2.
See https://memkind.github.io/memkind for more information.
3.
Available at https://pmem.io.

References

Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23(2), 187–198 (2011). https://hal.inria.fr/inria-00550877
Article Google Scholar
Bhandari, K., Chakrabarti, D.R., Boehm, H.J.: Makalu: fast recoverable allocation of non-volatile memory. ACM SIGPLAN Not. 51(10), 677–694 (2016)
Article Google Scholar
Brunie, H., Jaeger, J., Carribault, P., Barthou, D.: Profile-guided scope-based data allocation method. In: MEMSYS 2018 - International Symposium on Memory Systems. Alexandria, United States (October 2018). https://hal.inria.fr/hal-01897917
Bueno, J., et al.: Productive programming of GPU clusters with OmpSs. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pp. 557–568 (2012)
Google Scholar
Butcher, N., Olivier, S.L., Berry, J., Hammond, S.D., Kogge, P.M.: Optimizing for KNL usage modes when data doesn’t fit in MCDRAM. In: Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3225058.3225116
Cantalupo, C., Venkatesan, V., Hammond, J., Czurlyo, K., Hammond, S.D.: memkind: an extensible heap memory manager for heterogeneous memory platforms and mixed memory policies. Tech. rep., Sandia National Lab (SNL-NM), Albuquerque, NM (United States) (2015)
Google Scholar
Carribault, P., Pérache, M., Jourdren, H.: Enabling low-overhead hybrid MPI/OpenMP parallelism with MPC. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., de Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 1–14. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13217-9_1
Chapter Google Scholar
Chandrasekar, K., Ni, X., Kale, L.V.: A memory heterogeneity-aware runtime system for bandwidth-sensitive HPC applications. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1293–1300 (May 2017)
Google Scholar
Demeshko, I., Salinger, A.G., Spotz, W.F., Tezaur, I.K., Guba, O., Heroux, M.A.: Towards performance-portability of the Albany finite element analysis code using the Kokkos library of Trilinos. Tech. rep., Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sandia (2016)
Google Scholar
DeVito, Z., et al.: Liszt: a domain specific language for building portable mesh-based PDE solvers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2011. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2063384.2063396
Edwards, H.C., Sunderland, D.: Kokkos array performance-portable manycore programming model. In: Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM 2012, pp. 1–10. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2141702.2141703
Gautier, T., Ferreira Lima, J.V., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: 27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Boston, Massachusetts, United States (May 2013). https://hal.inria.fr/hal-00799904
Goglin, B.: Exposing the locality of heterogeneous memory architectures to HPC applications. In: Proceedings of the Second International Symposium on Memory Systems, MEMSYS 2016, p. 30–39. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2989081.2989115
Goglin, B., Rubio Proaño, A.: Opportunities for partitioning non-volatile memory DIMMs between co-scheduled Jobs on HPC Nodes. In: Euro-Par 2019: Parallel Processing Workshops, Göttingen, Germany (August 2019). https://hal.inria.fr/hal-02173336
Huang, H.F., Jiang, T.: Design and implementation of flash based NVDIMM. In: 2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA), pp. 1–6. IEEE (2014)
Google Scholar
Iwabuchi, K., Lebanoff, L., Gokhale, M., Pearce, R.: Metall: a persistent memory allocator enabling graph processing. In: 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp. 39–44. IEEE (2019)
Google Scholar
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Tech. Rep. LLNL-TR-641973 (August 2013)
Google Scholar
Kayraklioglu, E., Chang, W., El-Ghazawi, T.: Comparative performance and optimization of chapel in modern manycore architectures. In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1105–1114 (May 2017)
Google Scholar
Kirk, R.O., Mudalige, G.R., Reguly, I.Z., Wright, S.A., Martineau, M.J., Jarvis, S.A.: Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 834–841 (September 2017)
Google Scholar
LLVM Foundation: LLVM Compiler Infrastructure, version 10.0.0 (2020). https://llvm.org/releases/download.html#10.0.0
Nagasaka, Y., Matsuoka, S., Azad, A., Buluç, A.: High-performance sparse matrix-matrix products on intel KNL and multicore architectures. In: Proceedings of the 47th International Conference on Parallel Processing Companion, ICPP 2018. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3229710.3229720
OpenMP Architecture Review Board: OpenMP application program interface version 5.0 (2018). https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
Pérache, M., Jourdren, H., Namyst, R.: MPC: a unified parallel runtime for clusters of NUMA machines. In: Luque, E., Margalef, T., Benítez, D. (eds.) Euro-Par 2008. LNCS, vol. 5168, pp. 78–88. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85451-7_9
Chapter Google Scholar
Reguly, I.Z., Mudalige, G.R., Giles, M.B.: Beyond 16GB: out-of-core stencil computations. In: Proceedings of the Workshop on Memory Centric Programming for HPC, MCHPC 2017, pp. 20–29. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3145617.3145619
Schwalb, D., Berning, T., Faust, M., Dreseler, M., Plattner, H.: nvm malloc: memory allocation for NVRAM. ADMS@ VLDB 15, 61–72 (2015)
Google Scholar
Sodani, A., et al.: Knights landing: second-generation Intel Xeon phi product. IEEE Micro 36(2), 34–46 (2016)
Article Google Scholar
Standard C++ Foundation: ISO International Standard ISO/IEC 14882:2017(E) - Programming Language C++ (2017). https://isocpp.org/std/the-standard
Valat, S., Pérache, M., Jalby, W.: Introducing kernel-level page reuse for high performance computing. In: Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC 2013. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2492408.2492414
Virouleau, P., Roussel, A., Broquedis, F., Gautier, T., Rastello, F., Gratien, J.-M.: Description, implementation and evaluation of an affinity clause for task directives. In: Maruyama, N., de Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 61–73. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45550-1_5
Chapter Google Scholar
Yoshida, T.: Fujitsu high performance CPU for the post-k computer. In: Hot Chips 30th Symposium (HCS) (August 2018)
Google Scholar

Download references

Acknowledgments

This work was performed under the Exascale Computing Research collaboration, with the support of CEA, Intel and UVSQ.

Author information

Authors and Affiliations

CEA, DAM, DIF, 91297, Arpajon, France
Adrien Roussel, Patrick Carribault & Julien Jaeger
Exascale Computing Research Laboratory, Bruyères-le-Châtel, France
Adrien Roussel & Julien Jaeger
CEA, Laboratoire en Informatique Haute Performance pour le Calcul et la simulation, Université Paris-Saclay, 91680, Bruyères-le-Châtel, France
Adrien Roussel, Patrick Carribault & Julien Jaeger

Authors

Adrien Roussel
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar
Julien Jaeger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrien Roussel .

Editor information

Editors and Affiliations

Texas Advanced Computing Center (TACC), Austin, TX, USA
Kent Milfeld
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Texas Advanced Computing Center (TACC), Austin, TX, USA
Lars Koesterke
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roussel, A., Carribault, P., Jaeger, J. (2020). Preliminary Experience with OpenMP Memory Management Implementation. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds) OpenMP: Portable Multi-Level Parallelism on Modern Systems. IWOMP 2020. Lecture Notes in Computer Science(), vol 12295. Springer, Cham. https://doi.org/10.1007/978-3-030-58144-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-58144-2_20
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58143-5
Online ISBN: 978-3-030-58144-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Preliminary Experience with OpenMP Memory Management Implementation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

Symmetric Memory Partitions in OpenSHMEM: A Case Study with Intel KNL

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Preliminary Experience with OpenMP Memory Management Implementation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

Symmetric Memory Partitions in OpenSHMEM: A Case Study with Intel KNL

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation