Abstract
As applications grow in capability, they also grow in complexity. This complexity in turn gets pushed into modules and libraries. In addition, hardware configurations become increasingly elaborate, too. These two trends make understanding, debugging and analyzing the performance of applications more and more difficult.
To enable detailed insight into library usage of applications, we present an approach and implementation in Score-P that supports intuitive and robust creation of wrappers for arbitrary C/C++ libraries. Runtime analysis then uses these wrappers to keep track of how applications interact with libraries, how they interact with each other, and record the exact timing of their functions.
This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Creating a source file for each function and try compiling and linking it is a common technique among configure tools. Doing this in one compile-link-step would require parsing the output of each supported compiler and version, which is not portable across compilers and linkers.
- 5.
Not doing this would initially always wrap everything including functions from system headers.
- 6.
- 7.
References
Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurrency Comput.: Pract. Exp. 22(6), 685–701 (2010)
Arm MAP—Arm, August 2017. https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge/map
Balay, S., et al.: PETSc users manual revision 3.7. Technical report, Argonne National Lab. (ANL), Argonne, IL (United States) (2016)
Beazley, D.M., et al.: SWIG: an easy to use tool for integrating scripting languages with C and C++. In: TCL/TK Workshop (1996)
Brunst, H., Weber, M.: Custom hot spot analysis of HPC software with the vampir performance tool suite. In: Cheptsov, A., Brinkmann, S., Gracia, J., Resch, M., Nagel, W. (eds.) Tools for High Performance Computing, pp. 95–114. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37349-7_7
Butenhof, D.R.: Programming with POSIX Threads. Addison-Wesley Professional, Boston (1997)
Clang: a C language family frontend for LLVM, August 2017. http://clang.llvm.org
Google/clif: Wrapper generator foundation to wrap C++ for Python and other languages using LLVM, August 2017. https://github.com/google/clif
CTool library, August 2017. http://ctool.sourceforge.net
CUDA zone—NVIDIA developer, August 2017. https://developer.nvidia.com/cuda-zone
Dietrich, R., Ilsche, T., Juckeland, G.: Non-intrusive performance analysis of parallel hardware accelerated applications on hybrid architectures. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 135–143. IEEE (2010)
Edison design group, August 2017. http://edg.com
Eichenberger, A.E., et al.: OMPT: an OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_13
Extrae—BSC tools, August 2017. https://tools.bsc.es/extrae
Farhat, C., Roux, F.X.: A method of finite element tearing and interconnecting and its parallel solution algorithm. Int. J. Numer. Methods Eng. 32(6), 1205–1227 (1991)
Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM (2011)
Frigo, M., Johnson, S.G.: FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE (1998)
Hapla, V., Horak, D., Pospisil, L., Cermak, M., Vasatova, A., Sojka, R.: Solving contact mechanics problems with PERMON. In: Kozubek, T., Blaheta, R., Šístek, J., Rozložník, M., Čermák, M. (eds.) HPCSE 2015. LNCS, vol. 9611, pp. 101–115. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40361-8_7
Hilbrich, T., Schulz, M., de Supinski, B.R., Müller, M.S.: MUST: a scalable approach to runtime error detection in MPI programs. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds.) Tools for High Performance Computing, pp. 53–66. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-11261-4_5
Knüpfer, A., et al.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing, pp. 79–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Lemkul, J.A.: Gromacs tutorial: Lysozyme in water, September 2017. http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/index.html
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 15–24. ACM (2008)
de Melo, A.C.: Performance counters on Linux. In: Linux Plumbers Conference (2009)
Intel® math kernel library (intel® mkl)—intel® software, August 2017. https://software.intel.com/en-us/mkl
Message Passing Interface (MPI) forum, August 2017. http://mpi-forum.org
Profiler: CUDA toolkit documentation, August 2017. http://docs.nvidia.com/cuda/profiler-users-guide/index.html
Qt—cross-platform software development for embedded & desktop, August 2017. https://www.qt.io/
Shende, S., Malony, A.D., Spear, W., Schuchardt, K.: Characterizing I/O performance using the TAU performance system. In: PARCO, pp. 647–655 (2011)
Van Der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E., Berendsen, H.J.: GROMACS: fast, flexible, and free. J. Comput. Chem. 26(16), 1701–1718 (2005)
Intel® VTune™ Amplifier, August 2017. https://software.intel.com/en-us/intel-vtune-amplifier-xe
Acknowledgments
This research used resources of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory, which is supported by the Office of Science of the Department of Energy under Contract DE-AC05-00OR22725.
This work is supported in part by the German Research Foundation (DFG) within the CRC 912 - HAEC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Brendel, R., Wesarg, B., Tschüter, R., Weber, M., Ilsche, T., Oeste, S. (2019). Generic Library Interception for Improved Performance Measurement and Insight. In: Bhatele, A., Boehme, D., Levine, J., Malony, A., Schulz, M. (eds) Programming and Performance Visualization Tools. ESPT ESPT VPA VPA 2017 2018 2017 2018. Lecture Notes in Computer Science(), vol 11027. Springer, Cham. https://doi.org/10.1007/978-3-030-17872-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-17872-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17871-0
Online ISBN: 978-3-030-17872-7
eBook Packages: Computer ScienceComputer Science (R0)