Abstract
Developing software applications for high-performance computing (HPC) requires careful optimizations targeting a myriad of increasingly complex, highly interrelated software, hardware and system components. The demands placed on minimizing energy consumption on extreme-scale HPC systems and the associated shift towards hete rogeneous architectures add yet another level of complexity to program development and optimization. As a result, the software optimization process is often seen as daunting, cumbersome and time-consuming by software developers wishing to fully exploit HPC resources. To address these challenges, we have developed the Periscope Tuning Framework (PTF), an online automatic integrated tuning framework that combines both performance analysis and performance tuning with respect to the myriad of tuning parameters available to today’s software developer on modern HPC systems. This work introduces the architecture, tuning model and main infrastructure components of PTF as well as the main tuning plugins of PTF and their evaluation.
Similar content being viewed by others
Notes
PTF web site: http://periscope.in.tum.de.
The software environment on SuperMUC comprised the Intel Compiler 14, Parallel Environment 1.3, and OS SLE11 SP3. Details on SuperMUC can be found at: https://www.lrz.de/services/compute/supermuc/systemdescription
Due to the large number of flags, exhaustive search has not been used. It would have required over 27000 experiments.
Governors are processor policies to change frequency.
References
Bajrovic, E., Mijakovic, R., Dokulil, J., Benkner, S., & Gerndt, M. (2016). Tuning OpenCL applications with the periscope tuning framework, Hawaii international conference on system sciences. IEEE.
Balaprakash, P., Tiwari, A., & Wild, S.M. (2013). Multi-objective optimization of hpc kernels for performance, power, and energy, 4th international workshop on performance modeling, benchmarking, and simulation of HPC systems (PMBS12), 11/2013.
Benedict, S., Petkov, V., & Gerndt, M. (2010). Periscope: An online-based distributed performance analysis tool. In Müller, M.S., Resch, M.M., Schulz, A., & Nagel, W.E. (Eds.), Tools for high performance computing 2009 (pp. 1–16). Berlin Heidelberg: Springer.
Bruel, P., Gonzalez, M., & Goldman, A. (2015). Autotuning gpu compiler parameters using opentuner. XXII Symposium of Systems of High Performance Computing.
Buck, B., & Hollingsworth, J.K. (2000). An api for runtime code patching. International Journal of High Performance Computing Applications, 14(4), 317–329.
CESAR. Proxy-apps. https://cesar.mcs.anl.gov/content/software.
Chen, C., Chame, J., & Hall, M. (2008). Chill: A framework for composing high-level loop transformations. Technical report University of Southern California.
Chung, I-H., & Hollingsworth, J.K. (2004). Using information from prior runs to improve automated tuning systems, Proceedings of the 2004 ACM/IEEE conference on supercomputing, SC ’04 (p. 30). Washington: IEEE Computer Society.
CORAL. benchmarks. https://asc.llnl.gov/coral-benchmarks.
Costa, G., Jorba, J., Morajko, A., Margalef, T., & Luque, E. (2008). Performance models for dynamic tuning of parallel applications on computational grids, 2008 IEEE international conference on cluster computing (pp. 376–385).
Costa, G., Sikora, A., Jorba, J., & Gmate, T.M. (2014). Dynamic tuning of parallel applications in grid environment. Journal of Grid Computing, 12(2), 371–398.
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., & Yelick, K. (2008). Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08 (pp. 4:1–4:12). Piscataway: IEEE Press.
Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R. C., & Yelick, K. (2005). Self-adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 293–312.
Frigo, M., & Johnson, S. G. (1998). Fftw: an adaptive software architecture for the fft. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (Vol. 3, pp. 1381–1384).
Frigo, M., & Johnson, S. G. (2005). The design and implementation of fftw3. Proceedings of the IEEE, 93(2), 216–231.
Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I., & O’Boyle, M. (2011). Milepost gcc Machine learning enabled self-tuning compiler. International Journal of Parallel Programming, 39(3), 296–327.
Gerndt, M., César, E., & Benkner, S. (eds.) (2015). Automatic tuning of HPC applications - the periscope tuning framework. Shaker Verlag.
Haneda, M., Knijnenburg, P. M. W., & Wijshoff, H.A.G. (2005). Automatic selection of compiler options using non-parametric inferential statistics, 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005 (pp. 123–132).
Kukkonen, S., & Lampinen, J. (2005). Gde3: The third evolution step of generalized differential evolution. In The 2005 IEEE congress on evolutionary computation, 2005 (Vol. 1, pp. 443–450). IEEE.
Leather, H., Bonilla, E., & O’Boyle, M. (2009). Automatic feature generation for machine learning based optimizing compilation, Proceedings of the 7th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’09 (pp. 81–91). Washington: IEEE Computer Society.
Morajko, A., Caymes-Scutari, P., Margalef, T., & Mate, E. Luque. (2007). Monitoring, analysis and tuning environment for parallel/distributed applications. Concurrency and Computation: Practice and Experience, 19(11), 1517–1531.
Morajko, A., César, E., Caymes-Scutari, P., Margalef, T., Sorribes, J., & Luque, E. (2005). Automatic tuning of Master/Worker applications. In Proceedings of Euro-Par 2005 parallel processing: 11th international euro-par conference (pp. 95–103).
Navarette, C., Guillen, C., Hesse, W., & Brehm, M. (2014). Autotuning the energy consumption. In Bader, M. et al. (Eds.) Parallel computing accelerating computational science and engineering. IOS Press.
Nelson, Y. L., Bansal, B., Hall, M., Nakano, A., & Lerman, K. (2008). Model-guided performance tuning of parameter values A case study with molecular dynamics visualization, IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008 (pp. 1–8).
Oleynik, Y., Gerndt, M., Schuchart, J., Kjeldsberg, P.G., & Nagel, W.E. (2015). Run-time exploitation of application dynamism for energy-efficient exascale computing (READEX). In IEEE 18th international conference on computational science and engineering (CSE), 2015 (pp. 347–350). IEEE.
OpenCL SDK User-guide. https://software.intel.com/sites/default/files/m/e/7/0/3/1/33857-Intel_28R_29_OpenCL_SDK_User_Guide.eps https://software.intel.com/sites/default/files/m/e/7/0/3/1/33857-Intel_28R_29_OpenCL_SDK_User_Guide.eps.
Pan, Z., & Eigenmann, R. (2006). Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the international symposium on code generation and optimization, CGO ’06 (pp. 319–332). Washington: IEEE Computer Society.
Püschel, M., Moura, J.M. F., Singer, B., Xiong, J., Johnson, J., Padua, D., Veloso, M., & Johnson, R.W. (2004). Spiral: a generator for platform-adapted libraries of signal processing algorithms. International Journal of High Performance Computing Applications, 18(1), 21–45.
Ravipati, G., Bernat, A.R., Miller, B.P., & Hollingsworth, J.K. (2007). Towards the deconstruction of dyninst. Technical report. University of Wisconsin.
Ribler, R.L., Simitci, H., & Reed, D.A. (2001). The autopilot performance-directed adaptive control system. Future Generation Computer Systems, 18(1), 175–187.
Ribler, R. L., Vetter, J. S., Simitci, H., & Reed, D. A. (1998). Autopilot: adaptive control of distributed applications, Proceedings of the 7th international symposium on high performance distributed computing, 1998 (pp. 172–179).
Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., & Leiserson, C.E. (2011). The pochoir stencil compiler, Proceedings of the 23rd annual ACM symposium on parallelism in algorithms and architectures, SPAA ’11 (pp. 117–128). New York: ACM.
Tiwari, A., Chen, C., Chame, J., Hall, M., & Hollingsworth, J.K. (2009). A scalable auto-tuning framework for compiler optimization, IEEE International symposium on parallel distributed processing, 2009. IPDPS 2009 (pp. 1–12).
Tiwari, A., & Hollingsworth, J. K. (2011). Online adaptive code generation and tuning. In 2011 IEEE international parallel distributed processing symposium (IPDPS) (pp. 879–892).
Ţăpuş, C., Chung, I-H., & Hollingsworth, J.K. (2002). Active harmony: Towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on supercomputing, SC ’02 (pp. 1–11). Los Alamitos: IEEE Computer Society Press.
The LLVM Compiler Infrastructure. http://llvm.org/.
Triantafyllis, S., Vachharajani, M., Vachharajani, N., & August, D.I. (2003). Compiler optimization-space exploration, Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03 (pp. 204–215). Washington: IEEE Computer Society.
Vuduc, R., Demmel, J.W., & Yelick, K.A. (2005). Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1), 521.
Whaley, R.C., Petitet, A., & Dongarra, J.J. (2001). Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(12), 3–35. New Trends in High Performance Computing.
Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52 (4), 65–76.
X-TUNE. Autotuning for exascale: self-tuning software to manage heterogeneity. http://ctop.cs.utah.edu/x-tune/.
Xiujuan, L., & Zhongke, S. (2004). Overview of multi-objective optimization methods. Journal of Systems Engineering and Electronics, 15(2), 142–146.
Acknowledgments
This work was supported by the European Commission FP7 project AutoTune under grant no. 288038.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gerndt, M., Benkner, S., César, E. et al. A multi-aspect online tuning framework for HPC applications. Software Qual J 26, 1063–1096 (2018). https://doi.org/10.1007/s11219-017-9370-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-017-9370-x