[go: up one dir, main page]

Skip to main content
Log in

A multi-aspect online tuning framework for HPC applications

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Developing software applications for high-performance computing (HPC) requires careful optimizations targeting a myriad of increasingly complex, highly interrelated software, hardware and system components. The demands placed on minimizing energy consumption on extreme-scale HPC systems and the associated shift towards hete rogeneous architectures add yet another level of complexity to program development and optimization. As a result, the software optimization process is often seen as daunting, cumbersome and time-consuming by software developers wishing to fully exploit HPC resources. To address these challenges, we have developed the Periscope Tuning Framework (PTF), an online automatic integrated tuning framework that combines both performance analysis and performance tuning with respect to the myriad of tuning parameters available to today’s software developer on modern HPC systems. This work introduces the architecture, tuning model and main infrastructure components of PTF as well as the main tuning plugins of PTF and their evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. PTF web site: http://periscope.in.tum.de.

  2. The software environment on SuperMUC comprised the Intel Compiler 14, Parallel Environment 1.3, and OS SLE11 SP3. Details on SuperMUC can be found at: https://www.lrz.de/services/compute/supermuc/systemdescription

  3. Due to the large number of flags, exhaustive search has not been used. It would have required over 27000 experiments.

  4. Governors are processor policies to change frequency.

References

  • Bajrovic, E., Mijakovic, R., Dokulil, J., Benkner, S., & Gerndt, M. (2016). Tuning OpenCL applications with the periscope tuning framework, Hawaii international conference on system sciences. IEEE.

    Google Scholar 

  • Balaprakash, P., Tiwari, A., & Wild, S.M. (2013). Multi-objective optimization of hpc kernels for performance, power, and energy, 4th international workshop on performance modeling, benchmarking, and simulation of HPC systems (PMBS12), 11/2013.

    Google Scholar 

  • Benedict, S., Petkov, V., & Gerndt, M. (2010). Periscope: An online-based distributed performance analysis tool. In Müller, M.S., Resch, M.M., Schulz, A., & Nagel, W.E. (Eds.), Tools for high performance computing 2009 (pp. 1–16). Berlin Heidelberg: Springer.

    Google Scholar 

  • Bruel, P., Gonzalez, M., & Goldman, A. (2015). Autotuning gpu compiler parameters using opentuner. XXII Symposium of Systems of High Performance Computing.

  • Buck, B., & Hollingsworth, J.K. (2000). An api for runtime code patching. International Journal of High Performance Computing Applications, 14(4), 317–329.

    Article  Google Scholar 

  • CESAR. Proxy-apps. https://cesar.mcs.anl.gov/content/software.

  • Chen, C., Chame, J., & Hall, M. (2008). Chill: A framework for composing high-level loop transformations. Technical report University of Southern California.

  • Chung, I-H., & Hollingsworth, J.K. (2004). Using information from prior runs to improve automated tuning systems, Proceedings of the 2004 ACM/IEEE conference on supercomputing, SC ’04 (p. 30). Washington: IEEE Computer Society.

    Google Scholar 

  • CORAL. benchmarks. https://asc.llnl.gov/coral-benchmarks.

  • Costa, G., Jorba, J., Morajko, A., Margalef, T., & Luque, E. (2008). Performance models for dynamic tuning of parallel applications on computational grids, 2008 IEEE international conference on cluster computing (pp. 376–385).

    Chapter  Google Scholar 

  • Costa, G., Sikora, A., Jorba, J., & Gmate, T.M. (2014). Dynamic tuning of parallel applications in grid environment. Journal of Grid Computing, 12(2), 371–398.

    Article  Google Scholar 

  • Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., & Yelick, K. (2008). Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08 (pp. 4:1–4:12). Piscataway: IEEE Press.

  • Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R. C., & Yelick, K. (2005). Self-adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 293–312.

    Article  Google Scholar 

  • Frigo, M., & Johnson, S. G. (1998). Fftw: an adaptive software architecture for the fft. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (Vol. 3, pp. 1381–1384).

  • Frigo, M., & Johnson, S. G. (2005). The design and implementation of fftw3. Proceedings of the IEEE, 93(2), 216–231.

    Article  Google Scholar 

  • Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I., & O’Boyle, M. (2011). Milepost gcc Machine learning enabled self-tuning compiler. International Journal of Parallel Programming, 39(3), 296–327.

    Article  Google Scholar 

  • Gerndt, M., César, E., & Benkner, S. (eds.) (2015). Automatic tuning of HPC applications - the periscope tuning framework. Shaker Verlag.

  • Haneda, M., Knijnenburg, P. M. W., & Wijshoff, H.A.G. (2005). Automatic selection of compiler options using non-parametric inferential statistics, 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005 (pp. 123–132).

    Google Scholar 

  • Kukkonen, S., & Lampinen, J. (2005). Gde3: The third evolution step of generalized differential evolution. In The 2005 IEEE congress on evolutionary computation, 2005 (Vol. 1, pp. 443–450). IEEE.

  • Leather, H., Bonilla, E., & O’Boyle, M. (2009). Automatic feature generation for machine learning based optimizing compilation, Proceedings of the 7th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’09 (pp. 81–91). Washington: IEEE Computer Society.

    Chapter  Google Scholar 

  • Morajko, A., Caymes-Scutari, P., Margalef, T., & Mate, E. Luque. (2007). Monitoring, analysis and tuning environment for parallel/distributed applications. Concurrency and Computation: Practice and Experience, 19(11), 1517–1531.

    Article  MATH  Google Scholar 

  • Morajko, A., César, E., Caymes-Scutari, P., Margalef, T., Sorribes, J., & Luque, E. (2005). Automatic tuning of Master/Worker applications. In Proceedings of Euro-Par 2005 parallel processing: 11th international euro-par conference (pp. 95–103).

  • Navarette, C., Guillen, C., Hesse, W., & Brehm, M. (2014). Autotuning the energy consumption. In Bader, M. et al. (Eds.) Parallel computing accelerating computational science and engineering. IOS Press.

  • Nelson, Y. L., Bansal, B., Hall, M., Nakano, A., & Lerman, K. (2008). Model-guided performance tuning of parameter values A case study with molecular dynamics visualization, IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008 (pp. 1–8).

    Google Scholar 

  • Oleynik, Y., Gerndt, M., Schuchart, J., Kjeldsberg, P.G., & Nagel, W.E. (2015). Run-time exploitation of application dynamism for energy-efficient exascale computing (READEX). In IEEE 18th international conference on computational science and engineering (CSE), 2015 (pp. 347–350). IEEE.

  • OpenCL SDK User-guide. https://software.intel.com/sites/default/files/m/e/7/0/3/1/33857-Intel_28R_29_OpenCL_SDK_User_Guide.eps https://software.intel.com/sites/default/files/m/e/7/0/3/1/33857-Intel_28R_29_OpenCL_SDK_User_Guide.eps.

  • Pan, Z., & Eigenmann, R. (2006). Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the international symposium on code generation and optimization, CGO ’06 (pp. 319–332). Washington: IEEE Computer Society.

    Google Scholar 

  • Püschel, M., Moura, J.M. F., Singer, B., Xiong, J., Johnson, J., Padua, D., Veloso, M., & Johnson, R.W. (2004). Spiral: a generator for platform-adapted libraries of signal processing algorithms. International Journal of High Performance Computing Applications, 18(1), 21–45.

    Article  Google Scholar 

  • Ravipati, G., Bernat, A.R., Miller, B.P., & Hollingsworth, J.K. (2007). Towards the deconstruction of dyninst. Technical report. University of Wisconsin.

    Google Scholar 

  • Ribler, R.L., Simitci, H., & Reed, D.A. (2001). The autopilot performance-directed adaptive control system. Future Generation Computer Systems, 18(1), 175–187.

    Article  MATH  Google Scholar 

  • Ribler, R. L., Vetter, J. S., Simitci, H., & Reed, D. A. (1998). Autopilot: adaptive control of distributed applications, Proceedings of the 7th international symposium on high performance distributed computing, 1998 (pp. 172–179).

    Google Scholar 

  • Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., & Leiserson, C.E. (2011). The pochoir stencil compiler, Proceedings of the 23rd annual ACM symposium on parallelism in algorithms and architectures, SPAA ’11 (pp. 117–128). New York: ACM.

    Google Scholar 

  • Tiwari, A., Chen, C., Chame, J., Hall, M., & Hollingsworth, J.K. (2009). A scalable auto-tuning framework for compiler optimization, IEEE International symposium on parallel distributed processing, 2009. IPDPS 2009 (pp. 1–12).

    Google Scholar 

  • Tiwari, A., & Hollingsworth, J. K. (2011). Online adaptive code generation and tuning. In 2011 IEEE international parallel distributed processing symposium (IPDPS) (pp. 879–892).

  • Ţăpuş, C., Chung, I-H., & Hollingsworth, J.K. (2002). Active harmony: Towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on supercomputing, SC ’02 (pp. 1–11). Los Alamitos: IEEE Computer Society Press.

    Google Scholar 

  • The LLVM Compiler Infrastructure. http://llvm.org/.

  • Triantafyllis, S., Vachharajani, M., Vachharajani, N., & August, D.I. (2003). Compiler optimization-space exploration, Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03 (pp. 204–215). Washington: IEEE Computer Society.

    Google Scholar 

  • Vuduc, R., Demmel, J.W., & Yelick, K.A. (2005). Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1), 521.

    Google Scholar 

  • Whaley, R.C., Petitet, A., & Dongarra, J.J. (2001). Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(12), 3–35. New Trends in High Performance Computing.

    Article  MATH  Google Scholar 

  • Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52 (4), 65–76.

    Article  Google Scholar 

  • X-TUNE. Autotuning for exascale: self-tuning software to manage heterogeneity. http://ctop.cs.utah.edu/x-tune/.

  • Xiujuan, L., & Zhongke, S. (2004). Overview of multi-objective optimization methods. Journal of Systems Engineering and Electronics, 15(2), 142–146.

    Google Scholar 

Download references

Acknowledgments

This work was supported by the European Commission FP7 project AutoTune under grant no. 288038.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo César.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gerndt, M., Benkner, S., César, E. et al. A multi-aspect online tuning framework for HPC applications. Software Qual J 26, 1063–1096 (2018). https://doi.org/10.1007/s11219-017-9370-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-017-9370-x

Keywords

Navigation