Abstract
Writing efficient general-purpose programs for Graphics Processing Units (GPU) is a complex task. In order to be able to program these processors efficiently, one has to understand their intricate architecture, memory subsystem as well as the interaction with the Central Processing Unit (CPU). The paper presents the GAP - an automatic parallelizer designed to translate sequential ANSI C code to parallel CUDA C programs. Developed and implemented compiler was tested on the series of ANSI C programs. The generated code performed very well, achieving significant speed-ups for the programs that expose high degree of data-parallelism. Thus, the idea of applying the automatic parallelization for generating the CUDA C code is feasible and realistic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banerjee, U.: Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, New York (1993)
Banerjee, U.: Loop Transformations for Restructuring Compilers: Loop Parallelization. Kluwer Academic Publishers, New York (1994)
Banerjee, U.: Loop Transformations for Restructuring Compilers: Dependence Analysis. Kluwer Academic Publishers, New York (1994)
Zima, H., Chapman, B.: Supercompilers for Parallel and Vector Computers. ACM Press, New York (1991)
Midkiff, S.M.: Automatic Parallelization: An Overview of Fundamental Compiler Techniques. Morgan Claypool Publishers, California (2012)
Allen, R., Kennedy, K.: Automatic loop interchange. In: Proceedings of the SIGPLAN 1984 Symposium on Compiler Construction, Montreal, pp. 233–246 (1984)
Allen, R.: Dependence analysis for subscripted variables and its application to program transformations. Ph.D. thesis. Department of Mathematical Sciences, Rice University, Houston (1983)
Wolfe, M.J.: Advanced loop interchange. In: Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, pp. 536–543 (1986)
Wolfe, M.J.: Loop skewing: the wavefront method revisited. Int. J. Parallel Prog. 15(4), 279–293 (1986)
Quillere, F., Rajopadhye, S.V., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Prog. 28(5), 469–498 (2000)
Bondhugula, U.K.R.: Effective automatic parallelization and locality optimization using the polyhedral model. Ph.D. thesis. The Ohio State University, Ohio (2010)
Bastoul, C.: Improving data locality in static control programs. Ph.D. thesis. University Paris 6, Pierre et Marie Curie, France (2004)
Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA code generation for affine programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11970-5_14
Bajgoric, J.: Automatic parallelization of ANSI C to CUDA C programs. Master thesis. Wroclaw University of Science and Technology, Poland (2016)
Acknowledgements
The authors are grateful to the Czestochowa University of Technology for granting access to GPU platforms provided by the MICLAB project No. POIG.02.03.00.24-093/13.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Kwiatkowski, J., Bajgoric, D. (2018). Automatic Parallelization of ANSI C to CUDA C Programs. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-78024-5_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)