Automatic Parallelization of ANSI C to CUDA C Programs

Jan Kwiatkowski¹⁷ &
Dzanan Bajgoric^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1572 Accesses
1 Citations

Abstract

Writing efficient general-purpose programs for Graphics Processing Units (GPU) is a complex task. In order to be able to program these processors efficiently, one has to understand their intricate architecture, memory subsystem as well as the interaction with the Central Processing Unit (CPU). The paper presents the GAP - an automatic parallelizer designed to translate sequential ANSI C code to parallel CUDA C programs. Developed and implemented compiler was tested on the series of ANSI C programs. The generated code performed very well, achieving significant speed-ups for the programs that expose high degree of data-parallelism. Thus, the idea of applying the automatic parallelization for generating the CUDA C code is feasible and realistic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GAP - General Autonomous Parallelizer for CUDA Environment

Implicit Data Layout Optimization for Portable Parallel Programming in C++

An Efficient Parallel Implementation of CPU Scheduling Algorithms Using Data Parallel Algorithms

References

Banerjee, U.: Loop Transformations for Restructuring Compilers: The Foundations. Kluwer Academic Publishers, New York (1993)
Book MATH Google Scholar
Banerjee, U.: Loop Transformations for Restructuring Compilers: Loop Parallelization. Kluwer Academic Publishers, New York (1994)
Google Scholar
Banerjee, U.: Loop Transformations for Restructuring Compilers: Dependence Analysis. Kluwer Academic Publishers, New York (1994)
Google Scholar
Zima, H., Chapman, B.: Supercompilers for Parallel and Vector Computers. ACM Press, New York (1991)
Google Scholar
Midkiff, S.M.: Automatic Parallelization: An Overview of Fundamental Compiler Techniques. Morgan Claypool Publishers, California (2012)
Google Scholar
Allen, R., Kennedy, K.: Automatic loop interchange. In: Proceedings of the SIGPLAN 1984 Symposium on Compiler Construction, Montreal, pp. 233–246 (1984)
Google Scholar
Allen, R.: Dependence analysis for subscripted variables and its application to program transformations. Ph.D. thesis. Department of Mathematical Sciences, Rice University, Houston (1983)
Google Scholar
Wolfe, M.J.: Advanced loop interchange. In: Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, Illinois, pp. 536–543 (1986)
Google Scholar
Wolfe, M.J.: Loop skewing: the wavefront method revisited. Int. J. Parallel Prog. 15(4), 279–293 (1986)
Article MATH Google Scholar
Quillere, F., Rajopadhye, S.V., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Prog. 28(5), 469–498 (2000)
Article Google Scholar
Bondhugula, U.K.R.: Effective automatic parallelization and locality optimization using the polyhedral model. Ph.D. thesis. The Ohio State University, Ohio (2010)
Google Scholar
Bastoul, C.: Improving data locality in static control programs. Ph.D. thesis. University Paris 6, Pierre et Marie Curie, France (2004)
Google Scholar
Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA code generation for affine programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11970-5_14
Chapter Google Scholar
Bajgoric, J.: Automatic parallelization of ANSI C to CUDA C programs. Master thesis. Wroclaw University of Science and Technology, Poland (2016)
Google Scholar

Download references

Acknowledgements

The authors are grateful to the Czestochowa University of Technology for granting access to GPU platforms provided by the MICLAB project No. POIG.02.03.00.24-093/13.

Author information

Authors and Affiliations

Department of Informatics, Faculty of Computer Science and Management, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Jan Kwiatkowski & Dzanan Bajgoric
ARM Norway, Olav Tryggvasons gate 39-41, 7011, Trondheim, Norway
Dzanan Bajgoric

Authors

Jan Kwiatkowski
View author publications
You can also search for this author in PubMed Google Scholar
Dzanan Bajgoric
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kwiatkowski .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kwiatkowski, J., Bajgoric, D. (2018). Automatic Parallelization of ANSI C to CUDA C Programs. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-78024-5_40
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Parallelization of ANSI C to CUDA C Programs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GAP - General Autonomous Parallelizer for CUDA Environment

Implicit Data Layout Optimization for Portable Parallel Programming in C++

An Efficient Parallel Implementation of CPU Scheduling Algorithms Using Data Parallel Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Parallelization of ANSI C to CUDA C Programs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GAP - General Autonomous Parallelizer for CUDA Environment

Implicit Data Layout Optimization for Portable Parallel Programming in C++

An Efficient Parallel Implementation of CPU Scheduling Algorithms Using Data Parallel Algorithms

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation