Software pipelined execution of stream programs on GPUs
A Udupa, R Govindarajan… - … Symposium on Code …, 2009 - ieeexplore.ieee.org
A Udupa, R Govindarajan, MJ Thazhuthaveetil
2009 International Symposium on Code Generation and Optimization, 2009•ieeexplore.ieee.orgThe StreamIt programming model has been proposed to exploit parallelism in streaming
applications on general purpose multi-core architectures. This model allows programmers to
specify the structure of a program as a set of filters that act upon data, and a set of
communication channels between them. The StreamIt graphs describe task, data and
pipeline parallelism which can be exploited on modern graphics processing units (GPUs),
as they support abundant parallelism in hardware. In this paper, we describe the challenges …
applications on general purpose multi-core architectures. This model allows programmers to
specify the structure of a program as a set of filters that act upon data, and a set of
communication channels between them. The StreamIt graphs describe task, data and
pipeline parallelism which can be exploited on modern graphics processing units (GPUs),
as they support abundant parallelism in hardware. In this paper, we describe the challenges …
The StreamIt programming model has been proposed to exploit parallelism in streaming applications on general purpose multi-core architectures. This model allows programmers to specify the structure of a program as a set of filters that act upon data, and a set of communication channels between them. The StreamIt graphs describe task, data and pipeline parallelism which can be exploited on modern graphics processing units (GPUs), as they support abundant parallelism in hardware. In this paper, we describe the challenges in mapping StreamIt to GPUs and propose an efficient technique to software pipeline the execution of stream programs on GPUs. We formulate this problem - both scheduling and assignment of filters to processors - as an efficient integer linear program (ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipeline parallelism. Further it takes into consideration the synchronization and bandwidth limitations of GPUs, and yields speedups between 1.87X and 36.83X over a single threaded CPU.
ieeexplore.ieee.org
Showing the best result for this search. See all results