Zohouri et al., 2016 - Google Patents

Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs

Zohouri et al., 2016

Document ID: 6125461478272480784
Author: Zohouri H; Maruyama N; Smith A; Matsuda M; Matsuoka S
Publication year: 2016
Publication venue: SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

External Links

Cited by

Snippet

We evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU …

Continue reading at www.ccrc.wustl.edu (PDF) (other versions)

238000000034 method 0 abstract description 26

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. incrementing the instruction counter, jump
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G06F17/5054—Circuit design for user-programmable logic devices, e.g. field programmable gate arrays [FPGA]
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/68—Processors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application

Similar Documents

Publication	Publication Date	Title
Zohouri et al.	2016	Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs
Waidyasooriya et al.	2016	OpenCL-based FPGA-platform for stencil computation and its optimization methodology
Löff et al.	2021	The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures
Sengupta et al.	2010	Efficient Parallel Scan Algorithms for Manycore GPUs.
Yu et al.	2018	S2FA: An accelerator automation framework for heterogeneous computing in datacenters
Krommydas et al.	2016	Opendwarfs: Characterization of dwarf-based benchmarks on fixed and reconfigurable architectures
Rawat et al.	2015	SDSLc: A multi-target domain-specific compiler for stencil computations
Wang et al.	2017	A comprehensive framework for synthesizing stencil algorithms on FPGAs using OpenCL model
Kristien et al.	2019	High-level synthesis of functional patterns with Lift
Papakonstantinou et al.	2013	Efficient compilation of CUDA kernels for high-performance computing on FPGAs
Charara et al.	2019	Batched triangular dense linear algebra kernels for very small matrix sizes on GPUs
Verma et al.	2016	Accelerating workloads on fpgas via opencl: A case study with opendwarfs
Lambert et al.	2018	Directive-based, high-level programming and optimizations for high-performance computing with FPGAs
Tithi et al.	2014	Exploiting spatial architectures for edit distance algorithms
Petrovič et al.	2023	Kernel tuning toolkit
CN105260222B (en)	2018-10-26	Start spacing optimization method between cycle flowing water iteration in a kind of reconfigurable compiling device
Brown	2021	Porting incompressible flow matrix assembly to FPGAs for accelerating HPC engineering simulations
Chen et al.	2015	MetaFork: a compilation framework for concurrency models targeting hardware accelerators and its application to the generation of parametric CUDA kernels.
Vizcaino et al.	2024	Graph computing on long vector architectures (yes, it works!)
Lambert et al.	2020	In-depth optimization with the OpenACC-to-FPGA framework on an Arria 10 FPGA
Deest et al.	2017	One size does not fit all: Implementation trade-offs for iterative stencil computations on FPGAs
Zhao et al.	2015	Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions
Jin et al.	2018	Nuclear reactor simulation on OpenCL FPGA: A case study of RSBench
Shao et al.	2014	Map-reduce inspired loop parallelization on CGRA
Tang	2016	Fpga based acceleration of matrix decomposition and clustering algorithm using high level synthesis