Nakano, 2013 - Google Patents

Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models

Nakano, 2013

View PDF

Document ID: 2189599752277053666
Author: Nakano K
Publication year: 2013
Publication venue: IEICE TRANSACTIONS on Information and Systems

External Links

Cited by

Snippet

The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and …

Continue reading at www.jstage.jst.go.jp (PDF) (other versions)

230000015654 memory 0 title abstract description 141

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result; Formation of operand address; Addressing modes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Nakano	2012	Simple memory machine models for GPUs
Nakano	2013	Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models
Nguyen et al.	2010	3.5-D blocking optimization for stencil computations on modern CPUs and GPUs
Nakano	2013	The hierarchical memory machine model for GPUs
EP3526665B1 (en)	2020-07-15	Sorting for data-parallel computing devices
Pereira et al.	2015	PSkel: A stencil programming framework for CPU‐GPU systems
Nakano	2012	An optimal parallel prefix-sums algorithm on the memory machine models for GPUs
Chen et al.	2015	A high-throughput neural network accelerator
Liu	2015	Parallel and scalable sparse basic linear algebra subprograms
Nakano	2012	Efficient implementations of the approximate string matching on the memory machine models
Takafuji et al.	2017	C2CU: a CUDA C program generator for bulk execution of a sequential algorithm
Man et al.	2013	The approximate string matching on the hierarchical memory machine, with performance evaluation
Kasagi et al.	2013	An optimal offline permutation algorithm on the hierarchical memory machine, with the GPU implementation
Zou et al.	2020	Massively simulating adiabatic bifurcations with FPGA to solve combinatorial optimization
Nakano	2014	Asynchronous memory machine models with barrier synchronization
Kasagi et al.	2013	Offline permutation algorithms on the discrete memory machine with performance evaluation on the GPU
Kasagi et al.	2012	An implementation of conflict-free offline permutation on the GPU
Nunes et al.	2015	A fast approximate string matching algorithm on GPU
Tani et al.	2014	Bulk execution of oblivious algorithms on the unified memory machine, with GPU implementation
Zhou et al.	2018	FASTCF: FPGA-based accelerator for stochastic-gradient-descent-based collaborative filtering
US20220036243A1 (en)	2022-02-03	Apparatus with accelerated machine learning processing
Nakano	2013	Sequential memory access on the unified memory machine with application to the dynamic programming
Nishimura et al.	2017	Accelerating the Smith-waterman algorithm using bitwise parallel bulk computation technique on GPU
Nakano et al.	2013	The random address shift to reduce the memory access congestion on the discrete memory machine
Lin et al.	2019	MERIT: Tensor transform for memory-efficient vision processing on parallel architectures