Nakano, 2013 - Google Patents
Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine modelsNakano, 2013
View PDF- Document ID
- 2189599752277053666
- Author
- Nakano K
- Publication year
- Publication venue
- IEICE TRANSACTIONS on Information and Systems
External Links
Snippet
The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and …
- 230000015654 memory 0 title abstract description 141
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result; Formation of operand address; Addressing modes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nakano | Simple memory machine models for GPUs | |
Nakano | Optimal parallel algorithms for computing the sum, the prefix-sums, and the summed area table on the memory machine models | |
Nguyen et al. | 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs | |
Nakano | The hierarchical memory machine model for GPUs | |
EP3526665B1 (en) | Sorting for data-parallel computing devices | |
Pereira et al. | PSkel: A stencil programming framework for CPU‐GPU systems | |
Nakano | An optimal parallel prefix-sums algorithm on the memory machine models for GPUs | |
Chen et al. | A high-throughput neural network accelerator | |
Liu | Parallel and scalable sparse basic linear algebra subprograms | |
Nakano | Efficient implementations of the approximate string matching on the memory machine models | |
Takafuji et al. | C2CU: a CUDA C program generator for bulk execution of a sequential algorithm | |
Man et al. | The approximate string matching on the hierarchical memory machine, with performance evaluation | |
Kasagi et al. | An optimal offline permutation algorithm on the hierarchical memory machine, with the GPU implementation | |
Zou et al. | Massively simulating adiabatic bifurcations with FPGA to solve combinatorial optimization | |
Nakano | Asynchronous memory machine models with barrier synchronization | |
Kasagi et al. | Offline permutation algorithms on the discrete memory machine with performance evaluation on the GPU | |
Kasagi et al. | An implementation of conflict-free offline permutation on the GPU | |
Nunes et al. | A fast approximate string matching algorithm on GPU | |
Tani et al. | Bulk execution of oblivious algorithms on the unified memory machine, with GPU implementation | |
Zhou et al. | FASTCF: FPGA-based accelerator for stochastic-gradient-descent-based collaborative filtering | |
US20220036243A1 (en) | Apparatus with accelerated machine learning processing | |
Nakano | Sequential memory access on the unified memory machine with application to the dynamic programming | |
Nishimura et al. | Accelerating the Smith-waterman algorithm using bitwise parallel bulk computation technique on GPU | |
Nakano et al. | The random address shift to reduce the memory access congestion on the discrete memory machine | |
Lin et al. | MERIT: Tensor transform for memory-efficient vision processing on parallel architectures |