Endo, 2018 - Google Patents

Applying recursive temporal blocking for stencil computations to deeper memory hierarchy

Endo, 2018

Document ID: 15976893884961054971
Author: Endo T
Publication year: 2018
Publication venue: 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

External Links

Cited by

Snippet

Recent high performance computer architecture has deeper memory hierarchy including 3D stacking memory and non-volatile memory. In order to achieve higher application performance, optimizations in application algorithm level are required. This paper takes …

Continue reading at www.el.gsic.titech.ac.jp (PDF) (other versions)

230000002123 temporal effect 0 title abstract description 41

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Free address space management in non-volatile memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/885—Monitoring specific for caches

Similar Documents

Publication	Publication Date	Title
Ghose et al.	2019	Processing-in-memory: A workload-driven perspective
Aktulga et al.	2014	Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations
Akin et al.	2015	Data reorganization in memory using 3D-stacked DRAM
Jiang et al.	1997	Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors
Liu et al.	2018	Get out of the valley: Power-efficient address mapping for GPUs
Pena et al.	2014	Toward the efficient use of multiple explicitly managed memory subsystems
US20180004709A1 (en)	2018-01-04	System and method for gpu maximum register count optimization applied to general matrix-matrix multiplication
Wijs et al.	2012	Improving GPU sparse matrix-vector multiplication for probabilistic model checking
Endo	2018	Applying recursive temporal blocking for stencil computations to deeper memory hierarchy
Ma et al.	2017	Acceleration by inline cache for memory-intensive algorithms on FPGA via high-level synthesis
Rubin et al.	2014	Maps: Optimizing massively parallel applications using device-level memory abstraction
Torabzadehkashi et al.	2019	Accelerating HPC applications using computational storage devices
Do et al.	2019	SNU-NPB 2019: parallelizing and optimizing NPB in OpenCL and CUDA for modern GPUs
Lucas et al.	2010	Multifrontal computations on GPUs and their multi-core hosts
Nagasaka et al.	2014	Cache-aware sparse matrix formats for Kepler GPU
Liu et al.	2021	Nds: N-dimensional storage
Ghose et al.	2019	A workload and programming ease driven perspective of processing-in-memory
Quislant et al.	2012	Hardware signature designs to deal with asymmetry in transactional data sets
Quislant et al.	2011	LS-Sig: Locality-sensitive signatures for transactional memory
Hu et al.	2013	GPU accelerated fast multipole methods for vortex particle simulation
Avron et al.	2012	Managing data-movement for effective shared-memory parallelization of out-of-core sparse solvers
Endo et al.	2015	Realizing extremely large-scale stencil applications on GPU supercomputers
Nocentino et al.	2010	Optimizing memory access on GPUs using morton order indexing
Cabezas et al.	2015	GPU-SM: shared memory multi-GPU programming
US20150088936A1 (en)	2015-03-26	Statistical Analysis using a graphics processing unit