Endo, 2018 - Google Patents
Applying recursive temporal blocking for stencil computations to deeper memory hierarchyEndo, 2018
View PDF- Document ID
- 15976893884961054971
- Author
- Endo T
- Publication year
- Publication venue
- 2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)
External Links
Snippet
Recent high performance computer architecture has deeper memory hierarchy including 3D stacking memory and non-volatile memory. In order to achieve higher application performance, optimizations in application algorithm level are required. This paper takes …
- 230000002123 temporal effect 0 title abstract description 41
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0238—Free address space management in non-volatile memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/885—Monitoring specific for caches
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ghose et al. | Processing-in-memory: A workload-driven perspective | |
Aktulga et al. | Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations | |
Akin et al. | Data reorganization in memory using 3D-stacked DRAM | |
Jiang et al. | Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors | |
Liu et al. | Get out of the valley: Power-efficient address mapping for GPUs | |
Pena et al. | Toward the efficient use of multiple explicitly managed memory subsystems | |
US20180004709A1 (en) | System and method for gpu maximum register count optimization applied to general matrix-matrix multiplication | |
Wijs et al. | Improving GPU sparse matrix-vector multiplication for probabilistic model checking | |
Endo | Applying recursive temporal blocking for stencil computations to deeper memory hierarchy | |
Ma et al. | Acceleration by inline cache for memory-intensive algorithms on FPGA via high-level synthesis | |
Rubin et al. | Maps: Optimizing massively parallel applications using device-level memory abstraction | |
Torabzadehkashi et al. | Accelerating HPC applications using computational storage devices | |
Do et al. | SNU-NPB 2019: parallelizing and optimizing NPB in OpenCL and CUDA for modern GPUs | |
Lucas et al. | Multifrontal computations on GPUs and their multi-core hosts | |
Nagasaka et al. | Cache-aware sparse matrix formats for Kepler GPU | |
Liu et al. | Nds: N-dimensional storage | |
Ghose et al. | A workload and programming ease driven perspective of processing-in-memory | |
Quislant et al. | Hardware signature designs to deal with asymmetry in transactional data sets | |
Quislant et al. | LS-Sig: Locality-sensitive signatures for transactional memory | |
Hu et al. | GPU accelerated fast multipole methods for vortex particle simulation | |
Avron et al. | Managing data-movement for effective shared-memory parallelization of out-of-core sparse solvers | |
Endo et al. | Realizing extremely large-scale stencil applications on GPU supercomputers | |
Nocentino et al. | Optimizing memory access on GPUs using morton order indexing | |
Cabezas et al. | GPU-SM: shared memory multi-GPU programming | |
US20150088936A1 (en) | Statistical Analysis using a graphics processing unit |