[go: up one dir, main page]

Endo, 2018 - Google Patents

Applying recursive temporal blocking for stencil computations to deeper memory hierarchy

Endo, 2018

View PDF
Document ID
15976893884961054971
Author
Endo T
Publication year
Publication venue
2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

External Links

Snippet

Recent high performance computer architecture has deeper memory hierarchy including 3D stacking memory and non-volatile memory. In order to achieve higher application performance, optimizations in application algorithm level are required. This paper takes …
Continue reading at www.el.gsic.titech.ac.jp (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Free address space management in non-volatile memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/885Monitoring specific for caches

Similar Documents

Publication Publication Date Title
Ghose et al. Processing-in-memory: A workload-driven perspective
Aktulga et al. Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations
Akin et al. Data reorganization in memory using 3D-stacked DRAM
Jiang et al. Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors
Liu et al. Get out of the valley: Power-efficient address mapping for GPUs
Pena et al. Toward the efficient use of multiple explicitly managed memory subsystems
US20180004709A1 (en) System and method for gpu maximum register count optimization applied to general matrix-matrix multiplication
Wijs et al. Improving GPU sparse matrix-vector multiplication for probabilistic model checking
Endo Applying recursive temporal blocking for stencil computations to deeper memory hierarchy
Ma et al. Acceleration by inline cache for memory-intensive algorithms on FPGA via high-level synthesis
Rubin et al. Maps: Optimizing massively parallel applications using device-level memory abstraction
Torabzadehkashi et al. Accelerating HPC applications using computational storage devices
Do et al. SNU-NPB 2019: parallelizing and optimizing NPB in OpenCL and CUDA for modern GPUs
Lucas et al. Multifrontal computations on GPUs and their multi-core hosts
Nagasaka et al. Cache-aware sparse matrix formats for Kepler GPU
Liu et al. Nds: N-dimensional storage
Ghose et al. A workload and programming ease driven perspective of processing-in-memory
Quislant et al. Hardware signature designs to deal with asymmetry in transactional data sets
Quislant et al. LS-Sig: Locality-sensitive signatures for transactional memory
Hu et al. GPU accelerated fast multipole methods for vortex particle simulation
Avron et al. Managing data-movement for effective shared-memory parallelization of out-of-core sparse solvers
Endo et al. Realizing extremely large-scale stencil applications on GPU supercomputers
Nocentino et al. Optimizing memory access on GPUs using morton order indexing
Cabezas et al. GPU-SM: shared memory multi-GPU programming
US20150088936A1 (en) Statistical Analysis using a graphics processing unit