Ben-Nun et al., 2015 - Google Patents

Memory access patterns: The missing piece of the multi-GPU puzzle

Ben-Nun et al., 2015

Document ID: 3022111691053654845
Author: Ben-Nun T; Levy E; Barak A; Rubin E
Publication year: 2015
Publication venue: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

External Links

Cited by

Snippet

With the increased popularity of multi-GPU nodes in modern HPC clusters, it is imperative to develop matching programming paradigms for their efficient utilization. In order to take advantage of the local GPUs and the low-latency high-throughput interconnects that link …

Continue reading at tbennun.github.io (PDF) (other versions)

238000000638 solvent extraction 0 abstract description 17

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Programme synchronisation; Mutual exclusion, e.g. by means of semaphores; Contention for resources among tasks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL

Similar Documents

Publication	Publication Date	Title
Ben-Nun et al.	2015	Memory access patterns: The missing piece of the multi-GPU puzzle
Venkataraman et al.	2013	Presto: distributed machine learning and graph processing with sparse matrices
Mittal et al.	2019	A survey of techniques for optimizing deep learning on GPUs
Lu et al.	2021	Optimizing depthwise separable convolution operations on gpus
CN110383247B (en)	2023-04-28	Method executed by computer, computer readable medium and heterogeneous computing system
Unat et al.	2011	Mint: realizing CUDA performance in 3D stencil methods with annotated C
Löff et al.	2021	The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures
US20220057949A1 (en)	2022-02-24	Systems and methods for minimizing communications
Agrawal et al.	1995	An integrated runtime and compile-time approach for parallelizing structured and block structured applications
Dathathri et al.	2013	Generating efficient data movement code for heterogeneous architectures with distributed-memory
Komoda et al.	2013	Integrating multi-GPU execution in an OpenACC compiler
Rubin et al.	2014	Maps: Optimizing massively parallel applications using device-level memory abstraction
Totoni et al.	2017	HPAT: high performance analytics with scripting ease-of-use
Shterenlikht et al.	2015	Fortran 2008 coarrays
KR20240090423A (en)	2024-06-21	System and method for auto-parallelization of processing codes for multi-processor systems with optimized latency
Cui et al.	2017	Directive-based partitioning and pipelining for graphics processing units
Andión et al.	2016	Locality-aware automatic parallelization for GPGPU with OpenHMPP directives
Peterson et al.	2019	Automatic halo management for the Uintah GPU-heterogeneous asynchronous many-task runtime
Bednárek et al.	2017	Improving matrix-based dynamic programming on massively parallel accelerators
Davis et al.	2012	Paradigmatic shifts for exascale supercomputing
Rapaport	2022	GPU molecular dynamics: Algorithms and performance
Ribeiro	2011	Contributions on memory affinity management for hierarchical shared memory multi-core platforms
Bagliy et al.	2023	Automatic parallelization of iterative loops nests on distributed memory computing systems
Torres et al.	2023	Supporting efficient overlapping of host-device operations for heterogeneous programming with CtrlEvents
Sung	2013	Data layout transformation through in-place transposition