Koza et al., 2014 - Google Patents

Compressed multirow storage format for sparse matrices on graphics processing units

Koza et al., 2014

Document ID: 7859158530022294227
Author: Koza Z; Matyka M; Szkoda S; Mirosław �
Publication year: 2014
Publication venue: SIAM Journal on Scientific Computing

External Links

Cited by

Snippet

A new format for storing sparse matrices is proposed for efficient sparse matrix-vector (SpMV) product calculation on modern graphics processing units (GPUs). This format extends the standard compressed row storage (CRS) format and can be quickly converted to …

Continue reading at arxiv.org (PDF) (other versions)

238000003860 storage 0 title abstract description 22

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication	Publication Date	Title
Koza et al.	2014	Compressed multirow storage format for sparse matrices on graphics processing units
Besta et al.	2017	Slimsell: A vectorizable graph representation for breadth-first search
Filippone et al.	2017	Sparse matrix-vector multiplication on GPGPUs
Shanbhag et al.	2018	Efficient top-k query processing on massively parallel hardware
JP7220914B2 (en)	2023-02-13	Computer-implemented methods, computer-readable media and heterogeneous computing systems
Çatalyürek et al.	2012	Graph coloring algorithms for multi-core and massively multithreaded architectures
Breß et al.	2013	Efficient co-processor utilization in database query processing
Bauer et al.	2014	Singe: Leveraging warp specialization for high performance on gpus
Lee et al.	2014	Locality-aware mapping of nested parallel patterns on gpus
Zinenko et al.	2018	Modeling the conflicting demands of parallelism and temporal/spatial locality in affine scheduling
Martín et al.	2012	Algorithmic strategies for optimizing the parallel reduction primitive in CUDA
Hadade et al.	2019	Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures
Biferale et al.	2013	An optimized D2Q37 lattice Boltzmann code on GP-GPUs
Azad et al.	2012	Multithreaded algorithms for maximum matching in bipartite graphs
Elafrou et al.	2018	Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms
Ao et al.	2018	Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer
Diamond et al.	2014	Arbitrary modulus indexing
Zhang et al.	2018	Vectorized parallel sparse matrix-vector multiplication in PETSc using AVX-512
US7983890B2 (en)	2011-07-19	Method and apparatus performing automatic mapping for a multi-processor system
Liu	2015	Parallel and scalable sparse basic linear algebra subprograms
Kelefouras et al.	2014	A Matrix–Matrix Multiplication methodology for single/multi-core architectures using SIMD
Javanmard et al.	2019	Toward efficient architecture-independent algorithms for dynamic programs
Park et al.	2022	mGEMM: Low-latency convolution with minimal memory overhead optimized for mobile devices
Huo et al.	2011	Porting irregular reductions on heterogeneous CPU-GPU configurations
Li et al.	2013	GPU matrix multiplication