Koza et al., 2014 - Google Patents
Compressed multirow storage format for sparse matrices on graphics processing unitsKoza et al., 2014
View PDF- Document ID
- 7859158530022294227
- Author
- Koza Z
- Matyka M
- Szkoda S
- MirosÅ‚aw Å
- Publication year
- Publication venue
- SIAM Journal on Scientific Computing
External Links
Snippet
A new format for storing sparse matrices is proposed for efficient sparse matrix-vector (SpMV) product calculation on modern graphics processing units (GPUs). This format extends the standard compressed row storage (CRS) format and can be quickly converted to …
- 238000003860 storage 0 title abstract description 22
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/451—Code distribution
- G06F8/452—Loops
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Koza et al. | Compressed multirow storage format for sparse matrices on graphics processing units | |
Besta et al. | Slimsell: A vectorizable graph representation for breadth-first search | |
Filippone et al. | Sparse matrix-vector multiplication on GPGPUs | |
Shanbhag et al. | Efficient top-k query processing on massively parallel hardware | |
JP7220914B2 (en) | Computer-implemented methods, computer-readable media and heterogeneous computing systems | |
Çatalyürek et al. | Graph coloring algorithms for multi-core and massively multithreaded architectures | |
Breß et al. | Efficient co-processor utilization in database query processing | |
Bauer et al. | Singe: Leveraging warp specialization for high performance on gpus | |
Lee et al. | Locality-aware mapping of nested parallel patterns on gpus | |
Zinenko et al. | Modeling the conflicting demands of parallelism and temporal/spatial locality in affine scheduling | |
MartÃn et al. | Algorithmic strategies for optimizing the parallel reduction primitive in CUDA | |
Hadade et al. | Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures | |
Biferale et al. | An optimized D2Q37 lattice Boltzmann code on GP-GPUs | |
Azad et al. | Multithreaded algorithms for maximum matching in bipartite graphs | |
Elafrou et al. | Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms | |
Ao et al. | Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer | |
Diamond et al. | Arbitrary modulus indexing | |
Zhang et al. | Vectorized parallel sparse matrix-vector multiplication in PETSc using AVX-512 | |
US7983890B2 (en) | Method and apparatus performing automatic mapping for a multi-processor system | |
Liu | Parallel and scalable sparse basic linear algebra subprograms | |
Kelefouras et al. | A Matrix–Matrix Multiplication methodology for single/multi-core architectures using SIMD | |
Javanmard et al. | Toward efficient architecture-independent algorithms for dynamic programs | |
Park et al. | mGEMM: Low-latency convolution with minimal memory overhead optimized for mobile devices | |
Huo et al. | Porting irregular reductions on heterogeneous CPU-GPU configurations | |
Li et al. | GPU matrix multiplication |