[go: up one dir, main page]

Koza et al., 2014 - Google Patents

Compressed multirow storage format for sparse matrices on graphics processing units

Koza et al., 2014

View PDF
Document ID
7859158530022294227
Author
Koza Z
Matyka M
Szkoda S
MirosÅ‚aw Å
Publication year
Publication venue
SIAM Journal on Scientific Computing

External Links

Snippet

A new format for storing sparse matrices is proposed for efficient sparse matrix-vector (SpMV) product calculation on modern graphics processing units (GPUs). This format extends the standard compressed row storage (CRS) format and can be quickly converted to …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Programme initiating; Programme switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/451Code distribution
    • G06F8/452Loops
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformations of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/456Parallelism detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/78Architectures of general purpose stored programme computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication Publication Date Title
Koza et al. Compressed multirow storage format for sparse matrices on graphics processing units
Besta et al. Slimsell: A vectorizable graph representation for breadth-first search
Filippone et al. Sparse matrix-vector multiplication on GPGPUs
Shanbhag et al. Efficient top-k query processing on massively parallel hardware
JP7220914B2 (en) Computer-implemented methods, computer-readable media and heterogeneous computing systems
Çatalyürek et al. Graph coloring algorithms for multi-core and massively multithreaded architectures
Breß et al. Efficient co-processor utilization in database query processing
Bauer et al. Singe: Leveraging warp specialization for high performance on gpus
Lee et al. Locality-aware mapping of nested parallel patterns on gpus
Zinenko et al. Modeling the conflicting demands of parallelism and temporal/spatial locality in affine scheduling
Martín et al. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA
Hadade et al. Some useful optimisations for unstructured computational fluid dynamics codes on multicore and manycore architectures
Biferale et al. An optimized D2Q37 lattice Boltzmann code on GP-GPUs
Azad et al. Multithreaded algorithms for maximum matching in bipartite graphs
Elafrou et al. Sparsex: A library for high-performance sparse matrix-vector multiplication on multicore platforms
Ao et al. Performance optimization of the HPCG benchmark on the Sunway TaihuLight supercomputer
Diamond et al. Arbitrary modulus indexing
Zhang et al. Vectorized parallel sparse matrix-vector multiplication in PETSc using AVX-512
US7983890B2 (en) Method and apparatus performing automatic mapping for a multi-processor system
Liu Parallel and scalable sparse basic linear algebra subprograms
Kelefouras et al. A Matrix–Matrix Multiplication methodology for single/multi-core architectures using SIMD
Javanmard et al. Toward efficient architecture-independent algorithms for dynamic programs
Park et al. mGEMM: Low-latency convolution with minimal memory overhead optimized for mobile devices
Huo et al. Porting irregular reductions on heterogeneous CPU-GPU configurations
Li et al. GPU matrix multiplication