Lee et al., 2022 - Google Patents

MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units

Lee et al., 2022

Document ID: 6824833216726758650
Author: Lee S; Choi J; Jung W; Kim B; Park J; Kim H; Ahn J
Publication year: 2022
Publication venue: ACM Transactions on Design Automation of Electronic Systems (TODAES)

External Links

Cited by

Snippet

Mobile and edge devices become common platforms for inferring convolutional neural networks (CNNs) due to superior privacy and service quality. To reduce the computational costs of convolution (CONV), recent CNN models adopt depth-wise CONV (DW-CONV) and …

Continue reading at dl.acm.org (PDF) (other versions)

239000011159 matrix material 0 title description 20

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS

Similar Documents

Publication	Publication Date	Title
Xu et al.	2023	A survey of design and optimization for systolic array-based DNN accelerators
Hegde et al.	2019	Extensor: An accelerator for sparse tensor algebra
Qu et al.	2022	Dota: detect and omit weak attentions for scalable transformer acceleration
Zhang et al.	2021	BoostGCN: A framework for optimizing GCN inference on FPGA
Mittal	2016	A survey of techniques for approximate computing
Mahajan et al.	2016	Tabla: A unified template-based framework for accelerating statistical machine learning
Nguyen et al.	2022	ShortcutFusion: From tensorflow to FPGA-based accelerator with a reuse-aware memory allocation for shortcut data
Chen et al.	2015	A high-throughput neural network accelerator
Lee et al.	2022	MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units
US12321849B1 (en)	2025-06-03	Performing hardware operator fusion
Liu et al.	2024	An efficient FPGA-based depthwise separable convolutional neural network accelerator with hardware pruning
Arora et al.	2022	Tensor slices: FPGA building blocks for the deep learning era
Que et al.	2022	Remarn: A reconfigurable multi-threaded multi-core accelerator for recurrent neural networks
Pellauer et al.	2023	Symphony: Orchestrating sparse and dense tensors with hierarchical heterogeneous processing
Potocnik et al.	2024	Optimizing foundation model inference on a many-tiny-core open-source risc-v platform
Raha et al.	2023	Efficient hardware acceleration of emerging neural networks for embedded machine learning: An industry perspective
Cicek et al.	2022	Energy efficient boosting of gemm accelerators for dnn via reuse
Ioannou et al.	2022	Streaming Overlay Architecture for Lightweight LSTM Computation on FPGA SoCs
Gan et al.	2020	High performance reconfigurable computing for numerical simulation and deep learning
Agullo et al.	2016	Task-based sparse hybrid linear solver for distributed memory heterogeneous architectures
Qararyah et al.	2024	An efficient hybrid deep learning accelerator for compact and heterogeneous CNNs
Shin et al.	2023	Pimflow: Compiler and runtime support for cnn models on processing-in-memory dram
CN113642722A (en)	2021-11-12	Chip for convolution calculation, control method thereof and electronic device
Gupta et al.	2022	Store-n-learn: Classification and clustering with hyperdimensional computing across flash hierarchy
Lee et al.	2024	Resa: Reconfigurable systolic array for multiple tiny dnn tensors