Lee et al., 2022 - Google Patents
MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory unitsLee et al., 2022
View PDF- Document ID
- 6824833216726758650
- Author
- Lee S
- Choi J
- Jung W
- Kim B
- Park J
- Kim H
- Ahn J
- Publication year
- Publication venue
- ACM Transactions on Design Automation of Electronic Systems (TODAES)
External Links
Snippet
Mobile and edge devices become common platforms for inferring convolutional neural networks (CNNs) due to superior privacy and service quality. To reduce the computational costs of convolution (CONV), recent CNN models adopt depth-wise CONV (DW-CONV) and …
- 239000011159 matrix material 0 title description 20
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | A survey of design and optimization for systolic array-based DNN accelerators | |
Hegde et al. | Extensor: An accelerator for sparse tensor algebra | |
Qu et al. | Dota: detect and omit weak attentions for scalable transformer acceleration | |
Zhang et al. | BoostGCN: A framework for optimizing GCN inference on FPGA | |
Mittal | A survey of techniques for approximate computing | |
Mahajan et al. | Tabla: A unified template-based framework for accelerating statistical machine learning | |
Nguyen et al. | ShortcutFusion: From tensorflow to FPGA-based accelerator with a reuse-aware memory allocation for shortcut data | |
Chen et al. | A high-throughput neural network accelerator | |
Lee et al. | MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units | |
US12321849B1 (en) | Performing hardware operator fusion | |
Liu et al. | An efficient FPGA-based depthwise separable convolutional neural network accelerator with hardware pruning | |
Arora et al. | Tensor slices: FPGA building blocks for the deep learning era | |
Que et al. | Remarn: A reconfigurable multi-threaded multi-core accelerator for recurrent neural networks | |
Pellauer et al. | Symphony: Orchestrating sparse and dense tensors with hierarchical heterogeneous processing | |
Potocnik et al. | Optimizing foundation model inference on a many-tiny-core open-source risc-v platform | |
Raha et al. | Efficient hardware acceleration of emerging neural networks for embedded machine learning: An industry perspective | |
Cicek et al. | Energy efficient boosting of gemm accelerators for dnn via reuse | |
Ioannou et al. | Streaming Overlay Architecture for Lightweight LSTM Computation on FPGA SoCs | |
Gan et al. | High performance reconfigurable computing for numerical simulation and deep learning | |
Agullo et al. | Task-based sparse hybrid linear solver for distributed memory heterogeneous architectures | |
Qararyah et al. | An efficient hybrid deep learning accelerator for compact and heterogeneous CNNs | |
Shin et al. | Pimflow: Compiler and runtime support for cnn models on processing-in-memory dram | |
CN113642722A (en) | Chip for convolution calculation, control method thereof and electronic device | |
Gupta et al. | Store-n-learn: Classification and clustering with hyperdimensional computing across flash hierarchy | |
Lee et al. | Resa: Reconfigurable systolic array for multiple tiny dnn tensors |