Guo et al., 2023 - Google Patents
Cambricon-u: A systolic random increment memory architecture for unary computingGuo et al., 2023
View PDF- Document ID
- 13538713478930611456
- Author
- Guo H
- Zhao Y
- Li Z
- Hao Y
- Liu C
- Song X
- Li X
- Du Z
- Zhang R
- Guo Q
- Chen T
- Xu Z
- Publication year
- Publication venue
- Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
External Links
Snippet
Unary computing, whose arithmetics require only one logic gate, has enabled efficient DNN processing, especially on strictly power-constrained devices. However, unary computing still confronts the power efficiency bottleneck for buffering unary bitstreams. The buffering of …
- 230000015654 memory 0 title abstract description 32
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/505—Adding; Subtracting in bit-parallel fashion, i.e. having a different digit-handling circuit for each denomination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2207/00—Indexing scheme relating to methods or arrangements for processing data by operating upon the order or content of the data handled
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F1/00—Details of data-processing equipment not covered by groups G06F3/00 - G06F13/00, e.g. cooling, packaging or power supply specially adapted for computer application
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Catthoor et al. | Custom memory management methodology: Exploration of memory organisation for embedded multimedia system design | |
Liang et al. | High‐Level Synthesis: Productivity, Performance, and Software Constraints | |
Lee et al. | Application codesign of near-data processing for similarity search | |
Nguyen et al. | ShortcutFusion: From tensorflow to FPGA-based accelerator with a reuse-aware memory allocation for shortcut data | |
TW201602813A (en) | Systems, apparatuses, and methods for feature searching | |
Drozd et al. | Green IT engineering in the view of resource-based approach | |
Li et al. | MeNTT: A compact and efficient processing-in-memory number theoretic transform (NTT) accelerator | |
Fu et al. | 2-in-1 accelerator: Enabling random precision switch for winning both adversarial robustness and efficiency | |
Gao et al. | Millimeter-scale and billion-atom reactive force field simulation on sunway taihulight | |
US9626334B2 (en) | Systems, apparatuses, and methods for K nearest neighbor search | |
Ghaffar et al. | A low power in-DRAM architecture for quantized CNNs using fast Winograd convolutions | |
Guo et al. | Cambricon-u: A systolic random increment memory architecture for unary computing | |
Li et al. | A precision-scalable deep neural network accelerator with activation sparsity exploitation | |
Potocnik et al. | Optimizing foundation model inference on a many-tiny-core open-source risc-v platform | |
Liu et al. | FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities | |
Lee et al. | MVP: An efficient CNN accelerator with matrix, vector, and processing-near-memory units | |
Choi et al. | A deep neural network training architecture with inference-aware heterogeneous data-type | |
Zhang et al. | Tensorcache: Reconstructing memory architecture with sram-based in-cache computing for efficient tensor computations in gpgpus | |
Li et al. | Mathematical framework for optimizing crossbar allocation for reram-based CNN accelerators | |
Angizi et al. | Processing-in-memory acceleration of mac-based applications using residue number system: A comparative study | |
Servais et al. | Adaptive computation reuse for energy-efficient training of deep neural networks | |
Ghanbari et al. | Energy-efficient acceleration of convolutional neural networks using computation reuse | |
US20230064886A1 (en) | Techniques for data type detection with learned metadata | |
Haghi et al. | O⁴-DNN: A Hybrid DSP-LUT-Based Processing Unit With Operation Packing and Out-of-Order Execution for Efficient Realization of Convolutional Neural Networks on FPGA Devices | |
Luo et al. | A single clock cycle approximate adder with hybrid prediction and error compensation methods |