CN111932436B - Deep learning processor architecture for intelligent parking - Google Patents
Deep learning processor architecture for intelligent parking Download PDFInfo
- Publication number
- CN111932436B CN111932436B CN202010862272.8A CN202010862272A CN111932436B CN 111932436 B CN111932436 B CN 111932436B CN 202010862272 A CN202010862272 A CN 202010862272A CN 111932436 B CN111932436 B CN 111932436B
- Authority
- CN
- China
- Prior art keywords
- memory
- deep learning
- data
- module
- inner product
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013135 deep learning Methods 0.000 title claims abstract description 81
- 230000001133 acceleration Effects 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 24
- 230000001360 synchronised effect Effects 0.000 claims abstract description 15
- 238000010586 diagram Methods 0.000 claims abstract description 12
- 230000003993 interaction Effects 0.000 claims description 15
- 238000004891 communication Methods 0.000 claims description 5
- 101100043229 Oryza sativa subsp. japonica SPL14 gene Proteins 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 12
- 230000009977 dual effect Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 8
- 230000009471 action Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000036316 preload Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a deep learning processor architecture for intelligent parking, which comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller, wherein the deep learning network acceleration module is used for data processing and realizing each deep learning network used by a parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; and the input characteristic diagram memory and the kernel memory both adopt an A/B dual memory mode. The invention improves the calculation efficiency, is used as a hardware accelerator of a high-order video system for a plurality of deep learning networks, and realizes high calculation power and low power consumption of the system.
Description
Technical Field
The invention relates to the field of hardware architecture, in particular to a deep learning processor architecture for intelligent parking.
Background
The high-level video intelligent roadside parking system (hereinafter referred to as a high-level video system) is a system based on computer vision and depth artificial intelligence technology, and has the obvious advantages of high accuracy and capability of identifying license plates, vehicle characteristics and the like compared with the traditional intelligent parking system such as geomagnetic induction. The existing high-level video system generally adopts a high-definition camera and an edge processing unit to monitor and identify a roadside parking space in real time, and if vehicles enter and exit, information such as license plates, vehicle parking time and the like is collected, analyzed and stored. The recognition of roadside parking generally requires a plurality of deep learning networks to recognize the parking space occupation condition, vehicle characteristics, license plates and the like, and the size and calculation time consumption of each network are different.
The existing high-order video system generally adopts a GPU and a CPU as processing units, and the GPU is a general processor of artificial intelligence, so that the general processor has strong universality and simple use, but has low calculation power and large power consumption, and cannot meet the increasing functional and performance requirements of the high-order video system. Therefore, there is a need for a deep learning processor architecture for smart parking that replaces GPUs, using hardware accelerators to multiple deep learning networks as high-level video systems.
Disclosure of Invention
The invention aims at: the deep learning processor architecture for intelligent parking replaces a GPU, interacts with a CPU, and is used as a high-level video system to use hardware accelerators of various deep learning networks, so that high calculation power, low power consumption, high real-time performance and high accuracy of the system are realized.
The technical scheme adopted by the invention is as follows:
the invention relates to a deep learning processor architecture for intelligent parking, which comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller,
The high-speed data interface module is used for connecting external equipment and carrying out data interaction;
The synchronous control module comprises synchronous control, a sending data buffer, a receiving data buffer and a receiving address buffer;
The deep learning network acceleration module is used for data processing and realizing each deep learning network used by the parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; the input feature map memory, the kernel memory and the instruction controller are connected with an external memory through an external memory reading module to read data, the input feature map memory and the kernel memory are both in an A/B double-memory mode, and are connected with the data reading module to convey data to the inner product accelerator through the data reading module, and the instruction controller is connected with the inner product accelerator; the output controller is connected with the inner product accelerator for data interaction, the output characteristic diagram memory is connected with the inner product accelerator, and the data processing result of the inner product accelerator is sent to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage;
and the memory controller is in data interaction with the deep learning network acceleration module, intermediate data is stored into the external memory through the memory controller, and the intermediate data can be read in from the external memory.
Furthermore, the parking system uses three deep learning networks, namely a deep learning vehicle recognition network, a deep learning license plate number and a character recognition network.
Furthermore, the deep learning vehicle recognition network, the deep learning license plate number and the character recognition network are all decomposed into substructures which can be reused, and the hardware acceleration of the three deep learning networks is completed through the combination of the calling of a special instruction set.
Furthermore, the kernel memory can also store weight data and bias data.
Further, the high-speed data interface module comprises a PCIe or USB3 high-speed data communication interface.
The invention comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller, wherein 5 functional modules are used as auxiliary modules of the deep learning network acceleration module for realizing data interaction. According to the invention, the high-speed data interface module is connected with external equipment, external data is transmitted to the deep learning network acceleration module through the DMA module and then through the synchronous control module, after the deep learning network acceleration module processes the data, some intermediate data is stored into the external storage through the memory controller, and the intermediate data is read in from the external storage when required; and after the deep learning network acceleration module finishes data processing, the data is output to external equipment through the synchronous control module, the DMA module and the high-speed data communication interface.
The deep learning network acceleration module is used as a core module of the invention, and comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module, when the deep learning network acceleration module needs to process feature maps and kernel data, the input feature maps and the kernel data are read from an external memory through the external memory reading module and stored in the corresponding input feature map memory and the kernel memory; the inner product accelerator is used as a core computing unit of the deep learning network acceleration module, and the parallel processing units are more, and the processing speed exceeds the data reading speed of the input feature map memory and the kernel memory from the external memory, so that the input feature map memory and the kernel memory both adopt an A/B double-memory mode, namely a ping-pong double-memory mode, when the inner product accelerator reads the kernel memory A through the data reading module, the kernel memory B reads the data of the external memory through the external memory reading module, and when the inner product accelerator finishes the data interaction with the kernel memory A, the inner product accelerator immediately starts the data interaction with the kernel memory B, and the kernel memory A immediately starts the data reading of the next batch. The data in the invention is transmitted to the inner product accelerator through the data reading module, meanwhile, according to the input instruction of the instruction controller, the inner product accelerator can complete various calculations, some intermediate data of the inner product accelerator can be transmitted to the output controller for secondary processing and returned to the inner product accelerator, and after the inner product accelerator completes the calculations, the result can be transmitted to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage.
In summary, by adopting the technical scheme, the invention has the beneficial effects that:
1. The invention relates to a deep learning processor architecture for intelligent parking, which replaces a GPU (graphics processing unit), interacts with a CPU (central processing unit), uses an inner product accelerator as a core computing unit of a deep learning network acceleration module, reads data and stores the data in an input feature map memory and a core memory of an A/B (analog/digital) double memory, improves the computing efficiency, and uses the data as a hardware accelerator of a high-order video system to a plurality of deep learning networks, thereby realizing high computing power, low power consumption, high instantaneity and high accuracy of the system.
2. The invention relates to a deep learning processor architecture for intelligent parking, which successfully balances and optimizes power consumption, calculation speed, calculation delay and circuit size when a deep learning network is processed through the processor architecture, so that a high-level video intelligent roadside parking system is in an optimal state.
Drawings
For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and should not be considered as limiting the scope, for those skilled in the art, without performing creative efforts, other related drawings may be obtained according to the drawings, where the proportional relationships of the components in the drawings in the present specification do not represent the proportional relationships in actual material selection design, and are merely schematic diagrams of structures or positions, where:
FIG. 1 is a block diagram of the present invention;
Fig. 2 is a block diagram of a deep learning network acceleration module.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.
The present invention will be described in detail with reference to the accompanying drawings.
Example 1
As shown in fig. 1-2, the present invention is a deep learning processor architecture for smart parking, comprising a high-speed data interface module, a DMA module, a synchronization control module, a deep learning network acceleration module and a memory controller,
The high-speed data interface module is used for connecting external equipment and carrying out data interaction;
The synchronous control module comprises synchronous control, a sending data buffer, a receiving data buffer and a receiving address buffer;
The deep learning network acceleration module is used for data processing and realizing each deep learning network used by the parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; the input feature map memory, the kernel memory and the instruction controller are connected with an external memory through an external memory reading module to read data, the input feature map memory and the kernel memory are both in an A/B double-memory mode, and are connected with the data reading module to convey data to the inner product accelerator through the data reading module, and the instruction controller is connected with the inner product accelerator; the output controller is connected with the inner product accelerator for data interaction, the output characteristic diagram memory is connected with the inner product accelerator, and the data processing result of the inner product accelerator is sent to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage;
and the memory controller is in data interaction with the deep learning network acceleration module, intermediate data is stored into the external memory through the memory controller, and the intermediate data can be read in from the external memory.
In the invention, the processor architecture is realized on an FPGA chip, and Kintex-7 series FPGA chips are selected in the embodiment. The invention comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller, wherein 5 functional modules are used as auxiliary modules of the deep learning network acceleration module for realizing data interaction. The invention connects the external device through the high-speed data interface module, the external data passes the DMA module, the DMA module is the direct memory access module, and then sends to the deep learning network acceleration module through the synchronous control module, after the deep learning network acceleration module processes the data, some intermediate data is stored into the external memory through the memory controller, and when the data is needed, the intermediate data is read in from the external memory; and after the deep learning network acceleration module finishes data processing, the data is output to external equipment through the synchronous control module, the DMA module and the high-speed data communication interface.
The deep learning network acceleration module is used as a core module of the invention, and comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module, when the deep learning network acceleration module needs to process feature maps and kernel data, the input feature maps and the kernel data are read from an external memory through the external memory reading module and stored in the corresponding input feature map memory and the kernel memory; the inner product accelerator is used as a core computing unit of the deep learning network acceleration module, and the parallel processing units are more, and the processing speed exceeds the data reading speed of the input feature map memory and the inner core memory from the external memory, so that the input feature map memory and the inner core memory both adopt an A/B double-memory mode, namely a ping-pong double-memory mode, when the inner product accelerator reads the inner core memory A through the data reading module, the inner core memory B reads the data of the external memory through the external memory reading module, and when the inner product accelerator finishes the data interaction with the inner core memory A, the inner core memory A immediately starts to read the next batch of data after the data interaction with the inner core memory B, so that the data processing efficiency is improved. The data in the invention is transmitted to the inner product accelerator through the data reading module, meanwhile, according to the input instruction of the instruction controller, the inner product accelerator can complete various calculations, some intermediate data of the inner product accelerator can be transmitted to the output controller for secondary processing and returned to the inner product accelerator, and after the inner product accelerator completes the calculations, the result can be transmitted to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage.
The inner product accelerator is used as a core computing unit of the deep learning network acceleration module and can be represented by IPA, and is mainly used for completing parallel computation of multiplication and addition. The inner product accelerator receives a feature map vector with a length of 512 x 8 bits and a kernel vector with a length of 1024 x 8 bits and outputs inner products thereof, which can be 64, 32, 16 and 2 bit point numbers according to the instruction configuration of the instruction controller.
The inner product accelerator contains 32 processing units, each processing unit is composed of 16 multiplication units, each multiplication unit is a DSP unit of an FPGA, and can receive two groups of data to perform multiplication twice respectively, so we call IPA as IPA1 and IPA2, and inject different data into the IPA1 and IPA2 respectively. The method comprises the following steps:
A DSP48E in the FPGA can perform a calculation mode of p= (d±a) ×b±c, and the DSP48E is configured as p= (d+a) ×b+c, so that two 16-bit multiplications of p1=a×k1 and p2=a×k2 are required to be calculated, and the two multiplications are replaced by a 32-bit multiplication, where the substitution formula is p=a× (K1 < < 16+k2); a 32-bit output P may be obtained, where the upper 16 bits of P are P1 and the lower 16 bits are P2, C is configured based on the output result, c=0 if P2 is greater than or equal to 0, otherwise c=1 < <16.
The external memory read module reads data from the external memory, and the data read is controlled by a state machine: the state machine is divided into: idle, read feature map, read kernel, read bias, read instruction four states.
An external memory write module that performs each convolutional layer preload and post processing steps such as activation and Pooling operations, and then writes the data to external memory.
An output controller that calculates a final output of each convolution layer for an output of an Inner Product Accelerator (IPA), and completes data truncation.
Example two
This example is a further illustration of the present invention.
Based on the above embodiments, the parking system uses three deep learning networks, which are a deep learning vehicle recognition network, a deep learning license plate number and a word recognition network. The deep learning vehicle recognition network is used for an object recognition network for vehicle recognition, the deep learning license plate recognition network is used for a miniature object recognition network for license plate recognition, and the deep learning license plate number and character recognition network is used for a small object recognition network for license plate number and character recognition.
Furthermore, the deep learning vehicle recognition network, the deep learning license plate number and the character recognition network are all decomposed into substructures which can be reused, and the hardware acceleration of the three deep learning networks is completed through the combination of the calling of a special instruction set. The chip circuit area is saved, and the forward compatibility of the chip design is improved, so that the chip can meet the requirement of a new artificial intelligent network possibly occurring in the future as far as possible.
Example III
This example is a further illustration of the present invention.
The embodiment is based on the above embodiment, where the kernel memory is further capable of storing weight data and bias data. The external memory reading module reads the feature map, the kernel, the weight data and the bias data from the external memory, wherein the feature map is stored in the input feature map memory, and the kernel, the weight data and the bias data are stored in the kernel memory.
Example IV
This example is a further illustration of the present invention.
In this embodiment, based on the foregoing embodiment, the high-speed data interface module includes a PCIe or USB3 high-speed data communication interface. The corresponding external device should have a PCIe or USB3 data interface, such as a computer, corresponding thereto.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not creatively contemplated by those skilled in the art within the technical scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope defined by the claims.
Claims (5)
1. A deep learning processor architecture for smart parking, characterized by: comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller,
The high-speed data interface module is used for connecting external equipment and carrying out data interaction;
The synchronous control module comprises synchronous control, a sending data buffer, a receiving data buffer and a receiving address buffer;
The deep learning network acceleration module is used for data processing and realizing each deep learning network used by the parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; the input feature map memory, the kernel memory and the instruction controller are connected with an external memory through an external memory reading module to read data, the input feature map memory and the kernel memory are both in an A/B double-memory mode, and are connected with the data reading module to convey data to the inner product accelerator through the data reading module, and the instruction controller is connected with the inner product accelerator; the output controller is connected with the inner product accelerator for data interaction, the output characteristic diagram memory is connected with the inner product accelerator, and the data processing result of the inner product accelerator is sent to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage;
The memory controller is in data interaction with the deep learning network acceleration module, intermediate data is stored into the external memory through the memory controller, and the intermediate data can be read in from the external memory;
the inner product accelerator is used as a core computing unit of a deep learning network acceleration module and is represented by IPA;
the inner product accelerator comprises 32 processing units in total, each processing unit consists of 16 multiplication units, each multiplication unit is a DSP unit of an FPGA, two groups of data are received to respectively execute two multiplications, IPA is called IPA1 and IPA2, and different data are respectively injected into the two groups of data, and the inner product accelerator specifically comprises the following steps:
a DSP48E in the FPGA performs p= (d±a) ×b±c, configures the DSP48E as p= (d+a) ×b+c, calculates two 16-bit multiplications of p1=ak1, p2=ak2, replaces the two multiplications with a 32-bit multiplication, and replaces the two multiplications with the formula p=a (k1 < < 16+k2); and obtaining 32-bit output P, wherein the upper 16 bits of P are P1, the lower 16 bits are P2, C is configured based on the output result, C=0 is caused if P2 is more than or equal to 0, and otherwise, C=1 < < 16.
2. The deep learning processor architecture for smart parking of claim 1, wherein: the parking system uses three deep learning networks, namely a deep learning vehicle recognition network, a deep learning license plate number and a character recognition network.
3. The deep learning processor architecture for smart parking of claim 2, wherein: the deep learning vehicle recognition network, the deep learning license plate number and the character recognition network are all decomposed into substructures which can be reused, and the hardware acceleration of the three deep learning networks is completed through the combination of the calling of a special instruction set.
4. The deep learning processor architecture for smart parking of claim 1, wherein: the kernel memory is also capable of storing weight data and bias data.
5. The deep learning processor architecture for smart parking of claim 1, wherein: the high speed data interface module comprises a PCIe or USB3 high speed data communication interface.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010862272.8A CN111932436B (en) | 2020-08-25 | 2020-08-25 | Deep learning processor architecture for intelligent parking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010862272.8A CN111932436B (en) | 2020-08-25 | 2020-08-25 | Deep learning processor architecture for intelligent parking |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111932436A CN111932436A (en) | 2020-11-13 |
CN111932436B true CN111932436B (en) | 2024-04-19 |
Family
ID=73305183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010862272.8A Active CN111932436B (en) | 2020-08-25 | 2020-08-25 | Deep learning processor architecture for intelligent parking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111932436B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336877A (en) * | 2013-07-25 | 2013-10-02 | 哈尔滨工业大学 | Satellite lithium ion battery residual life prediction system and method based on RVM (relevance vector machine) dynamic reconfiguration |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A Convolutional Neural Network Accelerator Circuit Based on Fast Filtering Algorithm |
CN110058883A (en) * | 2019-03-14 | 2019-07-26 | 成都恒创新星科技有限公司 | A kind of CNN accelerated method and system based on OPU |
CN110058882A (en) * | 2019-03-14 | 2019-07-26 | 成都恒创新星科技有限公司 | It is a kind of for CNN accelerate OPU instruction set define method |
CN212302545U (en) * | 2020-08-25 | 2021-01-05 | 成都恒创新星科技有限公司 | Deep learning processor architecture for intelligent parking |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12131250B2 (en) * | 2017-09-29 | 2024-10-29 | Intel Corporation | Inner product convolutional neural network accelerator |
-
2020
- 2020-08-25 CN CN202010862272.8A patent/CN111932436B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336877A (en) * | 2013-07-25 | 2013-10-02 | 哈尔滨工业大学 | Satellite lithium ion battery residual life prediction system and method based on RVM (relevance vector machine) dynamic reconfiguration |
CN109948784A (en) * | 2019-01-03 | 2019-06-28 | 重庆邮电大学 | A Convolutional Neural Network Accelerator Circuit Based on Fast Filtering Algorithm |
CN110058883A (en) * | 2019-03-14 | 2019-07-26 | 成都恒创新星科技有限公司 | A kind of CNN accelerated method and system based on OPU |
CN110058882A (en) * | 2019-03-14 | 2019-07-26 | 成都恒创新星科技有限公司 | It is a kind of for CNN accelerate OPU instruction set define method |
CN212302545U (en) * | 2020-08-25 | 2021-01-05 | 成都恒创新星科技有限公司 | Deep learning processor architecture for intelligent parking |
Non-Patent Citations (1)
Title |
---|
一种基于可编程逻辑器件的卷积神经网络协处理器设计;杨一晨;梁峰;张国和;何平;吴斌;高震霆;;西安交通大学学报;20180710(第07期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111932436A (en) | 2020-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543832B (en) | Computing device and board card | |
CN109522052B (en) | Computing device and board card | |
CN111860398B (en) | Remote sensing image target detection method, system and terminal device | |
CN104238993B (en) | The vector matrix product accelerator of microprocessor integrated circuit | |
CN108805272A (en) | A kind of general convolutional neural networks accelerator based on FPGA | |
CN109934339A (en) | A Universal Convolutional Neural Network Accelerator Based on One-Dimensional Systolic Array | |
WO2022037257A1 (en) | Convolution calculation engine, artificial intelligence chip, and data processing method | |
CN115880132B (en) | Graphics processor, matrix multiplication task processing method, device and storage medium | |
CN111814957B (en) | Neural network operation method and related equipment | |
US11775808B2 (en) | Neural network computation device and method | |
CN110059797B (en) | Computing device and related product | |
CN109711540B (en) | Computing device and board card | |
CN110163349B (en) | Network model calculation method and device | |
CN107729944B (en) | Identification method and device of popular pictures, server and storage medium | |
CN111178513B (en) | Convolution implementation method and device of neural network and terminal equipment | |
CN110059809B (en) | Computing device and related product | |
CN110515872B (en) | Direct memory access method, device, special computing chip and heterogeneous computing system | |
CN111161705A (en) | Voice conversion method and device | |
US11256940B1 (en) | Method, apparatus and system for gradient updating of image processing model | |
CN111932436B (en) | Deep learning processor architecture for intelligent parking | |
CN212302545U (en) | Deep learning processor architecture for intelligent parking | |
CN112711051A (en) | Flight control system positioning method, device, equipment and storage medium | |
CN111645687A (en) | Lane changing strategy determining method, device and storage medium | |
CN109799483A (en) | A kind of data processing method and device | |
CN114968182A (en) | Operator splitting method, control method and device for storage and computation integrated chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |