[go: up one dir, main page]

CN111932436B - Deep learning processor architecture for intelligent parking - Google Patents

Deep learning processor architecture for intelligent parking Download PDF

Info

Publication number
CN111932436B
CN111932436B CN202010862272.8A CN202010862272A CN111932436B CN 111932436 B CN111932436 B CN 111932436B CN 202010862272 A CN202010862272 A CN 202010862272A CN 111932436 B CN111932436 B CN 111932436B
Authority
CN
China
Prior art keywords
memory
deep learning
data
module
inner product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010862272.8A
Other languages
Chinese (zh)
Other versions
CN111932436A (en
Inventor
王铭宇
王堃
吴晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Star Innovation Technology Co ltd
Original Assignee
Chengdu Star Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Star Innovation Technology Co ltd filed Critical Chengdu Star Innovation Technology Co ltd
Priority to CN202010862272.8A priority Critical patent/CN111932436B/en
Publication of CN111932436A publication Critical patent/CN111932436A/en
Application granted granted Critical
Publication of CN111932436B publication Critical patent/CN111932436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep learning processor architecture for intelligent parking, which comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller, wherein the deep learning network acceleration module is used for data processing and realizing each deep learning network used by a parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; and the input characteristic diagram memory and the kernel memory both adopt an A/B dual memory mode. The invention improves the calculation efficiency, is used as a hardware accelerator of a high-order video system for a plurality of deep learning networks, and realizes high calculation power and low power consumption of the system.

Description

Deep learning processor architecture for intelligent parking
Technical Field
The invention relates to the field of hardware architecture, in particular to a deep learning processor architecture for intelligent parking.
Background
The high-level video intelligent roadside parking system (hereinafter referred to as a high-level video system) is a system based on computer vision and depth artificial intelligence technology, and has the obvious advantages of high accuracy and capability of identifying license plates, vehicle characteristics and the like compared with the traditional intelligent parking system such as geomagnetic induction. The existing high-level video system generally adopts a high-definition camera and an edge processing unit to monitor and identify a roadside parking space in real time, and if vehicles enter and exit, information such as license plates, vehicle parking time and the like is collected, analyzed and stored. The recognition of roadside parking generally requires a plurality of deep learning networks to recognize the parking space occupation condition, vehicle characteristics, license plates and the like, and the size and calculation time consumption of each network are different.
The existing high-order video system generally adopts a GPU and a CPU as processing units, and the GPU is a general processor of artificial intelligence, so that the general processor has strong universality and simple use, but has low calculation power and large power consumption, and cannot meet the increasing functional and performance requirements of the high-order video system. Therefore, there is a need for a deep learning processor architecture for smart parking that replaces GPUs, using hardware accelerators to multiple deep learning networks as high-level video systems.
Disclosure of Invention
The invention aims at: the deep learning processor architecture for intelligent parking replaces a GPU, interacts with a CPU, and is used as a high-level video system to use hardware accelerators of various deep learning networks, so that high calculation power, low power consumption, high real-time performance and high accuracy of the system are realized.
The technical scheme adopted by the invention is as follows:
the invention relates to a deep learning processor architecture for intelligent parking, which comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller,
The high-speed data interface module is used for connecting external equipment and carrying out data interaction;
The synchronous control module comprises synchronous control, a sending data buffer, a receiving data buffer and a receiving address buffer;
The deep learning network acceleration module is used for data processing and realizing each deep learning network used by the parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; the input feature map memory, the kernel memory and the instruction controller are connected with an external memory through an external memory reading module to read data, the input feature map memory and the kernel memory are both in an A/B double-memory mode, and are connected with the data reading module to convey data to the inner product accelerator through the data reading module, and the instruction controller is connected with the inner product accelerator; the output controller is connected with the inner product accelerator for data interaction, the output characteristic diagram memory is connected with the inner product accelerator, and the data processing result of the inner product accelerator is sent to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage;
and the memory controller is in data interaction with the deep learning network acceleration module, intermediate data is stored into the external memory through the memory controller, and the intermediate data can be read in from the external memory.
Furthermore, the parking system uses three deep learning networks, namely a deep learning vehicle recognition network, a deep learning license plate number and a character recognition network.
Furthermore, the deep learning vehicle recognition network, the deep learning license plate number and the character recognition network are all decomposed into substructures which can be reused, and the hardware acceleration of the three deep learning networks is completed through the combination of the calling of a special instruction set.
Furthermore, the kernel memory can also store weight data and bias data.
Further, the high-speed data interface module comprises a PCIe or USB3 high-speed data communication interface.
The invention comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller, wherein 5 functional modules are used as auxiliary modules of the deep learning network acceleration module for realizing data interaction. According to the invention, the high-speed data interface module is connected with external equipment, external data is transmitted to the deep learning network acceleration module through the DMA module and then through the synchronous control module, after the deep learning network acceleration module processes the data, some intermediate data is stored into the external storage through the memory controller, and the intermediate data is read in from the external storage when required; and after the deep learning network acceleration module finishes data processing, the data is output to external equipment through the synchronous control module, the DMA module and the high-speed data communication interface.
The deep learning network acceleration module is used as a core module of the invention, and comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module, when the deep learning network acceleration module needs to process feature maps and kernel data, the input feature maps and the kernel data are read from an external memory through the external memory reading module and stored in the corresponding input feature map memory and the kernel memory; the inner product accelerator is used as a core computing unit of the deep learning network acceleration module, and the parallel processing units are more, and the processing speed exceeds the data reading speed of the input feature map memory and the kernel memory from the external memory, so that the input feature map memory and the kernel memory both adopt an A/B double-memory mode, namely a ping-pong double-memory mode, when the inner product accelerator reads the kernel memory A through the data reading module, the kernel memory B reads the data of the external memory through the external memory reading module, and when the inner product accelerator finishes the data interaction with the kernel memory A, the inner product accelerator immediately starts the data interaction with the kernel memory B, and the kernel memory A immediately starts the data reading of the next batch. The data in the invention is transmitted to the inner product accelerator through the data reading module, meanwhile, according to the input instruction of the instruction controller, the inner product accelerator can complete various calculations, some intermediate data of the inner product accelerator can be transmitted to the output controller for secondary processing and returned to the inner product accelerator, and after the inner product accelerator completes the calculations, the result can be transmitted to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage.
In summary, by adopting the technical scheme, the invention has the beneficial effects that:
1. The invention relates to a deep learning processor architecture for intelligent parking, which replaces a GPU (graphics processing unit), interacts with a CPU (central processing unit), uses an inner product accelerator as a core computing unit of a deep learning network acceleration module, reads data and stores the data in an input feature map memory and a core memory of an A/B (analog/digital) double memory, improves the computing efficiency, and uses the data as a hardware accelerator of a high-order video system to a plurality of deep learning networks, thereby realizing high computing power, low power consumption, high instantaneity and high accuracy of the system.
2. The invention relates to a deep learning processor architecture for intelligent parking, which successfully balances and optimizes power consumption, calculation speed, calculation delay and circuit size when a deep learning network is processed through the processor architecture, so that a high-level video intelligent roadside parking system is in an optimal state.
Drawings
For a clearer description of the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and should not be considered as limiting the scope, for those skilled in the art, without performing creative efforts, other related drawings may be obtained according to the drawings, where the proportional relationships of the components in the drawings in the present specification do not represent the proportional relationships in actual material selection design, and are merely schematic diagrams of structures or positions, where:
FIG. 1 is a block diagram of the present invention;
Fig. 2 is a block diagram of a deep learning network acceleration module.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the invention, i.e., the embodiments described are merely some, but not all, of the embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps.
The present invention will be described in detail with reference to the accompanying drawings.
Example 1
As shown in fig. 1-2, the present invention is a deep learning processor architecture for smart parking, comprising a high-speed data interface module, a DMA module, a synchronization control module, a deep learning network acceleration module and a memory controller,
The high-speed data interface module is used for connecting external equipment and carrying out data interaction;
The synchronous control module comprises synchronous control, a sending data buffer, a receiving data buffer and a receiving address buffer;
The deep learning network acceleration module is used for data processing and realizing each deep learning network used by the parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; the input feature map memory, the kernel memory and the instruction controller are connected with an external memory through an external memory reading module to read data, the input feature map memory and the kernel memory are both in an A/B double-memory mode, and are connected with the data reading module to convey data to the inner product accelerator through the data reading module, and the instruction controller is connected with the inner product accelerator; the output controller is connected with the inner product accelerator for data interaction, the output characteristic diagram memory is connected with the inner product accelerator, and the data processing result of the inner product accelerator is sent to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage;
and the memory controller is in data interaction with the deep learning network acceleration module, intermediate data is stored into the external memory through the memory controller, and the intermediate data can be read in from the external memory.
In the invention, the processor architecture is realized on an FPGA chip, and Kintex-7 series FPGA chips are selected in the embodiment. The invention comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller, wherein 5 functional modules are used as auxiliary modules of the deep learning network acceleration module for realizing data interaction. The invention connects the external device through the high-speed data interface module, the external data passes the DMA module, the DMA module is the direct memory access module, and then sends to the deep learning network acceleration module through the synchronous control module, after the deep learning network acceleration module processes the data, some intermediate data is stored into the external memory through the memory controller, and when the data is needed, the intermediate data is read in from the external memory; and after the deep learning network acceleration module finishes data processing, the data is output to external equipment through the synchronous control module, the DMA module and the high-speed data communication interface.
The deep learning network acceleration module is used as a core module of the invention, and comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module, when the deep learning network acceleration module needs to process feature maps and kernel data, the input feature maps and the kernel data are read from an external memory through the external memory reading module and stored in the corresponding input feature map memory and the kernel memory; the inner product accelerator is used as a core computing unit of the deep learning network acceleration module, and the parallel processing units are more, and the processing speed exceeds the data reading speed of the input feature map memory and the inner core memory from the external memory, so that the input feature map memory and the inner core memory both adopt an A/B double-memory mode, namely a ping-pong double-memory mode, when the inner product accelerator reads the inner core memory A through the data reading module, the inner core memory B reads the data of the external memory through the external memory reading module, and when the inner product accelerator finishes the data interaction with the inner core memory A, the inner core memory A immediately starts to read the next batch of data after the data interaction with the inner core memory B, so that the data processing efficiency is improved. The data in the invention is transmitted to the inner product accelerator through the data reading module, meanwhile, according to the input instruction of the instruction controller, the inner product accelerator can complete various calculations, some intermediate data of the inner product accelerator can be transmitted to the output controller for secondary processing and returned to the inner product accelerator, and after the inner product accelerator completes the calculations, the result can be transmitted to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage.
The inner product accelerator is used as a core computing unit of the deep learning network acceleration module and can be represented by IPA, and is mainly used for completing parallel computation of multiplication and addition. The inner product accelerator receives a feature map vector with a length of 512 x 8 bits and a kernel vector with a length of 1024 x 8 bits and outputs inner products thereof, which can be 64, 32, 16 and 2 bit point numbers according to the instruction configuration of the instruction controller.
The inner product accelerator contains 32 processing units, each processing unit is composed of 16 multiplication units, each multiplication unit is a DSP unit of an FPGA, and can receive two groups of data to perform multiplication twice respectively, so we call IPA as IPA1 and IPA2, and inject different data into the IPA1 and IPA2 respectively. The method comprises the following steps:
A DSP48E in the FPGA can perform a calculation mode of p= (d±a) ×b±c, and the DSP48E is configured as p= (d+a) ×b+c, so that two 16-bit multiplications of p1=a×k1 and p2=a×k2 are required to be calculated, and the two multiplications are replaced by a 32-bit multiplication, where the substitution formula is p=a× (K1 < < 16+k2); a 32-bit output P may be obtained, where the upper 16 bits of P are P1 and the lower 16 bits are P2, C is configured based on the output result, c=0 if P2 is greater than or equal to 0, otherwise c=1 < <16.
The external memory read module reads data from the external memory, and the data read is controlled by a state machine: the state machine is divided into: idle, read feature map, read kernel, read bias, read instruction four states.
An external memory write module that performs each convolutional layer preload and post processing steps such as activation and Pooling operations, and then writes the data to external memory.
An output controller that calculates a final output of each convolution layer for an output of an Inner Product Accelerator (IPA), and completes data truncation.
Example two
This example is a further illustration of the present invention.
Based on the above embodiments, the parking system uses three deep learning networks, which are a deep learning vehicle recognition network, a deep learning license plate number and a word recognition network. The deep learning vehicle recognition network is used for an object recognition network for vehicle recognition, the deep learning license plate recognition network is used for a miniature object recognition network for license plate recognition, and the deep learning license plate number and character recognition network is used for a small object recognition network for license plate number and character recognition.
Furthermore, the deep learning vehicle recognition network, the deep learning license plate number and the character recognition network are all decomposed into substructures which can be reused, and the hardware acceleration of the three deep learning networks is completed through the combination of the calling of a special instruction set. The chip circuit area is saved, and the forward compatibility of the chip design is improved, so that the chip can meet the requirement of a new artificial intelligent network possibly occurring in the future as far as possible.
Example III
This example is a further illustration of the present invention.
The embodiment is based on the above embodiment, where the kernel memory is further capable of storing weight data and bias data. The external memory reading module reads the feature map, the kernel, the weight data and the bias data from the external memory, wherein the feature map is stored in the input feature map memory, and the kernel, the weight data and the bias data are stored in the kernel memory.
Example IV
This example is a further illustration of the present invention.
In this embodiment, based on the foregoing embodiment, the high-speed data interface module includes a PCIe or USB3 high-speed data communication interface. The corresponding external device should have a PCIe or USB3 data interface, such as a computer, corresponding thereto.
The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that are not creatively contemplated by those skilled in the art within the technical scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope defined by the claims.

Claims (5)

1. A deep learning processor architecture for smart parking, characterized by: comprises a high-speed data interface module, a DMA module, a synchronous control module, a deep learning network acceleration module and a memory controller,
The high-speed data interface module is used for connecting external equipment and carrying out data interaction;
The synchronous control module comprises synchronous control, a sending data buffer, a receiving data buffer and a receiving address buffer;
The deep learning network acceleration module is used for data processing and realizing each deep learning network used by the parking system; the deep learning network acceleration module comprises an external memory reading module, an input feature map memory, a kernel memory, an instruction controller, a data reading module, an inner product accelerator, an output controller, an output feature map memory and an external memory writing module; the input feature map memory, the kernel memory and the instruction controller are connected with an external memory through an external memory reading module to read data, the input feature map memory and the kernel memory are both in an A/B double-memory mode, and are connected with the data reading module to convey data to the inner product accelerator through the data reading module, and the instruction controller is connected with the inner product accelerator; the output controller is connected with the inner product accelerator for data interaction, the output characteristic diagram memory is connected with the inner product accelerator, and the data processing result of the inner product accelerator is sent to the external storage writing module through the output characteristic diagram memory and finally stored in the external storage;
The memory controller is in data interaction with the deep learning network acceleration module, intermediate data is stored into the external memory through the memory controller, and the intermediate data can be read in from the external memory;
the inner product accelerator is used as a core computing unit of a deep learning network acceleration module and is represented by IPA;
the inner product accelerator comprises 32 processing units in total, each processing unit consists of 16 multiplication units, each multiplication unit is a DSP unit of an FPGA, two groups of data are received to respectively execute two multiplications, IPA is called IPA1 and IPA2, and different data are respectively injected into the two groups of data, and the inner product accelerator specifically comprises the following steps:
a DSP48E in the FPGA performs p= (d±a) ×b±c, configures the DSP48E as p= (d+a) ×b+c, calculates two 16-bit multiplications of p1=ak1, p2=ak2, replaces the two multiplications with a 32-bit multiplication, and replaces the two multiplications with the formula p=a (k1 < < 16+k2); and obtaining 32-bit output P, wherein the upper 16 bits of P are P1, the lower 16 bits are P2, C is configured based on the output result, C=0 is caused if P2 is more than or equal to 0, and otherwise, C=1 < < 16.
2. The deep learning processor architecture for smart parking of claim 1, wherein: the parking system uses three deep learning networks, namely a deep learning vehicle recognition network, a deep learning license plate number and a character recognition network.
3. The deep learning processor architecture for smart parking of claim 2, wherein: the deep learning vehicle recognition network, the deep learning license plate number and the character recognition network are all decomposed into substructures which can be reused, and the hardware acceleration of the three deep learning networks is completed through the combination of the calling of a special instruction set.
4. The deep learning processor architecture for smart parking of claim 1, wherein: the kernel memory is also capable of storing weight data and bias data.
5. The deep learning processor architecture for smart parking of claim 1, wherein: the high speed data interface module comprises a PCIe or USB3 high speed data communication interface.
CN202010862272.8A 2020-08-25 2020-08-25 Deep learning processor architecture for intelligent parking Active CN111932436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010862272.8A CN111932436B (en) 2020-08-25 2020-08-25 Deep learning processor architecture for intelligent parking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010862272.8A CN111932436B (en) 2020-08-25 2020-08-25 Deep learning processor architecture for intelligent parking

Publications (2)

Publication Number Publication Date
CN111932436A CN111932436A (en) 2020-11-13
CN111932436B true CN111932436B (en) 2024-04-19

Family

ID=73305183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010862272.8A Active CN111932436B (en) 2020-08-25 2020-08-25 Deep learning processor architecture for intelligent parking

Country Status (1)

Country Link
CN (1) CN111932436B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336877A (en) * 2013-07-25 2013-10-02 哈尔滨工业大学 Satellite lithium ion battery residual life prediction system and method based on RVM (relevance vector machine) dynamic reconfiguration
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A Convolutional Neural Network Accelerator Circuit Based on Fast Filtering Algorithm
CN110058883A (en) * 2019-03-14 2019-07-26 成都恒创新星科技有限公司 A kind of CNN accelerated method and system based on OPU
CN110058882A (en) * 2019-03-14 2019-07-26 成都恒创新星科技有限公司 It is a kind of for CNN accelerate OPU instruction set define method
CN212302545U (en) * 2020-08-25 2021-01-05 成都恒创新星科技有限公司 Deep learning processor architecture for intelligent parking

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12131250B2 (en) * 2017-09-29 2024-10-29 Intel Corporation Inner product convolutional neural network accelerator

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336877A (en) * 2013-07-25 2013-10-02 哈尔滨工业大学 Satellite lithium ion battery residual life prediction system and method based on RVM (relevance vector machine) dynamic reconfiguration
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A Convolutional Neural Network Accelerator Circuit Based on Fast Filtering Algorithm
CN110058883A (en) * 2019-03-14 2019-07-26 成都恒创新星科技有限公司 A kind of CNN accelerated method and system based on OPU
CN110058882A (en) * 2019-03-14 2019-07-26 成都恒创新星科技有限公司 It is a kind of for CNN accelerate OPU instruction set define method
CN212302545U (en) * 2020-08-25 2021-01-05 成都恒创新星科技有限公司 Deep learning processor architecture for intelligent parking

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于可编程逻辑器件的卷积神经网络协处理器设计;杨一晨;梁峰;张国和;何平;吴斌;高震霆;;西安交通大学学报;20180710(第07期);全文 *

Also Published As

Publication number Publication date
CN111932436A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN109543832B (en) Computing device and board card
CN109522052B (en) Computing device and board card
CN111860398B (en) Remote sensing image target detection method, system and terminal device
CN104238993B (en) The vector matrix product accelerator of microprocessor integrated circuit
CN108805272A (en) A kind of general convolutional neural networks accelerator based on FPGA
CN109934339A (en) A Universal Convolutional Neural Network Accelerator Based on One-Dimensional Systolic Array
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
CN115880132B (en) Graphics processor, matrix multiplication task processing method, device and storage medium
CN111814957B (en) Neural network operation method and related equipment
US11775808B2 (en) Neural network computation device and method
CN110059797B (en) Computing device and related product
CN109711540B (en) Computing device and board card
CN110163349B (en) Network model calculation method and device
CN107729944B (en) Identification method and device of popular pictures, server and storage medium
CN111178513B (en) Convolution implementation method and device of neural network and terminal equipment
CN110059809B (en) Computing device and related product
CN110515872B (en) Direct memory access method, device, special computing chip and heterogeneous computing system
CN111161705A (en) Voice conversion method and device
US11256940B1 (en) Method, apparatus and system for gradient updating of image processing model
CN111932436B (en) Deep learning processor architecture for intelligent parking
CN212302545U (en) Deep learning processor architecture for intelligent parking
CN112711051A (en) Flight control system positioning method, device, equipment and storage medium
CN111645687A (en) Lane changing strategy determining method, device and storage medium
CN109799483A (en) A kind of data processing method and device
CN114968182A (en) Operator splitting method, control method and device for storage and computation integrated chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant