CN111914753A

CN111914753A - Low-power-consumption intelligent gun aiming image processing system and method based on deep learning

Info

Publication number: CN111914753A
Application number: CN202010765005.9A
Authority: CN
Inventors: 毕笃彦; 王敬梅; 王生慧; 王洪涛; 石青松
Original assignee: Xi'an Jsbound Technology Corp
Current assignee: Xi'an Jsbound Technology Corp
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2020-11-10

Abstract

The invention belongs to the technical field of aiming instruments, and discloses a low-power-consumption intelligent gun aiming image processing system and a low-power-consumption intelligent gun aiming image processing method based on deep learning, wherein the low-power-consumption intelligent gun aiming image processing system based on the deep learning comprises: the system comprises an acquisition interface module, a core processing module and an algorithm software module. The invention adopts a deep learning algorithm and a low-power processor to realize the target detection, tracking and automatic firing control of the high-efficiency intelligent gun sight, has the characteristics of strong detection and identification capability, high intelligent degree, high working efficiency and strong expansion capability, and can effectively solve the problems of high power consumption and no intelligent image processing function of the traditional gun sight image processing chip; the intelligent rifle sight system has the advantages that the deep learning and low-power-consumption processor is introduced into the intelligent rifle sight design, so that the high-efficiency target detection, tracking and automatic firing level in the intelligent rifle sight development can be effectively improved, target detection, identification and tracking are intelligently carried out, the intelligent level of a sniper rifle sight is effectively improved, and the intelligent rifle sight system is convenient to popularize and use.

Description

Low-power-consumption intelligent gun aiming image processing system and method based on deep learning

Technical Field

The invention belongs to the technical field of aiming instruments, and particularly relates to a low-power-consumption intelligent gun aiming image processing system and method based on deep learning.

Background

At present, with the urgent need for automation and intellectualization of combat weapons in modern war, the conventional optical and electro-optical rifles of rifles have begun to shift to the intelligent rifles. The intelligent sighting device is characterized in that an image sensor and an intelligent image processing module are added on the basis of the traditional optical sighting device, and an intelligent image processing module is added on the basis of the photoelectric sighting device, so that the sighting device can carry out intelligent scene image processing. The target in the image can be automatically or assisted to be quickly found by a sniper, the target identification and tracking measurement can be carried out, the position error between the target and a bullet impact point can be calculated in real time, the sniper is assisted to carry out auxiliary firing or automatic firing of a rifle, and the target finding probability and shooting hit rate of the sniper can be greatly improved.

Because the sniper rifle has various use environments, complex external background and high target finding difficulty, the requirement on a sniper is high, and long-time training and culture are usually required to formally execute tasks. If the intelligent image processing module can be used for automatically or auxiliarily finding and tracking the target, the requirement on a sniper is greatly reduced, and the attack efficiency is greatly improved. The intelligent image processing module mainly solves the following problems:

the problem of difficulty in finding the target is solved. When a task is executed, due to different environments, the external illumination changes greatly in different time periods, the target background is complex, and more time accumulation and high concentration are needed for quickly finding the target. If the intelligent target detection and tracking technology can be adopted to prompt and track the appearance of the target, the burden of a sniper can be effectively reduced.

The problems of accurately tracking the target and measuring are solved. In the attack, on the premise of knowing the environmental factors such as wind speed, atmospheric pressure, humidity and the like, the accurate judgment of the current position, the movement speed and the like of the attack target is the key for determining the success of the attack, and long-term experience accumulation and repeated test adjustment are needed. If an intelligent target tracking measurement algorithm is adopted, the target can be accurately tracked, the deviation between the target position and the impact point can be accurately measured, and the accuracy of target attack is effectively improved on the premise of determining the attack lead.

The problems of low power consumption and long-time work are solved. Because the rifle is mainly used in the field, so reduce the power consumption and ensure long-time work, it is an important problem to reduce the volume and ensure convenient to carry. The traditional image processing chips such as DSP, FPGA, GPU and the like have large power consumption and large volume, and the application of the intelligent image processing technology in gun aiming is seriously influenced. If the image processing function of the intelligent rifle sight can be realized by adopting a low-power-consumption and miniaturized image processing chip, the popularization and the application of the rifle sight can be greatly promoted.

Through the above analysis, the problems and defects of the prior art are as follows: (1) finding the target is difficult. When a task is executed, due to different environments, the external illumination changes greatly in different time periods, the target background is complex, and more time accumulation and high concentration are needed for quickly finding the target. (2) There is a lack of accurate tracking of targets and measurements. In the attack, on the premise of knowing the environmental factors such as wind speed, atmospheric pressure, humidity and the like, the accurate judgment of the current position, the movement speed and the like of the attack target is the key for determining the success of the attack, and long-term experience accumulation and repeated test adjustment are needed. (3) Calibration problems for different snipers when using rifles. Each sniper has different physical and experience, so that parameters to be calibrated for finding and attacking targets by using the sniper rifle are different, for example, some snipers are inclined upwards in the habitual aiming process, while some snipers are inclined downwards in the habitual aiming process, the requirements for calibrating the rifle are different, and therefore, an intelligent database needs to be established, and different calibration schemes are selected for different snipers. (4) Low power consumption and long-time operation. Because the rifle is mainly used in the field, so reduce the power consumption and ensure long-time work, it is an important problem to reduce the volume and ensure convenient to carry. The traditional image processing chips such as DSP, FPGA, GPU and the like have large power consumption and large volume, and the application of the intelligent image processing technology in gun aiming is seriously influenced.

The difficulty in solving the above problems and defects is: the difficulty in solving the above problems is great. Machine vision, which actually uses a camera to capture an image and uses a computer to extract and analyze image information, is far inferior to human's processing capability of visual information in terms of computer image processing capability compared with human's vision, which has evolved for thousands of years. Human processing of visual information is mainly parallel processing and has quite abundant visual information model experience, while computer processing is serial processing conventionally, the model is simple, and the storage mode and the capacity are limited, so that quick and reliable target discovery is realized in a complex environment, and the identification is very difficult. With the development of deep learning technology in recent years, after a large amount of data learning and modeling, the system already has a better recognition result for specific target recognition, for example, for face recognition of a simple background, the recognition rate of a face in a database has reached or even exceeded the recognition rate of human eyes through complex deep learning training and modeling. However, in complex environments, the computer target detection and recognition capability still cannot reach the capability of human vision.

In the process of tracking and attacking the moving target, because the position difference between the target and the aiming point needs to be continuously calculated and judged, the moment of attack triggering needs to be quickly and accurately calculated in real time. Human vision is mainly based on experience, and a computer needs accurate calculation, and the tracking of the target relates to accurate target feature description and detection matching, so that the requirements on the real-time performance, the accuracy and the robustness of the algorithm are extremely high.

User intelligent calibration requires long-term data accumulation and better model training, both of which require better algorithms and models, because it involves analysis of large data and data mining.

In the aspect of the realization of the gun sight, the requirements on carrying convenience, reliable work, high intelligent degree, long working time and the like are met, so that the realization requirement on the intelligent gun sight is very high.

The significance of solving the problems and the defects is as follows: according to the difficulties, after the latest deep learning, data mining and low-power-consumption high-performance processor is introduced, the intelligent gun aiming function can be realized. After the intelligent gun sight is used, for intelligent operation, an advanced technical means is provided for assisting a sniper to quickly find a target and accurately attack the target, so that a user can quickly start, and a large amount of training time and training material consumption are saved. In addition, the invention can also be popularized to other equipment for finding the target and tracking and monitoring the target.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a low-power-consumption intelligent gun aiming image processing system and method based on deep learning, in order to improve the performance of a sniper rifle by an intelligent image processing technology as soon as possible.

The invention is realized in this way, a low-power consumption intelligent gun aiming image processing system based on deep learning, comprising:

the acquisition interface module is used for acquiring a plurality of paths of images of the image sensor of the gun aiming device;

the core processing module comprises a low-power SOC, a flash unit, a DDR unit, a clock unit and an inter-board connection unit with the acquisition interface unit;

the system comprises an algorithm software module, a software bottom driving module, a configuration management module, a target detection module, an identification algorithm design realization module, a target tracking algorithm design realization module, a UI (user interface) design module and a task scheduling design module.

Further, the acquisition interface module comprises:

the system comprises a serial communication signal interface of a gun aiming fire control system, an image storage and recording interface of a gun aiming device in the searching and attacking process, an image output interface of the gun aiming device, a debugging interface, a power supply management unit and an inter-board connecting unit.

Further, the acquisition interface module comprises a visible light and infrared video input acquisition interface;

the visible light video input interface is LVDS and comprises four paths of data differential signals and one path of clock differential signal;

the infrared video input is HDMITypeC, is converted into BT1120 through an HDMI decoding chip, and is sent into an MIPI channel;

the infrared video PAL interface connector is MMcx plug, coaxial cable transmission, signal definition is PAL _ IN and GND, resolution is 640X512X30HZ, converted into BT656 by PAL decoding chip and sent to MIPI channel.

Further, the core processing module comprises a low-power SOC selection module, a flash unit, a DDR unit, a clock unit and an inter-board connection unit.

Further, the algorithm software module is used for UI interface adjustment, novel target detection, identification and tracking, and forms detection, identification and tracking measurement suitable for the target of the type required by the application, and comprises the following steps: the system comprises a bottom driving software module, a configuration management software module, an intelligent target detection module, an identification algorithm design software module, a target tracking algorithm software module, a UI (user interface) editing software module and a task scheduling software module.

Another object of the present invention is to provide a low-power-consumption smart gun sight image processing method based on deep learning, which applies any one of the low-power-consumption smart gun sight image processing systems based on deep learning, and is characterized in that the low-power-consumption smart gun sight image processing method based on deep learning includes the following steps:

the target detection, tracking and automatic firing of the intelligent gun sight are carried out by adopting a deep learning algorithm and a low-power-consumption processor, so that the control of the gun sight is realized.

The invention also aims to provide a construction method of a low-power-consumption intelligent gun sight image processing system based on deep learning, which comprises the following steps:

establishing independent folders for an acquisition interface module and a core processing module respectively by using circuit board wiring software, and wiring and manufacturing boards according to needs;

step two, adopting an HIMPP Haisi software development platform, directly compiling software on the basis of an algorithm software code to form an operation code, downloading the operation code into a core processing module flash, and executing image acquisition, interface configuration, task scheduling, target detection, identification and tracking measurement of the intelligent gun aiming;

modifying and compiling corresponding software codes in the driving software, the configuration management software and the image acquisition software codes according to the image resolution, the frame rate and the image type;

step four, in the target detection and recognition algorithm software code, training and changing a deep learning target detection model according to the type and the application environment of the target to be attacked;

fifthly, in the UI editing software codes, modifying the codes according to the requirements of users;

and step six, in the task scheduling software code, modifying the flow scheduling and the protocol according to the control requirement of the user interface.

Another object of the present invention is to provide a program storage medium for receiving user input, the stored computer program causing an electronic device to execute the method for processing a low-power smart gun sight image based on deep learning.

It is another object of the present invention to provide a computer program product stored on a computer readable medium, comprising a computer readable program for providing a user input interface to implement the low power smart gun aim image processing method based on deep learning when executed on an electronic device.

By combining all the technical schemes, the invention has the advantages and positive effects that: the invention adopts a deep learning algorithm and a low-power processor to realize the target detection, tracking and automatic firing control of the high-efficiency intelligent gun sight, has the characteristics of strong detection and identification capability, high intelligent degree, high working efficiency and strong expansion capability, and can effectively solve the problems of high power consumption and no intelligent image processing function of the traditional gun sight image processing chip; the intelligent rifle sight system has the advantages that the deep learning and low-power-consumption processor is introduced into the intelligent rifle sight design, so that the high-efficiency target detection, tracking and automatic firing level in the intelligent rifle sight development can be effectively improved, target detection, identification and tracking are intelligently carried out, the intelligent level of a sniper rifle sight is effectively improved, and the intelligent rifle sight system is convenient to popularize and use.

Drawings

Fig. 1 is a block diagram of a low-power consumption intelligent gun aiming image processing system based on deep learning according to an embodiment of the present invention;

in the figure: 1. an acquisition interface module; 2. a core processing module; 3. and (4) an algorithm software module.

Fig. 2 is a schematic diagram of a low-power consumption intelligent gun sight image processing system based on deep learning according to an embodiment of the present invention.

Fig. 3 is a flowchart of a method for constructing a low-power-consumption intelligent gun sight image processing system based on deep learning according to an embodiment of the present invention.

Fig. 4 is a diagram illustrating a definition of an optical input interface according to an embodiment of the present invention.

Fig. 5 is a diagram of an infrared HDMI interface according to an embodiment of the present invention.

Fig. 6 is a block diagram of a core module according to an embodiment of the present invention.

Fig. 7 is an internal structural view of the Hi3519AV100 according to the embodiment of the present invention.

FIG. 8 is a block diagram of an image recognition tracking processing module according to an embodiment of the present invention

Fig. 9 is a flow chart of MPP internal processing provided by an embodiment of the present invention.

Figure 10 is a data flow diagram of an exemplary MPP common video buffer pool provided by an embodiment of the present invention.

FIG. 11 is a diagram of the location of the NNIE acceleration engine in the system provided by an embodiment of the present invention.

Fig. 12 is a flowchart of implementation of a system initialization function according to an embodiment of the present invention.

Fig. 13 is a flowchart of a continuous scanning process according to an embodiment of the present invention.

FIG. 14 is a flowchart of task scheduling software provided by an embodiment of the present invention.

Fig. 15 is a diagram of a YOLO-3 network structure according to an embodiment of the present invention.

Fig. 16 is a diagram of a darknet-53 network architecture provided by an embodiment of the present invention.

Fig. 17 is a flowchart of developing a classification model for target detection under the Caffe framework according to an embodiment of the present invention.

Fig. 18 is a schematic diagram of a phase correlation tracking algorithm provided by an embodiment of the present invention.

Fig. 19 is a power on self test screen diagram according to an embodiment of the present invention.

Fig. 20 is a diagram of an image display screen provided by an embodiment of the present invention.

Fig. 21 is a diagram of a deep learning target detection screen according to an embodiment of the present invention.

Fig. 22 is a diagram of a target mark screen according to an embodiment of the present invention.

Fig. 23 is a diagram of an attack screen provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a low-power-consumption intelligent gun sight image processing system and a low-power-consumption intelligent gun sight image processing method based on deep learning, and the invention is described in detail below with reference to the attached drawings.

As shown in fig. 1, a low-power consumption intelligent gun sight image processing system based on deep learning according to an embodiment of the present invention includes: the system comprises an acquisition interface module 1, a core processing module 2 and an algorithm software module 3.

The acquisition interface module 1 is used for receiving multipath image input acquisition of infrared, low-light or visible light of an image sensor of the gun sight, receiving control signal input and image processing state feedback of a fire control system of the gun sight, providing image output interface power of the gun sight for a sniper to observe images and detect tracking information, providing a debugging interface of a module, providing a power management unit 1-1 and an inter-board connecting unit 1-2, and storing and recording images of the gun sight in a searching and attacking process so as to perform post analysis and evaluation.

The core processing module 2 comprises a low-power SOC unit 2-1, a flash unit 2-2, a DDR unit 2-3, a clock unit 2-4 and an inter-board connecting unit 2-6 connected with the acquisition interface unit 2-5, so that processing and analysis of images and target detection, identification and tracking are achieved.

And the algorithm software module 3 is mainly matched with a gun aiming hardware module to realize the main function of intelligent image processing. The system comprises bottom-layer driving software, configuration management software, intelligent target detection and identification algorithm software, target tracking algorithm software, UI (user interface) editing software and task scheduling software. The invention can be used for image acquisition, intelligent target detection, identification and tracking in various gun aiming intelligent designs. The module has the characteristics of low power consumption, miniaturization and intellectualization.

The acquisition interface module, the core processing module and the algorithm software module provided by the embodiment of the invention are respectively 2 hardware board design principle drawing files and 1 software development file.

The acquisition interface module provided by the embodiment of the invention comprises a multi-path image acquisition unit design, a communication signal interface unit design, an image storage and recording unit design, an image output interface unit design, a debugging interface unit design, a power management unit design and an inter-board connection unit design.

The core processing module provided by the embodiment of the invention comprises low-power-consumption SOC selection, flash unit design, DDR unit design, clock unit design and inter-board connection unit design.

In the algorithm software provided by the embodiment of the invention, bottom layer driving software, configuration management software, intelligent target detection and identification algorithm design software, target tracking algorithm software, UI (user interface) editing software and task scheduling software are provided. Developers can continue to expand on the basis of the method, and UI interface adjustment, novel target detection, identification and tracking are achieved, and finally detection, identification and tracking measurement suitable for the target class required by the application is formed.

As shown in fig. 3, the method for constructing a low-power-consumption intelligent gun sight image processing system based on deep learning according to an embodiment of the present invention includes the following steps:

and S101, establishing independent folders for the acquisition interface module and the core processing module respectively by using circuit board wiring software, and wiring and manufacturing boards according to requirements.

S102, directly compiling software by adopting a HIMPP Haisi software development platform on the basis of an algorithm software code to form an operation code, downloading the operation code into a core processing module flash, and executing the functions of image acquisition, interface configuration, task scheduling, target detection, identification and tracking measurement of the intelligent gun sight.

S103, in the driving software, the configuration management software and the image acquisition software codes, modifying and compiling the corresponding software codes according to the image resolution, the frame rate and the image types.

And S104, in the target detection and identification algorithm software code, training and changing a deep learning target detection model according to the type and application environment of the target to be attacked.

And S105, modifying the code according to the requirement of the user in the UI editing software code.

And S106, modifying the flow scheduling and the protocol according to the control requirement of the user interface in the task scheduling software code.

The technical solution of the present invention is further described with reference to the following examples.

Fig. 2 shows the relationship among the modules, and the low-power-consumption intelligent gun sight image processing system based on deep learning according to the embodiment of the present invention performs image (visible light/infrared) acquisition and image (noise removal, brightness, contrast, amplification, and reduction) control through the acquisition interface module and the camera, the key and the gun sight fire control system of the gun sight, and has a debugging interface and a power management function. The core processing module utilizes the MPP image preprocessing function of the multimedia processor, the image compression function, the deep learning target detection and identification function and the target tracking and measuring function of the low-power-consumption DSP processor to realize the target finding, tracking and measuring functions required by the intelligent gun aiming under the coordination of detection, identification and tracking software. The algorithm module is based on a YOLO-3 deep learning algorithm and a KCF target tracking algorithm, programming development is carried out in a HIMPP integrated development environment, target running codes are formed and downloaded into a flash of a core processor, and the intelligent image processing function required by a gun aiming system is realized. Based on the hardware module and the algorithm software code, developers only need to pay attention to the image interface type corresponding to the development of the gun aiming project, the intelligent calculation process is adopted, and the new type gun aiming requirement can be quickly realized through secondary development and parameter configuration of hardware and software.

The low-power-consumption intelligent gun sight image processing system based on deep learning provided by the embodiment of the invention is further described with reference to fig. 3.

Firstly, establishing independent folders for an acquisition interface module and a core processing module respectively by using EDA circuit board wiring software pads; and wiring and board making are carried out according to the needs. By adopting an HIMPP Haisi software development platform, on the basis of an algorithm software code, software is directly compiled to form an operation code, and the operation code is downloaded into a core processing module flash, so that the functions of image acquisition, interface configuration, task scheduling, target detection, identification and tracking measurement of the intelligent gun sight can be executed. In the driving software, the configuration management software and the image acquisition software code, the corresponding software code is modified and compiled according to the image resolution, the frame rate and the image type so as to meet the actual application requirement. In the target detection and identification algorithm software code, a deep learning target detection model is trained and changed according to the type of a target to be attacked and the application environment (urban area, desert, forest, grassland and the like) so as to meet the actual requirement. In the UI interface editing software code, according to the requirements of users, the code is modified to meet the actual use requirements. In the task scheduling software code, the existing control instructions can be increased and decreased according to the control requirements of the user interface so as to meet the actual use requirements.

During the development, debugging or operation stage, relevant design and program are modified as required to adapt to actual needs. The invention adopts Pads to carry out schematic diagram and wiring design of a hardware module, and adopts a HIMPP platform to carry out software design under Linux.

Developers can modify and upgrade hardware according to the module hardware design file, and upgrade program functions through software modification. Because practical development and debugging tests are provided, the image processing function of the intelligent gun sight can be realized only by making a plate and downloading software according to the file provided by the invention when the use environments are completely the same. By adopting the verified schematic diagram and the development source code, a designer can easily modify design data according to application requirements and change parameter configuration to meet project requirements.

Second, collection interface module

The acquisition interface module is used for acquiring a plurality of paths of images of the image sensor of the gun aiming device; the device is characterized by comprising a visible light and infrared video input acquisition interface.

The optical video input interface can be LVDS, and includes four paths of data differential signals and one path of clock differential signal. Resolution was 2591X1944X30 HZ. The interface definition is shown in fig. 4.

The infrared video input is HDMI type c, and the interface definition is shown in fig. 5. Because the data volume is large, the MIPI channel needs to be occupied. The signal is converted into BT1120 through an HDMI decoding chip and sent into an MIPI channel.

The infrared video PAL interface connector is an MMcx plug, coaxial cable transmission is carried out, signal definitions are PAL _ IN and GND, the resolution is 640X512X30HZ, the data volume is not large, only one channel is occupied, and the infrared video PAL interface connector is converted into BT656 through a PAL decoding chip and is sent into an MIPI channel.

And the core processing module is a core processing component of the invention and realizes the receiving, conversion, preprocessing, detection, marking, identification and tracking measurement of the acquired image. The hardware comprises a low power consumption SOC, a flash unit, a DDR unit, a clock unit and a connection unit with the acquisition interface unit; the basic schematic diagram is shown in fig. 6.

The requirements of the image recognition tracking processing module on small product size, low power consumption, high localization rate, subsequent component supply, after-sale maintainability, guarantee and the like are considered. The invention selects a domestic Haisi AI chip Hi3519AV100 as a main processor of an image identification tracking processing module; the chip integrates dual cores A53, supports a Linux operating system, supports leading multi-channel 4K Sensor input in the industry, processes multi-channel ISP images, supports HDR10 high dynamic range technical standard, supports multi-channel panoramic hardware splicing and supports 8K30/4K120 video recording. The internal structure is shown in fig. 7:

processor core

2 ARM Cortex A53@1.4GHz, 32KB I-cache, 32KB D-cache/256KB L2 cache, support Neon acceleration, integrated FPU processing unit.

Video encoding and decoding

Supporting JPEG Baseline;

H.265/H.264 codec maximum resolution: 8192x 8192;

H.265/H.264 codec Performance:

3840x2160@60fps +720p @30 fps;

3840x2160@60fps decoding;

3840x2160@30fps encoding +3840x2160@30fps decoding;

supporting various code rate control modes such as CBR/VBR/AVBR/FIXQP/QPMAP;

the maximum code rates of H.265/H.264 coding output are respectively as follows: 120Mbps/200 Mbps;

support 8 region of interest (ROI) encodings.

NNIE

Supporting various classification neural networks such as AlexNet, VGG, ResNet, GoogleNet and the like;

supporting various target detection neural networks such as FasterR-CNN, SSD, Yolo-2 and the like;

2.0Tops neural network operation performance;

supporting complete API and tool chain (compiler, emulator), easy to adapt to customer-customized network.

ISP and image processing

The method supports multi-path time division multiplexing and can process multi-path sensor input video;

support for 3A (AE/AWB/AF) functionality, 3A parameters user adjustable;

supporting de-stationary pattern noise (FPN);

two-frame exposure WDR and LocalToneMapping are supported, and strong light suppression and backlight compensation are supported;

dead pixel correction and lens shading correction are supported;

supporting image dynamic contrast enhancement and edge enhancement processing;

supporting color difference correction (CAC) and removing purple edges;

supporting defogging;

supporting 6-Dof digital anti-shake and Rolling-shutter correction;

support multiple scaling outputs, scaling factor: 1/15.5-16 x;

support the OSD superposition of the maximum coding pretreatment of 8 areas;

and providing a PC-side ISP adjusting tool.

Video input interface

The serial input of a 12-lane Image Sensor is supported;

the maximum support can be 5-path Sensor serial input;

maximum input resolution: 7680x 4320;

support 10/12/14bit Bayer RGB DC time sequence video input;

BT.656 and BT.1120 video input is supported;

and 1-4 paths of YUV are input through an MIPI virtual channel.

Video output interface

The HDMI2.0 interface is supported, and the maximum output of 4Kx2K (4096x2160) @60fps can be supported;

the system supports a 4-lane MIPI DSI interface and can support 1080p @60fps output to the maximum;

supports 6/8/16/24bit digital LCD/BT.656/BT.1120 interface, and can output 1080p @60fpsRGB/YUV data to the maximum;

support 2 independent high definition video output channels (DHD0, DHD 1);

supporting non-homologous display of any two interfaces;

network interface

1 gigabit ethernet interface;

two interface modes of RGMII and RMII are supported;

10/100Mbit/s half duplex or full duplex is supported;

1000Mbit/s full duplex is supported;

and the TSO is supported, and the CPU overhead is reduced.

Peripheral interface

2 SDIO3.0 interfaces, wherein:

SDIO0 supporting SDXC card

SDIO1 supports butt joint WiFi module

1 USB3.0/PCIe2.0 multiplexing interface

Can be configured into two modes of USB3.0only and PCIe2.0x1+ USB 2.0;

when used for a PCIe2.0 interface, the system supports RC and EP functions;

when the USB Host/Device is used for a USB3.0 interface, the USB Host/Device is supported to be configurable;

1 USB2.0 interface supporting Host/Device configuration;

internal POR (power-on reset) signal output is supported, and external reset input is also supported;

the internal RTC is supported and can be independently powered by a battery;

an integrated 4-channel LSADC;

9 UART interfaces (part of pins are multiplexed with other pins);

supporting a plurality of I2C interfaces, SPI interfaces and GPIO interfaces;

external memory interface

32bit DDR4/LPDDR4 interface

SPI Nor Flash interface

SPI Nand Flash interface

NAND Flash interface

Support for eMMC5.1 interface

Power consumption

Typical scenario (4Kx2K (3840x2160) @30fps coding + neural network algorithm) power consumption: 1.9W

Supporting multiple power saving modes

Operating voltage

The core voltage is 0.8V;

the IO voltage is 1.8V/3.3V;

the voltage of a DDR4 SDRAM interface is 1.2V;

the LPDDR4 interface voltage is 1.1V.

And the algorithm software module is used for processing images of the matched hardware module and comprises a software bottom layer driving module, a configuration management module, a target detection module, an identification algorithm module, a target tracking algorithm design realization module, a UI (user interface) design module and a task scheduling design module. The software design framework is shown in fig. 8.

The application program mainly comprises a media processing platform MPP based on the Haisi chip, a neural network acceleration engine NNIE hardware processing unit, an open source computer vision library OpenCV, a Linux operating system, a deep learning framework Caffe and a YOLO-3 target classification detection algorithm.

The Linux operating system comprises an NFS file system, a neural network acceleration engine NNIE driver, a process management, a memory management, an HDMI driver, an MIPI driver and a media processing platform MPP driver.

The driver of the media processing platform MPP comprises a video input module VI driver, a video processing subsystem module VPSS driver, a video coding compression module VENC driver, a video decoding module VDEC driver and a video output module VO driver.

The application program mainly realizes the functions of power-on self-test, acquisition of video images in different scenes, image enhancement processing, image color format conversion, target identification, detection and tracking, superposition and synthesis of target tracking frame information on the video images, OLED (organic light emitting diode) real-time display of synthesized video image data, coding, compression, video storage and Ethernet transmission of the synthesized video image data, decoding, display and playback of the stored compressed video, system instruction serial port input and system state information feedback serial port output, system fire control GPIO output control, power management GPIO output control and the like;

according to the main functions in the image recognition, tracking and processing module, the main components of the application program are as follows:

a system initialization section;

a user logs in an information processing thread;

inputting serial port instruction data into a processing thread;

outputting a processing thread by the serial port instruction data;

an infrared video image processing thread;

visible light video image processing threads;

superposing the serial port instruction data and the graphic characters to synthesize a video image processing thread;

compressing, recording, storing and processing threads of the synthesized video images;

synthesizing a video image display and storage video playback processing thread;

target detection tracking processing thread;

as shown in fig. 9, the main internal processing flow of the haisi media processing platform is mainly divided into modules such as Video Input (VI), Video Processing (VPSS), Video Encoding (VENC), Video Decoding (VDEC), Video Output (VO), and REGION management (REGION).

Wherein:

the VI module captures a video image, can perform processing such as shearing, denoising and the like on the video image, and outputs a plurality of paths of image data with different resolutions.

The decoding module decodes the encoded video code stream, transmits the analyzed image data to VPSS for image processing, and then transmits the image data to VO for display. The video code stream of H.265/H.264/JPEG format can be decoded.

The VPSS module receives the images sent by the VI and the decoding module, can perform image enhancement, sharpening and other processing on the images, and realizes that multiple paths of image data with different resolutions are output from the same source for encoding, previewing or capturing.

The encoding module receives image data which is captured by the VI and output after VPSS processing, can superpose OSD images set by a user through the Region module, then encodes according to different protocols and outputs corresponding code streams.

And the VO module receives the output image processed by the VPSS, can perform processing such as playing control and the like, and finally outputs the output image to peripheral video equipment according to an output protocol configured by a user.

Video cache pool

The video cache pool mainly provides a large-block physical memory management function for media services, is responsible for allocating and recovering the memory, fully plays the role of the memory cache pool, and enables physical memory resources to be reasonably used in each media processing module. A typical common video buffer pool data flow diagram for MPP is shown in figure 10.

A group of cache blocks with the same size and continuous physical addresses form a video cache pool. The common video cache pool must be configured prior to system initialization. The number of the public cache pools and the size and the number of the cache blocks are different according to different services.

All video input channels can obtain video cache blocks from a common video cache pool for storing collected images, as shown in fig. 10, VI obtains video cache blocks Bm from a common video cache pool B, the cache blocks Bm are sent to the VPSS via VI, and the input cache blocks Bm are released back to the common video cache pool after being processed by the VPSS. Assuming that the working mode of the VPSS channel is USER, the VPSS channel 0 acquires a buffer block Bi from a public video buffer pool B as an output image buffer and sends the buffer block Bi to the VENC, the VPSS channel 1 acquires a buffer block Bk from the public video buffer pool B as an output image buffer and sends the buffer block Bk to the VO, the Bi is released back to the public video buffer pool after being encoded by the VENC, and the Bk is released back to the public video buffer pool after being displayed by the VO.

Deep learning acceleration engine (NNIE or NPU)

The deep learning acceleration engine NNIE is a deep learning dedicated accelerator based on neural network structures such as CNN and RCNN, and can be used in application scenarios such as image classification and target detection, and the position of the NNIE acceleration engine in the system is shown in fig. 11. Hi3519AV100 supports 1 NNIE.

Characteristics of

The NNIE accelerated engine characteristics are as follows:

supporting N × N convolution;

support for Pooling (Max and Average);

supporting Stride;

supporting Pad;

support for activation functions (Relu, Sigmoid, and TanH);

support LRN operations;

BN (batch normalization) is supported;

support the multiplication and addition operation (Inner Product) of the vector and the matrix;

support for Concat;

support for Eltwise;

supporting 8-bit data and parameter modes;

the support data and the parameter bit width are configurable;

parameter compression and parameter sparseness are supported;

the method supports the input image to be a single channel (gray scale image) and a three-channel (RGB format);

support image pre-processing (averaging and pixel value scaling);

supporting image batch processing;

and supporting the reporting of the intermediate layer result.

System initialization function

The image identification tracking processing module application program system initialization function mainly comprises the following functions: initializing system parameters, initializing a serial port, initializing a network port, initializing a media processing platform MPP, initializing a fire control and system state output GPIO port, and performing power-on self-test; the system initialization function implementation flow chart is shown in fig. 12:

system parameter and system menu initialization

(1) And system parameters:

selecting a view field, wherein the default view field is infrared;

decompressing an H265 encoded compressed file which displays an infrared video image defaulted to a video source display;

the video display target equipment for decompressing the display video comprises a liquid crystal screen and an OLED (organic light emitting diode) micro display screen, and the OLED micro display screen is selected as the default display screen;

electronic zoom parameter setting: defaults to 1; the selectable variable magnification coefficient is 1 to 4;

detecting that the number of identified targets is 0;

the target flag default setting is 0;

the target tracking frame position default x _ position and the high y _ position are both set to 0;

the default width and height of the size of the target tracking frame are both set to be 0;

the target tracking movement speed is defaulted to 0 m/s;

(2) initializing a system menu: shooting scenes can be selected as follows: desert, gobi, grassland, jungle, ocean and the like, wherein the default scene is the desert; the attack types chosen are: people, vehicles, targets and the like, wherein the attack target is the target by default;

target selection: after the attacker selects a scene and an attack type according to an actual shooting scene, the image recognition tracking processing module detects and recognizes the attack type in the shooting scene, 5 attack targets can be recognized at most simultaneously, after one attack target is selected, the trajectory calculation module and the laser range finder calculate and provide coordinates of a fire control excitation attack point, the image recognition tracking processing module generates an attack division line according to the coordinates of the attack point, and controls the fire control shooting of the firearm excitation module to the attack target through the GPIO port.

(3) Serial port initialization

Opening the serial device, wherein the serial device file is "/dev/ttyAMA 1";

setting serial port parameters:

baud rate 115200; 1 bit start bit; 8bit data bits; 1 stop bit; no check bit;

(4) media processing platform MPP initialization

Initialization of the video input VI, the image video processing subsystem VPSS and the video coding compression module VENC:

video input VI configuration: the number of video inputs in the working state is 1;

the number of an infrared video image video input module MipidDev is 0;

the number of a visible light video image video input module MipidDev is 4;

the resolution ratio of the infrared video image is 1280x1024, and the frame rate is 30;

the resolution of a visible light video image is 2592x1944, and the frame rate is 25;

the number ViDev of the infrared video image video input device is 0, and the number ViPIPE of the pipeline is 0;

the number ViDev of the visible light video image video input device is 1, and the number ViPIPE of the pipeline is 0;

the Group number VPSS _ GRP of the infrared video image video processing subsystem is 0, and the physical channel number VPSS _ CHN is 0;

the Group number VPSS _ GRP of the visible light video image video input system is 1, and the physical channel number VPSS _ CHN is 0;

the pixel FORMAT of a VPSS physical channel of the infrared VIDEO image VIDEO processing subsystem is YUV420SP, the DYNAMIC RANGE is set as DYNAMIC _ RANGE _ SDR8, the VIDEO FORMAT is VIDEO _ FORMAT _ LINEAR, and the VIDEO compression MODE is COMPRESS _ MODE _ SEG;

the VPSS physical channel pixel FORMAT of the visible light VIDEO image VIDEO input system is YUV420SP, the DYNAMIC RANGE is set as DYNAMIC _ RANGE _ SDR8, the VIDEO FORMAT is VIDEO _ FORMAT _ LINEAR, and the VIDEO compression MODE is COMPRESS _ MODE _ SEG;

the method comprises the following steps of applying for a public video image cache block of an infrared video image: the number of the cache pools is 1, the number of the cache blocks in the cache pools is 15, the width of the cache blocks is 1280, the height of the cache blocks is 1024, the pixel format of the cache blocks is YUV422SP, the bit width is 10, the compression MODE is COMPRESS _ MODE _ NONE, and the alignment MODE is the default 32-bit alignment;

applying for a public video image cache block of a visible light video image: the number of the cache pools is 1, the number of the cache blocks in the cache pools is 15, the width of the cache blocks is 2592, the height right of the cache blocks is 1944, the pixel format of the cache blocks is YUV422SP, the bit width is 10, the compression MODE is COMPRESS _ MODE _ NONE, and the alignment MODE is the default 32-bit alignment;

establishing a binding relationship between an infrared video image video input module VI and a corresponding video processing subsystem VPSS;

establishing a binding relationship between a visible light video image video input module VI and a corresponding video processing subsystem VPSS;

the compression type of the infrared video image coding compression is H265, the coding grade is MainProfile, and the compression code stream mode is a frame mode (a frame mode and a stream mode);

the compression type of the optical video image coding compression is H265, the coding grade is MainProfile, and the compression code stream mode is a frame mode (a frame mode and a stream mode);

the width of an input image of infrared video image coding compression is 1280, the height of the input image is 1024, the width of an output image is 1280, the height of the output image is 1024, and the size of a frame buffer is 1280x1024x 2;

the optical video image encoding compression can be performed with an input image width of 2592, an input image height of 1944, an output image width of 2592, an output image height of 1944, and a frame buffer size of 2592x1944x 2;

the code rate control mode of infrared video image coding compression is a fixed code rate CBR, the length of a group of pictures GOP is 30, 1 frame I frame per second is used as a reference frame, the code rate statistical time is 1 second, the source frame rate and the target frame rate are both 30, and the code rate is 5M;

the code rate control mode of the optical video image coding compression is a fixed code rate CBR, the length of a group of pictures GOP is 25, 1 frame I frame per second is used as a reference frame, the code rate statistical time is 1 second, the source frame rate and the target frame rate are both 25, and the code rate is 5M;

establishing a binding relationship between an infrared video image video processing subsystem VPSS and a corresponding video coding compression module VENC;

establishing a binding relationship between a VPSS (virtual video service system) and a corresponding video coding compression module (VENC);

initialization of the video decoding module VDEC and the image video processing subsystem VPSS:

the method comprises the following steps of applying for a public video image cache block of an infrared video image: the number of the cache pools is 1, the number of the cache blocks in the cache pools is 10, the width of the cache blocks is 1280, the height of the cache blocks is 1024, the pixel format of the cache blocks is YUV420SP, the bit width is 10, the compression MODE is COMPRESS _ MODE _ SEG, and the alignment MODE is 0-bit alignment;

applying for a public video image cache block of a visible light video image: the number of the cache pools is 1, the number of the cache blocks in the cache pools is 10, the width of the cache blocks is 2592, the height of the cache blocks is 1944, the pixel format of the cache blocks is YUV420SP, the bit width is 10, the compression MODE is COMPRESS _ MODE _ SEG, and the alignment MODE is 0-bit alignment;

the compression type of infrared VIDEO image decoding is H265, the width of a decoded image is 1280, the height of the decoded image is 1024, the code stream MODE is a frame MODE (frame MODE and stream MODE), the VIDEO decoding MODE is VIDEO _ DEC _ MODE _ IP, the number of reference frames is 3, the number of display frames is 2, and the number of frame caches is 6;

the compression type of the optical VIDEO image can be decoded is H265, the width of the decoded image is 2592, the height of the decoded image is 1944, the code stream MODE is frame MODE (frame MODE and stream MODE), the VIDEO decoding MODE is VIDEO _ DEC _ MODE _ IP, the number of reference frames is 3, the number of display frames is 2, and the number of frame buffers is 6;

in the Group attribute of the infrared video image video processing subsystem VPSS, the image width is 1280, the image height is 1024, the DYNAMIC RANGE is DYNAMIC _ RANGE _ HDR10, and the pixel format is YUV420 SP;

in the Group attribute of the visible light video image video input system VPSS, the image width is 2592, the image height is 1944, the DYNAMIC RANGE is DYNAMIC _ RANGE _ HDR10, and the pixel format is YUV420 SP;

the system physical channel output image width of the infrared VIDEO image VIDEO processing sub VPSS is 1280, the image height is 1024, the working MODE is an automatic MODE, the compression MODE is COMPRESS _ MODE _ SEG, the DYNAMIC RANGE is DYNAMIC _ RANGE _ HDR10, the pixel FORMAT is YUV420SP, the amplitude type is a no-amplitude type (failure type, automatic MODE and manual MODE), and the VIDEO FORMAT is VIDEO _ FORMAT _ LINEAR;

the system physical channel output image width of the visible light VIDEO image VIDEO processing sub VPSS is 1280, the image height is 1024, the working MODE is automatic MODE, the compression MODE is COMPRESS _ MODE _ SEG, the DYNAMIC RANGE is DYNAMIC _ RANGE _ HDR10, the pixel FORMAT is YUV420SP, the amplitude ratio type is no amplitude ratio (failure MODE, automatic MODE and manual MODE), and the VIDEO FORMAT is VIDEO _ FORMAT _ LINEAR;

establishing a binding relationship between an infrared video decoding module VEDC and a video processing subsystem VPSS;

establishing a binding relationship between a visible light video decoding module VEDC and a video processing subsystem VPSS;

initialization of the image video output module VO:

since the resolution of the OLED microdisplay is 1280x 1024; the interface time sequences of a video output module VO for decoding the infrared video image and the visible light video image are 1280x1024, and the frame rates are both 30;

if the external high-definition liquid crystal display is connected, the interface time sequences of the infrared video image decoding and the visible light video image video output VO are 1280x1024, and the frame rates are 30;

the VO device of the infrared video image decoding and visible light video image output module is a high-definition device DHD1 (the ultra-high-definition device DHD0 does not support an LCD interface);

temporarily aiming at the design of HDMI interface video output, the interface types of a video output module VO of infrared video image decoding and visible light video images are HDMI interfaces;

the background COLOR of the display device of the video output module VO for infrared video image decoding and visible light video image is COLOR _ RGB _ BLUE, the DYNAMIC RANGE is DYNAMIC _ RANGE _ HDR10, the physical channel MODE is VO _ MODE _1MUX, the pixel format is YUV420SP, and the image resolution and the display resolution are 1280x 1024; the default of the initial partition MODEs configured in the video layer is VO _ PART _ MODE _ SINGLE;

fire control output GPIO port initialization

Initializing a GPIO port of the fire control output to work at a low level; the method comprises the following specific steps:

opening the GPIO device, wherein the file of the GPIO device is "/dev/gpiochipX", and X is 0-14;

setting a GPIO pin as an output direction;

reading and writing GPIO pin level state as low level;

LSADC initialization for battery voltage detection

The LSADC (Low Speed ADC) converts an external analog signal into a digital value with a certain proportion, thereby realizing the measurement of the analog signal, and being applied to electric quantity detection, key detection and the like. The chip provides 1 LSADC, 4 independent channels;

sampling precision setting

The sampling accuracy can be set by LSADC _ CTRL9[9:0], and the corresponding sampling accuracy can be set according to application requirements.

When the sampling precision is set to 10 bits, the sampling results of 10 bits are all valid.

When the sampling precision setting is less than 10 bits, the corresponding sampling result is high-bit effective, for example: when the sampling precision is set to 8 bits, the upper 8 bits of the sampling result are valid.

A continuous scanning process, the flow of which is shown in FIG. 13,

in the continuous reading mode (LSADC _ CTRL0[ model _ sel ] ═ 1), the CPU sets the time interval Tscan, the spur width (tglit), and the effective channel number (ch _ vld) of continuous scanning according to the application scenario, and starts LSADC. LSADC completes a scan of an active channel (configure whether the LSADC _ CTRL0 channel is active or not to indicate that the bit is active) within a time interval Tscan. When the next scanning moment comes, the scanning of the next effective channel is started. And after finishing scanning all the effective channels, starting the next wheel to scan the effective channels.

Power-on self-test

After the initialization work is completed, each module in the image detection tracking processing module is subjected to self-checking, and the self-checking items are as follows:

the mounting condition of the SD card is checked, and whether the default state of the SD initialization is correct or not is checked;

checking whether a pin of a fire control GPIO port is set to be in an output direction or not and whether a default output level is a default low level or not;

checking whether a pin of a GPIO port of a power control port is set to be in an output direction, turning on a power supply of the infrared video input device by default, and turning off the power supply of the visible video input device by default;

performing primary battery voltage detection through an LSADC battery voltage detection function, and judging the normality of the function and the normality of the battery voltage;

fourthly, system task scheduling process

Overview of the function

The system receives and collects instructions, schedules and executes the instructions, completes the functions of target detection, identification and tracking measurement, and controls the automatic firing of the firearm through intelligent calculation.

Function implementation

The task scheduling software flowchart is shown in fig. 14:

powering up an aiming tool, starting up the aiming tool for self-test, and initializing a system;

determining a working scene according to keyboard input, and powering on the infrared working scene by default;

carrying out related image acquisition and image preprocessing (including histogram equalization, image contrast stretching, brightness adjustment, image denoising, image defogging and other operations) according to the determined working scene;

receiving a keyboard command, determining whether a deep learning target detection function is started, if the target detection is started, respectively detecting the required targets such as personnel, vehicles, targets and the like by a deep learning detection algorithm according to the target type requirement, and marking a target outer frame and a target type;

receiving a keyboard command to determine whether target marking is required, and if the target marking is required, starting a target tracking program to perform robust tracking on the target;

in the tracking process, continuously detecting an attack division point and an attack command of the fire control system transmitted by a serial port, starting to continuously calculate the deviation between a target position and an attack division center after confirming that attack confirmation information is received, and automatically sending a firing command to the fire control system when confirming that the deviation is smaller than a given threshold so as to realize the automatic firing of the fire control system;

in the whole process, the whole scheduling control process can be recorded in a whole-course video mode according to the key command, so that the post analysis and the upgrading improvement can be conveniently carried out.

Algorithms used by the system

The invention provides a target detection algorithm based on deep learning, a robust target tracking algorithm and an intelligent triggering model algorithm, which comprise the following contents:

deep learning target detection algorithm and software compiling and downloading process

The smart gun sight is mainly applied to complex environments, and is particularly applied to detection of small targets. Compared with a large target, the small target has low detection rate and high false alarm rate due to less pixels and unobvious features of the small target. The traditional pattern recognition image target detection algorithm cannot meet the application, so the invention provides an intelligent target detection algorithm based on a YOLO-3 deep learning network model, and in order to accelerate the running speed of the intelligent target detection algorithm, the invention carries out pruning processing on the original YOLO-3, thereby reducing parameters, improving the detection probability and accelerating the calculation speed.

The YOLO-3 adopts a classification network structure of Darknet-53 to extract target features, simultaneously adopts a Pass through structure of YOLO-2 to detect fine-grained features, and further adopts three feature graphs with different scales of large, medium and small to realize reliable and rapid detection of the target.

The idea of the YOLO series algorithm is as follows: firstly, extracting features of an input image through a feature extraction network to obtain a feature map (such as 13 × 13) with a certain size, then, dividing the input image into 13 × 13 cells, then, judging the cell in which the center coordinate of a certain target in the group route falls, then, predicting the target through the cell, wherein each cell can predict a fixed number of boundary frames, and only the boundary frame with the largest IOU (input object) in the group route in the boundary frames can be selected to predict the target.

The predicted output feature map has two dimensions of extracted features, one of which is a plane, e.g., 13x13, and the other of which is a depth, e.g., B (5+ C), where B represents the number of predicted bounding boxes per cell, C represents the corresponding number of classes of bounding boxes (20 for VOC data sets), and 5 represents 4 coordinate information and 1 bounding box confidence Score (Objectness Score).

The model of the YOLO-3 is more or less complex than the prior model, the improvement on the speed and the precision is very obvious, and the speed and the precision can be balanced by changing the structure of the model and a pruning and cutting algorithm;

the structure of YOLO-3 is shown in FIG. 15, in which:

DBL is short for Darknetconv2d _ BN _ Leaky, is the basic component of YOLO-3, and has the composition of convolution + BN + Leaky relu. For YOLO-3, BN and leak relu are already inseparable parts of the convolutional layer (except for the last layer of convolution), together constituting the smallest component.

And (2) resn: n represents a number, res1, res2, …, res8, etc., indicating how many res _ units are contained in this res _ block, which is a large component of YOLO-3. YoLO-3 started to model the residual structure of ResNet, and using this structure allows deeper network structures (rising from the dark net-19 of YoLO-2 to the dark net-53 of YoLO-3). For the interpretation of res _ block, it can be seen visually in the lower right-hand corner of fig. 12, whose basic component is also DBL.

concat: and (5) tensor splicing. The upsampling of the middle layer and the later layer of the darknet are spliced. The operation of splicing is different from that of the residual layer add, splicing expands the dimensionality of the tensor, and adding add directly does not result in a change in the tensor dimensionality.

The entire YOLO-3_ body includes 252 layers, including 23 add layers (mainly for the construction of res _ block, one add layer is needed for each res _ unit, and 1+2+8+8+4 is 23 layers in total). Besides, the BN layer number and the LeakyReLU layer number are identical (72 layers), and the expression in the network structure is as follows: each layer BN is followed by a layer of LeakyReLU. The convolutional layers have 75 layers, of which 72 are followed by a combination of BN + LeakyReLU to form the basic module DBL. Looking at the structure diagram, it can be found that there are 2 times for both upsampling and concat, corresponding to the table analysis. Each res _ block is filled with the last zero, for a total of 5 res _ blocks.

Backbone network structure

A general detection task model has a classification network (model) as a backbone network, for example, Faster R-CNN uses VGG as the backbone network, and YOLO-2 uses classification network darknet-19 as the backbone network; the YOLO-3 selects darknet-53 as a backbone network; FIG. 16 shows a structure of the darknet-53 network;

inside the YOLO-3 structure, there is no pooling layer and no full connectivity layer. In the forward propagation process, the size of the tensor is transformed by changing the step size of the convolution kernel, for example, stride (2,2), which is equivalent to reducing the side length of the image by half (i.e., reducing the area to 1/4). Going through 5 reductions in YOLO-2 would reduce the feature map to 1/32, the original input size. The input is 416x416 and the output is 13x13(416/32 ═ 13). YOLO-3, like YOLO-2, also reduces the output profile to 1/32 for input, so it is usually required that the input picture size is a multiple of 32.

Darknet-53 is a feature extraction network, YOLO-3 uses convolution layers (53 in total, located before each Res layer) therein to extract features, a multi-scale feature fusion and detection branch is not embodied in the network structure, the detection branch adopts a full convolution structure, wherein the number of convolution kernels of the last convolution layer is 255, and the detection branch is directed at 80 classes of COCO data sets: 3 × (80+4+1) ═ 255.

YOLO-3 also adopts a Multi-Scale Training (Multi-Scale Training) method in YOLO-2 in the aspect of network Training. While still employing a series of 3x 3, 1x1 convolutions, the 3x 3 convolution is responsible for increasing the number of signature Channels (Channels), and the 1x1 convolution is responsible for compressing the signature representation after the 3x 3 convolution.

(1) And (3) rolling layers:

the YOLO-3 network has 416 × 416 pixels as input and 3 channels (the Random parameter can adapt to the 32-based change when set to 1). Each convolutional layer performs a BN operation on the input data. Each convolutional layer convolution uses 32 convolution kernels, each convolution kernel being 3x 3 in size and 1 in steps.

(2) Res layer:

five Res layers with different scales and depths are selected in total and only carry out residual error solving operation among different layer outputs.

(3) Darknet-53 Structure:

from layer 0to 74, there are a total of 53 convolutional layers, the remainder being Res layers. As the primary network structure for the extraction of features by YOLO-3, Darknet uses a series of convolutional layers of convolution of 3x 3 and 1x1 (these convolutional layers are obtained by selecting and integrating convolutional layers with better performance from each primary network structure.)

(4) YOLO layer (corresponding to the Region layer in-2):

from 75 layers to 105 layers, the feature fusion layer of the YOLO-3 network is divided into three scales (13 × 13, 26 × 26 and 52 × 52), feature maps of different scales are stacked at each scale, and then local feature fusion between the feature maps of different scales is realized in a convolution kernel mode (3 × 3 and 1 × 1), (in YOLO-2, FC layers are used for realizing global feature fusion). The final output is the feature map is a tensor of depth 75 (3 x (1+4+20) 75), where 20 is the number of classes of the VOC data set.

The SoC adopted by the invention mainly adopts a deep learning framework Caffe, and the development flow of the neural network acceleration engine NNIE is shown in fig. 17. Compilation using the compilation tool of NNIE trained on Caffe is offline. By setting different modes, the compiler compiles the cafemodel into a data instruction file which can be loaded and executed on the simulator, the simulation library or the board end. In the early stage of model development, a simulator can be used for carrying out primary evaluation on the precision, performance and bandwidth of the trained model, a simulation library is used for carrying out complete function simulation after the model is in line with the user expectation, and finally the program is transplanted to a board end.

Calling NNIE hardware of a neural network acceleration engine in a Haisi chip, wherein a Caffe model is required to be compiled to generate a xwk data file, and the files to be prepared comprise a label file label.txt, data set files images, a model file GraphDF and a picture preprocessing file MeanFile;

the specific steps of compiling and generating the wk data file are as follows:

the first step is as follows: and (3) training a network model off line at a PC (personal computer) end, inputting a data set file images into a model file GraphDF, comparing through a label file label. And finally, obtaining a trained model file and a weight parameter file.

The second step is that: and importing the two trained files into RuyiStudio software provided by Haisi, and converting model file model files into wk data files which can be identified by a Haisi software platform by using a file format conversion function of a plug-in mapper.

The third step: and deploying the wk data file to a target recognition and tracking system board for operation.

Image tracking algorithm and software flow

In order to improve the shooting precision of firearms, the invention selects the KCF algorithm with outstanding processing speed and precision. The algorithm principle is as follows:

determining the size of a target frame, constructing and collecting positive and negative samples by using a circulation matrix of an area 2.5 times the size of the periphery of the target frame, and training a target classifier by using a ridge regression model. The samples are skillfully transformed, so that the training data matrix has a cyclic characteristic, and the diagonalization of discrete Fourier transform is realized, thereby greatly reducing the calculated amount and the storage amount and achieving the purpose of quickly and effectively positioning and tracking the target.

Since the correlation operation in the spatial domain is changed to the multiplication operation in the frequency domain by using the fast fourier transform, the operation speed can be greatly increased. The transformation principle is as follows:

let x and x' denote the image signal, the phase-dependent response is:

wherein F (x) and F (x ') are 2-dimensional Fourier transforms of x and x ', respectively, F (x ')^*Is the complex number of F (x'), and < is the phase,

is a dot product. The 2-dimensional phase correlation function R of x and x' is the inverse fourier transform of R. When the two image signals are similar, the phase correlation function is a standard delta function; when the two image signals are not similar, the peak-down tendency is irregular. The height of the peak indicates the degree of similarity of the image matches and the position of the peak point indicates the displacement between the two images.

The task of the tracking process is to estimate the target state of the current frame based on the target state of the previous frame. Spatial phase correlation is used to mark objects in the current frame image. If the target spatial displacement is (Δ m, Δ n), the current frame image x can be written as:

x(m′,n′)＝y(m+Δm,n+Δn)

wherein y is the target template. The spatial shift (Δ m, Δ n) will change the frequency domain phase spectrum of the current frame image x:

as shown in fig. 18.

Firstly, inputting an original image, and converting the original image into a frequency domain through a Fast Fourier Transform (FFT); then, the phase correlation filter is converted into a frequency domain in the same way; and finally, carrying out phase correlation operation on the original signal in a frequency domain to obtain a response function, and then transforming the frequency domain response function to a space domain through IFFT to obtain a space domain response function. The peak point of the spatial response function is the current frame target position.

The implementation flow of the KCF algorithm is as follows:

in It frame, samples are taken around the current position pt, training a regressor. The regressor can calculate the response of a small window sample.

In the It +1 frame, samples are taken around the position pt of the previous frame, and the response of each sample is judged by the aforementioned regressor.

The sample with the strongest response is taken as the current frame position pt + 1.

Intelligent percussion model and software process

The intelligent triggering model is mainly used for establishing speed, direction and direct distance functions of a target aiming point and an attack differentiation central point, calculating and estimating the meeting time of a target and the attack differentiation central point according to a target motion model respectively, and determining the sending time of a triggering signal.

The basic steps are as follows:

calculating a target track change model according to target frame-by-frame position information provided by a robust tracking algorithm, and predicting target speed, acceleration and a motion direction;

according to whether the coordinate deviation of the position of the attacking differentiation center and the predicted position of the next frame of the target is smaller than a given value or not; if the value is less than the preset value, generating and sending a firing signal to the fire control system; if not, the tracking calculation continues.

The invention effectively solves the problems of high power consumption, short working time and no intelligent image processing function of the traditional gun aiming image processing chip, can intelligently detect, identify and track the target, effectively improves the intelligent level of the sniper rifle aiming device, and is convenient for popularization and use.

And fourthly, experimental effect.

1) The power on self test screen of the present invention is shown in fig. 19.

2) Fig. 20 shows an image display screen according to the present invention.

3) The deep learning target detection screen of the present invention is shown in fig. 21.

4) The object marker screen of the present invention is shown in fig. 22.

5) The attack screen of the present invention is shown in fig. 23.

In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A low-power consumption intelligent gun aiming image processing system based on deep learning is characterized in that the low-power consumption intelligent gun aiming image processing system based on deep learning comprises:

2. The deep learning based low power consumption smart gun sight image processing system of claim 1, wherein the acquisition interface module comprises: the system comprises a serial communication signal interface of a gun aiming fire control system, an image storage and recording interface of a gun aiming device in the searching and attacking process, an image output interface of the gun aiming device, a debugging interface, a power supply management unit and an inter-board connecting unit.

3. The deep learning based low power consumption smart gun sight image processing system of claim 1, wherein the acquisition interface module comprises visible light and infrared video input acquisition interfaces;

the infrared video input is HDMI type C, converted into BT1120 through an HDMI decoding chip, and sent into an MIPI channel;

4. The deep learning-based low-power intelligent gun sight image processing system as claimed in claim 1, wherein the core processing module comprises a low-power SOC selection module, a flash unit, a DDR unit, a clock unit and an inter-board connection unit.

5. The deep learning based low power consumption smart gun sight image processing system as claimed in claim 1, wherein the algorithm software module is used for UI interface adjustment, novel target detection, identification and tracking, forming detection, identification and tracking measurements suitable for application of a desired class of targets, comprising: the system comprises a bottom driving software module, a configuration management software module, an intelligent target detection module, an identification algorithm design software module, a target tracking algorithm software module, a UI (user interface) editing software module and a task scheduling software module.

6. A low-power-consumption intelligent gun sight image processing method based on deep learning, which applies the low-power-consumption intelligent gun sight image processing system based on deep learning according to any one of claims 1 to 5, is characterized by comprising the following steps: the target detection, tracking and automatic firing of the intelligent gun sight are carried out by adopting a deep learning algorithm and a low-power-consumption processor, so that the control of the gun sight is realized.

7. A method for constructing a low-power-consumption intelligent gun aiming image processing system based on deep learning is characterized by comprising the following steps of:

8. A program storage medium that receives user input, the stored computer program causing an electronic device to perform the deep learning based low power consumption smart gun sight image processing method of claim 6.

9. A computer program product stored on a computer readable medium, comprising a computer readable program that, when executed on an electronic device, provides a user input interface to implement the deep learning based low power smart gun aim image processing method of claim 6.

10. A terminal is characterized in that the terminal is provided with the low-power-consumption intelligent gun aiming image processing system based on deep learning of any one of claims 1-5.