CN109447101B

CN109447101B - Vehicle position identification method and device based on deep CNN and storage medium

Info

Publication number: CN109447101B
Application number: CN201811024682.4A
Authority: CN
Inventors: 王玄音; 宋振旗
Original assignee: Beijing Yuetu Data Technology Development Co Ltd
Current assignee: Beijing Yuetu Data Technology Development Co Ltd
Priority date: 2018-09-04
Filing date: 2018-09-04
Publication date: 2021-07-09
Anticipated expiration: 2038-09-04
Also published as: CN109447101A

Abstract

The embodiment of the invention provides a vehicle position identification method based on a depth CNN, which comprises the following steps: training a feature extraction network by adopting a sample data set, and upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer; taking the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer, and detecting and identifying the position of a vehicle; and the feature extraction network with the upsampled convolutional layer and the target detection network jointly form the deep CNN. The embodiment of the invention also provides electronic equipment and a non-transitory computer readable storage medium, which are used for realizing the method. The invention can effectively realize the accurate and small identification of the vehicle position shot by the unmanned aerial vehicle in the high-speed flight state of the unmanned aerial vehicle.

Description

Vehicle position identification method and device based on deep CNN and storage medium

Technical Field

The embodiment of the invention relates to the technical field of pattern recognition, in particular to a vehicle position recognition method and device based on deep CNN and a storage medium.

Background

Target detection is an important research field of computer vision, and with the development of unmanned aerial vehicle technology, especially the use in military and security, vehicle detection and identification aiming at aerial images of unmanned aerial vehicles have urgent application requirements. And in the unmanned aerial vehicle scene of taking photo by plane, background environment is complicated, the field of view changes acutely, the interference thing is more to along with the unmanned aerial vehicle flying height, the change of angle of pitch. The size and the shooting angle of the vehicle also vary greatly, which causes great difficulty in detecting the target.

In a traditional vehicle detection algorithm, firstly, a sample classifier is constructed: the method comprises the steps of manually setting features, selecting a large number of positive samples and negative samples to carry out feature extraction, and obtaining a feature-based sample classifier by using machine learning algorithms such as a Support Vector Machine (SVM) or adaboost. Then the detection of the target: and traversing the image by using a fixed-size window in a sliding window mode, and classifying each window by using a classifier obtained in training so as to find a detected target. In addition, there are an optical flow method and an inter-frame difference method, which realize the detection of a target by analyzing a vector field of a moving target in an image. The interframe difference rule uses interframe difference to extract the background under the condition that the background is static, thereby realizing the detection of the target. Both methods are used for detecting a moving target through background modeling and parallax, and have poor detection effect on large scene changes. With the development of deep learning, the convolutional neural network has been widely applied in image classification and target detection at present, and the precision of the convolutional neural network is generally superior to that of the traditional algorithm. The network structure of a common target detection algorithm based on deep learning mainly comprises convolution layers, a feature map of an image is obtained through a convolution network of a certain layer, and then classification is carried out on the feature map through methods such as sliding window scanning or candidate region generation and the like, so that target detection is realized.

The traditional technology mainly aims at the detection of vehicles with low visual angles such as parking lots and road monitoring, and the detection effect of the traditional technology on man-machine aerial vehicle scenes is not ideal. Firstly, in the unmanned aerial vehicle scene of taking photo by plane, the gesture of vehicle is various, and the shooting angle to the car also changes greatly, and it is used for the classification to the car to hardly select suitable characteristic. Secondly, when the unmanned aerial vehicle is higher, the vehicle target is smaller, the target pixel size is smaller, only edge information and a small amount of even missing texture information exist, and great challenges are brought to target detection. Thirdly, the flying speed of the unmanned aerial vehicle is fast, so that the background change is large during shooting, and the background modeling cannot be performed. Finally, although neural networks have good results in the detection of large targets, the small targets are pooled or down-sampled several times, and much information is lost, making detection on the final feature map difficult. The method is difficult to adapt to multi-scale and multi-angle dynamic change identification of the vehicle target in an aerial photography scene, and the precision is generally low. Therefore, finding a method for accurately and slightly identifying a target position (such as a vehicle position) obtained by aerial shooting of an unmanned aerial vehicle in a situation where a camera device moves rapidly, for example, in a high-speed flight state of the unmanned aerial vehicle, is a problem that is of great concern in the industry.

Disclosure of Invention

In view of the foregoing problems in the prior art, embodiments of the present invention provide a vehicle position identification method and apparatus based on a deep CNN, and a storage medium.

According to a first aspect of embodiments of the present invention, there is provided a vehicle position identification method based on a depth CNN, including:

training a feature extraction network by adopting a sample data set, and upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer; taking the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer, and detecting and identifying the position of a vehicle; wherein the feature extraction network with the upsampled convolutional layer and the target detection network together form a deep CNN (i.e., a deep convolutional neural network).

According to the method provided by the embodiment of the invention, the last convolutional layer of the network is subjected to upsampling when the characteristic extraction network is trained to obtain the characteristic extraction network with the upsampled convolutional layer, and then the characteristic extraction network is matched with the target detection layer to form the deep convolutional neural network, so that the accurate and small micro-recognition of the vehicle position shot by the unmanned aerial vehicle can be effectively realized in the high-speed flight state of the unmanned aerial vehicle.

According to a second aspect of the embodiments of the present invention, there is provided a vehicle position recognition apparatus based on a depth CNN, including:

the characteristic extraction network acquisition module is used for training a characteristic extraction network by adopting a sample data set, and upsampling the last convolutional layer of the characteristic extraction network to obtain a characteristic extraction network with the upsampled convolutional layer;

the vehicle position identification module is used for taking the feature extraction network with the upsampling convolutional layer as a front network of a target detection layer and detecting and identifying the position of a vehicle;

the feature extraction network with the upsampled convolutional layer and the target detection network jointly form a deep CNN.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor to invoke the program instructions to perform the method for deep CNN based vehicle location identification provided by any of the various possible implementations of the first aspect.

According to a fourth aspect of embodiments of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for deep CNN-based vehicle location identification provided in any one of the various possible implementations of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart of a vehicle location identification method based on a depth CNN according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a target detection network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a vehicle position recognition apparatus based on a deep CNN in an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 5 is a schematic diagram of the effect of recognizing the vehicle position in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Currently, the position detection of targets is an important research field of computer vision, and with the development of unmanned aerial vehicle technology and the use in military and security, vehicle detection and identification for unmanned aerial vehicle aerial images have urgent application requirements. And in the unmanned aerial vehicle scene of taking photo by plane, background environment is complicated, the field of view changes acutely, the interference thing is more to along with the unmanned aerial vehicle flying height, the change of angle of pitch. The size and the shooting angle of the vehicle also vary greatly, which causes great difficulty in detecting the target.

In a traditional vehicle detection algorithm, firstly, a sample classifier is constructed: the features are manually set, a large number of positive samples and negative samples are selected for feature extraction, and the existing machine learning algorithm is used for obtaining the feature-based sample classifier. Then the detection of the target: and traversing the image by using a fixed-size window in a sliding window mode, and classifying each window by using a classifier obtained in training so as to find a detected target. In addition, an optical flow method and an inter-frame difference method are available. Both methods are used for detecting a moving target through background modeling and parallax, and have poor detection effect on large scene changes. With the development of deep learning, the convolutional neural network has been widely applied in image classification and target detection at present, and the precision of the convolutional neural network is generally superior to that of the traditional algorithm. The network structure of a common target detection algorithm based on deep learning mainly comprises convolution layers, a feature map of an image is obtained through a convolution network of a certain layer, and then classification is carried out on the feature map through methods such as sliding window scanning or candidate region generation and the like, so that target detection is realized.

The traditional technology is mainly aimed at low-visual-angle scenes such as vehicle detection of parking lots. The detection effect for man-machine aerial vehicle scenes is not ideal. Firstly, in the unmanned aerial vehicle scene of taking photo by plane, the gesture of vehicle is various, and the shooting angle to the car also changes greatly, and it is used for the classification to the car to hardly select suitable characteristic. Secondly, when the unmanned aerial vehicle is higher, the vehicle target is smaller, the target pixel size is smaller, only edge information and a small amount of even missing texture information exist, and great challenges are brought to target detection. Thirdly, the flying speed of the unmanned aerial vehicle is fast, so that the background change is large during shooting, and the background modeling cannot be performed. Finally, although neural networks have good results in the detection of large targets, the small targets are pooled or down-sampled several times, and much information is lost, making detection on the final feature map difficult. The method is difficult to adapt to multi-scale and multi-angle dynamic change identification of the vehicle target in an aerial photography scene, and the precision is generally low. Therefore, the focus of research in the industry is how to find a method for accurately and slightly identifying a target position (such as a vehicle position) photographed by an unmanned aerial vehicle under the condition that a camera device moves rapidly, for example, the unmanned aerial vehicle flies at a high speed.

In order to effectively explain how the technical solution of the present invention solves the foregoing technical problem, an embodiment of the present invention provides a vehicle position identification method based on a depth CNN, and referring to fig. 1, the method includes: 101. training a feature extraction network by adopting a sample data set, and upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer; 102. and detecting and identifying the position of the vehicle by taking the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer. And the feature extraction network with the upsampled convolutional layer and the target detection network jointly form the deep CNN.

In step 101, since the present invention is primarily directed to vehicle detection of an unmanned aerial vehicle aerial scene, a conventional vehicle detection data set cannot be used for the use of the scene. The sample data set adopted by the invention mainly has three sources: 1. the height of the unmanned aerial vehicle is required to be changed greatly in the shooting process, so that the vehicle samples shot at various angles can be obtained, the richness of the samples is guaranteed, and under-fitting is prevented. 2. Through network acquisition, many movies and network videos acquire some aerial pictures. 3. Remote sensing image data sets, such as satellite remote sensing images like VEDAI, are mainly used for supplementing data of small samples. In the deep learning, the learning ability is better when the number of samples is larger, so that some image processing methods need to be used to expand the data set. And expanding the data set, cutting N samples on each sample in a random cutting mode, and then carrying out operations such as mirror image turning, rotation, brightness change and the like on all the samples. The number of vehicles in each sample needs to be counted, the position information of the vehicles in the sample is marked by using a rectangular frame, the coordinates and the width and the height of the center point of each rectangular frame are stored, each sample independently generates a label file, and the file names correspond to the sample names one to one.

For the training of the feature extraction network, firstly, a deep convolutional neural network framework is used for training to obtain the feature extraction network, and the main function of the network is to obtain the image features with high robustness of the vehicle by using the deep neural network.

In order to train the feature extraction network, positive samples for training need to be obtained from the sample data set, that is, vehicle samples are cut according to the marked rectangular box. In addition, a sample containing no vehicle is used as a negative sample, thereby forming a two-class network. The network consists of a plurality of layers of convolutional networks, the layers are sequentially connected, initial parameters of each layer are initialized by using normally distributed random numbers, and softmax is used as a target function of the network.

The network consists of 26 convolutional layers, and in order to prevent the gradient disappearance problem in the deep network, a residual layer is added in the network. In addition, in order to improve the convergence speed and training time of the network, a regularization layer and a 1 x 1 convolution layer are also used in the network. The input size of the net is 512 x 512, so all input samples need to be normalized to 512 pixels x 512 pixels size first. After multiple times of downsampling, the size of the finally obtained convolution layer is 16 pixels by 16 pixels. Then the softmax layer follows, and the final classification problem is realized. And training to obtain a network framework with the classification accuracy of more than 95%, and removing a final classification layer. Only the first 26 convolutional layers are used as feature extraction networks.

In step 102, the feature extraction network with the upsampled convolutional layer is upsampled before being used as a pre-network for the target detection layer. In the technical scheme of the invention, the target detection mainly aims at the detection of one category, namely vehicles, so that the targets are not required to be classified in a target detection network. Therefore, the prediction of the vehicle position is mainly realized, and the vehicle position mainly has four parameters to represent { x, y, w, h }, which respectively represent the center point coordinate (x, y) and the width and the height (w, h) of the target. Prediction of the target location is achieved using logistic regression as the objective function.

Since the targets in the unmanned aerial vehicle aerial photography are small, in the convolutional layer with the size of 13 × 13, many small target features are no longer obvious or even disappear, so the target detection network of the invention firstly performs upsampling on the feature extraction network to obtain 32 × 32 convolutional layers. In order to improve the robustness of the target feature, the target feature is added with the last 32 × 32 convolutional layer in the feature extraction network to form a new convolutional layer. After the features are fused using several convolutional layers, the detection of the vehicle target is performed using the target detection layer.

Most vehicle targets can be extracted and detected through the target detection layer, but some smaller targets, especially vehicles on remote sensing images, cannot be detected. To address the detection of such very small targets, another dimension of target detection is added. Firstly, the last convolution layer of the last detection layer is up-sampled to obtain a convolution layer with the size of 64 x 64, and a new network layer is obtained by combining the last convolution layer with the size of 64 x 64 of the feature extraction layer. After the characteristics of the convolution layers are fused, the target detection layer is used for detecting the minimum vehicle target.

The whole network comprises a feature extraction network and a target detection layer which are 73 layers in total, so that an end-to-end network for vehicle detection is realized, namely, an image to be detected is input, and the quantity and position information of vehicles in a sample image can be obtained through the network.

The sample data set is crucial to training the feature extraction network in the above embodiments, and is directly related to the attributes and training results of the feature extraction network. It should be noted that while the sample data set is chosen primarily for the type of target that needs to be identified, its source may be multi-channel. Therefore, based on the content of the foregoing embodiments, as an alternative embodiment, the source of the foregoing sample data set may include: the unmanned aerial vehicle shoots the obtained picture, the picture obtained on the network and the remote sensing image data set. Wherein, unmanned aerial vehicle requires unmanned aerial vehicle's altitude variation great at the in-process of shooing to can acquire the vehicle sample that each angle was shot, thereby guarantee the richness of sample, prevent the appearance of oweing the fit. The pictures acquired through the network can specifically acquire some aerial pictures for many movies and videos. The remote sensing image data set comprises satellite remote sensing images such as VEDAI and the like, and is mainly used for supplementing data of small samples.

In the embodiments of the present invention, the feature extraction network with the upsampled convolutional layer is the technical core for realizing the essential spirit of the present invention and forming various embodiments of the present invention. The depth convolution neural network in each embodiment of the invention can be formed just by the upsampled convolution layer, so that the position of the small and micro vehicle can be accurately and effectively identified. Based on the above embodiments, the embodiments of the present invention further provide a specific embodiment of combining the upsampled convolutional layer with the feature extraction network to form a deep convolutional neural network, so as to fully describe the core content of the present invention and facilitate the reader to fully understand the spirit of the present invention. Referring specifically to fig. 2, fig. 2 specifically includes: a feature extraction network 201. The input represents a feature picture of the input. Among them, the feature extraction network 201 includes: 512 × 3 convolutional layers, 64 × n1 convolutional layers, 32 × n2 convolutional layers, and 16 × n3 convolutional layers. In the foregoing embodiment, upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer includes: upsampling the last convolutional layer of the feature extraction network to form a convolutional layer of 32 x n 3; adding the 32 × n3 convolution layer to the 32 × n2 of the feature extraction network to obtain 32 × n (3 + n2) convolution layers, and fusing the 32 × n (n3+ n2) convolution layers to obtain 32 × M convolution layers; wherein the feature extraction network and the 32 × n3 convolutional layer, the 32 × 32 (n3+ n2 convolutional layer and the 32 × M convolutional layer together form a feature extraction network with an upsampled convolutional layer.

Through practical verification, the upsampling can preliminarily solve the identification problem of some vehicle positions. However, it is difficult to effectively identify some of the smaller images generated when the drone is flying to a higher altitude. For this reason, it is necessary to construct a convolutional neural network having a greater depth. To this end, embodiments of the present invention further deepen the depth of the entire target detection network. Please continue to refer to fig. 2. In fig. 2, after the 32 × M convolutional layers are obtained by fusing the 32 × n (n3+ n2) convolutional layers, the method further includes:

sampling the 32 x M convolutional layer to form a 64 x M convolutional layer; adding the 64 × M convolutional layer to 64 × n1 of the feature extraction network to obtain a 64 × 64 (M + n1) convolutional layer, and fusing the 64 × M1 convolutional layer to obtain a 64 × M1 convolutional layer; the feature extraction network with upsampled convolutional layers formed in claim 3, together with said 64 x M convolutional layer, 64 x (M + n1) convolutional layer and 64 x M1 convolutional layer, forms another feature extraction network with upsampled convolutional layers. As can be seen from fig. 2, the technical solution of the whole embodiment essentially forms two object detection networks. In one aspect, a feature map may be output to the target detection layer via the 32 × M convolutional layer for the target detection layer to detect vehicle positions, forming a first target detection network that may detect the positions of smaller vehicles. On the other hand, through the winding layer of 64 × M1, a feature map is continuously output to the target detection layer for the target detection layer to detect the vehicle position, and a second target detection network is formed, and the second target detection network can be used for detecting the position of a vehicle with a tiny size when the unmanned aerial vehicle flies to a higher height. Namely, the depth of the feature extraction network is deepened, and the feature extraction network is matched with a target detection layer, so that the positions of vehicle targets with different minimums can be effectively detected.

The technical scheme of the invention is realized by organically combining various functions. In order to combine various functions to really realize the technical effect of the invention, the various steps and realized functions of the invention should be packaged in a modularization way. For this reason, based on the content of the foregoing embodiments, the embodiments of the present invention also provide a vehicle position identification apparatus based on the depth CNN, which is used to execute the vehicle position identification method based on the depth CNN in the foregoing method embodiments. Referring to fig. 3, the depth CNN-based vehicle position recognition device 303 includes:

a feature extraction network obtaining module 301, configured to train a feature extraction network by using a sample data set, and upsample a last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer;

a vehicle position identification module 302, configured to detect and identify a position of a vehicle by using the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer;

and the feature extraction network with the upsampled convolutional layer and the target detection network jointly form the deep CNN.

According to the device embodiment provided by the invention, the last layer of the convolutional layer of the network is sampled by adopting the characteristic extraction network acquisition module when the characteristic extraction network is trained to obtain the characteristic extraction network with the upsampled convolutional layer, and then the characteristic extraction network is matched with the vehicle identification module to form the deep convolutional neural network, so that the accurate and micro-recognition of the position of the vehicle shot by the unmanned aerial vehicle can be effectively realized in the high-speed flight state of the unmanned aerial vehicle.

The method of the embodiment of the invention is realized by depending on the electronic equipment, so that the related electronic equipment is necessarily introduced. An embodiment of the present invention provides an electronic device, as shown in fig. 4, the electronic device includes: a processor 401, a memory 402, and a bus 403;

the processor 401 and the memory 402 respectively complete communication with each other through the bus 403; the processor 401 is configured to call the program instructions in the memory 402 to execute the vehicle position identification method based on the deep CNN provided by the foregoing embodiment, for example, including: training a feature extraction network by adopting a sample data set, and upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer; taking the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer, and detecting and identifying the position of a vehicle; the feature extraction network with the upsampled convolutional layer and the target detection network jointly form the deep convolutional neural network CNN.

An embodiment of the present invention further provides a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions cause a computer to execute the method for identifying a vehicle location based on a depth CNN according to the embodiment, where the method includes: training a feature extraction network by adopting a sample data set, and upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer; taking the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer, and detecting and identifying the position of a vehicle; the feature extraction network with the upsampled convolutional layer and the target detection network jointly form the deep convolutional neural network CNN.

The above embodiments provide details of various possible solutions to the technical problems faced by the present invention. On the basis, the technical effects of the technical scheme of the invention need to be explained so as to facilitate the reader to thoroughly understand the technical scheme of the invention, and the practicability and effectiveness of the invention are also proved from one side. Referring to fig. 5, fig. 5 includes:

a tiny micro-vehicle location identification effect 501 and a small tiny vehicle location identification effect 502. The small vehicle position recognition effect 502 is a first target detection network formed by outputting a feature map to the target detection layer through the 32 × M convolution layer in the foregoing embodiment, so that the target detection layer can detect the position of the vehicle, and the target detection network can detect the position of the small vehicle. The effect 501 of identifying the vehicle position of the minimum micro vehicle is a second target detection network formed by outputting a feature map to the target detection layer through the convolution layer of 64 × M1 in the foregoing embodiment, so that the target detection layer detects the vehicle position. This second object detection network can be used to detect the location of very small vehicles when the drone is flying to a higher altitude. As can be seen from fig. 5, the depth of the feature extraction network is deepened, and the feature extraction network is matched with the target detection layer, so that the positions of the vehicle targets with different small degrees can be effectively detected, and a relatively clear and good detection effect is displayed no matter the identification effect 501 of the positions of the tiny vehicles or the identification effect 502 of the positions of the tiny vehicles.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the label determination device and the like are merely illustrative, and units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute the various embodiments or some parts of the methods of the embodiments. Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the embodiments of the present invention. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present invention should be included in the protection scope of the embodiments of the present invention.

Claims

1. A vehicle position identification method based on a depth CNN is characterized by comprising the following steps:

training a feature extraction network by adopting a sample data set, and upsampling the last convolutional layer of the feature extraction network to obtain a feature extraction network with an upsampled convolutional layer;

taking the feature extraction network with the upsampled convolutional layer as a front network of a target detection layer, and detecting and identifying the position of a vehicle;

the characteristic extraction network with the upsampled convolutional layer and the target detection network jointly form the depth CNN, the depth CNN comprises a first target detection network and a second target detection network, and the first target detection network and the second target detection network are used for respectively identifying the positions of vehicle targets in pictures shot at different heights;

the upsampling the last convolutional layer of the feature extraction network to obtain the feature extraction network with the upsampled convolutional layer comprises the following steps:

upsampling the last convolutional layer of the feature extraction network to form a convolutional layer of 32 x n 3;

adding the 32 × n3 convolution layer to the 32 × n2 of the feature extraction network to obtain 32 × n (3 + n2) convolution layers, and fusing the 32 × n (n3+ n2) convolution layers to obtain 32 × M convolution layers;

wherein the feature extraction network and the 32 × n3 convolutional layer, the 32 × 32 (n3+ n2 convolutional layer and the 32 × M convolutional layer together form the feature extraction network with the upsampled convolutional layer;

after the 32 × M convolutional layer is obtained after the fusing of the 32 × n (n3+ n2), the method further includes:

sampling the 32 x M convolutional layer to form a 64 x M convolutional layer;

adding the 64 × M convolutional layer to 64 × n1 of the feature extraction network to obtain a 64 × 64 (M + n1) convolutional layer, and fusing the 64 × M1 convolutional layer to obtain a 64 × M1 convolutional layer;

wherein, the formed feature extraction network with the upsampled convolutional layer and the 64 × M convolutional layer, the 64 × (M + n1) convolutional layer and the 64 × M1 convolutional layer form another feature extraction network with the upsampled convolutional layer;

the 32 × M convolutional layer outputs a feature map to the target detection layer, so that the target detection layer detects the position of the vehicle to form a first target detection network;

and the 64 × M1 convolution layer outputs a characteristic map to the target detection layer, so that the target detection layer detects the position of the vehicle to form a second target detection network.

2. The deep CNN-based vehicle location identification method according to claim 1, wherein the source of the sample data set comprises:

the unmanned aerial vehicle shoots the obtained picture, the picture obtained on the network and the remote sensing image data set.

3. A vehicle position recognition apparatus based on a depth CNN, characterized by comprising:

the feature extraction network acquisition module is specifically configured to:

sampling the 32 x M convolutional layer to form a 64 x M convolutional layer;

4. An electronic device, comprising:

at least one processor;

and at least one memory communicatively coupled to the processor, wherein:

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 2.

5. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1-2.