CN111178451A

CN111178451A - A license plate detection method based on YOLOv3 network

Info

Publication number: CN111178451A
Application number: CN202010002151.6A
Authority: CN
Inventors: 屈景怡; 冯晓赛; 杨俊�
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2020-01-02
Filing date: 2020-01-02
Publication date: 2020-05-19

Abstract

The invention provides a license plate detection method based on YOLOv3 network, including data preprocessing: collecting pictures containing license plates, sorting and sorting; expanding the number of fewer new energy license plates, the number of embassy and consulate license plates and civil aviation license plates by means of data balancing The number of blue license plates is equal to the number of blue license plates, and the above picture is marked as a dataset; Feature extraction: The dataset obtained in data preprocessing is encoded and input to the Darknet‑53 network, and the output of the last two residual layers is taken as Feature matrix; classification prediction: The two feature matrices of different dimensions obtained in feature extraction are spliced and sent to the logistic regression classifier to output the location and type of the license plate. The license plate detection method based on the YOLOv3 network of the present invention adopts the anchor box mechanism and uses only one feature extraction to predict the position and category of the license plate target, which reduces the amount of calculation and improves the calculation speed.

Description

License plate detection method based on YOLOv3 network

Technical Field

The invention belongs to the technical field of big data and deep learning, and particularly relates to a license plate detection method based on a YOLOv3 network.

Background

With the rapid development of civil aviation in China, the airport scale is getting bigger and bigger. Various tools and vehicles in an airport present huge challenges to an automatic vehicle management system, and license plate detection is an important link of the automatic vehicle management system. At present, a plurality of license plate detection systems are put into commercial use in China. There are two main methods for detecting license plates at home and abroad: one is a traditional license plate detection method based on prior characteristics; the other is a license plate detection method based on deep learning. The traditional license plate detection method mainly utilizes the characteristics of the license plate, such as contour, texture, color and the like, to model the license plate. These conventional methods are characterized by small calculation amount, low accuracy and poor robustness. With the fact that the AlexNet network based on deep learning in 2012 obtains the ILSVRC champion in the current year, the convolutional neural network obtains rich results, and development of the target detection method based on deep learning is greatly promoted. At present, target detection networks based on deep learning are mainly divided into two-step detection networks and one-step detection networks. The two-step detection network mainly comprises R-CNN series, such as R-CNN, SPP-NET, Fast R-CNN and the like, and is characterized in that a target detection task is divided into two steps: the position and type of the target are detected by first detecting pre-selected frames (Region pro-potential) of suspected targets, then extracting the features of the candidate frames, and deducing the probability that the candidate frames belong to each target. The single-step detection network directly obtains the class probability and the position of the target through one-step feature extraction, and the method is faster than a two-step detection method. Typical single-step assays are available as YOLO, YOLO9000, YOLOv3, SSD. The YOLOv3 network outputs the category and the bounding box of the predicted target object directly in a regression mode, and is characterized by high detection speed but poor detection effect on small target groups. In the license plate detection, the license plate target is large, and no small target group exists, so that YOLOv3 is used as a basic network for license plate detection.

Disclosure of Invention

In view of this, the present invention is directed to providing a YOLOv3 network-based license plate detection method, so as to provide a YOLOv3 network-based license plate detection method that is capable of adapting to a complex environment and has high detection accuracy.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a license plate detection method based on a YOLOv3 network comprises the following steps: data preprocessing: collecting pictures containing license plates, and sorting; expanding the number of new energy license plates, the number of embassy and guild license plates and the number of civil aviation license plates by a data balancing method to be equal to the number of blue license plates, and marking the pictures as a data set; feature extraction: coding a data set obtained in data preprocessing, inputting the coded data set into a Darknet-53 network, and taking the output of the last two residual error layers as a feature matrix; and (3) classification prediction: and splicing the two feature matrixes with different dimensions obtained in the feature extraction, sending the spliced feature matrixes into a logistic regression classifier, and outputting the position and the type of the license plate.

Further, the data equalization method includes: and expanding the number of picture samples by methods of distortion, rotation and random noise addition.

Further, the Darknet-53 network consists of a series of 1 × 1 and 3 × 3 convolutional layers, each followed by a BN layer and a LeakyReLU layer.

Further, in the feature extraction, the Darknet-53 network includes an activation function, a convolutional layer, a shortcut link layer, a routing layer, an upsampling layer and a YOLO layer, and the YOLO layer includes anchor box parameters, target categories and the number of preselected frames.

Further, the step of using the anchor box parameter to select the target candidate box by YOLOv3 is as follows: YOLOv3 divides the input image into s × s grids, each grid predicts the positions of n candidate frames and the confidence degrees of the target types corresponding to the suspected targets in the candidate frames according to the anchor box parameters and the multi-scale scaling of the feature map, and the method for obtaining the positions of the target candidate frames by the anchor box parameters is shown as the following formula:

in the above formula, c_x,c_yRepresenting grid position, anchor box center as grid center, b_h,b_wRepresenting the length and width offset of the target candidate box with respect to the anchor box, b_x,b_yDenotes the center offset, σ is a logistic regressionThe function of the function is that of the function,

further, the specific method of classification prediction is as follows: in the feature extraction, the output of the last two residual error layers of the DarkNet-53 network is subjected to sampling and tensor splicing to obtain two feature maps with different sizes, license plate targets with different sizes are predicted on two scales with different sizes, five parameters of x, y, w, h and p are required to be predicted in each scale, wherein the x, y, w and h correspond to the abscissa of the upper left corner, the ordinate of the upper left corner, the width and the height of a license plate target boundary frame in a data set label, and the p represents the category and the confidence coefficient of the corresponding category of the license plate targets,

further, the method of using the logistic regression classifier in the classification prediction is that YOLOv3 mainly implements logistic regression from the feature map to the output parameters through three loss functions: loss of target location offset L_loc(l, g) for determining the position of the license plate target; target confidence loss L_conf(o, c) for determining the probability that the license plate target belongs to different license plate types; target classification loss L_cla(O, C) for indicating the kind of license plate to which the license plate object belongs, wherein λ₁,λ₂,λ₃Is the equilibrium coefficient:

L(O,o,C,c,l,g)＝λ₁L_loc(l,g)+λ₂L_conf(o,c)+λ₃L_cla(O,C)

target confidence loss: the target confidence coefficient represents the probability of the target existing in the target rectangular frame, and the target confidence coefficient loss adopts binary cross entropy loss, wherein o_iE {0,1} represents whether the target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;

the Sigmoid probability of whether the target exists in the predicted target rectangular frame i is shown, and the predicted value c is obtained_iObtained by sigmoid function:

loss of target class: target class penalty L_cla(O, C) also employs a binary cross-entropy penalty, where O is_ijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;

the Sigmoid probability of the j-th class target in the boundary box i of the network prediction target is represented by a predicted value C_ijObtained by sigmoid function:

loss of target location: loss of target location L_loc(l, g) using the sum of squares of the difference between the true deviation value and the predicted deviation value, wherein

Indicating the predicted rectangular box coordinate offset, where the net predicts the offset, not the direct predicted coordinate,

indicating the coordinate offset between the preselected frame and the default frame with which it matches,

the middle superscript m is the { x, y, w, h }, b^x、b^y、b^w、b^hRespectively the predicted upper left abscissa and upper left ordinate of the target rectangular frame, the width of the bounding box, the height of the bounding box, c^x、c^y、p^w、p^hRespectively the horizontal coordinate of the upper left corner, the vertical coordinate of the upper left corner, the width of the boundary box, the height of the boundary box, g^x、g^y、g^w、g^hRespectively the upper left-hand abscissa, the upper left-hand ordinate and the edge of the real target rectangular frame matched with the default preselected frameThe width of the bounding box and the height of the bounding box, and the parameters are mapped on the prediction characteristic diagram

Compared with the prior art, the license plate detection method based on the YOLOv3 network has the following advantages:

(1) the license plate detection method based on the YOLOv3 network adopts an anchor box mechanism, only uses one-time feature extraction to predict two kinds of information of the position and the category of a license plate target, reduces the calculated amount and improves the calculation speed.

(2) The license plate detection method based on the YOLOv3 network is suitable for the arrangement of the anchor boxes of the license plate targets, so that the extraction of the target candidate frames is more targeted, the multi-scale feature prediction quantity is reasonably arranged aiming at the characteristic that the license plate targets have larger occupation ratio in the whole picture, and the detection speed is improved on the premise of not reducing the detection accuracy.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a state diagram of a data equalization method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the YOLOv3 anchor box mechanism according to an embodiment of the present invention;

fig. 3 is a structural diagram of a license plate detection network based on the YOLOv3 network according to an embodiment of the present invention;

FIG. 4 is a block diagram of Darknet-53 of a feature extraction network according to an embodiment of the present invention;

FIG. 5 shows reduced operands compared to YOLOv3 according to an embodiment of the present invention;

FIG. 6 is a network hyper-parameter set according to an embodiment of the present invention;

FIG. 7 is a comparison of the performance of the embodiments of the present invention and the YOLOv3 network for detecting license plates.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

The noun explains:

YOLOv3: YOLOv3 is An object detection network, produced by Josep Redmond et al in the literature Joseph Redmon, Ali faradai, yolovi 3: An incorporated improvement. arxiv: 1804.02767, respectively.

Darknet-53 network: YOLOv 3.

A license plate detection method based on YOLOv3 network, as shown in fig. 1 to 7, includes the following steps: data preprocessing: and collecting pictures containing the license plate, and sorting.

The data preprocessing mainly comprises five license plate types, namely a blue license plate, a yellow license plate, a new energy license plate, a license plate of a guild hall and a license plate special for civil aviation of a small civil vehicle.

The number of new energy license plates, the number of license plates of a messenger hall and the number of license plates of a civil aviation are reduced by a data balancing method, so that the number of various license plates is basically the same, and the purpose of balancing data is achieved.

The data equalization method is shown in fig. 1, and the number of picture samples is expanded by means of warping, rotating and adding random noise. The four pictures in fig. 1 represent from left to right in turn: the method comprises the steps of expanding picture original images, distorting the original images, rotating the original images and adding noise into the original images.

The pictures were labeled in the format of a Pascal VOC data set by manual labeling.

Feature extraction: and inputting the data codes obtained in the data preprocessing into a Darknet-53 network, and taking the output of the last two residual error layers as a feature matrix.

Feature extraction: and (3) outputting the last two residual error layers of the DarkNet-53 network, and performing up-sampling and tensor splicing to obtain two feature maps with different sizes for detecting license plate targets with different sizes.

The Darknet-53 network in the feature extraction comprises an activation function, a convolutional layer, a shortcut link layer (ShortcutConnections), a routing layer, an upsampling layer and a YOLO layer (the YOLO layer is used for realizing the functions of splicing and feature extraction after the feature graph is upsampled).

It should be noted that Darknet-53 does not have a pooling layer in the conventional sense, but rather achieves the effect of reducing the dimension of the feature map by adjusting the step size of the convolution.

Darknet-53 is composed of a series of 1X 1 and 3X 3 convolutional layers. Each convolutional layer is followed by a BN layer and a LeakyReLU layer.

The YOLO layer includes YOLOv 3-specific parameters such as anchor box parameters, object class, number of pre-selected frames, and the like. The step of using the anchor box parameter to select the target candidate box by the YOLOv3 is as follows: YOLOv3 divides the input image into s × s grids, and each grid predicts the positions of n candidate frames and the confidence of the target type corresponding to the suspected target in the candidate frame according to the multi-scale scaling of the anchor box parameters and the feature map (t)_x,t_y,t_w,t_h,t_o) Wherein (t)_x,t_y) Represents the coordinates of the center of the candidate frame, (t)_w,t_h) Width and height of the candidate box, t_oRepresenting the target class confidence.

The method for obtaining the target candidate frame position from the anchor box parameters is shown in the following formula (1):

in the above formula (1), c_x,c_yRepresenting the grid location. As shown in fig. 2 below, the grid position is represented by the upper left coordinate of the grid, the dashed box represents the anchor box, and the gray box represents the offset of the target candidate box from the anchor box. The target candidate box is located in the grid (c)_x,c_y) In, p_x,p_yHeight and width of anchor box, center of anchor box being center of grid, b_h,b_wRepresenting the length and width offset of the target candidate box with respect to the anchor box, b_x,b_yRepresenting the center offset, sigma is a logistic regression function,

and (3) classification prediction: and splicing the two feature matrixes with different dimensions obtained in the feature extraction, sending the spliced feature matrixes into a logistic regression classifier, and outputting the position and the type of the license plate.

And (3) splicing two feature matrixes with different sizes obtained in feature extraction through sampling and tensor, and predicting license plate targets with different sizes on two scales with different sizes. In each scale, five parameters of x, y, w, h and p need to be predicted. Wherein x, y, w and h correspond to the abscissa of the upper left corner, the ordinate of the upper left corner, the width and the height of the bounding box of the license plate target in the data set label, and p represents the category and the confidence coefficient of the corresponding type of the license plate target.

In classification prediction, YOLOv3 implements logistic regression from feature maps to output parameters mainly through three loss functions: loss of target location offset L_loc(l, g) for determining the position of the license plate target; target confidence loss L_conf(o, c) for determining the probability that the license plate target belongs to different license plate types; target classification loss L_cla(O, C) for representing the kind of the license plate to which the license plate object belongs. Wherein λ₁,λ₂,λ₃Is the equilibrium coefficient:

L(O,o,C,c,l,g)＝λ₁L_loc(l,g)+λ₂L_conf(o,c)+λ₃L_cla(O,C) (2)

(1) target confidence loss: the target confidence degree represents the probability of the target existing in the target rectangular frame, and the target confidence degree loss adopts Binary Cross Entropy loss (Binary Cross Entropy). Wherein o is_iE {0,1} represents whether the target really exists in the predicted target bounding box i, 0 represents not existing, and 1 represents existing.

And (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown. Will predict the value c_iObtained by sigmoid function:

(2) loss of target class: target class penalty L_cla(O, C) also employs a binary cross entropy penalty. Wherein, O_ijE {0,1}, which indicates whether the jth class target really exists in the predicted target bounding box i, 0 indicates that the jth class target does not exist, and 1 indicates that the jth class target exists.

Indicating the existence of the jth category in the network prediction target bounding box iTarget Sigmoid probability, from predicted value C_ijObtained by sigmoid function.

(3) Loss of target location: loss of target location L_loc(l, g) the sum of the squares of the difference between the true deviation value and the predicted deviation value is used. Wherein

And the coordinate offset of the predicted rectangular box is shown, wherein the network predicts the offset and does not directly predict the coordinate.

the middle superscript m is e { x, y, w, h }. b^x、b^y、b^w、b^hThe horizontal coordinate of the upper left corner of the predicted target rectangular frame, the vertical coordinate of the upper left corner, the width of the boundary frame and the height of the boundary frame are respectively. c. C^x、c^y、 p^w、p^hRespectively the abscissa of the upper left corner of the default preselected frame, the ordinate of the upper left corner, the width of the bounding box and the height of the bounding box. g^x、g^y、g^w、g^hThe horizontal coordinate of the upper left corner of the real target rectangular frame, the vertical coordinate of the upper left corner, the width of the boundary frame and the height of the boundary frame which are matched with the default preselected frame are respectively set. These parameters are mapped on the predicted feature map.

The specific method for classified prediction comprises the following steps: the process of processing the signature graph into the target class replaces Softmax with logistic regression, which has the advantage that the license plate labels that are classified may not be independent of each other. In addition, logistic regression is used to score the portion of the anchor box that surrounds a target, i.e., how likely this location is to be a license plate. This step is performed prior to prediction, and unnecessary anchor boxes may be eliminated to reduce the amount of computation.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. a license plate detection method based on YOLOv3 network, is characterized in that: comprise the steps:

Data preprocessing: collect pictures containing license plates and sort them out; expand the number of new energy license plates, the number of embassy and consulate license plates, and the number of civil aviation license plates by means of data balance, so that they are equal to the number of blue license plates, and mark the above pictures. is the dataset;

Feature extraction: encode the dataset obtained in data preprocessing and input it to the Darknet-53 network, and take the output of the last two residual layers as the feature matrix;

Classification prediction: The two feature matrices of different dimensions obtained in the feature extraction are spliced and sent to the logistic regression classifier to output the location and type of the license plate.

2. A kind of license plate detection method based on YOLOv3 network according to claim 1, is characterized in that: the method of described data equalization is: by the method of twisting, rotating, adding random noise, expand picture sample quantity.

3. A kind of license plate detection method based on YOLOv3 network according to claim 1, it is characterized in that: Darknet-53 network is made up of a series of 1×1 and 3×3 convolutional layers, after each convolutional layer Will be followed by a BN layer and a LeakyReLU layer.

4. a kind of license plate detection method based on YOLOv3 network according to claim 1, is characterized in that: in described feature extraction, Darknet-53 network comprises activation function, convolution layer, shortcut link layer, routing layer, upsampling layer and YOLO layer, the YOLO layer includes anchor box parameters, target category and the number of preselected boxes.

5. a kind of license plate detection method based on YOLOv3 network according to claim 4, is characterized in that: the step that described YOLOv3 selects target candidate frame with anchor box parameter is: YOLOv3 divides input image into the grid of s × s, Each grid predicts the position of n candidate frames and the confidence of the target type corresponding to the suspected target in the candidate frame according to the anchor box parameters and multi-scale scaling of the feature map. The method of obtaining the target candidate frame position from the anchor box parameters is shown in the following formula :

In the above formula, c _x , c _y represent the grid position, the center of the anchor box is the grid center, b _h , b _w represent the length and width offset of the target candidate frame relative to the anchor box, b _x , b _y represent the center offset quantity, σ is the logistic regression function,

6. a kind of license plate detection method based on YOLOv3 network according to claim 1 is characterized in that: the concrete method of classification prediction is: the output of the last two residual layers of DarkNet-53 network in feature extraction, after sampling and Zhang After splicing, two feature maps of different sizes are obtained, and license plate targets of different sizes are predicted on two scales of different sizes. In each scale, five parameters of x, y, w, h, and p need to be predicted. Where x, y, w, and h correspond to the upper left abscissa, upper left ordinate, width and height of the bounding box of the license plate target in the label of the dataset, and p represents the category and confidence of the corresponding type of the license plate target,

7. a kind of license plate detection method based on YOLOv3 network according to claim 6, is characterized in that: the method that uses logistic regression classifier in classification prediction is, YOLOv3 mainly realizes from feature map to output parameter through three loss functions The logistic regression of : target positioning offset loss L _loc (l, g), used to determine the location of the license plate target; target confidence loss L _conf (o, c), used to determine the probability that the license plate target belongs to different license plate types; The target classification loss L _cla (O, C) is used to represent the type of license plate to which the license plate target belongs, where λ ₁ , λ ₂ , λ ₃ are balance coefficients:

L(O,o,C,c,l,g)=λ ₁ L _loc (l,g)+λ ₂ L _conf (o,c)+λ ₃ L _cla (O,C)

Target confidence loss: The target confidence represents the probability of the existence of the target in the target rectangular box, and the target confidence loss adopts the binary cross-entropy loss, where o _i ∈ {0,1} indicates whether the predicted target bounding box i is true or not There is a target, 0 means it does not exist, and 1 means it exists;

Represents the sigmoid probability of predicting whether there is a target in the target rectangle i, and the predicted value c _i is obtained by the sigmoid function:

Target category loss: The target category loss L _cla (O,C) also uses the binary cross-entropy loss, where O _ij ∈ {0,1}, which indicates whether there is a real j-th target in the predicted target bounding box i, 0 means does not exist, 1 means exists;

Represents the sigmoid probability of the j-th type of target in the target bounding box i predicted by the network, which is obtained from the predicted value C _ij through the sigmoid function:

Target localization loss: The target localization loss L _loc (l, g) is the sum of the squares of the difference between the true deviation value and the predicted deviation value, where

Indicates the coordinate offset of the predicted rectangular frame, where the network predicts the offset, not the direct predicted coordinate.

Indicates the coordinate offset between the matching preselected box and the default box,

The superscript m∈{x,y,w,h}, b ^x , b ^y , b ^w , and b ^h are the abscissa of the upper left corner, the ordinate of the upper left corner, the width of the bounding box, the boundary of the predicted target rectangle, respectively The height of the box, c ^x , c ^y , ^p ^w , and ph are the abscissa of the upper left corner, the ordinate of the upper left corner, the width of the bounding box, the height of the bounding box, g ^x , g ^y , g ^w , g ^h are the abscissa of the upper left corner, the ordinate of the upper left corner, the width of the bounding box, and the height of the bounding box of the real target rectangle matching the default pre-selection box. These parameters are mapped on the prediction feature map,