Disclosure of Invention
In view of this, the present invention is directed to providing a YOLOv3 network-based license plate detection method, so as to provide a YOLOv3 network-based license plate detection method that is capable of adapting to a complex environment and has high detection accuracy.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a license plate detection method based on a YOLOv3 network comprises the following steps: data preprocessing: collecting pictures containing license plates, and sorting; expanding the number of new energy license plates, the number of embassy and guild license plates and the number of civil aviation license plates by a data balancing method to be equal to the number of blue license plates, and marking the pictures as a data set; feature extraction: coding a data set obtained in data preprocessing, inputting the coded data set into a Darknet-53 network, and taking the output of the last two residual error layers as a feature matrix; and (3) classification prediction: and splicing the two feature matrixes with different dimensions obtained in the feature extraction, sending the spliced feature matrixes into a logistic regression classifier, and outputting the position and the type of the license plate.
Further, the data equalization method includes: and expanding the number of picture samples by methods of distortion, rotation and random noise addition.
Further, the Darknet-53 network consists of a series of 1 × 1 and 3 × 3 convolutional layers, each followed by a BN layer and a LeakyReLU layer.
Further, in the feature extraction, the Darknet-53 network includes an activation function, a convolutional layer, a shortcut link layer, a routing layer, an upsampling layer and a YOLO layer, and the YOLO layer includes anchor box parameters, target categories and the number of preselected frames.
Further, the step of using the anchor box parameter to select the target candidate box by YOLOv3 is as follows: YOLOv3 divides the input image into s × s grids, each grid predicts the positions of n candidate frames and the confidence degrees of the target types corresponding to the suspected targets in the candidate frames according to the anchor box parameters and the multi-scale scaling of the feature map, and the method for obtaining the positions of the target candidate frames by the anchor box parameters is shown as the following formula:
in the above formula, c
x,c
yRepresenting grid position, anchor box center as grid center, b
h,b
wRepresenting the length and width offset of the target candidate box with respect to the anchor box, b
x,b
yDenotes the center offset, σ is a logistic regressionThe function of the function is that of the function,
further, the specific method of classification prediction is as follows: in the feature extraction, the output of the last two residual error layers of the DarkNet-53 network is subjected to sampling and tensor splicing to obtain two feature maps with different sizes, license plate targets with different sizes are predicted on two scales with different sizes, five parameters of x, y, w, h and p are required to be predicted in each scale, wherein the x, y, w and h correspond to the abscissa of the upper left corner, the ordinate of the upper left corner, the width and the height of a license plate target boundary frame in a data set label, and the p represents the category and the confidence coefficient of the corresponding category of the license plate targets,
further, the method of using the logistic regression classifier in the classification prediction is that YOLOv3 mainly implements logistic regression from the feature map to the output parameters through three loss functions: loss of target location offset Lloc(l, g) for determining the position of the license plate target; target confidence loss Lconf(o, c) for determining the probability that the license plate target belongs to different license plate types; target classification loss Lcla(O, C) for indicating the kind of license plate to which the license plate object belongs, wherein λ1,λ2,λ3Is the equilibrium coefficient:
L(O,o,C,c,l,g)=λ1Lloc(l,g)+λ2Lconf(o,c)+λ3Lcla(O,C)
target confidence loss: the target confidence coefficient represents the probability of the target existing in the target rectangular frame, and the target confidence coefficient loss adopts binary cross entropy loss, wherein o
iE {0,1} represents whether the target really exists in the predicted target boundary box i, 0 represents the absence, and 1 represents the existence;
the Sigmoid probability of whether the target exists in the predicted target rectangular frame i is shown, and the predicted value c is obtained
iObtained by sigmoid function:
loss of target class: target class penalty L
cla(O, C) also employs a binary cross-entropy penalty, where O is
ijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;
the Sigmoid probability of the j-th class target in the boundary box i of the network prediction target is represented by a predicted value C
ijObtained by sigmoid function:
loss of target location: loss of target location L
loc(l, g) using the sum of squares of the difference between the true deviation value and the predicted deviation value, wherein
Indicating the predicted rectangular box coordinate offset, where the net predicts the offset, not the direct predicted coordinate,
indicating the coordinate offset between the preselected frame and the default frame with which it matches,
the middle superscript m is the { x, y, w, h }, b
x、b
y、b
w、b
hRespectively the predicted upper left abscissa and upper left ordinate of the target rectangular frame, the width of the bounding box, the height of the bounding box, c
x、c
y、p
w、p
hRespectively the horizontal coordinate of the upper left corner, the vertical coordinate of the upper left corner, the width of the boundary box, the height of the boundary box, g
x、g
y、g
w、g
hRespectively the upper left-hand abscissa, the upper left-hand ordinate and the edge of the real target rectangular frame matched with the default preselected frameThe width of the bounding box and the height of the bounding box, and the parameters are mapped on the prediction characteristic diagram
Compared with the prior art, the license plate detection method based on the YOLOv3 network has the following advantages:
(1) the license plate detection method based on the YOLOv3 network adopts an anchor box mechanism, only uses one-time feature extraction to predict two kinds of information of the position and the category of a license plate target, reduces the calculated amount and improves the calculation speed.
(2) The license plate detection method based on the YOLOv3 network is suitable for the arrangement of the anchor boxes of the license plate targets, so that the extraction of the target candidate frames is more targeted, the multi-scale feature prediction quantity is reasonably arranged aiming at the characteristic that the license plate targets have larger occupation ratio in the whole picture, and the detection speed is improved on the premise of not reducing the detection accuracy.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The noun explains:
YOLOv3: YOLOv3 is An object detection network, produced by Josep Redmond et al in the literature Joseph Redmon, Ali faradai, yolovi 3: An incorporated improvement. arxiv: 1804.02767, respectively.
Darknet-53 network: YOLOv 3.
A license plate detection method based on YOLOv3 network, as shown in fig. 1 to 7, includes the following steps: data preprocessing: and collecting pictures containing the license plate, and sorting.
The data preprocessing mainly comprises five license plate types, namely a blue license plate, a yellow license plate, a new energy license plate, a license plate of a guild hall and a license plate special for civil aviation of a small civil vehicle.
The number of new energy license plates, the number of license plates of a messenger hall and the number of license plates of a civil aviation are reduced by a data balancing method, so that the number of various license plates is basically the same, and the purpose of balancing data is achieved.
The data equalization method is shown in fig. 1, and the number of picture samples is expanded by means of warping, rotating and adding random noise. The four pictures in fig. 1 represent from left to right in turn: the method comprises the steps of expanding picture original images, distorting the original images, rotating the original images and adding noise into the original images.
The pictures were labeled in the format of a Pascal VOC data set by manual labeling.
Feature extraction: and inputting the data codes obtained in the data preprocessing into a Darknet-53 network, and taking the output of the last two residual error layers as a feature matrix.
Feature extraction: and (3) outputting the last two residual error layers of the DarkNet-53 network, and performing up-sampling and tensor splicing to obtain two feature maps with different sizes for detecting license plate targets with different sizes.
The Darknet-53 network in the feature extraction comprises an activation function, a convolutional layer, a shortcut link layer (ShortcutConnections), a routing layer, an upsampling layer and a YOLO layer (the YOLO layer is used for realizing the functions of splicing and feature extraction after the feature graph is upsampled).
It should be noted that Darknet-53 does not have a pooling layer in the conventional sense, but rather achieves the effect of reducing the dimension of the feature map by adjusting the step size of the convolution.
Darknet-53 is composed of a series of 1X 1 and 3X 3 convolutional layers. Each convolutional layer is followed by a BN layer and a LeakyReLU layer.
The YOLO layer includes YOLOv 3-specific parameters such as anchor box parameters, object class, number of pre-selected frames, and the like. The step of using the anchor box parameter to select the target candidate box by the YOLOv3 is as follows: YOLOv3 divides the input image into s × s grids, and each grid predicts the positions of n candidate frames and the confidence of the target type corresponding to the suspected target in the candidate frame according to the multi-scale scaling of the anchor box parameters and the feature map (t)x,ty,tw,th,to) Wherein (t)x,ty) Represents the coordinates of the center of the candidate frame, (t)w,th) Width and height of the candidate box, toRepresenting the target class confidence.
The method for obtaining the target candidate frame position from the anchor box parameters is shown in the following formula (1):
in the above formula (1), c
x,c
yRepresenting the grid location. As shown in fig. 2 below, the grid position is represented by the upper left coordinate of the grid, the dashed box represents the anchor box, and the gray box represents the offset of the target candidate box from the anchor box. The target candidate box is located in the grid (c)
x,c
y) In, p
x,p
yHeight and width of anchor box, center of anchor box being center of grid, b
h,b
wRepresenting the length and width offset of the target candidate box with respect to the anchor box, b
x,b
yRepresenting the center offset, sigma is a logistic regression function,
and (3) classification prediction: and splicing the two feature matrixes with different dimensions obtained in the feature extraction, sending the spliced feature matrixes into a logistic regression classifier, and outputting the position and the type of the license plate.
And (3) splicing two feature matrixes with different sizes obtained in feature extraction through sampling and tensor, and predicting license plate targets with different sizes on two scales with different sizes. In each scale, five parameters of x, y, w, h and p need to be predicted. Wherein x, y, w and h correspond to the abscissa of the upper left corner, the ordinate of the upper left corner, the width and the height of the bounding box of the license plate target in the data set label, and p represents the category and the confidence coefficient of the corresponding type of the license plate target.
In classification prediction, YOLOv3 implements logistic regression from feature maps to output parameters mainly through three loss functions: loss of target location offset Lloc(l, g) for determining the position of the license plate target; target confidence loss Lconf(o, c) for determining the probability that the license plate target belongs to different license plate types; target classification loss Lcla(O, C) for representing the kind of the license plate to which the license plate object belongs. Wherein λ1,λ2,λ3Is the equilibrium coefficient:
L(O,o,C,c,l,g)=λ1Lloc(l,g)+λ2Lconf(o,c)+λ3Lcla(O,C) (2)
(1) target confidence loss: the target confidence degree represents the probability of the target existing in the target rectangular frame, and the target confidence degree loss adopts Binary Cross Entropy loss (Binary Cross Entropy). Wherein o is
iE {0,1} represents whether the target really exists in the predicted target bounding box i, 0 represents not existing, and 1 represents existing.
And (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown. Will predict the value c
iObtained by sigmoid function:
(2) loss of target class: target class penalty L
cla(O, C) also employs a binary cross entropy penalty. Wherein, O
ijE {0,1}, which indicates whether the jth class target really exists in the predicted target bounding box i, 0 indicates that the jth class target does not exist, and 1 indicates that the jth class target exists.
Indicating the existence of the jth category in the network prediction target bounding box iTarget Sigmoid probability, from predicted value C
ijObtained by sigmoid function.
(3) Loss of target location: loss of target location L
loc(l, g) the sum of the squares of the difference between the true deviation value and the predicted deviation value is used. Wherein
And the coordinate offset of the predicted rectangular box is shown, wherein the network predicts the offset and does not directly predict the coordinate.
Indicating the coordinate offset between the preselected frame and the default frame with which it matches,
the middle superscript m is e { x, y, w, h }. b
x、b
y、b
w、b
hThe horizontal coordinate of the upper left corner of the predicted target rectangular frame, the vertical coordinate of the upper left corner, the width of the boundary frame and the height of the boundary frame are respectively. c. C
x、c
y、 p
w、p
hRespectively the abscissa of the upper left corner of the default preselected frame, the ordinate of the upper left corner, the width of the bounding box and the height of the bounding box. g
x、g
y、g
w、g
hThe horizontal coordinate of the upper left corner of the real target rectangular frame, the vertical coordinate of the upper left corner, the width of the boundary frame and the height of the boundary frame which are matched with the default preselected frame are respectively set. These parameters are mapped on the predicted feature map.
The specific method for classified prediction comprises the following steps: the process of processing the signature graph into the target class replaces Softmax with logistic regression, which has the advantage that the license plate labels that are classified may not be independent of each other. In addition, logistic regression is used to score the portion of the anchor box that surrounds a target, i.e., how likely this location is to be a license plate. This step is performed prior to prediction, and unnecessary anchor boxes may be eliminated to reduce the amount of computation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.