Disclosure of Invention
In order to solve the above problems in the prior art, the present invention provides a helmet detection method, device, electronic device and storage medium based on improved YOLOv5 s. The technical problem to be solved by the invention is realized by the following technical scheme:
the invention relates to a helmet detection method based on improved YOLOv5s, which comprises the following steps:
constructing a YOLOv5s-Light target detection framework;
training the YOLOv5s-Light target detection frame by using a target image training set to obtain a YOLOv5s-Light target detection training frame;
inputting the image set to be detected into the YOLOv5s-Light target detection training frame to obtain a target detection set;
and detecting the target detection set by using a riding detection algorithm to obtain a target detection result.
In one embodiment of the invention, the YOLOv5s-Light target detection framework is constructed, and comprises the following steps:
based on the idea of lightweight network, performing first convolution structure transformation on a residual block of a first YOLOv5s network to obtain a second YOLOv5s network;
replacing an activation function of a deep level residual block of the second YOLOv5s network with an H-Swish function to obtain a third YOLOv5s network;
adding an SE attention module in a residual block of the third YOLOv5s network to obtain a fourth YOLOv5s network based on a channel attention mechanism;
compressing the number of channels output by each layer of the fourth YOLOv5s network at the same ratio to obtain a fifth YOLOv5s network;
and performing second convolution structure transformation on the fifth YOLOv5s network to obtain the YOLOv5s-Light target detection framework.
In one embodiment of the invention, the first convolution structure transformation comprises:
changing the common convolution in Bottleneck of the residual block in the first YOLOv5s network into packet convolution;
the inversion of bottleeck of the residual block in the first YOLOv5s network is cancelled.
In one embodiment of the invention, the second convolution structure transform comprises:
replacing the convolution of the non-residual block Bottleneck in the fifth YOLOv5s network with a depth separable convolution.
In one embodiment of the invention, the target image training set comprises a first head target image, a first helmet-worn head image, and a first rider image.
In an embodiment of the present invention, detecting the target detection set by using a riding detection algorithm to obtain a target detection result, includes:
initializing a rider image set wearing a helmet and a rider image set not wearing the helmet;
obtaining a second head target image and/or a second helmet-free rider image by using the YOLOv5s-Light target detection framework detection test set;
using the second head target image and the second unworn helmet rider image to arrive at a best IoU threshold;
detecting the target detection set by using the YOLOv5s-Light target detection framework to obtain a third head target image, a third helmet-worn head image and a third rider image;
placing the third helmet-worn head image in the helmet-worn rider image collection;
deriving a third unworn helmet rider image using the third head target image, the third rider image and the optimal IoU threshold value;
placing the third helmet-free rider image in the helmet-free rider image set, the helmet-free rider image set and the helmet-free rider image set being the target detection results.
In one embodiment of the present invention, using the second head target image and the second unworn helmet rider image to derive an optimal IoU threshold comprises:
step 4.3.1, initializing a threshold null set and an IoU threshold, wherein the initialization value of the IoU threshold is thres;
step 4.3.2, obtaining a second rider image by utilizing the YOLOv5s-Light target detection framework detection test set;
step 4.3.3, calculating IoU thresholds of said second head target image and said second rider image resulting in a second IoU threshold set;
step 4.3.4, determining whether the IoU threshold value in the second IoU threshold value set is greater than the thres, if the IoU threshold value in the second IoU threshold value set is greater than or equal to the thres, adding the second head target image into the threshold null set, if the IoU threshold value in the second IoU threshold value set is less than the thres, ignoring the second head target image until the comparison of all IoU threshold values in the second IoU threshold value set with the thres is completed;
step 4.3.5, calculating the difference value between the number of the second head target images in the threshold empty set and the number of the second helmet-unworn rider images, and adding the difference value into a target difference value set;
step 4.3.6, after adding the difference value into a target difference value set, reducing the thres by 0.01, and emptying the threshold empty set;
and 4.3.7, repeating the steps 4.3.4-4.3.6 until the value of thres is 0, and the value of thres corresponding to the minimum difference in the target difference set is the optimal IoU threshold.
In one embodiment of the present invention, there is provided a helmet detection device based on modified YOLOv5s, comprising:
the model building module is used for building a YOLOv5s-Light target detection framework;
the model training module is used for training the YOLOv5s-Light target detection framework by using a target image training set to obtain a YOLOv5s-Light target detection training framework;
the information processing module is used for inputting the image set to be detected into the YOLOv5s-Light target detection training frame to obtain a target detection set;
and the riding detection module is used for detecting the target detection set by utilizing a riding detection algorithm to obtain a target detection result.
One embodiment of the present invention provides a helmet detection electronic device based on modified YOLOv5s, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps when executing the computer program.
An embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, implements the steps of the improved YOLOv5 s-based helmet detection method.
Compared with the prior art, the invention has the beneficial effects that:
the application discloses a helmet detection method based on improved YOLOv5s, the method comprises the steps of firstly constructing a YOLOv5s-Light target detection frame, greatly reducing the number of network parameters on the basis of the YOLOv5s target detection frame through the YOLOv5s-Light target detection frame, achieving construction of a lightweight model, then obtaining a target detection set through the YOLOv5s-Light target detection frame, and finally detecting the target detection set through a riding detection algorithm to obtain a target detection result.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
It should be noted that the terms "first", "second" and "third" in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", "third" may explicitly or implicitly include one or more of the features.
Example one
Referring to fig. 1, fig. 1 is a flowchart of a helmet detection method based on improved YOLOv5s according to an embodiment of the present invention. The embodiment discloses a helmet detection method based on improved YOLOv5s, which comprises the following steps:
step 1, constructing a YOLOv5s-Light target detection framework.
Specifically, the present embodiment improves the algorithm of the first YOLOv5s network, and adopts a depth separable convolution module, an SE (Squeeze-and-excite) attention module and an improved inversion residual in the network model design, and the activation function adopts an H-swish function. The lightweight network model YOLOv5s-Light target detection framework is obtained while the detection frame rate and the accuracy are ensured, the parameter number of the model is greatly reduced, the efficiency of the model is improved, and the lightweight helmet detection method easy to deploy is realized.
Further, step 1 comprises:
step 1.1, based on the idea of lightweight network, performing first convolution structure transformation on a residual block of a first YOLOv5s network to obtain a second YOLOv5s network.
Further, the first convolution structure transformation includes:
changing the ordinary convolution in Bottleneck of the residual block in the first YOLOv5s network into packet convolution;
the inversion of bottleeck of the residual block in the first YOLOv5s network is cancelled.
Specifically, the normal convolution in the standard residual block bottleeck of the first YOLOv5s network is changed to a Group convolution (Group convolution), and the inversion of the residual block bottleeck of the first YOLOv5s network is cancelled to reduce the network parameters.
The first YOLOv5s network is an original YOLOv5s network, and is difficult to directly deploy on an edge computing platform, the detection speed is slow, the detection scene of the embodiment is actual traffic, so the requirement on the real-time performance of detection is high, the embodiment can greatly reduce network parameters under the idea of lightweight network, and the overall structure of the network is not changed.
And step 1.2, replacing the activation function of the deep level residual block of the second YOLOv5s network with an H-Swish function to obtain a third YOLOv5s network.
Referring to fig. 2, fig. 2 is an operation flowchart of replacing the activation function of the deep residual block of the second YOLOv5s network with an H-Swish function according to an embodiment of the present invention, in this embodiment, the activation function of the deep residual block of the second YOLOv5s network is replaced with an H-Swish function to obtain a third YOLOv5s network, and the activation function of the shallow residual block of the second YOLOv5s network is still maintained as LeakyReLu by encouraging the function to use fewer parameters to learn sparser image characteristics.
Further, the expression of the H-Swish function is:
ReLU represents the ReLU activation function for hidden neuron outputs, and x represents the input variables of the second YOLOv5s network.
And step 1.3, adding an SE attention module in a residual block of the third YOLOv5s network based on a channel attention mechanism to obtain a fourth YOLOv5s network.
Specifically, the SE attention module is specifically an Squeeze-And-Excite module, And an SE attention module is added to a residual block of the third YOLOv5s network, And global pooling can be used as channel weighting of the feature map, so that the feature learning capability of the network on the main target is improved.
And step 1.4, compressing the number of channels output by each layer of the fourth YOLOv5s network at the same ratio to obtain a fifth YOLOv5s network.
Specifically, the number of channels output by each layer of the fourth YOLOv5s network is compressed by the same ratio to further compress the parameter number of the fourth YOLOv5s network, and the compression ratio can be adjusted according to the hardware condition of the actually deployed edge computing platform.
And step 1.5, performing second convolution structure transformation on the fifth YOLOv5s network to obtain a YOLOv5s-Light target detection framework.
Further, the second convolution structure transform includes:
the convolution of the non-residual block bottleeck in the fifth YOLOv5s network is replaced with a depth separable convolution (Depthwise separable convolution).
Referring to fig. 3a and fig. 3b, fig. 3a is a schematic diagram of a standard convolution operation according to an embodiment of the present invention; FIG. 3b is a diagram illustrating a depth separable convolution operation according to an embodiment of the present invention.
Specifically, the convolution of the non-residual block bottleeck in the fifth YOLOv5s network is replaced by a depth separable convolution, that is, the normal convolution is split into a "convolution only, non-summation" depth convolution operation and a channel-adaptive 1 × 1 point convolution to reduce the parameters.
The network structure of the resulting improved YOLOv5s-Light target detection framework is shown in Table 1.
TABLE 1
Wherein, Conv represents the depth separable convolution operation (including the activation function and the BN layer), bneck represents the improved residual block, LRE represents the use of the Leaky-ReLu activation function, HS represents the use of the H-Swish activation function, and concat (x) represents the channel fusion of the previous layer output and the x-th layer output of the network of the YOLOv5s-Light target detection framework.
Further, the parameter quantity of the improved YOLOv5s network is reduced by 17.9 times compared with that before improvement, and the construction of a lightweight model is realized.
And 2, training the YOLOv5s-Light target detection frame by using a target image training set to obtain a YOLOv5s-Light target detection training frame.
In particular, the target image training set includes a first head target image (head), a first helmet-worn head image (helmet), and a first rider image (motorperson).
And 3, inputting the image set to be detected into a YOLOv5s-Light target detection training frame to obtain a target detection set.
Specifically, the present embodiment detects only whether or not the rider of the motorcycle or the electric vehicle wears the helmet, and does not detect other vehicles or pedestrians. The method comprises the steps of shooting scene images of the traffic intersection by using high-definition camera equipment erected at the traffic intersection, and inputting each frame of shot images serving as images to be detected into a YOLOv5s-Light target detection training frame to obtain a target detection set. The target detection set comprises a head target image, a head image of a helmet and a rider image.
The head target image represents that a head target can be detected in the image; the helmet-worn head image representation image shows that a helmet-worn head target can be detected; the rider image indicates that a rider target wearing a helmet and a rider target not wearing a helmet can be detected in the image, wherein the rider's ride-on tools are only a motorcycle and an electric vehicle.
And 4, detecting the target detection set by using a riding detection algorithm to obtain a target detection result.
Referring to fig. 4 and 5, fig. 4 is a diagram illustrating the effect of a riding detection algorithm according to an embodiment of the present invention; fig. 5 is a diagram illustrating a partial helmet detection effect according to an embodiment of the present invention.
Specifically, the head target image detected in step 3 is automatically filtered for pedestrians by a riding detection algorithm, which aims to filter out all pedestrian targets whose Intersection ratio (IoU) with the category target frame of the rider image is smaller than the adaptive threshold.
Further, step 4 comprises:
and 4.1, initializing a rider image set with a helmet and a rider image set without the helmet.
Specifically, two empty sets are initialized, one set H of images of a rider wearing a helmet and one set U of images of a rider without a helmet.
And 4.2, obtaining a second head target image and/or a second helmet-free rider image by utilizing a YOLOv5s-Light target detection framework detection test set.
Specifically, the test set is input into a YOLOv5s-Light target detection framework to be detected to obtain a second head target image and/or a second helmet-free rider image, and the second head target image and/or the second helmet-free rider image are both image sets. The images of the test set were used to obtain the optimal IoU threshold using the YOLOv5s-Light object detection framework.
Step 4.3, the optimal IoU threshold is obtained using the second head target image and the second helmet-less rider image.
Further, step 4.3 comprises:
step 4.3.1, initializing a threshold null set and IoU thresholds, wherein the initialization value of IoU threshold is thres;
specifically, an empty set T is initialized as a threshold empty set, and a threshold is initialized IoU as a variable thres, with a preferred value of thres being 1.
Step 4.3.2, obtaining a second rider image by utilizing a YOLOv5s-Light target detection framework detection test set;
specifically, the test set is input into a YOLOv5s-Light target detection framework for detection, and a plurality of second rider images are obtained.
Step 4.3.3, deriving a second IoU threshold set from the set of second head target images and the set of second rider images;
specifically, the IoU threshold value is calculated by the cross-over ratio formula, where a is the set of second head target images, and B is the set of second rider images, the second IoU threshold value set total calculation formula is:
step 4.3.4, determining whether the IoU threshold value in the second IoU threshold value set is greater than thres, if the IoU threshold value in the second IoU threshold value set is greater than or equal to thres, adding the second head target image into a threshold value empty set, if the IoU threshold value in the second IoU threshold value set is less than thres, ignoring the second head target image until the comparison of all IoU threshold values in the second IoU threshold value set with thres is completed;
4.3.5, calculating the difference value between the number of second head target images in the threshold empty set and the number of second helmet-unworn rider images, and adding the difference value into a target difference value set;
step 4.3.6, after the difference value is added into the target difference value set, reducing thres by 0.01, and emptying a threshold empty set;
and 4.3.7, repeating the steps 4.3.4-4.3.6 until the value of thres is 0, and the value of thres corresponding to the minimum difference in the target difference set is the optimal IoU threshold.
And 4.4, detecting the target detection set by using a YOLOv5s-Light target detection framework to obtain a third head target image, a third helmet-wearing head image and a third rider image.
Specifically, inputting the target detection set into YOLOv5s-Light target detection framework detects a third head target image, a third helmet-worn head image, and a third rider image, which are all image sets.
And 4.5, placing the head image of the third helmet wearing into the rider image set wearing the helmet.
Specifically, the acquired head image of the third helmet-worn subject is put into a helmet-worn rider image set to filter the detected image.
Step 4.6, a third helmet-free rider image is obtained using the third head target image, the third rider image and the optimal IoU threshold value.
Specifically, IoU thresholds for the third head target image and the third rider image are calculated, and if the IoU threshold is greater than the optimal IoU threshold, a third helmet-free rider image is obtained.
And 4.7, placing the third helmet-free rider image into a helmet-free rider image set, wherein the helmet-free rider image set and the helmet-free rider image set are combined into a target detection result.
Specifically, the acquired head image of the third helmet-worn person is placed in a helmet-worn rider image set, and the third helmet-unworn rider image is placed in a helmet-unworn rider image set, so that detection and classification of the target image are completed.
In summary, in the embodiment, the YOLOv5s-Light target detection training frame is used to perform target detection on the image set to be detected to obtain the target frame of the head target image, the head image worn on the helmet and the image of the rider, so that the number of networks is greatly reduced, the number of network parameters after improvement is reduced by 17.9 times compared with the number of network parameters before improvement, and the lightweight model is constructed. Then, the riding detection algorithm is utilized to filter the head target image, the head image wearing the helmet and the target frame of the rider image, so that the rider target of the motorcycle and the electric vehicle can be effectively detected, and filtering of other targets such as pedestrians is equivalent.
Example two
Referring to fig. 6, fig. 6 is a schematic structural diagram of a helmet detection device based on the improved YOLOv5s according to an embodiment of the present invention.
The embodiment discloses a helmet detection device based on modified YOLOv5s, including:
the model building module 1 is used for building a YOLOv5s-Light target detection framework;
the model training module 2 is used for training the YOLOv5s-Light target detection frame by using a target image training set to obtain a YOLOv5s-Light target detection training frame;
the information processing module 3 is used for inputting the image set to be detected into a YOLOv5s-Light target detection training frame to obtain a target detection set;
and the riding detection module 4 is used for detecting the target detection set by utilizing a riding detection algorithm to obtain a target detection result.
In an embodiment of the present invention, the model building module 1 is configured to train a YOLOv5s-Light target detection framework using a target image training set, where the image training set to be detected is used to obtain a target detection set.
In an embodiment of the invention, the model training module 2 constructs the YOLOv5s-Light target detection framework through methods such as a lightweight network idea, activation function replacement, a channel attention mechanism, and network channel number compression, respectively.
In an embodiment of the present invention, the information processing module 3 is configured to input the set of images to be detected into a YOLOv5s-Light target detection training framework to obtain a target detection set, where the target detection set includes a head target image, a head image wearing a helmet, and a rider image.
In an embodiment of the present invention, the riding detection module 4 is configured to detect the target detection set by using a riding detection algorithm to obtain a target detection result, where the riding detection algorithm is to filter out all pedestrian targets whose Intersection ratio (IoU) with a category target frame of the image of the rider is smaller than an adaptive threshold.
EXAMPLE III
Referring to fig. 7, fig. 7 is a schematic structural diagram of a helmet detection electronic device based on the improved YOLOv5s according to an embodiment of the present invention.
The embodiment discloses a helmet detection electronic device based on improved YOLOv5s, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of the embodiments when executing the computer program.
The helmet detection electronic device based on the improved YOLOv5s provided by the embodiment of the present invention can implement the above method embodiments, and the implementation principle and technical effect are similar, and are not described herein again.
Example four
Yet another embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
constructing a YOLOv5s-Light target detection framework;
training the YOLOv5s-Light target detection frame by using a target image training set to obtain a YOLOv5s-Light target detection training frame;
inputting an image set to be detected into a YOLOv5s-Light target detection training frame to obtain a target detection set;
and detecting the target detection set by using a riding detection algorithm to obtain a target detection result.
The computer-readable storage medium provided by the embodiment of the present invention may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.