CN116912796A

CN116912796A - Novel dynamic cascade YOLOv 8-based automatic driving target identification method and device

Info

Publication number: CN116912796A
Application number: CN202310899627.4A
Authority: CN
Inventors: 洪远; 姜明新; 杜强; 黄俊闻; 项靖; 王杰
Original assignee: Huaiyin Institute of Technology
Current assignee: Huaiyin Institute of Technology
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-10-20

Abstract

The invention discloses a method and device for automatic driving target recognition based on new dynamic cascade YOLOv8. The original images of traffic vehicles obtained in advance are pre-processed and divided into training sets and test sets; constructing a new dynamic cascade YOLOv8-based automatic driving target recognition method and device. Self-driving target recognition network; the self-driving target recognition network replaces the entire Backbone backbone network in the YOLOv8 network with a new dynamic cascade backbone network; replaces the detection head in the last part of the YOLOv8 network with the new cross-scale shared convolution weight ShareSepHead Detection head; use improved PolyLoss as the loss function of the automatic driving target recognition network; use the training set to train the automatic driving target recognition network type; input the test set into the trained automatic driving target recognition network, and conduct the test on the automatic driving target recognition network Evaluate. The invention can improve the accuracy and speed of target recognition in automatic driving and provide guarantee for the safety of automatic driving.

Description

An automatic driving target recognition method based on new dynamic cascade YOLOv8 and device

技术领域Technical field

本发明属于深度学习在计算机视觉领域的应用，具体涉及一种基于新型动态级联YOLOv8的自动驾驶目标识别方法及装置。The invention belongs to the application of deep learning in the field of computer vision, and specifically relates to an automatic driving target recognition method and device based on a new dynamic cascade YOLOv8.

背景技术Background technique

作为计算机视觉核心问题之一的目标检测，旨在找出图像中特定目标的类别和位置，现已广泛应用于各个领域，如自动驾驶、遥感图像、视频监控以及医疗检测等。As one of the core issues of computer vision, target detection aims to find the category and location of specific targets in images. It has been widely used in various fields, such as autonomous driving, remote sensing images, video surveillance, and medical detection.

YOLO发展自从2016年起，不断版本更新，至今已到v8。2016年，以YOLOv1为代表的单阶段(One-Stage)目标检测方法出现。纵观单阶段目标检测方法的发展历程可以发现，从首个单阶段目标检测方法YOLOv1提出至2023年的YOLOv8，YOLO系列的目标检测方法伴随着单阶段目标检测的发展而发展，已经成为One-Stage方法的典型代表。The development of YOLO has been continuously updated since 2016, and it has reached v8. In 2016, the single-stage (One-Stage) target detection method represented by YOLOv1 appeared. Looking at the development history of single-stage target detection methods, we can find that from the first single-stage target detection method YOLOv1 to YOLOv8 in 2023, the YOLO series of target detection methods have developed with the development of single-stage target detection and have become One- A typical representative of the Stage method.

虽然YOLOv8在处理简单图象时可以很快地进行目标检测，但当面对复杂场景，例如现实中遇到交通拥堵时，存在大量的车辆和行人时，需要更多的时间来进行检测。自动驾驶地实时性对于决策至关重要，处理速度的提升依旧有待提高。还有是准度地提升，自动驾驶需要高度准确的目标检测结果，以确保对于各种交通状况做出正确的反应。虽然YOLOv8可能在一些场景下有较好的表现，但在一些交通复杂的情况下，检测精度还有待提高。现有技术YOLOv8的backbone主干网在处理简单图像时很快，但遇到目标很多的复杂图像时需要更多的时间；现有YOLOv8的检测头部模型包含较多的参数，导致计算复杂度较高。在自动驾驶系统中，计算资源有限的，因此需要更高效的模型设计，以保证在嵌入式或资源受限的环境下实现目标检测。Although YOLOv8 can detect objects quickly when processing simple images, when faced with complex scenes, such as traffic jams in reality, when there are a large number of vehicles and pedestrians, more time is needed for detection. The real-time nature of autonomous driving is crucial for decision-making, and the processing speed still needs to be improved. There is also an improvement in accuracy. Autonomous driving requires highly accurate target detection results to ensure correct responses to various traffic conditions. Although YOLOv8 may perform better in some scenarios, the detection accuracy still needs to be improved in some complex traffic situations. The backbone backbone network of the existing technology YOLOv8 is very fast when processing simple images, but it takes more time when encountering complex images with many targets; the detection head model of the existing YOLOv8 contains more parameters, resulting in higher computational complexity. high. In autonomous driving systems, computing resources are limited, so more efficient model design is needed to ensure target detection in embedded or resource-limited environments.

发明内容Contents of the invention

发明目的：本发明提出一种基于新型动态级联YOLOv8的自动驾驶目标识别方法及装置，能在自动驾驶中进行准确的目标检测。Purpose of the invention: The present invention proposes an automatic driving target recognition method and device based on a new dynamic cascade YOLOv8, which can perform accurate target detection in automatic driving.

技术方案：本发明提出一种基于新型动态级联YOLOv8的自动驾驶目标识别方法，具体包括以下步骤：Technical solution: The present invention proposes an automatic driving target recognition method based on a new dynamic cascade YOLOv8, which specifically includes the following steps:

(1)对预先获取的交通车辆原始图像进行预处理，并划分为训练集和测试集；(1) Preprocess the original images of traffic vehicles obtained in advance and divide them into training sets and test sets;

(2)构建基于新型动态级联YOLOv8的自动驾驶目标识别网络；所述自动驾驶目标识别网络将YOLOv8网络中Backbone主干网整体替换为新型动态级联主干网络；将YOLOv8网络最后一部分中的检测头替换为新跨尺度共享卷积权重的ShareSepHead检测头；(2) Construct an automatic driving target recognition network based on the new dynamic cascade YOLOv8; the automatic driving target recognition network replaces the entire Backbone backbone network in the YOLOv8 network with a new dynamic cascade backbone network; replaces the detection head in the last part of the YOLOv8 network Replaced with the new ShareSepHead detection head that shares convolution weights across scales;

(3)采用改进的PolyLoss作为自动驾目标识别网络的损失函数；(3) Use improved PolyLoss as the loss function of the autonomous driving target recognition network;

(4)利用训练集对自动驾驶目标识别网络型进行训练；(4) Use the training set to train the automatic driving target recognition network type;

(5)将测试集输入训练好的自动驾驶目标识别网络，对自动驾驶目标识别网络进行评估。(5) Input the test set into the trained automatic driving target recognition network and evaluate the automatic driving target recognition network.

进一步地，步骤(2)所述新型动态级联主干网，拥有两个级联主干网络，并在两个主干网络之间插入动态路由器来自动为每个待检测图像选择最佳路线；待检测图像将经过第一个主干网络提取第一级多尺度特征，并将该多尺度特征送入动态路由器评判该图像难易程度；通过两个线性映射层将特征映射到难易评分；若判别为“简单”图像，则第一级多尺度特征将送入YOLOv8的head部分；若判别为“困难”图像，则待检测图像及其第一级多尺度特征将被送入第二个主干网络，提取获得第二级多尺度特征，送入YOLOv8的head部分。Further, the new dynamic cascade backbone network described in step (2) has two cascade backbone networks, and a dynamic router is inserted between the two backbone networks to automatically select the best route for each image to be detected; The image will extract the first-level multi-scale features through the first backbone network, and send the multi-scale features to the dynamic router to judge the difficulty of the image; the features will be mapped to the difficulty score through two linear mapping layers; if the judgment is If it is a "simple" image, the first-level multi-scale features will be sent to the head part of YOLOv8; if it is judged to be a "difficult" image, the image to be detected and its first-level multi-scale features will be sent to the second backbone network. The second-level multi-scale features are extracted and sent to the head part of YOLOv8.

进一步地，步骤(2)所述新型动态级联主干网络实现过程如下：Further, the implementation process of the new dynamic cascade backbone network described in step (2) is as follows:

对于输入图像x，首先提取其多尺度特征F1，第一主干B1为：For the input image x, first extract its multi-scale feature F1, and the first backbone B1 is:

式中，L为阶段数，即多尺度特征数；然后，路由器R将使用这些多尺度特征F1来预测该图像的难度分数φ∈(0,1)为：In the formula, L is the number of stages, that is, the number of multi-scale features; then, the router R will use these multi-scale features F1 to predict the difficulty score φ∈(0,1) of the image as:

如果路由器将输入的图像分类为“简单”图像，那么紧随其后的颈部头部D1将输出检测结果y为：If the router classifies the input image as a "simple" image, then the following neck head D1 will output the detection result y as:

如果路由器将输入图像分类为“复杂”图像，则多尺度特征将需要第二主干进一步增强，通过一个复合连接模块G将多尺度特征F1嵌入到H中：If the router classifies the input image as a "complex" image, the multi-scale features will need to be further enhanced by a second backbone, embedding the multi-scale features F1 into H through a composite connection module G:

其中，G是实现CBNet的DHLC；将输入图像x送入第二主干，通过对嵌入的H在每一阶段对应的元素依次求和来增强第二主干的特征，记为：Among them, G is the DHLC that implements CBNet; the input image x is sent to the second backbone, and the features of the second backbone are enhanced by sequentially summing the corresponding elements of the embedded H at each stage, recorded as:

检测结果，第二次的头颈部D2解码为：Test results, the second head and neck D2 decoding is:

y＝D₂(F₁)。y=D ₂ (F ₁ ).

进一步地，步骤(2)所述ShareSepHead检测头为在不同层之间共享卷积权重，独立计算BN的统计量；所述ShareSepHead包括依次连接的第一卷积层、第一深度可分离卷积层、第二深度可分离卷积层、第二卷积层和BN归一化层。Further, the ShareSepHead detection head in step (2) shares convolution weights between different layers and independently calculates the statistics of BN; the ShareSepHead includes a first convolution layer and a first depth-separable convolution layer connected in sequence. layer, the second depth separable convolutional layer, the second convolutional layer and the BN normalization layer.

进一步地，所述第一卷积层是一个3x3的卷积层，将输入特征图的通道数从x变为c2*2；第一深度可分离卷积层首先对每个输入通道分别应用卷积操作，然后对通道之间的特征进行组合；第二深度可分离卷积层将输入特征图的通道数从c2减少到c2；第二卷积层是1*1卷积层，将输入特征图的通道数从c2变为4*self.reg_max；每个检测头经过BN归一化改善梯度传播与训练速度，BN归一化通过对每个小批次的数据进行归一化处理。Further, the first convolution layer is a 3x3 convolution layer, changing the number of channels of the input feature map from x to c2*2; the first depth-separable convolution layer first applies convolution to each input channel respectively. product operation, and then combine the features between channels; the second depth-separable convolution layer reduces the number of channels of the input feature map from c2 to c2; the second convolution layer is a 1*1 convolution layer, which combines the input features The number of channels of the graph changes from c2 to 4*self.reg_max; each detection head is normalized by BN to improve the gradient propagation and training speed. BN normalization normalizes the data of each mini-batch.

进一步地，步骤(3)所述改进的PolyLoss包括组合损失函数和加权二元交叉熵损失；PolyLoss将二元交叉熵损失和Focal Loss组合在一起，通过调节损失函数的权重和形状来提高困难样本和正负样本之间的平衡处理能力；使用加权二元交叉熵损失计算预测结果与真实标签之间的二元交叉熵损失，衡量预测与真实标签的匹配程度；引入alpha_factor对损失进行加权，使正样本和负样本的损失在计算中得到不同程度的调整；入了多项式调整因子，用于增加样本概率预测的不确定性。Further, the improved PolyLoss described in step (3) includes a combined loss function and a weighted binary cross-entropy loss; PolyLoss combines the binary cross-entropy loss and Focal Loss to improve difficult samples by adjusting the weight and shape of the loss function. and balanced processing capabilities between positive and negative samples; use weighted binary cross-entropy loss to calculate the binary cross-entropy loss between the prediction result and the real label to measure the matching degree between the prediction and the real label; introduce alpha_factor to weight the loss so that The losses of positive samples and negative samples are adjusted to varying degrees in the calculation; a polynomial adjustment factor is added to increase the uncertainty of sample probability prediction.

进一步地，所述“简单”图像为单一目标图像；所述“困难”图像为两种及两种以上的目标图像。Further, the "simple" image is a single target image; the "difficult" image is two or more target images.

基于相同的发明构思，本发明提出一种装置设备，包括存储器和处理器，其中：Based on the same inventive concept, the present invention proposes a device including a memory and a processor, wherein:

存储器，用于存储能够在处理器上运行的计算机程序；Memory for storing computer programs capable of running on the processor;

处理器，用于在运行所述计算机程序时，执行如上所述的基于新型动态级联YOLOv8的自动驾驶目标识别方法步骤。A processor, configured to execute the steps of the automatic driving target recognition method based on the new dynamic cascade YOLOv8 as described above when running the computer program.

基于相同的发明构思，本发明提出一种存储介质，所述存储介质上存储有计算机程序，所述计算机程序被至少一个处理器执行时实现如上所述的基于新型动态级联YOLOv8的自动驾驶目标识别方法步骤。Based on the same inventive concept, the present invention proposes a storage medium with a computer program stored on the storage medium. When the computer program is executed by at least one processor, the above-mentioned automatic driving goal based on the new dynamic cascade YOLOv8 is achieved. Identify method steps.

有益效果：与现有技术相比，本发明的有益效果：本发明构建的基于新型动态级联YOLOv8的自动驾驶目标识别网络使得YOLOv8主干网络对不同难度的输入图像自适应的选择推理路线，提高提取特征效率；为提高YOLOv8检测精度，使用全新改进的PolyLoss损失函数，简化超参数搜索空间，调整了多项式系数；为升级YOLOv8检测头，更省参数量，更加高效，提高精度，使用新颖的共享检测头，以增强模型能力获得更高性能；最终使自动驾驶的目标检测更精确。Beneficial effects: Compared with the existing technology, the beneficial effects of the present invention are: the automatic driving target recognition network based on the new dynamic cascade YOLOv8 constructed by the present invention enables the YOLOv8 backbone network to adaptively select inference routes for input images of different difficulties, improving Feature extraction efficiency; in order to improve YOLOv8 detection accuracy, a new and improved PolyLoss loss function is used, the hyperparameter search space is simplified, and polynomial coefficients are adjusted; in order to upgrade the YOLOv8 detection head, it saves parameters, is more efficient, and improves accuracy, and uses a novel sharing Detection head to enhance model capabilities to achieve higher performance; ultimately making target detection for autonomous driving more accurate.

附图说明Description of the drawings

图1为动态级联主干网络结构示意图；Figure 1 is a schematic diagram of the dynamic cascade backbone network structure;

图2为共享卷积权重和分离的批归一化层的检测头结构示意图。Figure 2 is a schematic diagram of the detection head structure of shared convolution weights and separated batch normalization layers.

具体实施方式Detailed ways

下面结合附图对本发明作进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings.

本发明提出一种基于新型动态级联YOLOv8的自动驾驶目标识别方法，具体包括以下步骤：The present invention proposes an automatic driving target recognition method based on a new dynamic cascade YOLOv8, which specifically includes the following steps:

步骤S1：本发明选用KITTI数据集，其中已划分好数据集，含有测试集、训练集。在自动驾驶数据集上进行性能评估。Step S1: This invention selects the KITTI data set, which has been divided into data sets, including test sets and training sets. Performance evaluation on autonomous driving datasets.

步骤S2：基于YOLOv8网络基础，将Backbone主干网整体由新型动态级联(DynamicCascade)主干网替换。Step S2: Based on the YOLOv8 network foundation, replace the entire Backbone backbone network with the new Dynamic Cascade backbone network.

如图1所示，新型动态级联(Dynamic Cascade)主干网，其拥有两个级联主干网络，并在两个主干网络之间插入动态路由器来自动为每个待检测图像选择最佳路线。As shown in Figure 1, the new Dynamic Cascade backbone network has two cascade backbone networks, and a dynamic router is inserted between the two backbone networks to automatically select the best route for each image to be detected.

自适应路由器：为更好地评判图像地难易程度，并基于输入的多尺度特征信息特征信息来作出难易评判。假设第一个主干网已输出多尺度特征，为降低动态路由器的计算复杂度，先对其进行信息压缩，获得压缩后的特征，其中为全局池化操作、为通道维度拼接操作。之后，通过两个线性映射层将特征映射到难易评分。Adaptive router: In order to better judge the difficulty of the image, and make a difficulty judgment based on the input multi-scale feature information feature information. Assume that the first backbone network has output multi-scale features. In order to reduce the computational complexity of the dynamic router, information is first compressed to obtain the compressed features, which are global pooling operations and channel dimension splicing operations. Afterwards, the features are mapped to difficulty scores through two linear mapping layers.

两个级联网络：对于输入图像x，首先提取其多尺度特征F1，第一主干B1为：Two cascade networks: For the input image x, first extract its multi-scale feature F1, and the first backbone B1 is:

式中，L为阶段数，即多尺度特征数。然后，路由器R将使用这些多尺度特征F1来预测该图像的难度分数φ∈(0,1)为：In the formula, L is the number of stages, that is, the number of multi-scale features. Router R will then use these multi-scale features F1 to predict the difficulty score φ∈(0,1) for this image as:

“简单”的图像在第一个主干退出，而“复杂”的图像需要进一步处理。“简单”图像为单一目标行人或单一出现车辆皆为简单图像；“复杂”图像为两个或两个以上多种类图像。具体来说，如果路由器将输入的图像分类为“简单”图像，那么紧随其后的颈部头部D1将输出检测结果y为："Simple" images exit at the first trunk, while "complex" images require further processing. "Simple" images are simple images of a single target pedestrian or a single vehicle; "complex" images are two or more images of multiple types. Specifically, if the router classifies the input image as a "simple" image, then the following neck head D1 will output the detection result y as:

相反，如果路由器将输入图像分类为“复杂”图像，则多尺度特征将需要第二主干进一步增强，而不是立即被颈部头部D1解码。特别地，通过一个复合连接模块G将多尺度特征F1嵌入到H中：On the contrary, if the router classifies the input image as a "complex" image, the multi-scale features will need to be further enhanced by the second backbone instead of being immediately decoded by the neck head D1. In particular, the multi-scale feature F1 is embedded into H through a composite connection module G:

其中G是实现CBNet的DHLC。然后将输入图像x送入第二主干，通过对嵌入的H在每一阶段对应的元素依次求和来增强第二主干的特征，记为：Where G is the DHLC that implements CBNet. Then the input image x is sent to the second backbone, and the features of the second backbone are enhanced by sequentially summing the corresponding elements of the embedded H at each stage, recorded as:

y＝D₂(F₁)。y=D ₂ (F ₁ ).

通过以上过程，“简单”的图像将只处理一个主干，而“复杂”的图像将处理两个主干。显然，使用这样的体系结构，可以在计算(即速度)和准确性之间进行权衡。Through the above process, a "simple" image will process only one backbone, while a "complex" image will process two backbones. Obviously, with such an architecture, there is a trade-off between computation (i.e. speed) and accuracy.

步骤S3：基于YOLOv8网络，将默认的CIoU损失函数修改为新的PolyLoss分类损失函数，提升检测精度。Step S3: Based on the YOLOv8 network, modify the default CIoU loss function to the new PolyLoss classification loss function to improve detection accuracy.

该损失函数结合了二元交叉熵损失(BCEWithLogitsLoss)和Focal Loss(FL)的思想，用于目标检测任务中的目标分类。包含以下部分：This loss function combines the ideas of binary cross entropy loss (BCEWithLogitsLoss) and focal loss (FL) and is used for target classification in target detection tasks. Contains the following sections:

组合损失函数：PolyLoss将二元交叉熵损失和Focal Loss组合在一起，通过调节损失函数的权重和形状来提高模型对于困难样本和正负样本之间的平衡处理能力。Combined loss function: PolyLoss combines binary cross-entropy loss and Focal Loss to improve the model's ability to handle difficult samples and the balance between positive and negative samples by adjusting the weight and shape of the loss function.

加权二元交叉熵损失：PolyLoss首先使用nn.BCEWithLogitsLoss计算预测结果与真实标签之间的二元交叉熵损失。这一部分损失用于衡量预测与真实标签的匹配程度。Weighted binary cross-entropy loss: PolyLoss first uses nn.BCEWithLogitsLoss to calculate the binary cross-entropy loss between the prediction results and the true label. This part of the loss is used to measure how well the prediction matches the true label.

Focal Loss调整：为了处理困难样本，PolyLoss引入了Focal Loss中的思想。通过对预测概率值进行调整，使预测概率较低的样本在损失计算中起到更大的作用，从而提高对困难样本的关注度。Focal Loss adjustment: In order to deal with difficult samples, PolyLoss introduces the ideas in Focal Loss. By adjusting the predicted probability value, samples with lower predicted probability play a greater role in the loss calculation, thereby increasing attention to difficult samples.

损失权重调整：通过引入alpha_factor，PolyLoss对损失进行加权。这个因子根据真实标签的取值确定，使得正样本和负样本的损失在计算中得到不同程度的调整。Loss weight adjustment: PolyLoss weights the loss by introducing alpha_factor. This factor is determined based on the value of the real label, so that the loss of positive samples and negative samples is adjusted to varying degrees in the calculation.

多项式调整：在最后一步，PolyLoss引入了多项式调整因子，用于增加样本概率预测的不确定性。通过调整多项式的形状和系数，可以使损失在样本概率较低或较高时增加，从而进一步增强对困难样本的关注度。Polynomial adjustment: In the last step, PolyLoss introduces a polynomial adjustment factor to increase the uncertainty of the sample probability prediction. By adjusting the shape and coefficients of the polynomial, you can make the loss increase when the sample probability is lower or higher, further enhancing the focus on difficult samples.

PolyLoss损失函数在目标检测任务中，结合了二元交叉熵损失和Focal Loss的思想，并通过多项式调整和权重调整，提供了一种能够处理困难样本和平衡正负样本的损失计算方式。这可以帮助模型更好地学习和处理具有挑战性的目标分类任务。In the target detection task, the PolyLoss loss function combines the ideas of binary cross-entropy loss and focal loss, and provides a loss calculation method that can handle difficult samples and balance positive and negative samples through polynomial adjustment and weight adjustment. This can help the model better learn and handle challenging object classification tasks.

步骤S4：基于YOLOv8网络，将YOLOv8网络最后一部分中的检测头修改为新颖的跨尺度共享卷积权重的ShareSepHead检测头，形成基于新型动态级联YOLOv8的自动驾驶目标识别网络。Step S4: Based on the YOLOv8 network, modify the detection head in the last part of the YOLOv8 network to a novel ShareSepHead detection head that shares convolution weights across scales to form an autonomous driving target recognition network based on a new dynamic cascaded YOLOv8.

YOLOv8原先检测头是网络的最后一层，负责产生目标检测的预测结果。它根据输入图像的尺寸和网络的设计，将特征图映射到不同尺度的网格。每个网格单元负责检测和定位一个或多个目标。在每个尺度上，检测头输出了一组预测框，每个预测框由多个属性组成，通常包括边界框的坐标(中心坐标和宽高)、目标类别的概率以及目标存在的置信度得分。这些预测框经过非极大值抑制(NMS)进行后处理，用于过滤重叠的框并保留最准确的检测结果。检测头通常采用卷积层和全连接层的组合，通过不同的卷积核大小和步幅来适应不同尺度的目标检测。检测头的输出通常采用适当的激活函数和标准化操作，以确保预测结果在合适的范围内，并提供较好的可解释性和鲁棒性。The original detection head of YOLOv8 is the last layer of the network, responsible for generating prediction results of target detection. It maps feature maps to grids at different scales depending on the size of the input image and the design of the network. Each grid cell is responsible for detecting and locating one or more targets. At each scale, the detection head outputs a set of prediction boxes. Each prediction box is composed of multiple attributes, usually including the coordinates of the bounding box (center coordinates and width and height), the probability of the target category, and the confidence score of the target's existence. . These predicted boxes are post-processed by non-maximum suppression (NMS), which is used to filter overlapping boxes and retain the most accurate detection results. The detection head usually uses a combination of convolutional layers and fully connected layers to adapt to target detection at different scales through different convolution kernel sizes and strides. The output of the detection head usually adopts appropriate activation functions and normalization operations to ensure that the prediction results are within a suitable range and provide good interpretability and robustness.

如图2所示，新颖的跨尺度共享卷积权重的ShareSepHead检测头：在不同层之间共享卷积权重，但是独立计算BN(BatchNorm)的统计量。这是一种共享检测头，实时目标检测器通常针对不同的特征尺度使用单独的检测头，以增强模型能力以获得更高的性能，而不是在多个尺度上共享一个检测头。这次选择跨尺度共享检测头参数，但采用不同的批归一化层BN层，减少检测头参数，同时保持精度。BN也比其他归一化层更加有效，因为在推理中它直接使用了训练中计算的统计量。As shown in Figure 2, the novel ShareSepHead detection head that shares convolution weights across scales: shares the convolution weights between different layers, but independently calculates the statistics of BN (BatchNorm). This is a shared detection head. Real-time object detectors usually use separate detection heads for different feature scales to enhance model capabilities for higher performance, rather than sharing one detection head at multiple scales. This time we choose to share detection head parameters across scales, but use different batch normalization layers BN layers to reduce detection head parameters while maintaining accuracy. BN is also more efficient than other normalization layers because it directly uses the statistics calculated during training during inference.

图像经过YOLOv8head头部后，开始进入ShareSepHead检测头部分预测结果。每个head包括依次连接的第一卷积层、第一深度可分离卷积层、第二深度可分离卷积层、第二卷积层和BN归一化层。After the image passes through the YOLOv8head head, it begins to enter the ShareSepHead detection head part to predict the results. Each head includes a first convolutional layer, a first depthwise separable convolutional layer, a second depthwise separable convolutional layer, a second convolutional layer and a BN normalization layer connected in sequence.

第一个Conv3*3卷积层，是一个3x3的卷积层，将输入特征图的通道数从x变为c2*2。它帮助提取特征并增加通道数，以便更好地捕获目标的信息。The first Conv3*3 convolutional layer is a 3x3 convolutional layer that changes the number of channels of the input feature map from x to c2*2. It helps extract features and increase the number of channels to better capture the target’s information.

第二个DWConv3*3深度可分离卷积层，首先对每个输入通道分别应用卷积操作，然后对通道之间的特征进行组合。这有助于减少计算量并提高模型的效率。The second DWConv3*3 depthwise separable convolution layer first applies convolution operations to each input channel separately, and then combines the features between channels. This helps reduce the amount of calculations and improves the efficiency of the model.

第三个部分DWConv3*3深度可分离卷积层，将输入特征图的通道数从c2减少到c2。与上一步类似，这一层继续减少通道数并提取更高级别的特征。The third part, DWConv3*3 depthwise separable convolution layer, reduces the number of channels of the input feature map from c2 to c2. Similar to the previous step, this layer continues to reduce the number of channels and extract higher-level features.

第四个部分Conv1*1卷积层，将输入特征图的通道数从c2变为4*self.reg_max。负责预测边界框的坐标信息。The fourth part, Conv1*1 convolutional layer, changes the number of channels of the input feature map from c2 to 4*self.reg_max. Responsible for predicting the coordinate information of the bounding box.

其中每个head间，共享检测头参数信息。The detection head parameter information is shared between each head.

每个检测头经过BN归一化改善梯度传播与训练速度，BN归一化通过对每个小批次的数据进行归一化处理，可以使得网络中的激活值保持在一个相对较小的范围内，从而缓解梯度消失和梯度爆炸问题，促进梯度的传播，加速网络的训练过程。Each detection head undergoes BN normalization to improve gradient propagation and training speed. BN normalization can keep the activation value in the network in a relatively small range by normalizing the data of each mini-batch. within, thus alleviating the problems of gradient disappearance and gradient explosion, promoting the propagation of gradients, and accelerating the network training process.

步骤S5：利用划分好的数据集对步骤S4中构建的基于新型动态级联YOLOv8的自动驾驶目标识别网络，进行训练。对训练好的基于新型动态级联YOLOv8的自动驾驶目标识别网络的性能进行评估，最终实现自动驾驶中的目标识别。Step S5: Use the divided data set to train the automatic driving target recognition network based on the new dynamic cascade YOLOv8 constructed in step S4. Evaluate the performance of the trained automatic driving target recognition network based on the new dynamic cascade YOLOv8, and finally achieve target recognition in automatic driving.

基于相同的发明构思，本发明提出一种装置设备，包括存储器和处理器，其中：存储器，用于存储能够在处理器上运行的计算机程序；处理器，用于在运行所述计算机程序时，执行如上所述的基于新型动态级联YOLOv8的自动驾驶目标识别方法步骤。Based on the same inventive concept, the present invention proposes a device, including a memory and a processor, wherein: the memory is used to store a computer program that can be run on the processor; the processor is used to when running the computer program, Execute the steps of the automatic driving target recognition method based on the new dynamic cascade YOLOv8 as mentioned above.

基于相同的发明构思，本发明还提出一种存储介质，所述存储介质上存储有计算机程序，所述计算机程序被至少一个处理器执行时实现如上所述的基于新型动态级联YOLOv8的自动驾驶目标识别方法步骤。Based on the same inventive concept, the present invention also proposes a storage medium with a computer program stored on the storage medium. When the computer program is executed by at least one processor, the automatic driving based on the new dynamic cascade YOLOv8 is implemented as described above. Target identification method steps.

至此，已经结合附图所示的具体实验过程描述了本发明的技术方案，但是本发明的保护范围不局限于这些具体实施方式。在不偏离本发明原理的前提下，本领域技术人员可以对相关技术特征做出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。So far, the technical solutions of the present invention have been described in conjunction with the specific experimental processes shown in the drawings, but the protection scope of the present invention is not limited to these specific implementations. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to relevant technical features, and technical solutions after these modifications or substitutions will fall within the protection scope of the present invention.

Claims

1. An automatic driving target recognition method based on the new dynamic cascade YOLOv8, which is characterized by including the following steps:

(1) Preprocess the original images of traffic vehicles obtained in advance and divide them into training sets and test sets;

(2) Construct an automatic driving target recognition network based on the new dynamic cascade YOLOv8; the automatic driving target recognition network replaces the entire Backbone backbone network in the YOLOv8 network with a new dynamic cascade backbone network; replaces the detection head in the last part of the YOLOv8 network Replaced with the new ShareSepHead detection head that shares convolution weights across scales;

(3) Use improved PolyLoss as the loss function of the autonomous driving target recognition network;

(4) Use the training set to train the automatic driving target recognition network type;

(5) Input the test set into the trained automatic driving target recognition network and evaluate the automatic driving target recognition network.

2. An automatic driving target recognition method based on new dynamic cascade YOLOv8 according to claim 1, characterized in that the new dynamic cascade backbone network in step (2) has two cascade backbone networks, and A dynamic router is inserted between the two backbone networks to automatically select the best route for each image to be detected; the image to be detected will extract the first-level multi-scale features through the first backbone network, and send the multi-scale features into the dynamic The router judges the difficulty of the image; maps the features to difficulty scores through two linear mapping layers; if it is judged to be a "simple" image, the first-level multi-scale features will be sent to the head part of YOLOv8; if it is judged to be "difficult" "image, the image to be detected and its first-level multi-scale features will be sent to the second backbone network to extract the second-level multi-scale features and send them to the head part of YOLOv8.

3. An automatic driving target recognition method based on new dynamic cascade YOLOv8 according to claim 1, characterized in that the implementation process of the new dynamic cascade backbone network in step (2) is as follows:

For the input image x, first extract its multi-scale feature F1, and the first backbone B1 is:

In the formula, L is the number of stages, that is, the number of multi-scale features; then, the router R will use these multi-scale features F1 to predict the difficulty score φ∈(0,1) of the image as:

If the router classifies the input image as a "simple" image, then the following neck head D1 will output the detection result y as:

If the router classifies the input image as a "complex" image, the multi-scale features will need to be further enhanced by a second backbone, embedding the multi-scale features F1 into H through a composite connection module G:

Among them, G is the DHLC that implements CBNet; the input image x is sent to the second backbone, and the features of the second backbone are enhanced by sequentially summing the corresponding elements of the embedded H at each stage, recorded as:

Test results, the second head and neck D2 decoding is:

y=D ₂ (F1).

4. A method of automatic driving target recognition based on new dynamic cascade YOLOv8 according to claim 1, characterized in that the ShareSepHead detection head in step (2) shares convolution weights between different layers and calculates them independently. Statistics of BN; the ShareSepHead includes a first convolution layer, a first depth-separable convolution layer, a second depth-separable convolution layer, a second convolution layer and a BN normalization layer connected in sequence.

5. A method of automatic driving target recognition based on new dynamic cascade YOLOv8 according to claim 4, characterized in that the first convolution layer is a 3x3 convolution layer, and the number of channels of the input feature map is From x to c2*2; the first depthwise separable convolution layer first applies a convolution operation to each input channel separately, and then combines the features between channels; the second depthwise separable convolution layer will input the feature map The number of channels is reduced from c2 to c2; the second convolution layer is a 1*1 convolution layer, changing the number of channels of the input feature map from c2 to 4*self.reg_max; each detection head is normalized by BN to improve the gradient Propagation and training speed, BN normalization is performed by normalizing the data of each mini-batch.

6. A kind of automatic driving target recognition method based on new dynamic cascade YOLOv8 according to claim 1, characterized in that the improved PolyLoss in step (3) includes a combined loss function and a weighted binary cross-entropy loss; PolyLoss Binary cross-entropy loss and focal loss are combined to improve the balance between difficult samples and positive and negative samples by adjusting the weight and shape of the loss function; use weighted binary cross-entropy loss to calculate the difference between the prediction result and the true label The binary cross-entropy loss between the two measures the matching degree between the prediction and the real label; alpha_factor is introduced to weight the loss, so that the loss of positive samples and negative samples is adjusted to varying degrees in the calculation; a polynomial adjustment factor is added to increase Uncertainty in sample probability predictions.

7. A method of automatic driving target recognition based on new dynamic cascade YOLOv8 according to claim 2, characterized in that the "simple" image is a single target image; the "difficult" image is two kinds and two More than one target image.

8. An apparatus, characterized by comprising a memory and a processor, wherein:

Memory for storing computer programs capable of running on the processor;

A processor, configured to execute the steps of the automatic driving target recognition method based on the new dynamic cascade YOLOv8 as described in any one of claims 1-7 when running the computer program.

9. A storage medium, characterized in that a computer program is stored on the storage medium, and when the computer program is executed by at least one processor, the new dynamic cascade-based method as described in any one of claims 1-7 is implemented. YOLOv8 automatic driving target recognition method steps.