CN116129207B

CN116129207B - A Method for Image Data Processing with Multi-Scale Channel Attention

Info

Publication number: CN116129207B
Application number: CN202310414590.1A
Authority: CN
Inventors: 刘刚; 王冰冰; 周杰; 王磊; 史魁杰; 曾辉; 张金烁; 胡莉
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-08-04
Anticipated expiration: 2043-04-18
Also published as: CN116129207A

Abstract

The invention discloses an image data processing method with multi-scale channel attention. By extracting the global features and local features in the input data, the convolutional neural network pays more attention to the overall information and local detail features of the input data, thereby alleviating Target aggregation and target occlusion problems in complex scenes.

Description

A Method for Image Data Processing with Multi-Scale Channel Attention

技术领域technical field

本发明涉及计算机视觉领域，尤其涉及一种多尺度通道注意力的图像数据处理方法。The invention relates to the field of computer vision, in particular to an image data processing method of multi-scale channel attention.

背景技术Background technique

通道注意力机制能够显著地提高模型的表现力和泛化能力，且具有较低的计算成本，容易被集成到现有的卷积神经网络结构中。而由于这些优点，通道注意力机制也已经被广泛应用于图像分类、目标检测、语义分割等深度学习应用领域。The channel attention mechanism can significantly improve the expressiveness and generalization ability of the model, and has a low computational cost, and is easy to be integrated into the existing convolutional neural network structure. Due to these advantages, the channel attention mechanism has been widely used in deep learning applications such as image classification, object detection, and semantic segmentation.

通道注意力机制的本质是对不同通道的特征进行加权平均，从而得到更加丰富、稳定、可靠的特征表达。The essence of the channel attention mechanism is to weight and average the features of different channels, so as to obtain more abundant, stable and reliable feature expressions.

现有的通道注意力有SE，ECA，CA等，这些通道注意力仅仅关注某一局部特征中的细节信息或者全局特征中的语义信息，而没有同时关注细节信息与语义信息，导致不够丰富的通道维度的特征表达。The existing channel attention includes SE, ECA, CA, etc. These channel attention only focus on the detailed information in a certain local feature or the semantic information in the global feature, but do not pay attention to the detailed information and semantic information at the same time, resulting in insufficient richness. Feature representation of the channel dimension.

发明内容Contents of the invention

本发明的目的是为了提供一种多尺度通道注意力的图像数据处理方法。The purpose of the present invention is to provide a multi-scale channel attention image data processing method.

本发明所要解决的问题是：The problem to be solved by the present invention is:

提出一种多尺度通道注意力的图像数据处理方法，提取输入数据中的全局特征和局部特征，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集与目标遮挡问题。A multi-scale channel attention image data processing method is proposed to extract the global features and local features in the input data, so that the convolutional neural network pays more attention to the overall information and local detail features of the input data, thereby alleviating the occurrence of complex scenes. The problem of target aggregation and target occlusion.

一种多尺度通道注意力的图像数据处理方法采用的技术方案如下：A technical scheme adopted by an image data processing method of multi-scale channel attention is as follows:

一种多尺度通道注意力的图像数据处理方法A Method for Image Data Processing with Multi-Scale Channel Attention

S21：对输入数据（原始图像或特征图）进行数字化处理，将提取到的特征转换为数字化，并通过张量矩阵存储，经过归一化处理使卷积神经网络收敛加快；S21: Digitize the input data (original image or feature map), convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization;

S22：使用全局通道注意机制与局部通道注意力机制相结合的方法，对输入数据进行特征提取和特征融合；S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism to perform feature extraction and feature fusion on the input data;

S22.1：在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数，全局通道注意力可以通过对特征图的全局平均池化和逐元素变换，自适应地调整不同通道的权重，使得模型能够关注更重要的特征，提高模型的分类性能和鲁棒性，其中全局平均的计算公式为：S22.1: In the global channel attention mechanism, global average pooling, one-dimensional convolutional layer with adaptive selection of convolution kernel size and Sigmoid activation function are used. Global channel attention can be achieved through global average pooling of feature maps and Element-by-element transformation, adaptively adjust the weights of different channels, so that the model can focus on more important features, improve the classification performance and robustness of the model, and the calculation formula of the global average is:

其中表示全局平均池化结果，/>为输入图像，其尺寸为W×H×C，W、H和C分别表示输入图像的宽、高和通道，i和j分别代表宽和高上的像素点位置；in Indicates the global average pooling result, /> is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively;

自适应选择的计算公式为：The calculation formula for adaptive selection is:

其中k表示一维卷积的卷积核大小，C表示通道数，表示k只能取奇数，/>和b用于改变C和k之间的比例，本发明中/>和b分别取2和1；Where k represents the convolution kernel size of one-dimensional convolution, C represents the number of channels, Indicates that k can only take odd numbers, /> and b are used to change the ratio between C and k, in the present invention /> and b take 2 and 1 respectively;

Sigmoid激活函数也称为S型生长曲线，计算公式为：The Sigmoid activation function is also known as the S-type growth curve, and the calculation formula is:

其中x为输入；where x is the input;

S22.2：在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP，用于提取局部特征，MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活，输入数据经二维卷积后仅改变其通道数，第一个卷积操作的输出通道数为输入通道数的十六分之一，第二个卷积操作的输出通道数与嵌入位置通道数一致，局部通道注意力则可以帮助模型更好地捕捉输入特征中的局部信息，ReLU函数通过将相应的活性值设为0，仅保留正元素并丢弃所有负元素；S22.2: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and an intermediate The ReLU function activation of the input data only changes the number of channels after two-dimensional convolution. The number of output channels of the first convolution operation is one-sixteenth of the number of input channels, and the number of output channels of the second convolution operation Consistent with the number of embedding position channels, local channel attention can help the model better capture the local information in the input features. The ReLU function keeps only positive elements and discards all negative elements by setting the corresponding activity value to 0;

S22.3：将全局注意力与局部注意力的输出进行融合操作，并使用Sigmoid函数激活数据得到最终的注意力权重，然后将激活后的数据与输入数据进行逐像素相乘。S22.3: Fuse the output of global attention and local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data with the input data pixel by pixel.

S22.4：通过Sigmoid函数进行压缩，它将已有数据根据其范围，将任意输入压缩到区间（0, 1）中的某个值，以保证归一化；S22.4: Compress through the Sigmoid function, which compresses any input to a value in the interval (0, 1) according to its range to ensure normalization;

S22.5：对输入数据与激活后的数据进行逐像素相乘操作，用来完成对输入数据的不同位置加权操作，从而更关注全局特征和局部特征。S22.5: Multiply the input data and the activated data pixel by pixel to complete the weighting operation of different positions of the input data, so as to pay more attention to global features and local features.

进一步的，上述输入数据通过上述步骤S24中二维卷积后仅改变其通道数，且在整个MLP架构内，对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力，其中的收缩系数为r，收缩后特征尺度为H×W×C/r，使用ReLU激活函数，扩张后特征尺度为H×W×C。Further, the above-mentioned input data only changes the number of channels after the two-dimensional convolution in the above-mentioned step S24, and in the entire MLP architecture, the channel of the input data is estimated in a way of shrinking first and then expanding, The shrinkage coefficient is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.

进一步的，上述步骤S23和S24中分别通过全局通道注意力机制和局部通道注意力机制提取输入数据中的全局特征和局部特征，以及所述步骤S26中通过对全局注意力与局部注意力的输出进行融合操作即对不同特征进行特征融合，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集与目标遮挡问题。Further, in the above steps S23 and S24, the global features and local features in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism respectively, and the output of the global attention and local attention in the step S26 is The fusion operation is to perform feature fusion on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating the problem of target aggregation and target occlusion in complex scenes.

本发明的有益效果：复杂场景下的小目标检测的大量聚集和严重的遮挡等特点带来的检测精度不高、漏检率高等问题，可以通过多尺度通道注意力的图像数据处理方法进一步缓解，多尺度通道注意力的图像数据处理方法通过提取数据中的全局特征和局部特征并对不同特征进行特征融合，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集及目标遮挡问题。Beneficial effects of the present invention: the problems of low detection accuracy and high missed detection rate caused by the large number of small target detection and serious occlusion in complex scenes can be further alleviated by the image data processing method of multi-scale channel attention , the image data processing method of multi-scale channel attention extracts the global features and local features in the data and performs feature fusion on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating Target aggregation and target occlusion problems in complex scenes.

附图说明Description of drawings

图1为本发明中多尺度通道注意力的图像数据处理方法示意图；Fig. 1 is a schematic diagram of an image data processing method of multi-scale channel attention in the present invention;

图2为本发明中ReLU函数修正线性示意图；Fig. 2 is a linear schematic diagram of ReLU function modification in the present invention;

图3为本发明中sigmoid函数数据归一化示意图。Fig. 3 is a schematic diagram of normalization of sigmoid function data in the present invention.

实施方式Implementation

下面结合说明书附图对本发明进一步清楚完整说明，但本发明的保护范围并不仅限于此。The present invention will be further clearly and completely described below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited thereto.

实施例Example

如图1至图3所示，一种多尺度通道注意力的图像数据处理方法，包括以下步骤：As shown in Figures 1 to 3, an image data processing method for multi-scale channel attention includes the following steps:

S22：使用全局通道注意机制与局部通道注意力机制相结合的方法，如图1所示，对输入数据进行特征提取和特征融合；S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism, as shown in Figure 1, to perform feature extraction and feature fusion on the input data;

S22.1：在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数，如图1左列所示，全局通道注意力可以通过对特征图的全局平均池化和逐元素变换，自适应地调整不同通道的权重，使得模型能够关注更重要的特征，提高模型的分类性能和鲁棒性，其中全局平均池化的计算公式为：S22.1: In the global channel attention mechanism, use global average pooling, one-dimensional convolution layer with adaptive selection of convolution kernel size and Sigmoid activation function, as shown in the left column of Figure 1, the global channel attention can be passed to The global average pooling and element-by-element transformation of the feature map adaptively adjust the weights of different channels so that the model can focus on more important features and improve the classification performance and robustness of the model. The calculation formula of the global average pooling is:

其中y表示全局平均池化结果，X为输入图像，其尺寸为W×H×C，W、H和C分别表示输入图像的宽、高和通道，i和j分别代表宽和高上的像素点位置；Where y represents the global average pooling result, X is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixels on the width and height respectively point position;

其中k表示一维卷积的卷积核大小，C表示通道数，表示k只能取奇数，其中和b用于改变C和k之间的比例，本实施例中中/>和b分别取2和1；Where k represents the convolution kernel size of one-dimensional convolution, C represents the number of channels, Indicates that k can only take odd numbers, where and b are used to change the ratio between C and k, in this embodiment /> and b take 2 and 1 respectively;

Sigmoid激活函数也称为S型生长曲线，如图3所示，计算公式为：The Sigmoid activation function is also called the S-type growth curve, as shown in Figure 3, the calculation formula is:

其中x为输入；where x is the input;

S22.2：在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP，用于提取局部特征，MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活，ReLU函数激活使一部分神经元的输出为0，减少了参数的相互依存关系，缓解了过拟合问题的发生，输入数据经二维卷积后仅改变其通道数，第一个卷积操作的输出通道数为输入通道数的十六分之一，第二个卷积操作的输出通道数与嵌入位置通道数一致，局部通道注意力则可以帮助模型更好地捕捉输入特征中的局部信息，如图1右列所示，其中ReLU函数通过将相应的活性值设为0，如图2所示，仅保留正元素并丢弃所有负元素；S22.2: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and an intermediate The activation of the ReLU function, the activation of the ReLU function makes the output of some neurons 0, which reduces the interdependence of parameters and alleviates the occurrence of over-fitting problems. The input data only changes the number of channels after two-dimensional convolution. The number of output channels of the first convolution operation is one-sixteenth of the number of input channels, the number of output channels of the second convolution operation is consistent with the number of embedding position channels, and local channel attention can help the model better capture the input The local information in the feature, as shown in the right column of Figure 1, where the ReLU function sets the corresponding activity value to 0, as shown in Figure 2, only retaining positive elements and discarding all negative elements;

S22.4：通过Sigmoid函数进行压缩，它将已有数据根据其范围，将任意输入压缩到区间（0, 1）中的某个值，以保证归一化，如图1所示；S22.4: Compress through the Sigmoid function, which compresses any input to a certain value in the interval (0, 1) according to its range to ensure normalization, as shown in Figure 1;

S22.5：对输入数据与激活后的数据进行逐像素相乘操作，用来完成对输入数据的不同位置加权操作，从而更关注全局特征和局部特征如图1所示。S22.5: Multiply the input data and the activated data pixel by pixel to complete the weighting operation of different positions of the input data, so as to pay more attention to the global features and local features, as shown in Figure 1.

上述输入数据通过上述步骤S24中二维卷积后仅改变其通道数，且在整个MLP架构内，对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力，其中的收缩系数为r，收缩后特征尺度为H×W×C/r，使用ReLU激活函数，扩张后特征尺度为H×W×C。The above-mentioned input data only changes the number of channels after the two-dimensional convolution in the above-mentioned step S24, and in the entire MLP architecture, the channels of the input data are first contracted and then expanded to estimate the attention between channels, where the contraction The coefficient is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.

上述步骤S23和S24中分别通过全局通道注意力机制和局部通道注意力机制提取输入数据中的全局特征和局部特征，以及所述步骤S26中通过对全局注意力与局部注意力的输出进行融合操作即对不同特征进行特征融合，从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注，从而缓解复杂场景中出现的目标聚集与目标遮挡问题。In the above steps S23 and S24, the global features and local features in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism respectively, and in the step S26, the fusion operation is performed on the output of the global attention and the local attention That is, feature fusion is performed on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating the problems of target aggregation and target occlusion in complex scenes.

本发明的实施例公布的是较佳的实施例，但并不局限于此，本领域的普通技术人员，极易根据上述实施例，领会本发明的精神，并做出不同的引申和变化，但只要不脱离本发明的精神，都在本发明的保护范围内。The embodiments of the present invention disclose preferred embodiments, but are not limited thereto. Those skilled in the art can easily comprehend the spirit of the present invention based on the above-mentioned embodiments, and make different extensions and changes. But as long as it does not deviate from the spirit of the present invention, it is within the protection scope of the present invention.

Claims

1. an image data processing method of multi-scale channel attention, is characterized in that, comprises the following steps:

S21: Digitize the input data, that is, the original image or feature map, convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization;

S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism to perform feature extraction and feature fusion on the input data;

S22.1: Use global average pooling, a one-dimensional convolutional layer with adaptive selection of the convolution kernel size, and a Sigmoid activation function in the global channel attention mechanism, wherein the calculation formula of the global average pooling process is:

, where /> Indicates the global average pooling result, /> is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively;

The calculation formula for adaptive selection is: , where k represents the convolution kernel size of one-dimensional convolution, C represents the number of channels, /> Indicates that k can only take odd numbers, /> and b are used to vary the ratio between C and k;

The Sigmoid activation function is also known as the S-type growth curve, and the calculation formula is: , where x is the input;

S22.2: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and an intermediate The ReLU function activation of the input data only changes the number of channels after two-dimensional convolution. The number of output channels of the first convolution operation is one-sixteenth of the number of input channels, and the number of output channels of the second convolution operation Consistent with the number of embedding position channels, the ReLU function keeps only positive elements and discards all negative elements by setting the corresponding activity value to 0;

S22.3: Fuse the output of global attention and local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data with the input data pixel by pixel;

S22.4: Compress through the Sigmoid function, which compresses any input to a value in the interval (0, 1) according to its range to ensure normalization;

S22.5: Multiply the input data and the activated data pixel by pixel to complete the weighting operation of different positions of the input data, so as to pay more attention to global features and local features.

2. the image data processing method of a kind of multi-scale channel attention according to claim 1, is characterized in that,

After the input data passes through the two-dimensional convolution in the step S24, only the number of channels is changed, and in the entire MLP architecture, the channels of the input data are first contracted and then expanded to estimate the attention between channels, wherein The shrinkage coefficient of is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.