[go: up one dir, main page]

CN116129207B - A Method for Image Data Processing with Multi-Scale Channel Attention - Google Patents

A Method for Image Data Processing with Multi-Scale Channel Attention Download PDF

Info

Publication number
CN116129207B
CN116129207B CN202310414590.1A CN202310414590A CN116129207B CN 116129207 B CN116129207 B CN 116129207B CN 202310414590 A CN202310414590 A CN 202310414590A CN 116129207 B CN116129207 B CN 116129207B
Authority
CN
China
Prior art keywords
channels
attention
input data
global
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310414590.1A
Other languages
Chinese (zh)
Other versions
CN116129207A (en
Inventor
刘刚
王冰冰
周杰
王磊
史魁杰
曾辉
张金烁
胡莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202310414590.1A priority Critical patent/CN116129207B/en
Publication of CN116129207A publication Critical patent/CN116129207A/en
Application granted granted Critical
Publication of CN116129207B publication Critical patent/CN116129207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Facsimile Image Signal Circuits (AREA)

Abstract

本发明公开了一种多尺度通道注意力的图像数据处理方法,通过提取输入数据中的全局特征和局部特征,从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注,从而缓解复杂场景中出现的目标聚集与目标遮挡问题。

The invention discloses an image data processing method with multi-scale channel attention. By extracting the global features and local features in the input data, the convolutional neural network pays more attention to the overall information and local detail features of the input data, thereby alleviating Target aggregation and target occlusion problems in complex scenes.

Description

一种多尺度通道注意力的图像数据处理方法A Method for Image Data Processing with Multi-Scale Channel Attention

技术领域technical field

本发明涉及计算机视觉领域,尤其涉及一种多尺度通道注意力的图像数据处理方法。The invention relates to the field of computer vision, in particular to an image data processing method of multi-scale channel attention.

背景技术Background technique

通道注意力机制能够显著地提高模型的表现力和泛化能力,且具有较低的计算成本,容易被集成到现有的卷积神经网络结构中。而由于这些优点,通道注意力机制也已经被广泛应用于图像分类、目标检测、语义分割等深度学习应用领域。The channel attention mechanism can significantly improve the expressiveness and generalization ability of the model, and has a low computational cost, and is easy to be integrated into the existing convolutional neural network structure. Due to these advantages, the channel attention mechanism has been widely used in deep learning applications such as image classification, object detection, and semantic segmentation.

通道注意力机制的本质是对不同通道的特征进行加权平均,从而得到更加丰富、稳定、可靠的特征表达。The essence of the channel attention mechanism is to weight and average the features of different channels, so as to obtain more abundant, stable and reliable feature expressions.

现有的通道注意力有SE,ECA,CA等,这些通道注意力仅仅关注某一局部特征中的细节信息或者全局特征中的语义信息,而没有同时关注细节信息与语义信息,导致不够丰富的通道维度的特征表达。The existing channel attention includes SE, ECA, CA, etc. These channel attention only focus on the detailed information in a certain local feature or the semantic information in the global feature, but do not pay attention to the detailed information and semantic information at the same time, resulting in insufficient richness. Feature representation of the channel dimension.

发明内容Contents of the invention

本发明的目的是为了提供一种多尺度通道注意力的图像数据处理方法。The purpose of the present invention is to provide a multi-scale channel attention image data processing method.

本发明所要解决的问题是:The problem to be solved by the present invention is:

提出一种多尺度通道注意力的图像数据处理方法,提取输入数据中的全局特征和局部特征,从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注,从而缓解复杂场景中出现的目标聚集与目标遮挡问题。A multi-scale channel attention image data processing method is proposed to extract the global features and local features in the input data, so that the convolutional neural network pays more attention to the overall information and local detail features of the input data, thereby alleviating the occurrence of complex scenes. The problem of target aggregation and target occlusion.

一种多尺度通道注意力的图像数据处理方法采用的技术方案如下:A technical scheme adopted by an image data processing method of multi-scale channel attention is as follows:

一种多尺度通道注意力的图像数据处理方法A Method for Image Data Processing with Multi-Scale Channel Attention

S21:对输入数据(原始图像或特征图)进行数字化处理,将提取到的特征转换为数字化,并通过张量矩阵存储,经过归一化处理使卷积神经网络收敛加快;S21: Digitize the input data (original image or feature map), convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization;

S22:使用全局通道注意机制与局部通道注意力机制相结合的方法,对输入数据进行特征提取和特征融合;S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism to perform feature extraction and feature fusion on the input data;

S22.1:在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数,全局通道注意力可以通过对特征图的全局平均池化和逐元素变换,自适应地调整不同通道的权重,使得模型能够关注更重要的特征,提高模型的分类性能和鲁棒性,其中全局平均的计算公式为:S22.1: In the global channel attention mechanism, global average pooling, one-dimensional convolutional layer with adaptive selection of convolution kernel size and Sigmoid activation function are used. Global channel attention can be achieved through global average pooling of feature maps and Element-by-element transformation, adaptively adjust the weights of different channels, so that the model can focus on more important features, improve the classification performance and robustness of the model, and the calculation formula of the global average is:

其中表示全局平均池化结果,/>为输入图像,其尺寸为W×H×C,W、H和C分别表示输入图像的宽、高和通道,i和j分别代表宽和高上的像素点位置;in Indicates the global average pooling result, /> is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively;

自适应选择的计算公式为:The calculation formula for adaptive selection is:

其中k表示一维卷积的卷积核大小,C表示通道数,表示k只能取奇数,/>和b用于改变C和k之间的比例,本发明中/>和b分别取2和1;Where k represents the convolution kernel size of one-dimensional convolution, C represents the number of channels, Indicates that k can only take odd numbers, /> and b are used to change the ratio between C and k, in the present invention /> and b take 2 and 1 respectively;

Sigmoid激活函数也称为S型生长曲线,计算公式为:The Sigmoid activation function is also known as the S-type growth curve, and the calculation formula is:

其中x为输入;where x is the input;

S22.2:在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP,用于提取局部特征,MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活,输入数据经二维卷积后仅改变其通道数,第一个卷积操作的输出通道数为输入通道数的十六分之一,第二个卷积操作的输出通道数与嵌入位置通道数一致,局部通道注意力则可以帮助模型更好地捕捉输入特征中的局部信息,ReLU函数通过将相应的活性值设为0,仅保留正元素并丢弃所有负元素;S22.2: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and an intermediate The ReLU function activation of the input data only changes the number of channels after two-dimensional convolution. The number of output channels of the first convolution operation is one-sixteenth of the number of input channels, and the number of output channels of the second convolution operation Consistent with the number of embedding position channels, local channel attention can help the model better capture the local information in the input features. The ReLU function keeps only positive elements and discards all negative elements by setting the corresponding activity value to 0;

S22.3:将全局注意力与局部注意力的输出进行融合操作,并使用Sigmoid函数激活数据得到最终的注意力权重,然后将激活后的数据与输入数据进行逐像素相乘。S22.3: Fuse the output of global attention and local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data with the input data pixel by pixel.

S22.4:通过Sigmoid函数进行压缩,它将已有数据根据其范围,将任意输入压缩到区间(0, 1)中的某个值,以保证归一化;S22.4: Compress through the Sigmoid function, which compresses any input to a value in the interval (0, 1) according to its range to ensure normalization;

S22.5:对输入数据与激活后的数据进行逐像素相乘操作,用来完成对输入数据的不同位置加权操作,从而更关注全局特征和局部特征。S22.5: Multiply the input data and the activated data pixel by pixel to complete the weighting operation of different positions of the input data, so as to pay more attention to global features and local features.

进一步的,上述输入数据通过上述步骤S24中二维卷积后仅改变其通道数,且在整个MLP架构内,对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力,其中的收缩系数为r,收缩后特征尺度为H×W×C/r,使用ReLU激活函数,扩张后特征尺度为H×W×C。Further, the above-mentioned input data only changes the number of channels after the two-dimensional convolution in the above-mentioned step S24, and in the entire MLP architecture, the channel of the input data is estimated in a way of shrinking first and then expanding, The shrinkage coefficient is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.

进一步的,上述步骤S23和S24中分别通过全局通道注意力机制和局部通道注意力机制提取输入数据中的全局特征和局部特征,以及所述步骤S26中通过对全局注意力与局部注意力的输出进行融合操作即对不同特征进行特征融合,从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注,从而缓解复杂场景中出现的目标聚集与目标遮挡问题。Further, in the above steps S23 and S24, the global features and local features in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism respectively, and the output of the global attention and local attention in the step S26 is The fusion operation is to perform feature fusion on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating the problem of target aggregation and target occlusion in complex scenes.

本发明的有益效果:复杂场景下的小目标检测的大量聚集和严重的遮挡等特点带来的检测精度不高、漏检率高等问题,可以通过多尺度通道注意力的图像数据处理方法进一步缓解,多尺度通道注意力的图像数据处理方法通过提取数据中的全局特征和局部特征并对不同特征进行特征融合,从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注,从而缓解复杂场景中出现的目标聚集及目标遮挡问题。Beneficial effects of the present invention: the problems of low detection accuracy and high missed detection rate caused by the large number of small target detection and serious occlusion in complex scenes can be further alleviated by the image data processing method of multi-scale channel attention , the image data processing method of multi-scale channel attention extracts the global features and local features in the data and performs feature fusion on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating Target aggregation and target occlusion problems in complex scenes.

附图说明Description of drawings

图1为本发明中多尺度通道注意力的图像数据处理方法示意图;Fig. 1 is a schematic diagram of an image data processing method of multi-scale channel attention in the present invention;

图2为本发明中ReLU函数修正线性示意图;Fig. 2 is a linear schematic diagram of ReLU function modification in the present invention;

图3为本发明中sigmoid函数数据归一化示意图。Fig. 3 is a schematic diagram of normalization of sigmoid function data in the present invention.

实施方式Implementation

下面结合说明书附图对本发明进一步清楚完整说明,但本发明的保护范围并不仅限于此。The present invention will be further clearly and completely described below in conjunction with the accompanying drawings, but the protection scope of the present invention is not limited thereto.

实施例Example

如图1至图3所示,一种多尺度通道注意力的图像数据处理方法,包括以下步骤:As shown in Figures 1 to 3, an image data processing method for multi-scale channel attention includes the following steps:

S21:对输入数据(原始图像或特征图)进行数字化处理,将提取到的特征转换为数字化,并通过张量矩阵存储,经过归一化处理使卷积神经网络收敛加快;S21: Digitize the input data (original image or feature map), convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization;

S22:使用全局通道注意机制与局部通道注意力机制相结合的方法,如图1所示,对输入数据进行特征提取和特征融合;S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism, as shown in Figure 1, to perform feature extraction and feature fusion on the input data;

S22.1:在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数,如图1左列所示,全局通道注意力可以通过对特征图的全局平均池化和逐元素变换,自适应地调整不同通道的权重,使得模型能够关注更重要的特征,提高模型的分类性能和鲁棒性,其中全局平均池化的计算公式为:S22.1: In the global channel attention mechanism, use global average pooling, one-dimensional convolution layer with adaptive selection of convolution kernel size and Sigmoid activation function, as shown in the left column of Figure 1, the global channel attention can be passed to The global average pooling and element-by-element transformation of the feature map adaptively adjust the weights of different channels so that the model can focus on more important features and improve the classification performance and robustness of the model. The calculation formula of the global average pooling is:

其中y表示全局平均池化结果,X为输入图像,其尺寸为W×H×C,W、H和C分别表示输入图像的宽、高和通道,i和j分别代表宽和高上的像素点位置;Where y represents the global average pooling result, X is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixels on the width and height respectively point position;

自适应选择的计算公式为:The calculation formula for adaptive selection is:

其中k表示一维卷积的卷积核大小,C表示通道数,表示k只能取奇数,其中和b用于改变C和k之间的比例,本实施例中中/>和b分别取2和1;Where k represents the convolution kernel size of one-dimensional convolution, C represents the number of channels, Indicates that k can only take odd numbers, where and b are used to change the ratio between C and k, in this embodiment /> and b take 2 and 1 respectively;

Sigmoid激活函数也称为S型生长曲线,如图3所示,计算公式为:The Sigmoid activation function is also called the S-type growth curve, as shown in Figure 3, the calculation formula is:

其中x为输入;where x is the input;

S22.2:在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP,用于提取局部特征,MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活,ReLU函数激活使一部分神经元的输出为0,减少了参数的相互依存关系,缓解了过拟合问题的发生,输入数据经二维卷积后仅改变其通道数,第一个卷积操作的输出通道数为输入通道数的十六分之一,第二个卷积操作的输出通道数与嵌入位置通道数一致,局部通道注意力则可以帮助模型更好地捕捉输入特征中的局部信息,如图1右列所示,其中ReLU函数通过将相应的活性值设为0,如图2所示,仅保留正元素并丢弃所有负元素;S22.2: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and an intermediate The activation of the ReLU function, the activation of the ReLU function makes the output of some neurons 0, which reduces the interdependence of parameters and alleviates the occurrence of over-fitting problems. The input data only changes the number of channels after two-dimensional convolution. The number of output channels of the first convolution operation is one-sixteenth of the number of input channels, the number of output channels of the second convolution operation is consistent with the number of embedding position channels, and local channel attention can help the model better capture the input The local information in the feature, as shown in the right column of Figure 1, where the ReLU function sets the corresponding activity value to 0, as shown in Figure 2, only retaining positive elements and discarding all negative elements;

S22.3:将全局注意力与局部注意力的输出进行融合操作,并使用Sigmoid函数激活数据得到最终的注意力权重,然后将激活后的数据与输入数据进行逐像素相乘。S22.3: Fuse the output of global attention and local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data with the input data pixel by pixel.

S22.4:通过Sigmoid函数进行压缩,它将已有数据根据其范围,将任意输入压缩到区间(0, 1)中的某个值,以保证归一化,如图1所示;S22.4: Compress through the Sigmoid function, which compresses any input to a certain value in the interval (0, 1) according to its range to ensure normalization, as shown in Figure 1;

S22.5:对输入数据与激活后的数据进行逐像素相乘操作,用来完成对输入数据的不同位置加权操作,从而更关注全局特征和局部特征如图1所示。S22.5: Multiply the input data and the activated data pixel by pixel to complete the weighting operation of different positions of the input data, so as to pay more attention to the global features and local features, as shown in Figure 1.

上述输入数据通过上述步骤S24中二维卷积后仅改变其通道数,且在整个MLP架构内,对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力,其中的收缩系数为r,收缩后特征尺度为H×W×C/r,使用ReLU激活函数,扩张后特征尺度为H×W×C。The above-mentioned input data only changes the number of channels after the two-dimensional convolution in the above-mentioned step S24, and in the entire MLP architecture, the channels of the input data are first contracted and then expanded to estimate the attention between channels, where the contraction The coefficient is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.

上述步骤S23和S24中分别通过全局通道注意力机制和局部通道注意力机制提取输入数据中的全局特征和局部特征,以及所述步骤S26中通过对全局注意力与局部注意力的输出进行融合操作即对不同特征进行特征融合,从而使得卷积神经网络对输入数据的整体信息以及局部细节特征更加关注,从而缓解复杂场景中出现的目标聚集与目标遮挡问题。In the above steps S23 and S24, the global features and local features in the input data are extracted through the global channel attention mechanism and the local channel attention mechanism respectively, and in the step S26, the fusion operation is performed on the output of the global attention and the local attention That is, feature fusion is performed on different features, so that the convolutional neural network pays more attention to the overall information of the input data and local detail features, thereby alleviating the problems of target aggregation and target occlusion in complex scenes.

本发明的实施例公布的是较佳的实施例,但并不局限于此,本领域的普通技术人员,极易根据上述实施例,领会本发明的精神,并做出不同的引申和变化,但只要不脱离本发明的精神,都在本发明的保护范围内。The embodiments of the present invention disclose preferred embodiments, but are not limited thereto. Those skilled in the art can easily comprehend the spirit of the present invention based on the above-mentioned embodiments, and make different extensions and changes. But as long as it does not deviate from the spirit of the present invention, it is within the protection scope of the present invention.

Claims (2)

1.一种多尺度通道注意力的图像数据处理方法,其特征在于,包括以下步骤:1. an image data processing method of multi-scale channel attention, is characterized in that, comprises the following steps: S21:对输入数据即原始图像或特征图进行数字化处理,将提取到的特征转换为数字化,并通过张量矩阵存储,经过归一化处理使卷积神经网络收敛加快;S21: Digitize the input data, that is, the original image or feature map, convert the extracted features into digits, store them in a tensor matrix, and speed up the convergence of the convolutional neural network after normalization; S22:使用全局通道注意力机制与局部通道注意力机制相结合的方法,对输入数据进行特征提取和特征融合;S22: Use the method of combining the global channel attention mechanism and the local channel attention mechanism to perform feature extraction and feature fusion on the input data; S22.1:在全局通道注意力机制内使用全局平均池化、自适应选择卷积核大小的一维卷积层和Sigmoid激活函数,其中所述全局平均池化过程的计算公式为:S22.1: Use global average pooling, a one-dimensional convolutional layer with adaptive selection of the convolution kernel size, and a Sigmoid activation function in the global channel attention mechanism, wherein the calculation formula of the global average pooling process is: ,其中/>表示全局平均池化结果,/>为输入图像,其尺寸为W×H×C,W、H和C分别表示输入图像的宽、高和通道,i和j分别代表宽和高上的像素点位置; , where /> Indicates the global average pooling result, /> is the input image, its size is W×H×C, W, H and C represent the width, height and channel of the input image respectively, and i and j represent the pixel position on the width and height respectively; 自适应选择的计算公式为:,其中k表示一维卷积的卷积核大小,C表示通道数,/>表示k只能取奇数,/>和b用于改变C和k之间的比例;The calculation formula for adaptive selection is: , where k represents the convolution kernel size of one-dimensional convolution, C represents the number of channels, /> Indicates that k can only take odd numbers, /> and b are used to vary the ratio between C and k; Sigmoid激活函数也称为S型生长曲线,计算公式为:,其中x为输入;The Sigmoid activation function is also known as the S-type growth curve, and the calculation formula is: , where x is the input; S22.2:在局部通道注意力机制中采用的是二维卷积实现的多层感知机MLP,用于提取局部特征,MLP架构为卷积核大小为1的两个二维卷积以及中间的ReLU函数激活,输入数据经二维卷积后仅改变其通道数,第一个卷积操作的输出通道数为输入通道数的十六分之一,第二个卷积操作的输出通道数与嵌入位置通道数一致,ReLU函数通过将相应的活性值设为0,仅保留正元素并丢弃所有负元素;S22.2: In the local channel attention mechanism, a multi-layer perceptron MLP implemented by two-dimensional convolution is used to extract local features. The MLP architecture is two two-dimensional convolutions with a convolution kernel size of 1 and an intermediate The ReLU function activation of the input data only changes the number of channels after two-dimensional convolution. The number of output channels of the first convolution operation is one-sixteenth of the number of input channels, and the number of output channels of the second convolution operation Consistent with the number of embedding position channels, the ReLU function keeps only positive elements and discards all negative elements by setting the corresponding activity value to 0; S22.3:将全局注意力与局部注意力的输出进行融合操作,并使用Sigmoid函数激活数据得到最终的注意力权重,然后将激活后的数据与输入数据进行逐像素相乘;S22.3: Fuse the output of global attention and local attention, and use the Sigmoid function to activate the data to obtain the final attention weight, and then multiply the activated data with the input data pixel by pixel; S22.4:通过Sigmoid函数进行压缩,它将已有数据根据其范围,将任意输入压缩到区间(0, 1)中的某个值,以保证归一化;S22.4: Compress through the Sigmoid function, which compresses any input to a value in the interval (0, 1) according to its range to ensure normalization; S22.5:对输入数据与激活后的数据进行逐像素相乘操作,用来完成对输入数据的不同位置加权操作,从而更关注全局特征和局部特征。S22.5: Multiply the input data and the activated data pixel by pixel to complete the weighting operation of different positions of the input data, so as to pay more attention to global features and local features. 2.根据权利要求1所述的一种多尺度通道注意力的图像数据处理方法,其特征在于,2. the image data processing method of a kind of multi-scale channel attention according to claim 1, is characterized in that, 所述输入数据通过所述步骤S24中二维卷积后仅改变其通道数,且在整个MLP架构内,对输入数据的通道以一种先收缩后扩张的方式估计通道间的注意力,其中的收缩系数为r,收缩后特征尺度为H×W×C/r,使用ReLU激活函数,扩张后特征尺度为H×W×C。After the input data passes through the two-dimensional convolution in the step S24, only the number of channels is changed, and in the entire MLP architecture, the channels of the input data are first contracted and then expanded to estimate the attention between channels, wherein The shrinkage coefficient of is r, the feature scale after shrinkage is H×W×C/r, and the ReLU activation function is used, and the feature scale after expansion is H×W×C.
CN202310414590.1A 2023-04-18 2023-04-18 A Method for Image Data Processing with Multi-Scale Channel Attention Active CN116129207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310414590.1A CN116129207B (en) 2023-04-18 2023-04-18 A Method for Image Data Processing with Multi-Scale Channel Attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310414590.1A CN116129207B (en) 2023-04-18 2023-04-18 A Method for Image Data Processing with Multi-Scale Channel Attention

Publications (2)

Publication Number Publication Date
CN116129207A CN116129207A (en) 2023-05-16
CN116129207B true CN116129207B (en) 2023-08-04

Family

ID=86301329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310414590.1A Active CN116129207B (en) 2023-04-18 2023-04-18 A Method for Image Data Processing with Multi-Scale Channel Attention

Country Status (1)

Country Link
CN (1) CN116129207B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094343B (en) * 2024-04-23 2024-07-16 安徽大学 Attention mechanism-based LSTM machine residual service life prediction method
CN118397281B (en) * 2024-06-24 2024-09-17 湖南工商大学 Image segmentation model training method, segmentation method and device based on artificial intelligence
CN119252334A (en) * 2024-10-14 2025-01-03 山东合成生物技术有限公司 A screening method and system for synthetic biological probiotics

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489358A (en) * 2020-03-18 2020-08-04 华中科技大学 A 3D point cloud semantic segmentation method based on deep learning
CN112017198A (en) * 2020-10-16 2020-12-01 湖南师范大学 Right ventricle segmentation method and device based on self-attention mechanism multi-scale features
CN112784764A (en) * 2021-01-27 2021-05-11 南京邮电大学 Expression recognition method and system based on local and global attention mechanism
CN115240201A (en) * 2022-09-21 2022-10-25 江西师范大学 A Chinese character generation method using Chinese character skeleton information to alleviate the problem of network model collapse
CN115761258A (en) * 2022-11-10 2023-03-07 山西大学 Image direction prediction method based on multi-scale fusion and attention mechanism
CN115880225A (en) * 2022-11-10 2023-03-31 北京工业大学 Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157307B (en) * 2016-06-27 2018-09-11 浙江工商大学 A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN110853051B (en) * 2019-10-24 2022-06-03 北京航空航天大学 Cerebrovascular image segmentation method based on multi-attention dense connection generation countermeasure network
CN113627295A (en) * 2021-07-28 2021-11-09 中汽创智科技有限公司 Image processing method, device, equipment and storage medium
CN114842553B (en) * 2022-04-18 2025-01-03 安庆师范大学 Action Detection Method Based on Residual Shrinkage Structure and Non-local Attention

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489358A (en) * 2020-03-18 2020-08-04 华中科技大学 A 3D point cloud semantic segmentation method based on deep learning
CN112017198A (en) * 2020-10-16 2020-12-01 湖南师范大学 Right ventricle segmentation method and device based on self-attention mechanism multi-scale features
CN112784764A (en) * 2021-01-27 2021-05-11 南京邮电大学 Expression recognition method and system based on local and global attention mechanism
CN115240201A (en) * 2022-09-21 2022-10-25 江西师范大学 A Chinese character generation method using Chinese character skeleton information to alleviate the problem of network model collapse
CN115761258A (en) * 2022-11-10 2023-03-07 山西大学 Image direction prediction method based on multi-scale fusion and attention mechanism
CN115880225A (en) * 2022-11-10 2023-03-31 北京工业大学 Dynamic illumination human face image quality enhancement method based on multi-scale attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多尺度特征联合注意力的声纹识别模型研究;章予希;《中国优秀硕士学位论文全文数据库 信息科技辑》;第I136-362页 *

Also Published As

Publication number Publication date
CN116129207A (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN116129207B (en) A Method for Image Data Processing with Multi-Scale Channel Attention
CN114119638B (en) Medical image segmentation method integrating multi-scale features and attention mechanisms
CN111768432B (en) Moving object segmentation method and system based on Siamese deep neural network
CN111402129B (en) Binocular stereo matching method based on joint up-sampling convolutional neural network
CN112257794B (en) YOLO-based lightweight target detection method
WO2020056791A1 (en) Method and apparatus for super-resolution reconstruction of multi-scale dilated convolution neural network
CN110136062B (en) A Super-Resolution Reconstruction Method for Joint Semantic Segmentation
CN112149504A (en) A hybrid convolutional residual network combined with attention for action video recognition
CN112364931A (en) Low-sample target detection method based on meta-feature and weight adjustment and network model
CN112131959A (en) 2D human body posture estimation method based on multi-scale feature reinforcement
CN115082675B (en) A transparent object image segmentation method and system
CN114821249B (en) Vehicle weight recognition method based on grouping aggregation attention and local relation
CN107767416A (en) The recognition methods of pedestrian's direction in a kind of low-resolution image
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
CN110866938A (en) Full-automatic video moving object segmentation method
CN109740552A (en) A Target Tracking Method Based on Parallel Feature Pyramid Neural Network
CN116758407A (en) Underwater small target detection method and device based on CenterNet
CN117876842A (en) A method and system for detecting anomalies of industrial products based on generative adversarial networks
CN117912083A (en) Light-weight facial expression recognition method based on linear self-attention
CN115171052B (en) Crowded crowd attitude estimation method based on high-resolution context network
CN116309632A (en) 3D Liver Semantic Segmentation Method Based on Multi-scale Cascade Feature Attention Strategy
CN114926734B (en) Solid waste detection device and method based on feature aggregation and attention fusion
CN110210419A (en) The scene Recognition system and model generating method of high-resolution remote sensing image
CN114519383A (en) Image target detection method and system
CN114492755A (en) Object Detection Model Compression Method Based on Knowledge Distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant