CN118154984A

CN118154984A - Unsupervised neighborhood classification superpixel generation method and system based on fusion guided filtering

Info

Publication number: CN118154984A
Application number: CN202410422107.9A
Authority: CN
Inventors: 张永霞; 孙银隆; 白清芳; 王南南; 郭强
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2024-04-09
Filing date: 2024-04-09
Publication date: 2024-06-07
Anticipated expiration: 2044-04-09
Also published as: CN118154984B

Abstract

The invention relates to the technical field of image processing, in particular to an unsupervised neighborhood classification superpixel generation method and system integrating guide filtering, comprising the following steps: downsampling an input image to obtain a low resolution image; extracting features from the low resolution image using a lightweight network model, the lightweight network model comprising a multi-scale pyramid attention mechanism and a depth separable U-Net module; performing pixel neighborhood classification based on the characteristics to obtain the association mapping of the pixels and nine neighborhood super-pixel pairs; and inputting the low-resolution image and the association mapping to a guide filtering module, and carrying out joint up-sampling by taking the original input image as a guide image to obtain a super-pixel segmentation result. The method for quickly and accurately generating the super pixels of the unsupervised image reduces the parameter number of the model through the super pixels quick generation framework and accelerates the reasoning speed of the model.

Description

Unsupervised neighborhood classification superpixel generation method and system based on fusion guided filtering

技术领域Technical Field

本发明属于图像处理技术领域，具体涉及一种融合引导滤波的无监督邻域分类超像素生成方法及系统。The present invention belongs to the technical field of image processing, and in particular relates to an unsupervised neighborhood classification superpixel generation method and system using fusion guided filtering.

背景技术Background technique

超像素分割是依据像素特征进行聚类，将图像划分为若干个连通区域，每个区域即为超像素。与像素相比，超像素可以用更少的元素有效地表示图像。因此，超像素分割通常用作预处理步骤，以降低后续计算机视觉任务，如图像分割、目标跟踪和显著性检测等的运算复杂度。于此，图像超像素分割方法需要较高的准确性和效率。Superpixel segmentation is to cluster pixels based on their features and divide the image into several connected regions, each of which is a superpixel. Compared with pixels, superpixels can effectively represent images with fewer elements. Therefore, superpixel segmentation is often used as a preprocessing step to reduce the computational complexity of subsequent computer vision tasks such as image segmentation, object tracking, and saliency detection. Therefore, image superpixel segmentation methods require high accuracy and efficiency.

2018年，Yang等人首次将全卷积网络(fully-connected convolutionalnetwork，FCN)应用于图像超像素分割，首先按照目标超像素数量将图像初始化为矩形网格；然后采用编码器-解码器结构，通过将每个像素分配到9个相邻网格中，将超像素生成问题转化为9分类问题。FCN是第1个真正的端到端的超像素生成模型。基于FCN，Wang等人设计并引入像素与超像素的关联植入(association implantation，AI)模块，使网络能够明确地捕获像素与其周围超像素之间的关系，同时提出边界感知损失，使网络更加关注于边界信息。然而，无论是SSN还是FCN，都没有对超像素的连通性进行约束，因此可能产生分离的超像素。为了保证超像素的连通性，它们都需要后处理步骤将琐碎的超像素进行合并，而这个步骤往往是独立的、非神经网络结构。为了省去后处理步骤，Yuan等人基于插值的思想，提出超像素插值网络(superpixel interpolation network，SIN)，根据目标超像素数量生成初始分辨率的超像素结果，然后通过水平和竖直2个方向交替插值，生成与输入图像相同分辨率的分割结果，自始至终保证了超像素的连通性。以上方法都属于有监督的学习方法，需要大量的标记数据进行训练才能获得较好的性能。In 2018, Yang et al. first applied the fully-connected convolutional network (FCN) to image superpixel segmentation. First, the image was initialized into a rectangular grid according to the number of target superpixels. Then, an encoder-decoder structure was used to convert the superpixel generation problem into a 9-classification problem by assigning each pixel to 9 adjacent grids. FCN is the first true end-to-end superpixel generation model. Based on FCN, Wang et al. designed and introduced the association implantation (AI) module between pixels and superpixels, which enables the network to explicitly capture the relationship between pixels and their surrounding superpixels. At the same time, a boundary-aware loss was proposed to make the network pay more attention to boundary information. However, neither SSN nor FCN constrains the connectivity of superpixels, so separated superpixels may be generated. In order to ensure the connectivity of superpixels, they both require a post-processing step to merge trivial superpixels, and this step is often an independent, non-neural network structure. In order to save the post-processing steps, Yuan et al. proposed a superpixel interpolation network (SIN) based on the idea of interpolation. It generates superpixel results of the initial resolution according to the number of target superpixels, and then generates segmentation results with the same resolution as the input image by alternating interpolation in the horizontal and vertical directions, ensuring the connectivity of superpixels from beginning to end. The above methods are all supervised learning methods, which require a large amount of labeled data for training to achieve good performance.

发明内容Summary of the invention

针对现有技术的上述不足，本发明提供一种融合引导滤波的无监督邻域分类超像素生成方法及系统，以解决上述技术问题。In view of the above-mentioned deficiencies in the prior art, the present invention provides a method and system for generating unsupervised neighborhood classification superpixels by fusion-guided filtering to solve the above-mentioned technical problems.

第一方面，本发明提供一种融合引导滤波的无监督邻域分类超像素生成方法，包括：In a first aspect, the present invention provides an unsupervised neighborhood classification superpixel generation method using fusion guided filtering, comprising:

对输入图像进行下采样，得到低分辨率图像；Downsample the input image to obtain a low-resolution image;

利用轻量化网络模型从所述低分辨率图像提取特征，所述轻量化网络模型包括多尺度金字塔注意力机制和深度可分离U-Net模块；Extracting features from the low-resolution image using a lightweight network model, wherein the lightweight network model includes a multi-scale pyramid attention mechanism and a deep separable U-Net module;

基于所述特征进行像素邻域分类，得到像素与九邻域超像素对的关联映射；Perform pixel neighborhood classification based on the features to obtain association mapping between pixels and nine-neighborhood superpixel pairs;

将所述低分辨率图像和所述关联映射输入至引导滤波模块，以原始输入图像作为引导图进行联合上采样得到超像素分割结果。The low-resolution image and the association map are input into a guided filtering module, and the original input image is used as a guided map for joint upsampling to obtain a superpixel segmentation result.

在一个可选的实施方式中，所述方法还包括：In an optional embodiment, the method further comprises:

利用梯度重缩放模块对轻量化网络模型进行顺序训练；Use the gradient rescaling module to sequentially train the lightweight network model;

构建目标损失函数，利用所述目标损失函数对所述轻量化网络模型进行优化，目标损失函数包括：Construct a target loss function, and use the target loss function to optimize the lightweight network model. The target loss function includes:

； ;

其中，表示聚类损失，/>表示重构损失，/>表示边界平滑损失，L_edge表示边缘损失，/>、/>和/>表示权重系数。in, represents the clustering loss,/> represents the reconstruction loss,/> represents boundary smoothing loss, L _edge represents edge loss, /> 、/> and/> Represents the weight coefficient.

在一个可选的实施方式中，利用轻量化网络模型从所述低分辨率图像提取特征，包括：In an optional embodiment, extracting features from the low-resolution image using a lightweight network model includes:

多尺度金字塔注意力机制采用多种空洞率的卷积对低分辨率图像的初始特征向量在多种尺度上进行特征提取；The multi-scale pyramid attention mechanism uses convolutions with various dilation rates to extract features from the initial feature vector of the low-resolution image at multiple scales.

采用ECA通道注意力机制对多种尺度的特征进行加权，以捕捉低分辨率图像的不同尺度的特征信息；The ECA channel attention mechanism is used to weight features of multiple scales to capture feature information of different scales of low-resolution images;

将多种尺度的通道注意力特征进行融合，得到多尺度注意力特征；The channel attention features of multiple scales are fused to obtain multi-scale attention features;

深度可分离U-Net模块对多尺度注意力特征进行深度卷积和逐点卷积，得到保留颜色以及空间的特征；The deep separable U-Net module performs deep convolution and point-wise convolution on the multi-scale attention features to obtain color-preserving and spatial features;

利用梯度重缩放模块进行特征重构，得到低分辨率图像的特征。The gradient rescaling module is used to reconstruct features and obtain the features of the low-resolution image.

在一个可选的实施方式中，基于所述特征进行像素邻域分类，得到像素与九邻域超像素对的关联映射，包括：In an optional implementation, pixel neighborhood classification is performed based on the features to obtain an association map between pixels and nine-neighborhood superpixel pairs, including:

基于所述特征计算像素与邻域超像素的相似度，并将所述相似度转换为分配的概率；Calculating the similarity between the pixel and the neighboring superpixel based on the feature, and converting the similarity into a probability of assignment;

将所述像素分配给概率最大的邻域超像素。The pixel is assigned to the neighboring superpixel with the highest probability.

聚类损失；Clustering loss ;

其中，表示像素与周围9个超像素中第S个超像素的平均概率；第一项表示各像素的熵/>，保证了超像素生成的准确性；第二项是各像素平均向量的负熵，促使超像素的形成更加均匀。in, Represents the average probability of a pixel and the Sth superpixel among the surrounding 9 superpixels; the first term represents the entropy of each pixel/> , which ensures the accuracy of superpixel generation; the second item is the negative entropy of the average vector of each pixel, which makes the formation of superpixels more uniform.

重构损失；Reconstruction loss ;

L_c表示颜色特征的重构误差，公式为：L _c represents the reconstruction error of color features, and the formula is:

； ;

表示空间特征的重构误差，公式为： Represents the reconstruction error of spatial features, and the formula is:

； ;

表示/>和/>之间的权重参数，/>和/>分别表示图像的高和宽，/>表示图像特征通道序号，/>表示图像颜色特征通道数，/>表示图像空间特征通道数。 Indicates/> and/> The weight parameter between and/> Respectively represent the height and width of the image, /> Indicates the image feature channel number, /> Indicates the number of image color feature channels, /> Indicates the number of image spatial feature channels.

边界平滑损失；Boundary smoothing loss ;

其中，表示在分割结果软分配/>或原始输入图像的/>和/>方向的梯度运算，/>和表示原始输入图像的高度和宽度，/>和/>分别表示/>和输入图像/>的第/>个像素的软分配向量和特征。in, Indicates soft allocation in the segmentation result/> or the original input image/> and/> Directional gradient operation, /> and Indicates the height and width of the original input image, /> and/> Respectively represent/> and input image/> The first/> The soft assigned vectors and features of pixels.

原始分辨率输出的损失；Loss of original resolution output ;

通过使用3×3的拉普拉斯核对图像进行边缘计算，然后通过Softmax操作将其转换到[0，1]范围内，得到边缘图、/>和/>；随后，通过计算/>、/>和/>之间的KL散度，来衡量两个概率分布之间的差异，边缘损失值越低，表示两个概率分布越相似；通过最小化边缘损失，使重建的图像边缘分布与原始图像的边缘分布最接近。The edge map is obtained by using a 3×3 Laplacian kernel to calculate the edge of the image and then converting it to the range of [0, 1] through a Softmax operation. 、/> and/> ; Then, by calculating /> 、/> and/> The KL divergence between them is used to measure the difference between the two probability distributions. The lower the edge loss value, the more similar the two probability distributions are. By minimizing the edge loss, the edge distribution of the reconstructed image is closest to the edge distribution of the original image.

在一个可选的实施方式中，将所述低分辨率图像和所述关联映射输入至引导滤波模块，以原始输入图像作为引导图进行联合上采样得到超像素分割结果，包括：In an optional embodiment, the low-resolution image and the association map are input into a guided filtering module, and the original input image is used as a guided map for joint upsampling to obtain a superpixel segmentation result, including:

用线性模型对超像素结果与输入图像之间的关系进行建模，可表示为，其中，/>表示超像素分割结果，/>表示输入图像，/>和/>表示系数；The linear model is used to model the relationship between the superpixel result and the input image, which can be expressed as , where /> Represents the superpixel segmentation result, /> represents the input image, /> and/> represents the coefficient;

将局部窗口内的低分辨率图像/>和分割结果/>的映射关系表示为：The local window Low-resolution images within /> And segmentation results/> The mapping relationship is expressed as:

； ;

其中，表示像素索引，/>和/>分别表示/>和/>中的索引为/>的像素，/>表示矩形窗口/>的索引，/>和/>表示在窗口/>中该线性函数的系数；in, Represents the pixel index, /> and/> Respectively represent/> and/> The index in is /> pixels, /> Represents a rectangular window/> The index of and/> Displayed in the window /> The coefficients of the linear function in ;

通过最小化结果图像与原图的差异，可得：By minimizing the difference between the result image and the original image, we can get:

； ;

其中，表示/>中的像素个数，/>表示位于窗口/>中/>像素点的平均值，/>和分别表示/>中/>像素的方差和均值，/>表示正则化参数；/>在图像上以1为步长滑动，采用/>大小的窗口进行操作，每个像素点被9个窗口覆盖，对9组系数取均值即得低分辨率图像对应的映射系数/>和/>；in, Indicates/> The number of pixels in, /> Indicates that the window is Middle/> The average value of the pixels, /> and Respectively represent/> Middle/> The variance and mean of the pixels, /> represents the regularization parameter; /> Slide on the image in steps of 1, using /> The operation is performed with windows of different sizes. Each pixel is covered by 9 windows. The mapping coefficients corresponding to the low-resolution image are obtained by taking the average of the 9 groups of coefficients./> and/> ;

对和/>上采样生成/>和/>，最终得到高分辨率输出/>，公式为：right and/> Upsampling generation/> and/> , and finally get high-resolution output/> , the formula is:

； ;

超像素快速生成网络通过将原始输入图像作为引导图像，弥补图像下采样时造成的颜色和空间边缘信息损失。The superpixel fast generation network is constructed by transforming the original input image into As a guide image, it compensates for the loss of color and spatial edge information caused by image downsampling.

第二方面，本发明提供一种融合引导滤波的无监督邻域分类超像素生成系统，包括：In a second aspect, the present invention provides an unsupervised neighborhood classification superpixel generation system fused with guided filtering, comprising:

下采样模块，用于对输入图像进行下采样，得到低分辨率图像；A downsampling module is used to downsample the input image to obtain a low-resolution image;

特征提取模块，用于利用轻量化网络模型从所述低分辨率图像提取特征，所述轻量化网络模型包括多尺度金字塔注意力机制和深度可分离U-Net模块；A feature extraction module, configured to extract features from the low-resolution image using a lightweight network model, wherein the lightweight network model includes a multi-scale pyramid attention mechanism and a deep separable U-Net module;

分类处理模块，用于基于所述特征进行像素邻域分类，得到像素与九邻域超像素对的关联映射；A classification processing module, used for performing pixel neighborhood classification based on the features to obtain an association map between pixels and nine-neighborhood superpixel pairs;

上采样模块，用于将所述低分辨率图像和所述关联映射输入至引导滤波模块，以原始输入图像作为引导图进行联合上采样得到超像素分割结果。The upsampling module is used to input the low-resolution image and the association map into the guided filtering module, and perform joint upsampling using the original input image as the guided map to obtain a superpixel segmentation result.

本发明的有益效果在于，本发明提供的融合引导滤波的无监督邻域分类超像素生成方法及系统，提供无监督的图像超像素快速准确生成的分割方法，它可用于自然图像、室内图像以及医学图像等数字图像的分割。其中采用多尺度注意力机制、引导图像以及边缘损失函数，着重考虑图像的边缘信息，提高了边缘贴合度以及平滑程度。同时设计超像素快速生成框架以降低模型的参数量，加快模型的推理速度。The beneficial effect of the present invention is that the unsupervised neighborhood classification superpixel generation method and system provided by the present invention provide an unsupervised image superpixel fast and accurate generation segmentation method, which can be used for the segmentation of digital images such as natural images, indoor images and medical images. A multi-scale attention mechanism, a guided image and an edge loss function are used to focus on the edge information of the image, thereby improving the edge fit and smoothness. At the same time, a superpixel fast generation framework is designed to reduce the number of model parameters and speed up the model reasoning speed.

本发明提供的融合引导滤波的无监督邻域分类超像素生成方法及系统，具体包括以下优势：The unsupervised neighborhood classification superpixel generation method and system of fusion-guided filtering provided by the present invention specifically include the following advantages:

1）超像素快速生成框架采用快速引导滤波模块并结合原始分辨率图像进行联合上采样，更好地弥补下采样造成的图像信息的缺失，同时可以保留图像的边缘细节信息。1) The fast superpixel generation framework uses a fast guided filtering module and combines it with the original resolution image for joint upsampling to better compensate for the loss of image information caused by downsampling while retaining the edge detail information of the image.

2）将多尺度金字塔注意力机制与深度可分离U-Net模块结合组成轻量化特征提取网络，在多尺度金字塔注意力机制中，采用ECA通道注意力机制对不同尺度的特征进行加权，从而能够捕捉到图像中不同尺度的特征信息；然后将不同尺度的通道注意力特征进行融合，得到多尺度注意力特征，以进一步增强网络的特征表达能力。2) The multi-scale pyramid attention mechanism is combined with the deep separable U-Net module to form a lightweight feature extraction network. In the multi-scale pyramid attention mechanism, the ECA channel attention mechanism is used to weight features of different scales, so that feature information of different scales in the image can be captured; then the channel attention features of different scales are fused to obtain multi-scale attention features to further enhance the feature expression ability of the network.

3）在深度卷积阶段，每个输入通道都会单独应用一个滤波器，从而在保留空间信息的同时减少了参数数量。而逐点卷积阶段则利用1x1卷积核将深度卷积的输出映射到所需的输出通道，以此来获得最终的特征表示。在该模块中，通过跳跃连接将模块中不同维度的输入直接与相同维度的输出相连接，以便将更多的低级特征信息传递到高级特征层中，这可以使模型学习到不同层次的特征表示，从而使模型提取的特征能够更好地保留颜色以及空间的细节信息。3) In the deep convolution stage, a filter is applied to each input channel separately, thereby reducing the number of parameters while retaining spatial information. The point-by-point convolution stage uses a 1x1 convolution kernel to map the output of the deep convolution to the required output channel to obtain the final feature representation. In this module, the inputs of different dimensions in the module are directly connected to the outputs of the same dimension through jump connections, so that more low-level feature information can be passed to the high-level feature layer, which can enable the model to learn feature representations at different levels, so that the features extracted by the model can better retain color and spatial detail information.

4）采用鲁棒的损失函数，充分考虑图像的边缘信息，通过比较原始图像边缘与重构图像、超像素之间的分布差异，促进模型生成更接近原始图像的超像素分割结果。保证超像素分割的准确性。4) A robust loss function is used to fully consider the edge information of the image. By comparing the distribution differences between the edge of the original image and the reconstructed image and superpixels, the model is promoted to generate superpixel segmentation results that are closer to the original image, ensuring the accuracy of superpixel segmentation.

此外，本发明设计原理可靠，结构简单，具有非常广泛的应用前景。In addition, the invention has a reliable design principle, a simple structure and a very broad application prospect.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, for ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1是本发明一个实施例的方法的示意性流程图。FIG1 is a schematic flow chart of a method according to an embodiment of the present invention.

图2是本发明一个实施例的方法的另一示意性流程图。FIG. 2 is another schematic flow chart of a method according to an embodiment of the present invention.

图3 是本发明一个实施例的方法的原理图。FIG3 is a schematic diagram of a method according to an embodiment of the present invention.

图4 是本发明一个实施例的方法的多尺度金字塔注意力模块的示意图。FIG. 4 is a schematic diagram of a multi-scale pyramid attention module of a method according to an embodiment of the present invention.

图5 是本发明一个实施例的方法的ECA通道注意力模块的示意图。FIG. 5 is a schematic diagram of an ECA channel attention module of a method according to an embodiment of the present invention.

图6 是本发明一个实施例的方法的分割效果图。FIG. 6 is a diagram showing the segmentation effect of a method according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明中的技术方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the technical solutions in the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of the present invention.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本发明。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as those commonly understood by those skilled in the art of the present invention. The terms used in the specification of the present invention herein are only for the purpose of describing specific embodiments and are not intended to limit the present invention.

本发明实施例提供的融合引导滤波的无监督邻域分类超像素生成方法由计算机设备执行，相应地，融合引导滤波的无监督邻域分类超像素生成系统运行于计算机设备中。The unsupervised neighborhood classification superpixel generation method of fusion guided filtering provided in an embodiment of the present invention is executed by a computer device, and correspondingly, the unsupervised neighborhood classification superpixel generation system of fusion guided filtering runs in the computer device.

图1是本发明一个实施例的方法的示意性流程图。其中，图1执行主体可以为一种融合引导滤波的无监督邻域分类超像素生成系统。根据不同的需求，该流程图中步骤的顺序可以改变，某些可以省略。FIG1 is a schematic flow chart of a method according to an embodiment of the present invention. The execution subject of FIG1 may be an unsupervised neighborhood classification superpixel generation system with fusion guided filtering. According to different requirements, the order of the steps in the flow chart may be changed, and some may be omitted.

如图1所示，该方法包括：As shown in FIG1 , the method includes:

步骤110，对输入图像进行下采样，得到低分辨率图像；Step 110, down-sampling the input image to obtain a low-resolution image;

步骤120，利用轻量化网络模型从所述低分辨率图像提取特征，所述轻量化网络模型包括多尺度金字塔注意力机制和深度可分离U-Net模块；Step 120, extracting features from the low-resolution image using a lightweight network model, wherein the lightweight network model includes a multi-scale pyramid attention mechanism and a depth-separable U-Net module;

步骤130，基于所述特征进行像素邻域分类，得到像素与九邻域超像素对的关联映射；Step 130, performing pixel neighborhood classification based on the features to obtain an association map between pixels and nine-neighborhood superpixel pairs;

步骤140，将所述低分辨率图像和所述关联映射输入至引导滤波模块，以原始输入图像作为引导图进行联合上采样得到超像素分割结果。Step 140: Input the low-resolution image and the association map into a guided filtering module, and perform joint upsampling using the original input image as a guided map to obtain a superpixel segmentation result.

为了便于对本发明的理解，下面以本发明融合引导滤波的无监督邻域分类超像素生成方法的原理，结合实施例及附图，对本发明提供的融合引导滤波的无监督邻域分类超像素生成方法做进一步的描述。To facilitate understanding of the present invention, the following is a further description of the unsupervised neighborhood classification superpixel generation method using fusion guided filtering provided by the present invention based on the principle of the unsupervised neighborhood classification superpixel generation method using fusion guided filtering, in combination with the embodiments and drawings.

具体的，请参考图2和图3，所述融合引导滤波的无监督邻域分类超像素生成方法包括：Specifically, please refer to FIG. 2 and FIG. 3 , the unsupervised neighborhood classification superpixel generation method of fusion guided filtering includes:

S1、将输入图像转换为低分辨率图像。S1. Convert the input image to a low-resolution image.

输入图像为一幅的自然图像/>，首先将图像/>进行下采样得到低分辨率输入图像/>，并将其空间坐标信息添加到图像的特征向量中，定义为一个扩展性映射，将低分辨率输入图像/>嵌入到5维的特征向量/>并进行归一化，得到初始的特征向量。The input image is a Natural images of , first of all, the image /> Downsample to get a low-resolution input image/> , and add its spatial coordinate information to the feature vector of the image, defined as an extensible mapping , the low resolution input image/> Embedded into 5-dimensional feature vector/> And normalize it to get the initial eigenvector.

S2、特征提取。S2. Feature extraction.

多尺度金字塔注意力机制模块（如图4所示），结合了空洞空间卷积池化金字塔和ECA通道注意力机制，在不同尺度上对特征进行了更细致的提取和加权，从而获得了更具表征能力的特征表示。具体来说，首先，采用不同空洞率r = 0、1、2的卷积对图像输入特征在不同尺度上进行特征提取；然后，采用ECA通道注意力机制对不同尺度的特征进行加权，从而能够捕捉到图像中不同尺度的特征信息。最后，将不同尺度的通道注意力特征进行融合，得到多尺度注意力特征，以进一步增强网络的特征的表达能力，The multi-scale pyramid attention mechanism module (as shown in Figure 4) combines the dilated spatial convolutional pooling pyramid and the ECA channel attention mechanism to extract and weight features more carefully at different scales, thereby obtaining a more representative feature representation. Specifically, first, convolutions with different dilation rates r = 0, 1, and 2 are used to filter the image input features. Feature extraction is performed at different scales; then, the ECA channel attention mechanism is used to weight features of different scales, so that feature information of different scales in the image can be captured. Finally, the channel attention features of different scales are fused to obtain multi-scale attention features to further enhance the feature expression ability of the network.

其中，*为卷积运算，为扩张率0、1、2的空洞卷积，/>是由ReLU实现的非线性函数。Among them, * is the convolution operation, is a dilated convolution with dilation rates of 0, 1, and 2,/> It is a nonlinear function implemented by ReLU.

ECA通道注意力模块（如图5所示）相比于传统的二维卷积操作，ECA模块采用了一维卷积操作来提取通道间的关系，这不仅使得参数量较少，计算复杂度也更低。因此，ECA模块在提升模型性能的同时，并不会明显增加模型的参数量和计算资源的消耗。通过一维卷积操作，ECA模块能够有效地提取特征图中通道间的相关性和依赖关系。在计算通道注意力权重时，采用了局部卷积操作，即仅考虑特征图中局部区域的通道间关系。这种局部化的操作有助于减少局部信息的混淆，使得模块更加关注图像中重要的特征，同时，ECA模块能够全局地影响整个特征图，确保了模块对图像全局信息的感知。Compared with the traditional two-dimensional convolution operation, the ECA channel attention module (as shown in Figure 5) uses a one-dimensional convolution operation to extract the relationship between channels, which not only reduces the number of parameters but also reduces the computational complexity. Therefore, while improving the performance of the model, the ECA module does not significantly increase the number of model parameters and the consumption of computing resources. Through the one-dimensional convolution operation, the ECA module can effectively extract the correlation and dependency between channels in the feature map. When calculating the channel attention weight, a local convolution operation is used, that is, only the relationship between channels in the local area of the feature map is considered. This localized operation helps to reduce the confusion of local information, allowing the module to pay more attention to important features in the image. At the same time, the ECA module can globally affect the entire feature map, ensuring the module's perception of the global information of the image.

分别将空洞卷积的输出特征向量进行全局平均池化操作，得到的结果记为/>：The output feature vectors of the dilated convolution are Perform a global average pooling operation, and the result is recorded as/> :

其中，和/>分别是输入特征图的高度和宽度。然后，将/>通过1维卷积层，得到的结果记为/>:in, and/> are the height and width of the input feature map respectively. Then, /> Through the 1D convolution layer, the result is recorded as/> :

其中，是1维卷积核，/>是卷积核的大小。将/>通过/>激活函数，得到权重结果/>：in, is a 1D convolution kernel, /> is the size of the convolution kernel. By/> Activation function, get the weight result/> :

最后，将输入特征图和/>逐元素相乘，得到ECA模块的输出特征向量：Finally, the input feature map and/> Multiply element by element to get the output feature vector of the ECA module:

。 .

采用的深度可分离U-Net模块（Depth-separable U-Net module，DSUM）是在U-Net的基础上进行了改进和优化，其作为轻型U-Net模块的组成部分，采用了一种更加高效的卷积方式，以提高模型的推理速度和计算效率。深度可分离卷积将标准卷积操作分解为两个步骤：深度卷积和逐点卷积。在深度卷积阶段，每个输入通道都会单独应用一个滤波器，从而在保留空间信息的同时减少了参数数量。而逐点卷积阶段则利用1x1卷积核将深度卷积的输出映射到所需的输出通道，以此来获得最终的特征表示。在该模块中，通过跳跃连接将模块中不同维度的输入直接与相同维度的输出相连接，以便将更多的低级特征信息传递到高级特征层中，这可以使模型学习到不同层次的特征表示，从而使模型提取的特征能够更好地保留颜色以及空间的细节信息。The depth-separable U-Net module (DSUM) used is an improvement and optimization based on U-Net. As a component of the lightweight U-Net module, it adopts a more efficient convolution method to improve the inference speed and computational efficiency of the model. The depth-separable convolution decomposes the standard convolution operation into two steps: depth convolution and point-by-point convolution. In the depth convolution stage, a filter is applied to each input channel separately, thereby reducing the number of parameters while retaining spatial information. The point-by-point convolution stage uses a 1x1 convolution kernel to map the output of the depth convolution to the required output channel to obtain the final feature representation. In this module, the inputs of different dimensions in the module are directly connected to the outputs of the same dimension through jump connections, so that more low-level feature information can be passed to the high-level feature layer, which can enable the model to learn feature representations at different levels, so that the features extracted by the model can better retain color and spatial detail information.

S3、像素分类得到像素-九邻域超像素关联映射。S3. Pixel classification obtains pixel-nine-neighborhood superpixel association mapping.

基于所述特征计算像素与邻域超像素的相似度，并将所述相似度转换为分配的概率；将所述像素分配给概率最大的邻域超像素。The similarity between the pixel and the neighboring superpixel is calculated based on the feature, and the similarity is converted into a probability of assignment; and the pixel is assigned to the neighboring superpixel with the maximum probability.

超像素分割目的是将像素分配到超像素集合中的某一个超像素中，每一个超像素代表一组像素的集合，参数/>是超像素数量的上界。但是计算所有像素-超像素对的关联映射是不必要且计算昂贵的。相反，对于给定的像素p，限制范围为邻域的9个网格单元集/>。对于每个像素p，只考虑将其对周围9个邻域网格进行分配。因此，可以将像素-超像素对的关联映射表示为一个张量/>，其中/>。The goal of superpixel segmentation is to assign pixels to sets of superpixels. In a superpixel in , each superpixel represents a set of pixels, and the parameter /> is an upper bound on the number of superpixels. However, computing the association map for all pixel-superpixel pairs is unnecessary and computationally expensive. Instead, for a given pixel p, the bound is set to the set of 9 grid cells in the neighborhood For each pixel p, only the nine neighboring grids around it are considered for assignment. Therefore, the association map of pixel-superpixel pairs can be represented as a tensor/> , where/> .

将超像素分割看作为九邻域分类任务，让[1]作为超像素的概率表示。在/>处的像素通过/>分配给邻域超像素，通过优化定义的目标损失函数来获得/>。Consider superpixel segmentation as a nine-neighborhood classification task, [1] As a probabilistic representation of superpixels. The pixel at / > Assigned to the neighborhood superpixel, obtained by optimizing the defined objective loss function/> .

S4、快速引导滤波模块进行联合上采样。S4, fast guided filtering module for joint upsampling.

超像素快速生成网络的难点在于如何通过低分辨率图像的分割结果得到高分辨率图像分割结果，关键是挖掘输入图像与其分割结果之间的关系。图像超像素生成任务的目标是要准确地检测并保持图像边缘，因此可用线性模型对超像素结果与输入图像之间的关系进行建模，可表示为，其中，/>表示超像素分割结果，/>表示输入图像，/>和/>表示系数。同一幅图像中，不同分辨率的输入与超像素分割结果应具有相同的线性关系。基于此，已知低分辨率的输入/>与超像素结果/>，通过学习二者之间的线性关系，并将其应用于原始输入分辨率图像，即可产生原始分辨率的图像处理结果。局部窗口/>内的低分辨率图像/>和分割结果/>的映射关系可表示为：The difficulty of the superpixel fast generation network lies in how to obtain the high-resolution image segmentation result through the low-resolution image segmentation result. The key is to explore the relationship between the input image and its segmentation result. The goal of the image superpixel generation task is to accurately detect and maintain the image edge. Therefore, a linear model can be used to model the relationship between the superpixel result and the input image, which can be expressed as , where /> Represents the superpixel segmentation result, /> represents the input image, /> and/> In the same image, inputs of different resolutions should have the same linear relationship with superpixel segmentation results. Based on this, it is known that the low-resolution input And superpixel results/> By learning the linear relationship between the two and applying it to the original input resolution image, the image processing result of the original resolution can be generated. Local window/> Low-resolution images within /> And segmentation results/> The mapping relationship can be expressed as:

； ;

其中，表示像素索引，/>和/>分别表示/>和/>中的索引为/>的像素，/>表示矩形窗口/>的索引，/>和/>表示在窗口/>中该线性函数的系数。通过最小化结果图像与原图的差异，可得：in, Represents the pixel index, /> and/> Respectively represent/> and/> The index in is /> pixels, /> Represents a rectangular window/> The index of and/> Displayed in the window /> By minimizing the difference between the result image and the original image, we can get:

； ;

其中，表示/>中的像素个数，/>表示位于窗口/>中/>像素点的平均值，/>和分别表示/>中/>像素的方差和均值，/>表示正则化参数。/>在图像上以1为步长滑动，每个像素会被多个窗口覆盖，对应得到多组系数。本章中采用/>大小的窗口进行操作，每个像素点会被9个窗口覆盖，对9组系数取均值即得低分辨率图像对应的映射系数/>和/>。对和/>上采样生成/>和/>，最终得到高分辨率输出/>，公式为：in, Indicates/> The number of pixels in, /> Indicates that the window is Middle/> The average value of the pixels, /> and Respectively represent/> Middle/> The variance and mean of the pixels, /> represents the regularization parameter. /> Slide on the image with a step size of 1, each pixel will be covered by multiple windows, and multiple sets of coefficients will be obtained accordingly. This chapter uses /> Each pixel will be covered by 9 windows. The mapping coefficients corresponding to the low-resolution image are obtained by taking the average of the 9 groups of coefficients. and/> .right and/> Upsampling generation/> and/> , and finally get high-resolution output/> , the formula is:

； ;

超像素快速生成网络通过将原始输入图像作为引导图像，弥补图像下采样时造成的颜色和空间边缘信息损失，保证了上采样生成超像素的准确性。The superpixel fast generation network is constructed by transforming the original input image into As a guide image, it compensates for the loss of color and spatial edge information caused by image downsampling, ensuring the accuracy of superpixels generated by upsampling.

S5、计算目标损失函数，通过梯度下降更新模型参数。S5. Calculate the target loss function and update the model parameters by gradient descent.

目标损失函数如下:The objective loss function is as follows:

； ;

其中，表示聚类损失，/>表示重构损失，/>表示边界平滑损失，L_edge表示原始分辨率输出的损失。/>、/>和/>表示权重系数，在实验中分别设为1，1，2.5。in, represents the clustering loss,/> represents the reconstruction loss,/> represents the boundary smoothing loss, and L _edge represents the loss of the original resolution output. /> 、/> and/> Represents the weight coefficient, which is set to 1, 1, and 2.5 respectively in the experiment.

（1）聚类损失 (1) Clustering loss

为保证超像素生成的准确性和均匀性，我们借助文献[34]提出的聚类损失，考虑每个像素的分配概率以及超像素的均匀性：To ensure the accuracy and uniformity of superpixel generation, we use the clustering loss proposed in [34] to consider the assignment probability of each pixel and the uniformity of superpixels:

. .

其中，表示像素与周围9个超像素中第S个超像素的平均概率。第一项表示各像素的熵/>，保证了超像素生成的准确性。第二项是各像素平均向量的负熵，促使超像素的形成更加均匀。通过权衡准确性和均匀性，获得更具空间一致性和均匀性的分割结果。in, Represents the average probability of the pixel and the Sth superpixel in the surrounding 9 superpixels. The first term represents the entropy of each pixel/> , which ensures the accuracy of superpixel generation. The second term is the negative entropy of the average vector of each pixel, which makes the formation of superpixels more uniform. By weighing accuracy and uniformity, a more spatially consistent and uniform segmentation result is obtained.

（2）重构损失 (2) Reconstruction loss

为了保证学习到的图像特征的有效性和可靠性，在GRM中采用FRL将学习到的聚类特征进行重建，得到重建图像特征，采用均方误差(mean squared error，MSE)计算重构损失，公式为In order to ensure the validity and reliability of the learned image features, FRL is used in GRM to reconstruct the learned clustering features to obtain the reconstructed image features , the reconstruction loss is calculated using mean squared error (MSE), and the formula is:

. .

其中,表示颜色特征的重构误差，公式为：Among them, represents the reconstruction error of color features, and the formula is:

. .

； ;

表示/>和/>之间的权重参数，/>和/>分别表示图像的高和宽，/>表示图像特征通道序号，/>表示图像颜色特征通道数，/>表示图像空间特征通道数，/>，/>。 Indicates/> and/> The weight parameter between and/> Respectively represent the height and width of the image, /> Indicates the image feature channel number, /> Indicates the number of image color feature channels, /> Indicates the number of image spatial feature channels, /> ,/> .

（3）边界平滑损失 (3) Boundary smoothing loss

为了使图像超像素在边界处与输入图像保持一致，采用高斯核函数计算输入图像与预测的像素软分配/>之间的关系，公式为In order to make the image superpixel consistent with the input image at the boundary, the Gaussian kernel function is used to calculate the input image Soft allocation of pixels with prediction/> The relationship between

； ;

其中，表示在分割结果软分配/>或原始输入图像的/>和/>方向的梯度运算，/>和表示原始输入图像的高度和宽度，/>和/>分别表示/>和输入图像/>的第/>个像素的软分配向量和特征，在实验中将/>设为8。in, Indicates soft allocation in the segmentation result/> or the original input image/> and/> Directional gradient operation, /> and Indicates the height and width of the original input image, /> and/> Respectively represent/> and input image/> The first/> The soft allocation vector and features of pixels are used in the experiment. Set to 8.

（4）边缘损失 (4) Edge loss

为进一步增加边界的贴合性和分割准确性，引导模型更好地学习边缘信息，受到文献[37]启发，采用该损失函数着重考虑输入图像、像素-超像素关联映射/>以及重建图像/>中的边缘信息，并使用KL散度来衡量它们之间的边缘分布的相似性，公式为In order to further improve the boundary fit and segmentation accuracy and guide the model to better learn edge information, inspired by the literature [37], this loss function is used to focus on the input image , pixel-superpixel association map/> and reconstruct the image/> The edge information in , and use KL divergence to measure the similarity of the edge distribution between them, the formula is

. .

通过使用3×3的拉普拉斯核对图像进行边缘计算，然后通过Softmax操作将其转换到[0，1]范围内，得到边缘图、/>和/>。随后，通过计算/>、/>和/>之间的KL散度，来衡量两个概率分布之间的差异，边缘损失值越低，表示两个概率分布越相似。因此，通过最小化边缘损失，我们可以使重建的图像边缘分布与原始图像的边缘分布尽可能接近，从而保留重要的边缘信息。The edge map is obtained by using a 3×3 Laplacian kernel to calculate the edge of the image and then converting it to the range of [0, 1] through a Softmax operation. 、/> and/> . Then, by calculating/> 、/> and/> The KL divergence between is used to measure the difference between the two probability distributions. The lower the edge loss value, the more similar the two probability distributions are. Therefore, by minimizing the edge loss, we can make the edge distribution of the reconstructed image as close as possible to the edge distribution of the original image, thereby retaining important edge information.

无监督图像像素邻域分类的超像素快速生成方法在超像素分割中取得了较好的表现，与其他无监督的超像素分割方法使用相同的实验数据进行了验证，在BSDS500数据集中的测试集上进行实验，其中包括像素分辨率481321和321/>481的200幅图像。使用超像素图像分割中常见的三种指标衡量估计：The superpixel fast generation method for unsupervised image pixel neighborhood classification has achieved good performance in superpixel segmentation. It has been verified using the same experimental data as other unsupervised superpixel segmentation methods. The experiments were conducted on the test set in the BSDS500 dataset, which includes a pixel resolution of 481 321 and 321/> 200 images of 481. The estimates are measured using three common metrics in superpixel image segmentation:

（1）可达分割准确性ASA（Achievable Segmentation Accuracy）(1) Achievable Segmentation Accuracy (ASA)

用于评估超像素分割的整体准确度，它考虑了超像素之间的空间相邻关系，是一种基于连通性的度量。ASA的计算公式为：Used to evaluate the overall accuracy of superpixel segmentation, it takes into account the spatial adjacency between superpixels and is a connectivity-based metric. The calculation formula of ASA is:

， ,

其中，N表示图像中的超像素数量，是算法分割的第/>个超像素，/>是对应的真实标签的第/>个超像素。ASA的取值范围在0到1之间，数值越接近1表示分割结果与真实标签越接近。Where N is the number of superpixels in the image, It is the algorithm segmentation of the superpixels, /> is the corresponding true label /> The value of ASA ranges from 0 to 1. The closer the value is to 1, the closer the segmentation result is to the true label.

（2）边界召回率BR（Boundary Recall）(2) Boundary Recall BR

用于评估超像素分割的边界检测能力，它衡量了算法提取出的边界数量与真实边界数量之间的比例。BR的计算公式为：It is used to evaluate the boundary detection ability of superpixel segmentation. It measures the ratio between the number of boundaries extracted by the algorithm and the number of true boundaries. The calculation formula of BR is:

， ,

其中，表示真实标签的边界集合，/>表示超像素算法生成的超像素结果边界集合。BR的取值范围在0到1之间，数值越接近1表示算法提取出的边界越接近真实边界。in, represents the boundary set of the true label,/> Represents the set of superpixel result boundaries generated by the superpixel algorithm. The value range of BR is between 0 and 1. The closer the value is to 1, the closer the boundary extracted by the algorithm is to the real boundary.

（3）欠分割误差USE（UnderSegmentation Error）(3) UnderSegmentation Error USE

用于评估超像素分割中欠分割的情况，即将同一对象分割成多个超像素的情况。USE的计算公式为：It is used to evaluate the under-segmentation in superpixel segmentation, that is, the situation where the same object is segmented into multiple superpixels. The calculation formula of USE is:

， ,

其中，是算法分割的第/>个超像素，/>是对应的真实标签的第/>个超像素。USE的取值范围在0到1之间，数值越接近0表示分割结果与真实标签的重合度越高.in, It is the algorithm segmentation of the superpixels, /> is the corresponding true label /> The value range of USE is between 0 and 1. The closer the value is to 0, the higher the overlap between the segmentation result and the true label.

在一些实施例中，所述融合引导滤波的无监督邻域分类超像素生成系统可以包括多个由计算机程序段所组成的功能模块。所述融合引导滤波的无监督邻域分类超像素生成系统中的各个程序段的计算机程序可以存储于计算机设备的存储器中，并由至少一个处理器所执行，以执行（详见图1描述）融合引导滤波的无监督邻域分类超像素生成的功能。In some embodiments, the unsupervised neighborhood classification superpixel generation system of fusion guided filtering may include multiple functional modules composed of computer program segments. The computer program of each program segment in the unsupervised neighborhood classification superpixel generation system of fusion guided filtering may be stored in the memory of a computer device and executed by at least one processor to perform (see FIG. 1 for details) the function of unsupervised neighborhood classification superpixel generation of fusion guided filtering.

本实施例中，所述融合引导滤波的无监督邻域分类超像素生成系统根据其所执行的功能，可以被划分为多个功能模块。本发明所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机程序段，其存储在存储器中。在本实施例中，关于各模块的功能将在后续的实施例中详述。In this embodiment, the unsupervised neighborhood classification superpixel generation system of fusion guided filtering can be divided into multiple functional modules according to the functions performed. The module referred to in the present invention refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, which are stored in a memory. In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

可选地，作为本发明一个实施例，还包括：Optionally, as an embodiment of the present invention, it further includes:

； ;

可选地，作为本发明一个实施例，利用轻量化网络模型从所述低分辨率图像提取特征，包括：Optionally, as an embodiment of the present invention, extracting features from the low-resolution image using a lightweight network model includes:

可选地，作为本发明一个实施例，基于所述特征进行像素邻域分类，得到像素与九邻域超像素对的关联映射，包括：Optionally, as an embodiment of the present invention, pixel neighborhood classification is performed based on the features to obtain an association mapping between pixels and nine-neighborhood superpixel pairs, including:

聚类损失；Clustering loss ;

重构损失；Reconstruction loss ;

； ;

边界平滑损失；Boundary smoothing loss ;

原始分辨率输出的损失；Loss of original resolution output ;

可选地，作为本发明一个实施例，将所述低分辨率图像和所述关联映射输入至引导滤波模块，以原始输入图像作为引导图进行联合上采样得到超像素分割结果，包括：Optionally, as an embodiment of the present invention, the low-resolution image and the association map are input into a guided filtering module, and the original input image is used as a guided map for joint upsampling to obtain a superpixel segmentation result, including:

； ;

尽管通过参考附图并结合优选实施例的方式对本发明进行了详细描述，但本发明并不限于此。在不脱离本发明的精神和实质的前提下，本领域普通技术人员可以对本发明的实施例进行各种等效的修改或替换，而这些修改或替换都应在本发明的涵盖范围内/任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应所述以权利要求的保护范围为准。Although the present invention has been described in detail by referring to the accompanying drawings and in combination with the preferred embodiments, the present invention is not limited thereto. Without departing from the spirit and essence of the present invention, a person of ordinary skill in the art may make various equivalent modifications or substitutions to the embodiments of the present invention, and these modifications or substitutions shall be within the scope of the present invention. Any person skilled in the art who is familiar with the present technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention, and all of these shall be within the scope of protection of the present invention. Therefore, the scope of protection of the present invention shall be subject to the scope of protection of the claims.

Claims

1. A method for generating superpixels by unsupervised neighborhood classification by fusion guided filtering, characterized by comprising:

Downsample the input image to obtain a low-resolution image;

Extracting features from the low-resolution image using a lightweight network model, wherein the lightweight network model includes a multi-scale pyramid attention mechanism and a deep separable U-Net module;

Perform pixel neighborhood classification based on the features to obtain association mapping between pixels and nine-neighborhood superpixel pairs;

The low-resolution image and the association map are input into a guided filtering module, and the original input image is used as a guided map for joint upsampling to obtain a superpixel segmentation result.

2. The method according to claim 1, characterized in that the method further comprises:

Use the gradient rescaling module to sequentially train the lightweight network model;

Construct a target loss function, and use the target loss function to optimize the lightweight network model. The target loss function includes:

;

in, represents the clustering loss,/> represents the reconstruction loss,/> represents boundary smoothing loss, L _edge represents edge loss, /> 、/> and/> Represents the weight coefficient.

3. The method according to claim 1, characterized in that extracting features from the low-resolution image using a lightweight network model comprises:

The multi-scale pyramid attention mechanism uses convolutions with various dilation rates to extract features from the initial feature vector of the low-resolution image at multiple scales.

The ECA channel attention mechanism is used to weight features of multiple scales to capture feature information of different scales of low-resolution images;

The channel attention features of multiple scales are fused to obtain multi-scale attention features;

The deep separable U-Net module performs deep convolution and point-wise convolution on the multi-scale attention features to obtain color-preserving and spatial features;

The gradient rescaling module is used to reconstruct features and obtain the features of the low-resolution image.

4. The method according to claim 1, characterized in that pixel neighborhood classification is performed based on the features to obtain an association map between pixels and nine-neighborhood superpixel pairs, comprising:

Calculating the similarity between the pixel and the neighboring superpixel based on the feature, and converting the similarity into a probability of assignment;

The pixel is assigned to the neighboring superpixel with the highest probability.

5. The method according to claim 2, characterized in that the method further comprises:

Clustering loss ;

in, Represents the average probability of a pixel and the Sth superpixel among the surrounding 9 superpixels; the first term represents the entropy of each pixel/> , which ensures the accuracy of superpixel generation; the second item is the negative entropy of the average vector of each pixel, which makes the formation of superpixels more uniform.

6. The method according to claim 2, characterized in that the method further comprises:

Reconstruction loss ;

L _c represents the reconstruction error of color features, and the formula is:

;

Represents the reconstruction error of spatial features, and the formula is:

;

Indicates/> and/> The weight parameter between and/> Respectively represent the height and width of the image, /> Indicates the image feature channel number, /> Indicates the number of image color feature channels, /> Indicates the number of image spatial feature channels.

7. The method according to claim 2, characterized in that the method further comprises:

Boundary smoothing loss ;

in, Indicates soft allocation in the segmentation result/> or the original input image/> and/> Directional gradient operation, /> and/> Indicates the height and width of the original input image, /> and/> Respectively represent/> and input image/> The first/> The soft assigned vectors and features of pixels.

8. The method according to claim 2, characterized in that the method further comprises:

Loss of original resolution output ;

The edge map is obtained by using a 3×3 Laplacian kernel to calculate the edge of the image and then converting it to the range of [0, 1] through a Softmax operation. 、/> and/> ; Then, by calculating /> 、/> and/> The KL divergence between them is used to measure the difference between the two probability distributions. The lower the edge loss value, the more similar the two probability distributions are. By minimizing the edge loss, the edge distribution of the reconstructed image is closest to the edge distribution of the original image.

9. The method according to claim 1, characterized in that the low-resolution image and the association map are input into a guided filtering module, and the original input image is used as a guided map for joint upsampling to obtain a superpixel segmentation result, comprising:

The relationship between the superpixel result and the input image is modeled using a linear model, which can be expressed as ,in, Represents the superpixel segmentation result, /> represents the input image, /> and/> represents the coefficient;

The local window Low-resolution images within /> And segmentation results/> The mapping relationship is expressed as:

;

in, Represents the pixel index, /> and/> Respectively represent/> and/> The index in is /> pixels, /> Represents a rectangular window/> The index of and/> Displayed in the window /> The coefficients of the linear function in ;

By minimizing the difference between the result image and the original image, we can get:

;

in, Indicates/> The number of pixels in, /> Indicates that the window is Middle/> The average value of the pixels, /> and/> Respectively represent/> Middle/> The variance and mean of the pixels, /> represents the regularization parameter; /> Slide on the image with a step size of 1, using The operation is performed with windows of different sizes. Each pixel is covered by 9 windows. The mapping coefficients corresponding to the low-resolution image are obtained by taking the average of the 9 groups of coefficients./> and/> ;

right and/> Upsampling generation/> and/> , and finally get high-resolution output/> , the formula is:

;

The superpixel fast generation network is constructed by transforming the original input image into As a guide image, it compensates for the loss of color and spatial edge information caused by image downsampling.

10. An unsupervised neighborhood classification superpixel generation system fused with guided filtering, characterized by comprising:

A downsampling module is used to downsample the input image to obtain a low-resolution image;

A feature extraction module, configured to extract features from the low-resolution image using a lightweight network model, wherein the lightweight network model includes a multi-scale pyramid attention mechanism and a deep separable U-Net module;

A classification processing module, used for performing pixel neighborhood classification based on the features to obtain an association map between pixels and nine-neighborhood superpixel pairs;

The upsampling module is used to input the low-resolution image and the association map into the guided filtering module, and perform joint upsampling using the original input image as the guided map to obtain a superpixel segmentation result.