CN111027576B

CN111027576B - Co-saliency detection method based on co-saliency generative adversarial network

Info

Publication number: CN111027576B
Application number: CN201911368623.3A
Authority: CN
Inventors: 钱晓亮; 白臻; 任航丽; 曾黎; 邢培旭; 程塨; 姚西文; 刘向龙; 岳伟超; 王芳; 刘玉翠; 赵素娜; 王慰; 毋媛媛; 吴青娥
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou University of Light Industry
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-10-30
Anticipated expiration: 2039-12-26
Also published as: CN111027576A

Abstract

The present invention proposes a collaborative saliency detection method based on a collaborative saliency generative adversarial network. The steps are as follows: constructing a collaborative saliency generative adversarial network model; performing two-stage training on the collaborative saliency generative adversarial network model: in the first step In the first training stage, the collaborative saliency generative adversarial network is pre-trained with a labeled salient target database. After this stage of training, the collaborative saliency generative adversarial network has the ability to detect the saliency of a single image; in the second In the training stage, based on the model parameters trained in the first stage, the synergistic saliency generative adversarial network is trained by using the labeled synergistic saliency data group whose synergistic saliency targets belong to the same category. After this stage of training, the trained adversarial network can be directly used The model performs class co-saliency detection. The invention has the advantages of simple training process, high detection efficiency, strong versatility and high accuracy.

Description

Co-saliency detection method based on co-saliency generative adversarial network

技术领域technical field

本发明涉及计算机视觉与机器学习的技术领域，尤其涉及一种基于协同显著性生成式对抗网络的协同显著性检测方法。The invention relates to the technical fields of computer vision and machine learning, and in particular to a collaborative saliency detection method based on a synergistic saliency generative confrontation network.

背景技术Background technique

随着大数据时代的到来，可以提供各种信息资源的网站和存储移动设备的出现，大量图像和视频的数字化信息充斥着我们的生活，怎样赋予计算机快速并准确获取和保留更多有效信息的能力显得十分有必要。协同显著性检测基于人类生物视觉注意机制，通过检测多幅相关场景图像以及视频中的公共显著性目标，提取能够代表这类图像的有效目标信息，自动过滤图像中的冗余和噪音，降低算法的时间和空间复杂度，从而实现计算资源的优先分配，提高后续图像任务的执行效率。With the advent of the era of big data, the emergence of websites and mobile storage devices that can provide various information resources, a large number of digital information of images and videos are flooding our lives, how to enable computers to quickly and accurately acquire and retain more effective information? Ability appears to be necessary. Collaborative saliency detection is based on the human biological visual attention mechanism. By detecting multiple related scene images and common saliency targets in videos, it can extract effective target information that can represent such images, automatically filter redundancy and noise in images, and reduce algorithms. The time and space complexity is reduced, so as to realize the preferential allocation of computing resources and improve the execution efficiency of subsequent image tasks.

现有的协同显著性检测方法有多种，单幅图像显著性特征的提取和多幅图像间相似性线索的捕捉是该任务涉及到的关键环节。随着深度学习的发展，可根据方法是否采用深度学习技术将现有的协同显著性方法分为两类。基于非深度学习的方法往往基于一些手工设计特征以及人为设置的相似性度量准则进行协同显著性检测，导致提取出来的特征与相似性信息限制了检测性能，对检测精度影响较大。另一类基于深度学习的协同显著性检测方法与传统协同显著性检测方法相比，由深度模型提取出来的信息更具有有效性，大大提升了协同显著性检测性能。但现有带标注的协同显著性数据量规模有限，对深度学习技术的应用形成了一定制约。There are many existing collaborative saliency detection methods. The extraction of saliency features of a single image and the capture of similarity cues between multiple images are the key links involved in this task. With the development of deep learning, existing collaborative saliency methods can be divided into two categories according to whether the method adopts deep learning techniques or not. Non-deep learning-based methods often perform collaborative saliency detection based on some hand-designed features and artificially set similarity measurement criteria, resulting in the extracted features and similarity information that limit the detection performance and have a greater impact on the detection accuracy. Another type of deep learning-based collaborative saliency detection method is more effective than the traditional collaborative saliency detection method, and the information extracted by the deep model is more effective, which greatly improves the performance of collaborative saliency detection. However, the existing annotated collaborative saliency data is limited in size, which restricts the application of deep learning technology.

发明内容SUMMARY OF THE INVENTION

针对现有基于深度学习的协同显著性检测方法受训练数据量不足的局限，从而对检测精度影响较大的技术问题，本发明提出一种基于协同显著性生成式对抗网络的协同显著性检测方法，有效利用了单幅图像的显著性与同类别图像组的内部相关性信息来进行同类别图像间的协同显著性检测，训练及检测过程简单且检测效率高。Aiming at the technical problem that the existing deep learning-based collaborative saliency detection method is limited by the insufficient amount of training data, which greatly affects the detection accuracy, the present invention proposes a collaborative saliency detection method based on a collaborative saliency generative adversarial network. , which effectively utilizes the saliency of a single image and the internal correlation information of the same category of images to perform collaborative saliency detection between images of the same category. The training and detection process is simple and the detection efficiency is high.

为了达到上述目的，本发明的技术方案是这样实现的：一种基于协同显著性生成式对抗网络的协同显著性检测方法，其步骤如下：In order to achieve the above purpose, the technical solution of the present invention is achieved as follows: a collaborative saliency detection method based on a collaborative saliency generative adversarial network, the steps of which are as follows:

步骤一：构建协同显著性生成式对抗网络模型：根据协同显著性检测任务特性对协同显著性生成式对抗网络中的生成器和判别器的网络架构进行设计，构建协同显著性生成式对抗网络模型；Step 1: Build a synergistic saliency generative adversarial network model: Design the network architecture of the generator and discriminator in the collaborative saliency generative adversarial network according to the characteristics of the collaborative saliency detection task, and build a synergistic saliency generative adversarial network model ;

步骤二：对协同显著性生成式对抗网络模型进行两阶段训练：在第一训练阶段，采用带标签的显著目标数据库对协同显著性生成式对抗网络进行预训练，在第二训练阶段，基于第一阶段训练好的模型参数，采用协同显著目标属于同一类别的协同显著性数据组对协同显著性生成式对抗网络进行训练；Step 2: Two-stage training of the synergistic saliency generative adversarial network model: In the first training stage, the labeled salient target database is used to pre-train the synergistic saliency generative adversarial network. In the first stage, the trained model parameters are used to train the synergistic saliency generative adversarial network using the synergistic saliency data set whose synergistic salient targets belong to the same category;

步骤三：类别协同显著性检测：将步骤二中训练好的协同显著性生成式对抗网络模型的生成器作为类别协同显著性检测器，将属于同一种类别的图像作为类别协同显著性检测器的输入，直接端到端的输出同类别图像对应的协同显著图。Step 3: Category collaborative saliency detection: The generator of the collaborative saliency generative adversarial network model trained in step 2 is used as the category collaborative saliency detector, and images belonging to the same category are used as the category collaborative saliency detector. Input, direct end-to-end output of the collaborative saliency map corresponding to the same category of images.

所述步骤一中生成器的网络采用U-Net网络结构、为全卷积网络，其中卷积层和反卷积层的卷积核尺寸、步长和填充值对称设置，倒数三层卷积层和前三层反卷积层设有Dropout操作；所述判别器的网络也为全卷积网络，经多层卷积操作后，输出二维概率矩阵，根据二维概率矩阵对判别器输入的图像进行Patch级真假判别；所述生成器学习原始图像与协同显著真值图之间的映射关系，从而生成协同显著图，判别器把生成器生成的协同显著图与真值图进行真假区分判别。In the first step, the generator network adopts the U-Net network structure, which is a fully convolutional network, wherein the convolution kernel size, step size and padding value of the convolutional layer and the deconvolutional layer are symmetrically set, and the last three-layer volume is set. The product layer and the first three deconvolution layers are equipped with a Dropout operation; the network of the discriminator is also a full convolution network. After the multi-layer convolution operation, a two-dimensional probability matrix is output, and the discriminator is determined according to the two-dimensional probability matrix. The input image is subjected to Patch-level true and false discrimination; the generator learns the mapping relationship between the original image and the collaborative saliency map, thereby generating the collaborative saliency map, and the discriminator compares the collaborative saliency map generated by the generator with the ground truth map. Distinguish between true and false.

所述步骤二中第一训练阶段和第二训练阶段中：训练生成器时，固定判别器网络参数，提高生成器生成的图像被判别器判定为真的概率，更新生成器的参数；训练判别器时，固定生成器的参数，使判别器提高真实样本为判断为真的概率、降低生成的伪样本判断为真的概率，更新判别器的参数。In the first training stage and the second training stage in the second step: when training the generator, the network parameters of the discriminator are fixed, the probability that the image generated by the generator is judged to be true by the discriminator is increased, and the parameters of the generator are updated; When the generator is used, the parameters of the generator are fixed, so that the discriminator increases the probability that the real sample is judged to be true, reduces the probability that the generated pseudo sample is judged to be true, and updates the parameters of the discriminator.

所述第一训练阶段和第二训练阶段中，生成器的损失函数为：In the first training stage and the second training stage, the loss function of the generator is:

LG＝LG₁+λ·LG₂ (1)LG=LG ₁ +λ·LG ₂ (1)

其中，LG₁是生成器的对抗损失，LG₂是生成器的像素损失，λ为调整损失权重的系数；θ^G是生成器的网络模型参数；生成器的对抗损失LG₁为：Among them, LG ₁ is the adversarial loss of the generator, LG ₂ is the pixel loss of the generator, λ is the coefficient for adjusting the loss weight; θ ^G is the network model parameter of the generator; the adversarial loss LG ₁ of the generator is:

LG₁＝BCE(D(G(I_m),I_m),A_real) (3)LG ₁ =BCE(D(G(I _m ),I _m ),A _real ) (3)

BCE(x,y)＝y·logx+(1-y)·log(1-x) (4)BCE(x,y)=y·logx+(1-y)·log(1-x) (4)

生成器的像素损失LG₂为：The pixel loss LG ₂ of the generator is:

LG₂＝||S_m-G(I_m)||₁ (5)LG ₂ =||S _m -G(I _m )|| ₁ (5)

其中，I_m和S_m分别表示输入的第m幅原始图像及其对应的显著目标真值图，G(·)表示生成器生成的伪显著图，D(·,·)表示判别器输出的二维概率矩阵，A_real是一个元素全为1的二维矩阵、尺寸大小与概率矩阵D(·,·)一致；函数BCE(·,·)用于计算二维概率矩阵D(·,·)与二维矩阵A_real的二值交叉熵，函数BCE(·,·)的表达式见公式(4)，其中，x和y是函数BCE(·,·)的自变量；LG₂为显著目标真值图与生成图像的1范数损失；Among them, Im and S _m respectively represent the input _m -th original image and its corresponding saliency target ground truth map, G( ) represents the pseudo saliency map generated by the generator, D( , ) represents the output of the discriminator Two-dimensional probability matrix, A _real is a two-dimensional matrix whose elements are all 1, and the size is consistent with the probability matrix D(·,·); the function BCE(·,·) is used to calculate the two-dimensional probability matrix D(·,· ) and the binary cross-entropy of the two-dimensional matrix A _real , the expression of the function BCE(·,·) is shown in formula (4), where x and y are the independent variables of the function BCE(·,·); LG ₂ is significant 1-norm loss of target ground-truth map and generated image;

所述判别器的损失函数表示为：The loss function of the discriminator is expressed as:

LD＝BCE(D(S_m,I_m),A_real)+BCE(D(G(I_m),I_m),A_fake) (6)LD=BCE(D(S _m ,I _m ),A _real )+BCE(D(G(I _m ),I _m ),A _fake ) (6)

其中，θ^D表示判别器的网络模型参数，A_fake为元素全为0的二维矩阵、尺寸大小与概率矩阵D(·,·)的尺寸一致。Among them, θ ^D represents the network model parameters of the discriminator, A _fake is a two-dimensional matrix whose elements are all 0, and the size is consistent with the size of the probability matrix D(·,·).

所述第一训练阶段，将带标签的显著目标数据库作为训练样本集对协同显著性生成式对抗网络进行训练，自动学习原始图像与显著目标真值图之间的映射关系，具体实现方法为：In the first training stage, the labeled salient target database is used as a training sample set to train the synergistic saliency generative adversarial network, and the mapping relationship between the original image and the salient target truth map is automatically learned. The specific implementation method is as follows:

采用显著目标数据库PASCAL-1500、HKU-IS和DUTS中带像素级标签的图像作为训练数据，将所有原始图像以及对应的显著目标真值图的尺寸调整为生成器的输入大小，将原始图像输入生成器得到相同大小的伪显著图，伪显著图与显著目标真值图进行像素级图像比较，且将伪显著图与原始图像按通道数拼接作为伪样本，显著目标真值图与原始图像拼接作为真实样本分别送入判别器；其中，通过最小化损失函数，采用Adam优化算法来迭代更新生成器网络模型参数θ^G和判别器的网络模型参数θ^D。The images with pixel-level labels in the salient target database PASCAL-1500, HKU-IS and DUTS are used as training data, and all original images and the corresponding salient target ground-truth maps are resized to the input size of the generator, and the original images are input The generator obtains a pseudo saliency map of the same size. The pseudo saliency map and the saliency target ground truth map are compared at the pixel level, and the pseudo saliency map and the original image are spliced according to the number of channels as pseudo samples, and the saliency target ground truth map is spliced with the original image. As real samples, they are respectively sent to the discriminator; wherein, by minimizing the loss function, the Adam optimization algorithm is used to iteratively update the generator network model parameters θ ^G and the discriminator network model parameters θ ^D .

所述步骤二中按照图像所含的公共显著目标的类别划分图像组，在第一阶段训练好的模型参数基础上，针对某一类别采用对应的图像组对协同显著性生成式对抗网络进行第二阶段的训练，使协同显著性生成式对抗网络学习原始图像与协同显著性真值图之间的映射关系，具体实现方法为：In the second step, the image groups are divided according to the categories of the common salient objects contained in the images, and based on the model parameters trained in the first stage, the corresponding image groups are used for a certain category to perform the first step on the synergistic saliency generative adversarial network. The second-stage training enables the collaborative saliency generative adversarial network to learn the mapping relationship between the original image and the collaborative saliency ground-truth map. The specific implementation method is as follows:

采用CoSal2015、iCoseg和MSRC-A共3个公开的且已按显著目标类别分好组的协同显著性检测数据库，将所有原始图像以及对应的协同显著性真值图的尺寸调整为生成器的输入大小，直接随机选取每组中50％的图像数据作为类别训练样本，对经第一阶段训练的协同显著性生成式对抗网络进行训练，使生成器自动学习提取该类别样本中的公共显著性信息，训练出针对单一类别图像的协同显著性检测模型。Using CoSal2015, iCoseg and MSRC-A, a total of 3 public co-saliency detection databases that have been grouped by salient object categories, adjust the size of all original images and the corresponding co-saliency ground-truth maps as the input of the generator size, directly and randomly select 50% of the image data in each group as the category training samples, and train the synergistic saliency generative adversarial network trained in the first stage, so that the generator automatically learns to extract the public saliency information in the category samples , to train a co-saliency detection model for a single class of images.

对任意一幅图像，若其属于第二阶段训练所用训练样本类别中的某一类，则将其送入已完成训练的对应类别的协同显著性检测模型的生成器中，输入前需将其尺寸调整为生成器的输入大小，生成器输出的图像即为输入图像的协同显著图：For any image, if it belongs to a certain category of the training sample categories used in the second-stage training, it will be sent to the generator of the co-saliency detection model of the corresponding category that has been trained, and it needs to be entered before input. The size is adjusted to the input size of the generator, and the output image of the generator is the co-saliency map of the input image:

CoS_m＝G^*(I_m) (8)CoS _m = G ^* (I _m ) (8)

其中，G^*(·)表示经两阶段训练好的协同显著性生成式对抗网络的生成器，CoS_m是待检测图像I_m最终生成的协同显著图。Among them, G ^* ( ) represents the generator of the co-saliency generative adversarial network trained in two stages, and CoS _m is the co-saliency map finally generated by the image I _m to be detected.

与现有技术相比，本发明的有益效果：基于协同显著性生成式对抗网络，采用两阶段训练机制进行协同显著性检测，优势在于：1)采用显著目标数据库训练数据缓解了协同显著性数据量不足会引起的一系列训练问题。2)采用两阶段训练机制，将显著目标数据作为第一阶段训练数据，保证该阶段训练好的网络具有单幅图像显著性检测能力，利用网络的记忆功能，在网络已保留单幅显著目标检测能力的基础上，将协同显著性图像组作为第二阶段训练数据，使网络具备同类别图像间的相关性信息捕捉能力，并将单幅图像显著性检测信息和同组图像间的相关性信息进行融合，使网络最终具备类别协同显著性检测能力。通过实验可知，本发明训练检测过程简单、检测效率高，能够显著提高协同显著性检测的通用性和准确性，对关键信息的快速获取具有重要意义。Compared with the prior art, the present invention has the beneficial effects: based on the synergistic saliency generative adversarial network, a two-stage training mechanism is used to detect the synergistic saliency. Insufficient volume can cause a series of training problems. 2) A two-stage training mechanism is adopted, and the salient target data is used as the first-stage training data to ensure that the network trained in this stage has the ability to detect the saliency of a single image. Using the memory function of the network, the network has reserved a single salient target detection. On the basis of the ability, the collaborative saliency image group is used as the second stage training data, so that the network has the ability to capture the correlation information between the images of the same category, and the saliency detection information of a single image and the correlation between the images in the same group. The information is fused, so that the network finally has the ability of category collaborative saliency detection. It can be seen from experiments that the training and detection process of the present invention is simple and the detection efficiency is high, the versatility and accuracy of the collaborative saliency detection can be significantly improved, and the rapid acquisition of key information is of great significance.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

图2为本发明与现有算法在CoSal2015数据库上的主观结果对比图。FIG. 2 is a comparison diagram of subjective results between the present invention and the existing algorithm on the CoSal2015 database.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

如图1所示，一种基于协同显著性生成式对抗网络的协同显著性检测方法，利用训练好的协同显著性生成式对抗网络实现协同显著性检测，其步骤如下：As shown in Figure 1, a collaborative saliency detection method based on collaborative saliency generative adversarial network uses the trained collaborative saliency generative adversarial network to achieve collaborative saliency detection. The steps are as follows:

步骤一：构建协同显著性生成式对抗网络模型：根据协同显著性检测任务特性对协同显著性生成式对抗网络中的生成器和判别器的网络架构进行设计，构建协同显著性生成式对抗网络模型。Step 1: Build a synergistic saliency generative adversarial network model: Design the network architecture of the generator and discriminator in the collaborative saliency generative adversarial network according to the characteristics of the collaborative saliency detection task, and build a synergistic saliency generative adversarial network model .

所述协同显著性生成式对抗网络模型包括生成器网络模型和判别器网络模型两部分。设计合适的生成器网络模型来学习原始图像与协同显著图之间的映射关系，使生成器生成的协同显著图尽可能接近协同显著性真值图。设计合适的判别器网络模型，使其尽可能的把生成器生成的协同显著图与协同显著真值图进行真假区分，从而辅助训练生成器。The synergistic saliency generative adversarial network model includes a generator network model and a discriminator network model. An appropriate generator network model is designed to learn the mapping relationship between the original image and the co-saliency map, so that the co-saliency map generated by the generator is as close as possible to the co-saliency ground-truth map. Design a suitable discriminator network model to make it possible to distinguish the true and false of the co-saliency map generated by the generator from the co-saliency ground truth map, so as to assist the training of the generator.

所述生成器网络模型整体采用U-Net网络结构，为全卷积网络，包括对称设置的8个卷积层和8个反卷积层，保证生成器输出的图像与输入图像尺寸大小一致，卷积层和反卷积层的卷积核尺寸、步长和填充值对称设置，倒数3个卷积层和前3个反卷积层增加了值为0.5的Dropout操作，即将上一层的激活输出进行随机50％的置零，能够有效地防止过拟合。在生成器网络模型架构中，在编码器-解码器网络结构对称的基础上，加入短连接，将相同尺寸大小的特征图进行级联，可参见U-Net网络结构的设计，参见文献(Ronneberger O,Fischer P,Brox T,et al.U-Net:Convolutional Networks for Biomedical ImageSegmentation.in Proc.Medi.Image Comput.and Comput.Assis.,2015,pp.234-241)。级联的目的是为了保证生成的图像保留目标边缘这类细节信息。因此，生成器G的网络结构如表1所示。The generator network model adopts the U-Net network structure as a whole, which is a fully convolutional network, including 8 convolutional layers and 8 deconvolutional layers symmetrically set to ensure that the image output by the generator is the same size as the input image, The convolution kernel size, stride and padding value of the convolutional layer and deconvolutional layer are set symmetrically. The last three convolutional layers and the first three deconvolutional layers have a Dropout operation with a value of 0.5. The activation output is randomly 50% zeroed, which can effectively prevent overfitting. In the generator network model architecture, on the basis of the symmetry of the encoder-decoder network structure, short connections are added, and feature maps of the same size are cascaded. O, Fischer P, Brox T, et al. U-Net: Convolutional Networks for Biomedical ImageSegmentation. in Proc. Medi. Image Comput. and Comput. Assis., 2015, pp. 234-241). The purpose of the cascade is to ensure that the generated image retains details such as object edges. Therefore, the network structure of generator G is shown in Table 1.

表1生成器的网络结构Table 1 The network structure of the generator

判别器网络模块设计采用编码器结构，也为全卷积网络，判别器D的网络结构如表2所示，判别器输出为28×28的二维矩阵，矩阵上的每个元素为其对应的输入图像的图像块被判别为真的概率，即对输入的图像进行Patch级判别，参见文献(Phillip Isola,Jun-YanZhu,Tinghui Zhou,et.al,.Image-to-Image Translation with ConditionalAdversarial Networks.arXiv preprint arXiv:1611.07004)。The discriminator network module is designed with an encoder structure, which is also a fully convolutional network. The network structure of discriminator D is shown in Table 2. The output of the discriminator is a 28×28 two-dimensional matrix, and each element on the matrix corresponds to its The probability that the image block of the input image is judged to be true, that is, the patch-level judgment of the input image is performed, see the literature (Phillip Isola, Jun-YanZhu, Tinghui Zhou, et.al,. Image-to-Image Translation with Conditional Adversarial Networks .arXiv preprint arXiv:1611.07004).

表2判别器的网络结构Table 2 The network structure of the discriminator

网络层Network layer 输入enter 卷积核尺寸convolution kernel size 步长step size 填充值fill value 输出output 卷积层1Convolutional layer 1 256×256×6256×256×6 44 22 11 128×128×64128×128×64 卷积层2Convolutional layer 2 128×128×64128×128×64 44 22 11 64×64×12864×64×128 卷积层3Convolutional layer 3 64×64×12864×64×128 44 22 11 32×32×25632×32×256 卷积层4Convolutional layer 4 32×32×25632×32×256 33 11 00 30×30×51230×30×512 卷积层5Convolutional layer 5 30×30×51230×30×512 33 11 00 28×28×128×28×1

步骤二：对协同显著性生成式对抗网络模型进行两阶段训练：在第一训练阶段，采用数据量较为丰富的带标签的显著目标数据库对协同显著性生成式对抗网络进行预训练，在第二训练阶段，基于第一阶段训练好的模型参数，采用协同显著目标属于同一类别的协同显著性数据组对协同显著性生成式对抗网络进行类别协同显著性检测训练。Step 2: Two-stage training of the synergistic saliency generative adversarial network model: In the first training stage, a labeled salient target database with abundant data is used to pre-train the synergistic saliency generative adversarial network, and in the second In the training phase, based on the model parameters trained in the first stage, the collaborative saliency data set with the collaborative saliency targets belonging to the same category is used to train the collaborative saliency generative adversarial network for class collaborative saliency detection.

本发明设计两阶段训练机制，第一训练阶段进行预训练使协同显著性生成式对抗网络中的生成器具备初步的单幅图像显著目标检测能力，第二训练阶段进一步使生成器具备检测多幅该类别图像的协同显著目标能力。The present invention designs a two-stage training mechanism. In the first training stage, pre-training enables the generator in the synergistic saliency generative confrontation network to have the preliminary single-image salient target detection ability, and the second training stage further enables the generator to detect multiple images. The co-salient object capability of this class of images.

两阶段训练过程中，生成器的损失函数表示为：During the two-stage training process, the loss function of the generator is expressed as:

LG＝LG₁+λ·LG₂ (1)LG=LG ₁ +λ·LG ₂ (1)

生成器的损失函数由一个对抗损失及像素损失组成，LG₁和LG₂分别表示两阶段训练中生成器的对抗损失和像素损失，λ为调整损失权重的系数，本发明设置λ＝100，θ^G是生成器G的网络模型参数。训练生成器时，固定判别器参数，尽可能的提高生成器生成的图像被判别器判定为真的概率，从而更新生成器的参数。其中，生成器的对抗损失表示为：The loss function of the generator consists of an adversarial loss and a pixel loss. LG ₁ and LG ₂ represent the adversarial loss and pixel loss of the generator in the two-stage training, respectively. λ is a coefficient for adjusting the loss weight. The present invention sets λ=100, θ ^G is the network model parameter of the generator G. When training the generator, the parameters of the discriminator are fixed, and the probability that the image generated by the generator is judged to be true by the discriminator is increased as much as possible, so as to update the parameters of the generator. Among them, the adversarial loss of the generator is expressed as:

BCE(x,y)＝y·logx+(1-y)·log(1-x) (4)BCE(x,y)=y·logx+(1-y)·log(1-x) (4)

生成器的像素损失表示为：The pixel loss of the generator is expressed as:

LG₂＝||S_m-G(I_m)||₁ (5)LG ₂ =||S _m -G(I _m )|| ₁ (5)

其中，I_m和S_m分别表示输入的第m幅原始图像及其对应的显著性真值图，第一阶段的I_m和S_m来自于训练所需的显著目标数据库，第二阶段的I_m和S_m来自于训练所需的协同显著数据库，G(·,·)表示生成器生成的伪显著图，将原始图像I_m和G(·,·)伪显著图拼接后作为概率矩D的输入，D(·,·)表示判别器输出的概率矩阵。概率矩阵D(·,·)的参数θ^D是训练出来的判别器网络中卷积网络间的所有神经元的权重和偏置，由网络结构决定，判别器的输出是二维概率矩阵。判别器输入的图像尺寸为256×256×3，输出为28×28的二维概率矩阵，那么矩阵上的每个元素都是一个概率值，按照位置对应了一个尺寸大小约为9×9的图像块被判别器判别为真的概率值，概率值范围为[0,1]之间。A_real是一个元素全为1的二维矩阵，其尺寸大小与概率矩阵D(·,·)一致，其中的每个元素对应输入样本中的一个图像块，表示的是相应图像块被判别器判定为真的概率全为1。函数BCE(·,·)用于计算概率矩阵D(·,·)与二维矩阵A_real的二值交叉熵，其函数表达式见公式(4)，其中，x和y是函数BCE(·,·)的自变量。LG₂为显著目标真值图与生成图像的1范数损失。Among them, Im and S _m represent the input _{m-th original image and its corresponding saliency ground truth map, respectively. Im and S m} _in _the first stage come from the saliency target database required for training, and I in the second stage _m and S _m come from the collaborative saliency database required for training, G(·,·) represents the pseudo-saliency map generated by the generator, and the original image _Im and G(·,·) pseudo-saliency map are spliced together as a probability moment D The input of , D(·,·) represents the probability matrix of the output of the discriminator. The parameter θ ^D of the probability matrix D(·,·) is the weight and bias of all neurons between the convolutional networks in the trained discriminator network, which is determined by the network structure, and the output of the discriminator is a two-dimensional probability matrix. The input image size of the discriminator is 256×256×3, and the output is a two-dimensional probability matrix of 28×28. Then each element on the matrix is a probability value, which corresponds to a size of about 9×9 according to the position. The image block is judged as a true probability value by the discriminator, and the probability value range is between [0, 1]. A _real is a two-dimensional matrix whose elements are all 1, and its size is consistent with the probability matrix D(·,·), where each element corresponds to an image block in the input sample, and represents the corresponding image block discriminator The probability of being true is all 1. The function BCE(·,·) is used to calculate the binary cross-entropy between the probability matrix D(·,·) and the two-dimensional matrix A _real , and its function expression is shown in formula (4), where x and y are the function BCE(· , ) independent variable. LG ₂ is the 1-norm loss of the salient object ground-truth map and the generated image.

两阶段训练过程中，判别器损失函数表示为：In the two-stage training process, the discriminator loss function is expressed as:

LD(θ^D)＝BCE(D(S_m,θ^D),A_real)+BCE(D(G(I_m,θ^G),θ^D),A_fake) (6)LD(θ ^D )=BCE(D(S _m ,θ ^D ),A _real )+BCE(D(G(I _m ,θ ^G ),θ ^D ),A _fake ) (6)

上式(6)中，θ^D表示判别器的网络模型参数，A_real为元素全为1的二维矩阵，A_fake为元素全为0的二维矩阵，其尺寸大小与概率矩阵D(·,·)的尺寸一致。训练判别器时，固定生成器参数，使判别器尽可能提高真实样本判断为真的概率，降低生成的伪样本被判断为真的概率，从而更新判别器的参数。In the above formula (6), θ ^D represents the network model parameters of the discriminator, A _real is a two-dimensional matrix whose elements are all 1, and A _fake is a two-dimensional matrix whose elements are all 0. Its size is related to the probability matrix D(· , ) are the same size. When training the discriminator, the parameters of the generator are fixed, so that the discriminator can increase the probability that the real sample is judged to be true as much as possible, and reduce the probability that the generated pseudo sample is judged to be true, so as to update the parameters of the discriminator.

第一训练阶段，将数据量较为丰富且带标签的显著目标数据库作为样本集对协同显著性生成式对抗网络进行第一阶段训练，使该模型学习原始图像与显著目标真值图之间的映射关系，使其首先具备针对单幅图像的显著目标检测能力。在第一阶段训练好的模型参数基础上，模型继承了单幅图像显著性检测的能力，按照图像所含的公共显著目标的类别划分图像组，针对某一类别采用对应的图像组对协同显著性生成式对抗网络进行第二阶段的训练。使协同显著性生成式对抗网络学习原始图像与协同显著性真值图之间的映射关系，使训练出的类别协同显著性生成式对抗网络具备检测多幅图像间公共显著目标的能力。In the first training stage, the collaborative saliency generative adversarial network is trained in the first stage using the salient target database with abundant data and labels as the sample set, so that the model can learn the mapping between the original image and the salient target ground-truth map. relationship, so that it firstly has the ability to detect salient objects for a single image. Based on the model parameters trained in the first stage, the model inherits the ability to detect the saliency of a single image, divides image groups according to the categories of common salient objects contained in the image, and uses the corresponding image group for a certain category to synergistically saliency The second-stage training of the generative adversarial network. The co-saliency generative adversarial network learns the mapping relationship between the original image and the co-saliency ground-truth map, so that the trained class co-saliency generative adversarial network has the ability to detect common salient objects among multiple images.

第一阶段训练：采用显著目标数据库PASCAL-1500、HKU-IS和DUTS中共21517幅带像素级标签的图像作为训练数据。将所有原始图像以及对应的显著目标真值图的图像尺寸调整为256×256×3大小，将原始图像输入生成器。生成图像尺寸大小为256×256×3，生成图像与显著目标真值图进行像素级图像比较，且将生成图像与原始图像按通道数拼接作为伪样本，显著目标真值图与原始图像拼接作为真实样本分别送入判别器。其中，采用Adam优化算法来迭代更新模型参数，Batchsize、学习率、Dropout率和Epoch分别设为1、0.0002、0.5和100。Batchsize指的是训练过程中更新一次网络参数所采用的样本数，学习率指训练过程中每次模型参数训练更新的幅度，Epoch指的是训练过程中遍循所有训练样本的次数。Dropout操作只在6-8层卷积层以及1-3层反卷积层加入，使生成器网络结构具有一定鲁棒性。The first stage of training: A total of 21,517 images with pixel-level labels in the salient target database PASCAL-1500, HKU-IS and DUTS are used as training data. Resize all original images and the corresponding saliency object ground-truth maps to 256×256×3 size, and input the original images into the generator. The size of the generated image is 256 × 256 × 3, the generated image is compared with the salient target ground truth map for pixel-level image, and the generated image and the original image are spliced according to the number of channels as a pseudo sample, and the saliency target ground truth map and the original image are spliced as a pseudo sample. The real samples are sent to the discriminator respectively. Among them, the Adam optimization algorithm is used to iteratively update the model parameters, and the Batchsize, learning rate, Dropout rate and Epoch are set to 1, 0.0002, 0.5 and 100, respectively. Batchsize refers to the number of samples used to update network parameters once during the training process, learning rate refers to the magnitude of each model parameter training update during the training process, and Epoch refers to the number of times all training samples are traversed during the training process. The dropout operation is only added in the 6-8 convolutional layers and the 1-3 deconvolutional layers, which makes the generator network structure have a certain robustness.

第二阶段训练：在第一阶段训练好的协同显著性生成式对抗网络模型参数基础上，即将第一阶段训练好的参数作为第二阶段训练的初始参数，训练模型具备类别协同显著性检测能力。采用CoSal2015、iCoseg和MSRC-A共3个公开的协同显著性检测数据库进行模型训练。训练前，将所有原始图像以及对应的显著目标真值图的图像尺寸调整为256×256大小，这3个数据库中的图像已根据协同显著目标是否属于同一类进行了分组，因此直接随机选取组中50％的数据进行第二阶段训练，在同一类别样本训练过程中，促使模型学习提取该类别样本中的公共显著性信息，从而训练出针对单一类别图像的协同显著性检测模型。除Epoch的参数为400外，训练涉及参数设置与第一阶段训练的相同。The second stage of training: On the basis of the parameters of the synergistic saliency generative adversarial network model trained in the first stage, the parameters trained in the first stage are used as the initial parameters of the second stage training, and the training model has the ability to detect the category synergistic saliency . Model training was performed using three public collaborative saliency detection databases, CoSal2015, iCoseg and MSRC-A. Before training, the image size of all original images and the corresponding salient target ground-truth maps are adjusted to 256×256. The images in these three databases have been grouped according to whether the collaborative salient targets belong to the same class, so the group is directly selected randomly. 50% of the data are trained in the second stage. During the training of the same category of samples, the model is prompted to learn to extract the common saliency information in the category of samples, so as to train a collaborative saliency detection model for a single category of images. The training involves the same parameter settings as the first-stage training, except that the parameter for Epoch is 400.

步骤三：类别协同显著性检测：将步骤二中训练好的协同显著性生成式对抗网络模型的生成器作为类别协同显著性检测器，将与第二阶段训练样本属于同一种类别的图像作为类别协同显著性检测器的输入，直接端到端的输出图像对应的协同显著图。Step 3: Category collaborative saliency detection: Use the generator of the collaborative saliency generative adversarial network model trained in step 2 as a category collaborative saliency detector, and take images belonging to the same category as the training samples in the second stage as the category The input of the co-saliency detector directly outputs the co-saliency map corresponding to the image end-to-end.

将经过两阶段训练的协同显著性生成式对抗网络模型中的生成器直接作为类别协同显著性检测器，对任意一幅图像，若其属于第二阶段训练所用训练样本类别中的某一类，则将其送入已完成训练的对应类别的协同显著性检测模型的生成器中，输入前将其图像尺寸统一为256×256×3，生成器输出的图像即为输入图像的协同显著图，尺寸大小也为256×256×3。直接将生成器生成的图像作为协同显著图，如下式所示：The generator in the two-stage training co-saliency generative adversarial network model is directly used as the category co-saliency detector. For any image, if it belongs to one of the training sample categories used in the second-stage training, Then it is sent to the generator of the co-saliency detection model of the corresponding category that has been trained. Before input, the image size is unified to 256×256×3, and the image output by the generator is the co-saliency map of the input image. The size is also 256×256×3. The image generated by the generator is directly used as a collaborative saliency map, as shown in the following formula:

CoS_m＝G^*(I_m) (8)CoS _m = G ^* (I _m ) (8)

G^*(·)表示经两阶段训练好的协同显著性生成式对抗网络的生成器，CoS_m是待检测图像I_m最终生成的协同显著图。G ^* ( ) represents the generator of the two-stage trained co-saliency generative adversarial network, and CoS _m is the co-saliency map finally generated by the image I _m to be detected.

至此，完成对一组包含相同类别目标图像组的协同显著性检测,即完成图像协同显著性检测。So far, the collaborative saliency detection of a group of target image groups containing the same category is completed, that is, the collaborative saliency detection of images is completed.

本发明在硬件环境为Intel(R)XeonE5-2650 v3@2.30Hz×20CPU，NVIDIA GTXTITAN-XPGPU，128G显存的工作站上进行了实验，运行的软件环境是：Ubuntu16.04和深度学习框架Pytorch1.0。The present invention has carried out experiments on workstations whose hardware environment is Intel(R) XeonE5-2650 v3@2.30Hz×20CPU, NVIDIA GTXTITAN-XPGPU, and 128G video memory, and the running software environment is: Ubuntu16.04 and deep learning framework Pytorch1.0 .

为了验证本发明的检测性能及效率，在CoSal2015数据库上对本发明方法和6种协同显著性检测方法进行每幅图像检测时间对比和主观结果对比。在CoSal2015数据库上，同一硬件环境下，对公开代码的SACS-R、SACS、CBCS、ESMG四种方法与本发明进行检测时间对比，如表3所示。In order to verify the detection performance and efficiency of the present invention, the method of the present invention and 6 collaborative saliency detection methods were compared on the CoSal2015 database to compare the detection time of each image and the subjective results. On the CoSal2015 database, under the same hardware environment, the detection time of the four methods of SACS-R, SACS, CBCS and ESMG of the public code and the present invention are compared, as shown in Table 3.

表3现有算法在CoSal2015数据库上的检测时间对比Table 3. Comparison of detection time of existing algorithms on CoSal2015 database

算法algorithm SACS-RSACS-R SACSSACS CBCSCBCS ESMGESMG 本发明this invention 代码类型code type MATLABMATLAB MATLABMATLAB MATLABMATLAB MATLABMATLAB PythonPython 检测时间Detection time 8.873秒8.873 seconds 2.652秒2.652 seconds 1.688秒1.688 seconds 1.723秒1.723 seconds 0.562秒0.562 seconds

参与主观对比的方法有来自于文献(D.Zhang,J.Han,C.Li et al.Co-saliencydetection via looking deep and wide.in Proc.IEEE Conf.Comput.Vis.PatternRecognit.,2015,pp.2994-3002)的LDAW、来自于文献(X.Cao,Z.Tao,B.Zhang et al.Self-Adaptively Weighted Co-Saliency Detection via Rank Constraint.IEEETrans.Image Process.vol.23,no.9,pp.4175-4186,2014)的SACS-R和SACS、来自于文献(J.Han,G.Cheng,Z.Li et al.A Unified Metric Learning-Based Framework for Co-Saliency Detection.IEEE Trans.on Cir.and Sys.for Vid.Tech.,vol.28,no.10,pp.2473-2483,2018)的AUM、来自于文献(H.Fu,X.Cao,and Z.Tu,Cluster-Based Co-Saliency Detection.IEEE Trans.Image Process.,vol.22,no.10,pp.3766-3778,2013)的CBCS以及来自于文献(A.Joulin,K.Tang,and L.Feifei.Efficient Image and VideoCo-localization with Frank-Wolfe Algorithm.in Proc.IEEE Eur.Conf.Comput.Vis.,2014,pp.253-268)的ESMG，CoSal2015数据库上部分组--海星、青蛙和橡胶的主观对比结果如图2所示。The methods involved in subjective comparison are derived from literature (D.Zhang, J.Han, C.Li et al.Co-saliency detection via looking deep and wide.in Proc.IEEE Conf.Comput.Vis.PatternRecognit., 2015, pp. 2994-3002) LDAW, from the literature (X.Cao, Z.Tao, B.Zhang et al.Self-Adaptively Weighted Co-Saliency Detection via Rank Constraint.IEEETrans.Image Process.vol.23,no.9, pp.4175-4186, 2014) of SACS-R and SACS, from the literature (J.Han, G.Cheng, Z.Li et al.A Unified Metric Learning-Based Framework for Co-Saliency Detection.IEEE Trans.on AUM of Cir.and Sys.for Vid.Tech., vol.28, no.10, pp.2473-2483, 2018), from the literature (H.Fu, X.Cao, and Z.Tu, Cluster-Based Co-Saliency Detection. IEEE Trans. Image Process., vol. 22, no. 10, pp. 3766-3778, 2013) CBCS and from the literature (A. Joulin, K. Tang, and L. Feifei. Efficient Image and VideoCo-localization with Frank-Wolfe Algorithm.in Proc.IEEE Eur.Conf.Comput.Vis., 2014, pp.253-268) ESMG, CoSal2015 database upper grouping -- subjective comparison results of starfish, frog and rubber as shown in picture 2.

由图2和表3可知，本发明对一幅图像进行协同显著性检测的时间最短，即效率最高，与其他现有方法对比，本发明得到的协同显著图与真值图最接近。It can be seen from FIG. 2 and Table 3 that the present invention has the shortest time for collaborative saliency detection for an image, that is, the highest efficiency. Compared with other existing methods, the collaborative saliency map obtained by the present invention is the closest to the true value map.

本发明包括协同显著性生成式对抗网络模型的构建和两阶段模型训练，在第一训练阶段，将数据量较为丰富的带标注显著目标数据集作为训练数据集，来缓解协同视觉显著性领域带标注数据量不足引起的训练问题，同时，使训练好的网络具备单幅图像显著性的检测能力，在第二阶段，采用带标注的协同显著性图像组对模型进行训练，利用模型记忆功能，使模型在具备单幅图像显著性检测能力基础上进一步具备类别协同显著性检测能力；将经两阶段训练好的生成器作为类别协同显著性检测器，端到端的输出该类别图像的协同显著图。本发明有效利用了网络的记忆功能、单幅图像的显著性与同类别图像组的内部相关性信息来进行多幅图像间的协同显著性检测，训练及检测过程简单且检测效率高。The invention includes the construction of a synergistic saliency generative adversarial network model and two-stage model training. In the first training stage, a marked salient target data set with a relatively rich amount of data is used as a training data set to alleviate the problem of the collaborative visual saliency field. At the same time, the trained network has the ability to detect the saliency of a single image. In the second stage, the model is trained with annotated collaborative saliency image group, and the model memory function is used to train the model. The model is further equipped with the ability to detect the saliency of a single image on the basis of the ability to detect the saliency of a single image; the generator trained in two stages is used as a collaborative saliency detector, and the collaborative saliency map of the category of images is output end-to-end. . The invention effectively utilizes the memory function of the network, the saliency of a single image and the internal correlation information of the same category of image groups to perform collaborative saliency detection among multiple images, with simple training and detection processes and high detection efficiency.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the scope of the present invention. within the scope of protection.

Claims

1. a collaborative saliency detection method based on synergistic saliency generative adversarial network, is characterized in that, its steps are as follows:

Step 1: Build a collaborative saliency generative adversarial network model: Design the network architecture of the generator and discriminator in the collaborative saliency generative adversarial network according to the characteristics of the collaborative saliency detection task, and build a collaborative saliency generative adversarial network model ;

Step 2: Two-stage training of the synergistic saliency generative adversarial network model: In the first training stage, the labeled salient target database is used to pre-train the synergistic saliency generative adversarial network. In the first stage, the trained model parameters are used to train the synergistic saliency generative adversarial network using the synergistic saliency data set whose synergistic salient targets belong to the same category;

Step 3: Category collaborative saliency detection: The generator of the collaborative saliency generative adversarial network model trained in step 2 is used as the category collaborative saliency detector, and images belonging to the same category are used as the category collaborative saliency detector. Input, direct end-to-end output of the collaborative saliency map corresponding to the same category of images;

In the first training stage, the labeled salient target database is used as a training sample set to train the synergistic saliency generative adversarial network, and the mapping relationship between the original image and the salient target truth map is automatically learned. The specific implementation method is as follows:

The images with pixel-level labels in the salient target database PASCAL-1500, HKU-IS and DUTS are used as training data, and all original images and the corresponding salient target ground-truth maps are resized to the input size of the generator, and the original images are input The generator obtains a pseudo saliency map of the same size. The pseudo saliency map and the saliency target ground truth map are compared at the pixel level, and the pseudo saliency map and the original image are spliced according to the number of channels as pseudo samples, and the saliency target ground truth map is spliced with the original image. As real samples, they are respectively sent to the discriminator; wherein, by minimizing the loss function, the Adam optimization algorithm is used to iteratively update the generator network model parameters θ ^G and the discriminator network model parameters θ ^D ;

In the second step, the image groups are divided according to the categories of the common salient objects contained in the images, and based on the model parameters trained in the first stage, the corresponding image groups are used for a certain category to perform the first step on the synergistic saliency generative adversarial network. The second-stage training enables the collaborative saliency generative adversarial network to learn the mapping relationship between the original image and the collaborative saliency ground-truth map. The specific implementation method is as follows:

Using CoSal2015, iCoseg and MSRC-A, a total of 3 public co-saliency detection databases that have been grouped by salient object categories, adjust the size of all original images and the corresponding co-saliency ground-truth maps as the input of the generator size, directly and randomly select 50% of the image data in each group as the category training samples, and train the synergistic saliency generative adversarial network trained in the first stage, so that the generator automatically learns to extract the public saliency information in the category samples , to train a co-saliency detection model for a single class of images.

2. the collaborative saliency detection method based on synergistic saliency generative confrontation network according to claim 1, is characterized in that, in described step 1, the network of generator adopts U-Net network structure, is full convolution network, The convolution kernel size, stride and padding value of the convolutional layer and deconvolutional layer are set symmetrically, and the last three convolutional layers and the first three deconvolutional layers are equipped with Dropout operations; the network of the discriminator is also It is a fully convolutional network. After multi-layer convolution operation, a two-dimensional probability matrix is output, and according to the two-dimensional probability matrix, the image input by the discriminator is subjected to Patch-level true and false discrimination; the generator learns the original image and the collaborative significant true value. The mapping relationship between the graphs to generate a collaborative saliency map, and the discriminator distinguishes between the true and false values of the collaborative saliency map generated by the generator and the ground truth map.

3. The collaborative saliency detection method based on collaborative saliency generative adversarial network according to claim 1 or 2, characterized in that, in the first training stage and the second training stage in the step 2: when training the generator , fix the network parameters of the discriminator, improve the probability that the image generated by the generator is judged to be true by the discriminator, and update the parameters of the generator; when training the discriminator, fix the parameters of the generator, so that the discriminator improves the real sample to be judged as true Probability, reduce the probability that the generated pseudo samples are judged to be true, and update the parameters of the discriminator.

4. The collaborative saliency detection method based on collaborative saliency generative adversarial network according to claim 3, characterized in that, in the first training stage and the second training stage, the loss function of the generator is:

LG=LG ₁ +λ·LG ₂ (1)

Among them, LG ₁ is the adversarial loss of the generator, LG ₂ is the pixel loss of the generator, λ is the coefficient for adjusting the loss weight; θ ^G is the network model parameter of the generator; the adversarial loss LG ₁ of the generator is:

LG ₁ =BCE(D(G(I _m ),I _m ),A _real ) (3)

BCE(x,y)=y·logx+(1-y)·log(1-x) (4)

The pixel loss LG ₂ of the generator is:

LG ₂ =||S _m -G(I _m )|| ₁ (5)

Among them, Im and S _m respectively represent the input _m -th original image and its corresponding saliency target ground truth map, G( ) represents the pseudo saliency map generated by the generator, D( , ) represents the output of the discriminator Two-dimensional probability matrix, A _real is a two-dimensional matrix whose elements are all 1, and the size is consistent with the probability matrix D(·,·); the function BCE(·,·) is used to calculate the two-dimensional probability matrix D(·,· ) and the binary cross-entropy of the two-dimensional matrix A _real , the expression of the function BCE(·,·) is shown in formula (4), where x and y are the independent variables of the function BCE(·,·);

The loss function of the discriminator is expressed as:

LD=BCE(D(S _m ,I _m ),A _real )+BCE(D(G(I _m ),I _m ),A _fake ) (6)

Among them, θ ^D represents the network model parameters of the discriminator, A _fake is a two-dimensional matrix whose elements are all 0, and the size is consistent with the size of the probability matrix D(·,·).

5. The collaborative saliency detection method based on collaborative saliency generative adversarial network according to claim 1, characterized in that, for any image, if it belongs to a certain category of the training sample categories used in the second stage training , then it is sent to the generator of the co-saliency detection model of the corresponding category that has been trained, and its size needs to be adjusted to the input size of the generator before input, and the image output by the generator is the co-saliency map of the input image. :

CoS _m = G ^* (I _m ) (8)

Among them, G ^* ( ) represents the generator of the co-saliency generative adversarial network trained in two stages, and CoS _m is the co-saliency map finally generated by the image I _m to be detected.