CN113837200A

CN113837200A - Autonomous learning method in visual saliency detection

Info

Publication number: CN113837200A
Application number: CN202111012352.5A
Authority: CN
Inventors: 王涵宇; 王致畅; 边疆; 裴轶敏; 章涛; 潘晨
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-12-24

Abstract

本发明公开了一种视觉显著性检测中的自主学习方法，包括以下各个步骤：(1)借助两个有监督的深度SOD模型构造两个并行的视觉感知通道，形成一套双视觉信息流的显著目标检测框架；(2)比较同一时刻两个感知通道输出显著图的二值化掩膜之间的差异，判断显著目标区域的感知饱和程度；(3)若感知饱和度大，双通道输出的显著图可叠加生成一个最终显著图，该显著图的二值化掩膜被视为高可信度的目标区域；由此收集一定数量可信度高的自动标注目标区域，形成算法自主标注的训练样本集，用于步骤(1)中两个有监督的深度SOD模型的进一步自主学习更新。The invention discloses an autonomous learning method in visual saliency detection, comprising the following steps: (1) Constructing two parallel visual perception channels by means of two supervised deep SOD models, forming a set of dual visual information flow Salient target detection framework; (2) Compare the difference between the binarized masks of the saliency map output by the two perceptual channels at the same time, and judge the perceptual saturation degree of the salient target area; (3) If the perceptual saturation is large, the dual-channel output The saliency map can be superimposed to generate a final saliency map, and the binarized mask of the saliency map is regarded as a high-confidence target area; thus a certain number of automatically-labeled target areas with high reliability are collected to form the algorithm's autonomous labeling The set of training samples for further autonomous learning updates of the two supervised deep SOD models in step (1).

Description

Autonomous learning method in visual saliency detection

Technical Field

The invention relates to the technical field of computer vision, in particular to a method for autonomously learning in salient object detection by means of a visual perception saturation mechanism.

Background

Visual attention and visual saliency are fundamental research issues in psychology, bioneurology, cognitive science, and computer vision. In recent decades, a hundred-fold multi-ocular-spot prediction fp (visualization prediction) and saliency detection sod (salient object detection) method has been proposed for visual attention modeling. The best performance is currently the deep SOD model based on the deep learning framework. However, the most difficult step to construct a deep learning model is to manually label a large number of training sample graphs at the pixel level, and the overall system performance depends on manually labeled data. However, manual labeling is time consuming and expensive. Furthermore, models trained on fine-labeled datasets tend to be over-fit and less generalizable. Therefore, a deep SOD model which can be trained autonomously without manual intervention becomes a research hotspot.

We note that most of the deep SOD models published today process information in a single sensing pathway. For example, in SOD based on full convolutional neural network (FCN), different features come from different convolutional layers, and the fusion of features is always performed in the same FCN; the single deep SOD model rarely exhibits perceptual fusion at the decision level. Although a few multi-sensing channel or multi-sensing branch detection models can fuse multi-sensing channel results, most of them only aim at extraction and fusion of multi-scale features, and rarely reveal the interaction and the interrelation of a multi-channel perception system.

Psychological and physiological experiments show that human intuition and memory can produce visual perception simultaneously and interact with each other. For example, the human binocular system forms two channels for processing visual information, and should have other functions in addition to being able to form stereoscopic vision. We believe that simulating human binocular perception, establishing two parallel, slightly different SOD sensing channels may be beneficial for the significant target detection task. The system with multiple sensing channels can simultaneously generate target sensing and output sensing difference. When the outputs of a plurality of sensing channels are very similar, namely the difference is very small, the same target is detected by multiple channels at the same time, and the sensing tends to be saturated; smaller multi-channel perceptual differences correspond to higher visual perceptual saturation, which can be represented as a confidence level of the target detected by the multi-channel system. By using the mechanism, a remarkable target with high reliability in an image can be automatically found by constructing a multi-channel algorithm, and a high-reliability target region is used as an automatic labeling sample for iteratively updating a deep SOD model.

Disclosure of Invention

In view of this, the invention proposes a frame of a salient object detection algorithm with two perception channels by simulating human binocular vision: finding a target area with high reliability by comparing the difference of the two-channel visual perception; constructing a new training sample set through a high-reliability target area; and continuously optimizing the target detection model through self iterative learning. The purpose of the invention is realized by the following technical scheme:

1) constructing two parallel visual perception channels by two different supervised deep SOD models to form a set of remarkable target detection framework with double visual information flows;

2) judging the perception saturation degree of the salient target area by comparing the difference between the binaryzation masks of the two perception channels outputting the salient images at the same moment; if the difference is small, the saturation is high, and if the difference is large, the saturation is low.

3) When the perception saturation output in the step 2) exceeds a preset empirical threshold, the visual perception is considered to be close to saturation, the saliency maps output by the two channels can be superposed to generate a final saliency map, and the binary mask of the saliency map is considered as a target area with high reliability; therefore, a certain number of automatically labeled target areas with high reliability are collected, and a training sample set with algorithm self-labeling is formed and is used for further self-learning updating of the two supervised deep SOD models in the step 1). If the perception saturation value output in the step 2) is smaller than a preset threshold value, the visual perception is indicated to be undersaturated, and the credibility of the detected and output significant target area is low, so that the significant target area cannot be selected to enter a training sample set.

4) When the SOD model in the step 1) is updated, the method can obtain a remarkable target detection result with better performance.

5) Under the condition of a limited number of test data, after the deep SOD model in the method is updated iteratively for 1-2 times, the detection precision of the obvious target is not obviously improved any more, and the system performance tends to be saturated; in order to obtain better performance, the algorithm needs to process a larger-scale data set, collect more training samples with high reliability, and iteratively update the SOD model in the dual channels.

Drawings

FIG. 1 is a block diagram of an autonomous learning method in visual saliency detection;

Detailed Description

The present invention is further illustrated by the following specific examples, but the present invention is not limited to these examples.

The invention is intended to cover alternatives, modifications, equivalents, and alternatives that may be included within the spirit and scope of the invention. In the following description of the preferred embodiments of the present invention, specific details are set forth in order to provide a thorough understanding of the present invention, and it will be apparent to those skilled in the art that the present invention may be practiced without these specific details.

As shown in FIG. 1, the implementation of the self-learning method in the detection of the visually significant object of the present invention comprises the following steps:

1) a parallel two-channel salient object detection system is designed. The system consists of two detection channels with similar structures, simulates a human eye binocular system, and obtains high-reliability target perception through a visual saturation mechanism;

2) each detection channel can respectively adopt a current new deep SOD model, such as PicANet and F3 Net; firstly, utilizing an initial training set to perform offline pre-training to generate two deep SOD models to form an initial system; the initial system detects the image salient object to obtain salient images output by two channels, and two binary mask images are obtained after thresholding the salient images; comparing the two mask regions, and measuring the difference between the masks by using an F-measure parameter (see formula (1)); the larger the F-measure value is, the higher the perceived saturation is, and conversely, the lower the saturation is.

Wherein:

β²are empirical weighting factors. M1 is a sense channel output binary mask, and M2 is another sense channel output binary mask.

3) And when the F-measure value representing the perception saturation is larger than the threshold th, the output saliency maps of the two perception channels are superposed and fused together to form a new output map, and the binary mask map of the map is used as an automatic labeling map and becomes a new training sample. Through a large number of image tests, a certain number of automatic annotation graphs with high reliability can be collected to form a new training sample set.

4) The new training sample set may be used for iterative updating of the deep SOD model in both detection channels.

5) Because the test data is limited, the number of the high-reliability marking samples automatically obtained by the method does not increase any more after reaching a certain number, and tends to be saturated; the iterative updating of the deep SOD model is influenced by the influence, the detection precision of the obvious target is not obviously improved any more after the iterative updating is carried out for 1-2 times, and the system performance tends to be saturated. To achieve better performance, the algorithm can process a larger-scale data set, generate more training samples with high confidence, and iteratively update the SOD models in the two channels.

The foregoing is illustrative of the preferred embodiments of the present invention only and is not to be construed as limiting the claims. The present invention is not limited to the above embodiments, and the specific structure thereof is allowed to vary. In general, all changes which come within the scope of the invention as defined by the independent claims are intended to be embraced therein.

Claims

1. a self-learning method in visual saliency detection, is characterized in that: realize by following each step:

1) With the help of two different supervised deep salient object detection (SOD) models, two parallel visual perception channels are constructed to form a set of salient object detection framework with dual visual information flow; by comparing the two perceptions at the same time The difference between the binarized masks of the channel output saliency map measures the perceptual saturation of the salient target area; if the mask difference is small, the saturation is high, and if the difference is large, the saturation is low; the target-perceived saturation can be used as a Expression of object detection reliability.

2) When the perceptual saturation output in step 1) exceeds a preset empirical threshold, it is considered that the visual perception is close to saturation, and the saliency map output by the dual-channel output can be superimposed to generate a final saliency map. The binarization mask of the saliency map is It is regarded as a high-credibility target area; a certain number of automatically marked target areas with high reliability are collected to form a training sample set independently marked by the algorithm, which is used for the further development of the two supervised deep SOD models in step 1). Self-learning and updating; on the contrary, if the perceptual saturation value output in step 1) is less than the preset threshold, it indicates that the visual perception is under-saturated. At this time, the salient target area of the detection output has low reliability and will not be selected into the training sample set.

3) After collecting a certain number of self-labeled training samples, the two SOD models in step 1) can be retrained and updated; the updated SOD model can obtain significant target detection results with better performance; under the condition of limited test data, continuous By iteratively updating the SOD model, the salient target detection performance of the entire system will no longer improve and tend to saturate.