[go: up one dir, main page]

CN113837200A - Autonomous learning method in visual saliency detection - Google Patents

Autonomous learning method in visual saliency detection Download PDF

Info

Publication number
CN113837200A
CN113837200A CN202111012352.5A CN202111012352A CN113837200A CN 113837200 A CN113837200 A CN 113837200A CN 202111012352 A CN202111012352 A CN 202111012352A CN 113837200 A CN113837200 A CN 113837200A
Authority
CN
China
Prior art keywords
saturation
sod
detection
saliency map
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111012352.5A
Other languages
Chinese (zh)
Inventor
王涵宇
王致畅
边疆
裴轶敏
章涛
潘晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Jiliang University
Original Assignee
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Jiliang University filed Critical China Jiliang University
Priority to CN202111012352.5A priority Critical patent/CN113837200A/en
Publication of CN113837200A publication Critical patent/CN113837200A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种视觉显著性检测中的自主学习方法,包括以下各个步骤:(1)借助两个有监督的深度SOD模型构造两个并行的视觉感知通道,形成一套双视觉信息流的显著目标检测框架;(2)比较同一时刻两个感知通道输出显著图的二值化掩膜之间的差异,判断显著目标区域的感知饱和程度;(3)若感知饱和度大,双通道输出的显著图可叠加生成一个最终显著图,该显著图的二值化掩膜被视为高可信度的目标区域;由此收集一定数量可信度高的自动标注目标区域,形成算法自主标注的训练样本集,用于步骤(1)中两个有监督的深度SOD模型的进一步自主学习更新。The invention discloses an autonomous learning method in visual saliency detection, comprising the following steps: (1) Constructing two parallel visual perception channels by means of two supervised deep SOD models, forming a set of dual visual information flow Salient target detection framework; (2) Compare the difference between the binarized masks of the saliency map output by the two perceptual channels at the same time, and judge the perceptual saturation degree of the salient target area; (3) If the perceptual saturation is large, the dual-channel output The saliency map can be superimposed to generate a final saliency map, and the binarized mask of the saliency map is regarded as a high-confidence target area; thus a certain number of automatically-labeled target areas with high reliability are collected to form the algorithm's autonomous labeling The set of training samples for further autonomous learning updates of the two supervised deep SOD models in step (1).

Description

Autonomous learning method in visual saliency detection
Technical Field
The invention relates to the technical field of computer vision, in particular to a method for autonomously learning in salient object detection by means of a visual perception saturation mechanism.
Background
Visual attention and visual saliency are fundamental research issues in psychology, bioneurology, cognitive science, and computer vision. In recent decades, a hundred-fold multi-ocular-spot prediction fp (visualization prediction) and saliency detection sod (salient object detection) method has been proposed for visual attention modeling. The best performance is currently the deep SOD model based on the deep learning framework. However, the most difficult step to construct a deep learning model is to manually label a large number of training sample graphs at the pixel level, and the overall system performance depends on manually labeled data. However, manual labeling is time consuming and expensive. Furthermore, models trained on fine-labeled datasets tend to be over-fit and less generalizable. Therefore, a deep SOD model which can be trained autonomously without manual intervention becomes a research hotspot.
We note that most of the deep SOD models published today process information in a single sensing pathway. For example, in SOD based on full convolutional neural network (FCN), different features come from different convolutional layers, and the fusion of features is always performed in the same FCN; the single deep SOD model rarely exhibits perceptual fusion at the decision level. Although a few multi-sensing channel or multi-sensing branch detection models can fuse multi-sensing channel results, most of them only aim at extraction and fusion of multi-scale features, and rarely reveal the interaction and the interrelation of a multi-channel perception system.
Psychological and physiological experiments show that human intuition and memory can produce visual perception simultaneously and interact with each other. For example, the human binocular system forms two channels for processing visual information, and should have other functions in addition to being able to form stereoscopic vision. We believe that simulating human binocular perception, establishing two parallel, slightly different SOD sensing channels may be beneficial for the significant target detection task. The system with multiple sensing channels can simultaneously generate target sensing and output sensing difference. When the outputs of a plurality of sensing channels are very similar, namely the difference is very small, the same target is detected by multiple channels at the same time, and the sensing tends to be saturated; smaller multi-channel perceptual differences correspond to higher visual perceptual saturation, which can be represented as a confidence level of the target detected by the multi-channel system. By using the mechanism, a remarkable target with high reliability in an image can be automatically found by constructing a multi-channel algorithm, and a high-reliability target region is used as an automatic labeling sample for iteratively updating a deep SOD model.
Disclosure of Invention
In view of this, the invention proposes a frame of a salient object detection algorithm with two perception channels by simulating human binocular vision: finding a target area with high reliability by comparing the difference of the two-channel visual perception; constructing a new training sample set through a high-reliability target area; and continuously optimizing the target detection model through self iterative learning. The purpose of the invention is realized by the following technical scheme:
1) constructing two parallel visual perception channels by two different supervised deep SOD models to form a set of remarkable target detection framework with double visual information flows;
2) judging the perception saturation degree of the salient target area by comparing the difference between the binaryzation masks of the two perception channels outputting the salient images at the same moment; if the difference is small, the saturation is high, and if the difference is large, the saturation is low.
3) When the perception saturation output in the step 2) exceeds a preset empirical threshold, the visual perception is considered to be close to saturation, the saliency maps output by the two channels can be superposed to generate a final saliency map, and the binary mask of the saliency map is considered as a target area with high reliability; therefore, a certain number of automatically labeled target areas with high reliability are collected, and a training sample set with algorithm self-labeling is formed and is used for further self-learning updating of the two supervised deep SOD models in the step 1). If the perception saturation value output in the step 2) is smaller than a preset threshold value, the visual perception is indicated to be undersaturated, and the credibility of the detected and output significant target area is low, so that the significant target area cannot be selected to enter a training sample set.
4) When the SOD model in the step 1) is updated, the method can obtain a remarkable target detection result with better performance.
5) Under the condition of a limited number of test data, after the deep SOD model in the method is updated iteratively for 1-2 times, the detection precision of the obvious target is not obviously improved any more, and the system performance tends to be saturated; in order to obtain better performance, the algorithm needs to process a larger-scale data set, collect more training samples with high reliability, and iteratively update the SOD model in the dual channels.
Drawings
FIG. 1 is a block diagram of an autonomous learning method in visual saliency detection;
Detailed Description
The present invention is further illustrated by the following specific examples, but the present invention is not limited to these examples.
The invention is intended to cover alternatives, modifications, equivalents, and alternatives that may be included within the spirit and scope of the invention. In the following description of the preferred embodiments of the present invention, specific details are set forth in order to provide a thorough understanding of the present invention, and it will be apparent to those skilled in the art that the present invention may be practiced without these specific details.
As shown in FIG. 1, the implementation of the self-learning method in the detection of the visually significant object of the present invention comprises the following steps:
1) a parallel two-channel salient object detection system is designed. The system consists of two detection channels with similar structures, simulates a human eye binocular system, and obtains high-reliability target perception through a visual saturation mechanism;
2) each detection channel can respectively adopt a current new deep SOD model, such as PicANet and F3 Net; firstly, utilizing an initial training set to perform offline pre-training to generate two deep SOD models to form an initial system; the initial system detects the image salient object to obtain salient images output by two channels, and two binary mask images are obtained after thresholding the salient images; comparing the two mask regions, and measuring the difference between the masks by using an F-measure parameter (see formula (1)); the larger the F-measure value is, the higher the perceived saturation is, and conversely, the lower the saturation is.
Figure RE-GDA0003368657330000031
Wherein:
Figure RE-GDA0003368657330000032
β2are empirical weighting factors. M1 is a sense channel output binary mask, and M2 is another sense channel output binary mask.
3) And when the F-measure value representing the perception saturation is larger than the threshold th, the output saliency maps of the two perception channels are superposed and fused together to form a new output map, and the binary mask map of the map is used as an automatic labeling map and becomes a new training sample. Through a large number of image tests, a certain number of automatic annotation graphs with high reliability can be collected to form a new training sample set.
4) The new training sample set may be used for iterative updating of the deep SOD model in both detection channels.
5) Because the test data is limited, the number of the high-reliability marking samples automatically obtained by the method does not increase any more after reaching a certain number, and tends to be saturated; the iterative updating of the deep SOD model is influenced by the influence, the detection precision of the obvious target is not obviously improved any more after the iterative updating is carried out for 1-2 times, and the system performance tends to be saturated. To achieve better performance, the algorithm can process a larger-scale data set, generate more training samples with high confidence, and iteratively update the SOD models in the two channels.
The foregoing is illustrative of the preferred embodiments of the present invention only and is not to be construed as limiting the claims. The present invention is not limited to the above embodiments, and the specific structure thereof is allowed to vary. In general, all changes which come within the scope of the invention as defined by the independent claims are intended to be embraced therein.

Claims (1)

1.一种视觉显著性检测中的自主学习方法,其特征在于:通过以下各个步骤实现:1. a self-learning method in visual saliency detection, is characterized in that: realize by following each step: 1)借助两个不同的有监督深度显著目标检测(Salient Object detection,SOD)模型构造两个并行的视觉感知通道,形成一套双视觉信息流的显著目标检测框架;通过比较同一时刻两个感知通道输出显著图的二值化掩膜之间的差异,度量显著目标区域的感知饱和程度;若掩膜差异小则饱和度高,差异大则饱和度低;目标感知的饱和度可作为一种目标检测可信度的表达。1) With the help of two different supervised deep salient object detection (SOD) models, two parallel visual perception channels are constructed to form a set of salient object detection framework with dual visual information flow; by comparing the two perceptions at the same time The difference between the binarized masks of the channel output saliency map measures the perceptual saturation of the salient target area; if the mask difference is small, the saturation is high, and if the difference is large, the saturation is low; the target-perceived saturation can be used as a Expression of object detection reliability. 2)当步骤1)输出的感知饱和度超过一个预设的经验阈值,则认为视觉感知接近饱和,双通道输出的显著图可叠加生成一个最终显著图,该显著图的二值化掩膜被视为高可信度的目标区域;由此收集一定数量可信度高的自动标注目标区域,形成算法自主标注的训练样本集,用于步骤1)中两个有监督的深度SOD模型的进一步自主学习更新;反之,若步骤1)输出的感知饱和度值小于预设的阈值,表明视觉感知欠饱和,此时检测输出的显著目标区域可信度低,不会被选择进入训练样本集。2) When the perceptual saturation output in step 1) exceeds a preset empirical threshold, it is considered that the visual perception is close to saturation, and the saliency map output by the dual-channel output can be superimposed to generate a final saliency map. The binarization mask of the saliency map is It is regarded as a high-credibility target area; a certain number of automatically marked target areas with high reliability are collected to form a training sample set independently marked by the algorithm, which is used for the further development of the two supervised deep SOD models in step 1). Self-learning and updating; on the contrary, if the perceptual saturation value output in step 1) is less than the preset threshold, it indicates that the visual perception is under-saturated. At this time, the salient target area of the detection output has low reliability and will not be selected into the training sample set. 3)收集一定数量自主标注的训练样本后,步骤1)中的两个SOD模型可被重新训练更新;更新的SOD模型能够获得性能更好的显著目标检测结果;在有限测试数据条件下,连续迭代更新SOD模型,整个系统的显著目标检测性能会不再提升,趋于饱和。3) After collecting a certain number of self-labeled training samples, the two SOD models in step 1) can be retrained and updated; the updated SOD model can obtain significant target detection results with better performance; under the condition of limited test data, continuous By iteratively updating the SOD model, the salient target detection performance of the entire system will no longer improve and tend to saturate.
CN202111012352.5A 2021-08-31 2021-08-31 Autonomous learning method in visual saliency detection Pending CN113837200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111012352.5A CN113837200A (en) 2021-08-31 2021-08-31 Autonomous learning method in visual saliency detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111012352.5A CN113837200A (en) 2021-08-31 2021-08-31 Autonomous learning method in visual saliency detection

Publications (1)

Publication Number Publication Date
CN113837200A true CN113837200A (en) 2021-12-24

Family

ID=78961746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111012352.5A Pending CN113837200A (en) 2021-08-31 2021-08-31 Autonomous learning method in visual saliency detection

Country Status (1)

Country Link
CN (1) CN113837200A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015163830A1 (en) * 2014-04-22 2015-10-29 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Target localization and size estimation via multiple model learning in visual tracking
CN106530684A (en) * 2015-09-11 2017-03-22 杭州海康威视系统技术有限公司 Method and device of processing traffic road information
CN106780468A (en) * 2016-12-22 2017-05-31 中国计量大学 View-based access control model perceives the conspicuousness detection method of positive feedback
CN108647695A (en) * 2018-05-02 2018-10-12 武汉科技大学 Soft image conspicuousness detection method based on covariance convolutional neural networks
CN110443784A (en) * 2019-07-11 2019-11-12 中国科学院大学 A kind of effective conspicuousness prediction model method
CN110751005A (en) * 2018-07-23 2020-02-04 合肥工业大学 Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN111432207A (en) * 2020-03-30 2020-07-17 北京航空航天大学 A perceptual HD video coding method based on salient object detection and saliency guidance

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015163830A1 (en) * 2014-04-22 2015-10-29 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Target localization and size estimation via multiple model learning in visual tracking
CN106530684A (en) * 2015-09-11 2017-03-22 杭州海康威视系统技术有限公司 Method and device of processing traffic road information
CN106780468A (en) * 2016-12-22 2017-05-31 中国计量大学 View-based access control model perceives the conspicuousness detection method of positive feedback
CN108647695A (en) * 2018-05-02 2018-10-12 武汉科技大学 Soft image conspicuousness detection method based on covariance convolutional neural networks
CN110751005A (en) * 2018-07-23 2020-02-04 合肥工业大学 Pedestrian detection method integrating depth perception features and kernel extreme learning machine
CN110443784A (en) * 2019-07-11 2019-11-12 中国科学院大学 A kind of effective conspicuousness prediction model method
CN111432207A (en) * 2020-03-30 2020-07-17 北京航空航天大学 A perceptual HD video coding method based on salient object detection and saliency guidance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAN LI 等: "Visual Saliency Based on Scale-Space Analysis in the Frequency Domain", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, vol. 35, no. 4, pages 996 - 1010, XP011495348, DOI: 10.1109/TPAMI.2012.147 *
侯淇彬: "基于视觉注意机制的语义分割自主学习", 《中国博士学位论文全文数据库 信息科技辑》, pages 1 - 105 *

Similar Documents

Publication Publication Date Title
CN110909780B (en) Image recognition model training and image recognition method, device and system
US11494902B2 (en) Systems and methods for automatic detection and quantification of pathology using dynamic feature classification
KR102631031B1 (en) Method for detecting defects in semiconductor device
CN108830326B (en) Automatic segmentation method and device for MRI (magnetic resonance imaging) image
CN107316300B (en) Tire X-ray defect detection method based on deep convolutional neural network
Funke et al. Efficient automatic 3D-reconstruction of branching neurons from EM data
US20190156202A1 (en) Model construction in a neural network for object detection
Imran et al. Automated identification of cataract severity using retinal fundus images
CN114445670B (en) Training method, device and equipment of image processing model and storage medium
CN107368798A (en) A kind of crowd's Emotion identification method based on deep learning
CN106897738A (en) A kind of pedestrian detection method based on semi-supervised learning
Bi et al. Local-global dual perception based deep multiple instance learning for retinal disease classification
Habib et al. Staircase Detection to Guide Visually Impaired People: A Hybrid Approach.
CN111899247A (en) Method, device, equipment and medium for identifying lumen region of choroidal blood vessel
CN113313169B (en) Deep learning-based intelligent recognition method, device and equipment for training material
CN114330499A (en) Method, device, equipment, storage medium and program product for training classification model
CN111709914A (en) A No-reference Image Quality Evaluation Method Based on HVS Characteristics
CN112581450A (en) Pollen detection method based on expansion convolution pyramid and multi-scale pyramid
CN107330441B (en) Flame Image Foreground Extraction Algorithm
CN114022811A (en) Water surface floater monitoring method and system based on continuous learning
CN113065551A (en) Method for performing image segmentation using a deep neural network model
EP4511797A1 (en) Segmentation of optical coherence tomography (oct) images
CN106023159A (en) Disease spot image segmentation method and system for greenhouse vegetable leaf
CN117115517A (en) Biological aeration detection and identification method based on YOLOv5-Segment instance segmentation
CN113850762B (en) Eye disease recognition method, device, equipment and storage medium based on anterior segment image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211224

WD01 Invention patent application deemed withdrawn after publication