CN112070784A - Perception edge detection method based on context enhancement network - Google Patents
Perception edge detection method based on context enhancement network Download PDFInfo
- Publication number
- CN112070784A CN112070784A CN202010965729.8A CN202010965729A CN112070784A CN 112070784 A CN112070784 A CN 112070784A CN 202010965729 A CN202010965729 A CN 202010965729A CN 112070784 A CN112070784 A CN 112070784A
- Authority
- CN
- China
- Prior art keywords
- feature
- output
- convolution
- unit
- clstm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000003708 edge detection Methods 0.000 title claims abstract description 29
- 230000008447 perception Effects 0.000 title abstract 2
- 230000006870 function Effects 0.000 claims description 36
- 238000013528 artificial neural network Methods 0.000 claims description 24
- 230000000306 recurrent effect Effects 0.000 claims description 21
- 230000004913 activation Effects 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 13
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 4
- 230000002457 bidirectional effect Effects 0.000 abstract description 2
- 238000001514 detection method Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明涉及图像处理领域,具体涉及一种基于上下文增强网络的感知边缘检测方法。The invention relates to the field of image processing, in particular to a perceptual edge detection method based on a context enhancement network.
背景技术Background technique
图像边缘是指其周围像素灰度有阶跃变化和屋顶变化的那些像素的集合,是图像最基本的特征之一。图像边缘往往携带一幅图像的大部分信息,边缘检测在计算机视觉、图像处理等应用中起着重要的作用,是图像分析与识别的重要环节,因此图像的边缘检测一直是人们研究的热门课题。Image edge refers to the set of pixels whose surrounding pixels have step change and roof change, and is one of the most basic features of the image. Image edges often carry most of the information of an image. Edge detection plays an important role in computer vision, image processing and other applications, and is an important part of image analysis and recognition. Therefore, image edge detection has always been a hot topic of research. .
传统的边缘检测方法通常以颜色以及亮度等特征来寻找图像的边缘点,如Canny、Sobel算子。但由于自然图像的复杂性,仅利用梯度和颜色等特征难以实现准确的边缘检测,一些学者尝试通过数据驱动的方式将梯度、颜色以及亮度等多种低级特征用于边缘检测,如gPb+UCM、StrucutredEdge等方法,虽然该类方法相较基于梯度的方法取得了一定的提高,但由于该类方法仅利用了图像的低级特征,在特殊场景下难以实现检测的鲁棒性。为了捕捉图像的高层次特征,一些学者尝试利用卷积神经网络来提取图像块特征,提出了一些经典的边缘检测方法,如N4Fields、DeepEdge等,得益于卷积神经网络强大的特征提取能力,该类方法在检测精度上得到了较好的提升,但图像块的处理方式不利于捕捉图像的全局特征,基于此,xie等人提出一种端到端学习的边缘检测方法,通过结合多层次、多尺度特征,使得边缘检测精度得到了大幅提升,在此基础上,Liu等人通过融合每个尺度中大小不同的感受野特征以提高每个尺度的表达能力,提高边缘检测的精度。Traditional edge detection methods usually use features such as color and brightness to find edge points of images, such as Canny and Sobel operators. However, due to the complexity of natural images, it is difficult to achieve accurate edge detection only by using features such as gradient and color. Some scholars try to use various low-level features such as gradient, color and brightness for edge detection in a data-driven way, such as gPb+UCM. , StrucutredEdge and other methods, although this type of method has achieved a certain improvement compared with the gradient-based method, but because this type of method only utilizes the low-level features of the image, it is difficult to achieve robust detection in special scenarios. In order to capture the high-level features of images, some scholars try to use convolutional neural networks to extract image block features, and propose some classic edge detection methods, such as N4Fields, DeepEdge, etc., thanks to the powerful feature extraction capabilities of convolutional neural networks, This type of method has been improved in detection accuracy, but the processing method of image blocks is not conducive to capturing the global features of the image. Based on this, Xie et al. proposed an end-to-end learning edge detection method. , multi-scale features, so that the edge detection accuracy has been greatly improved. On this basis, Liu et al. improved the expressive ability of each scale and improved the accuracy of edge detection by fusing the receptive field features of different sizes in each scale.
虽然近年来提出的一系列基于多尺度结合的边缘检测方法在检测精度上取得了一定的提升,但是这些方法普遍存在着大尺度特征语义信息少,误分类多,小尺度空间信息少,定位不精准等问题,而通过对不同尺度输出加权的方式不能很好解决以上问题,如何对图像多尺度特征进行信息互补、增强每个尺度的分类准确性是目前的一个难点。Although a series of edge detection methods based on multi-scale combination proposed in recent years have achieved a certain improvement in detection accuracy, these methods generally suffer from less large-scale feature semantic information, more misclassification, less small-scale spatial information, and poor localization. However, the above problems cannot be well solved by weighting the output of different scales. How to complement the information of the multi-scale features of the image and enhance the classification accuracy of each scale is a difficulty at present.
发明内容SUMMARY OF THE INVENTION
针对现有基于多尺度结构的边缘检测方法中各尺度特征互相独立,存在着大尺度特征语义信息少,误分类多,小尺度空间信息少,小目标信息丢失严重等问题,本发明提供一种基于上下文增强网络的感知边缘检测方法。这种方法通过捕捉多尺度上下文信息之间的内在联系提升边缘检测的准确性,具有速度快的优点。Aiming at the problems of the existing edge detection methods based on multi-scale structure that the features of each scale are independent of each other, the large-scale features have less semantic information, more misclassification, less small-scale spatial information, and serious loss of small target information, the present invention provides an A perceptual edge detection method based on context-augmented networks. This method improves the accuracy of edge detection by capturing the intrinsic connection between multi-scale contextual information, and has the advantage of being fast.
实现本发明目的的技术方案是:The technical scheme that realizes the object of the present invention is:
一种基于上下文增强网络的感知边缘检测方法,包括如下步骤:A method for perceptual edge detection based on a context-enhanced network, comprising the following steps:
(1)获取图像训练数据集训练边缘检测模型:所述边缘检测模型包括特征提取过程、神经网络双向递归过程和分类过程,具体为:(1) Obtain the image training data set and train the edge detection model: the edge detection model includes a feature extraction process, a neural network two-way recursive process and a classification process, specifically:
特征提取过程:将样本图像x映射为5组具有d维的特征,特征提取网络由5个CSU模块组成,作为一个可训练的特征提取网络将输入的样本图像x映射为5组具有21维的特征,即其中CSU模块里包含横向和纵向两个支路,每个CSU模块的纵向支路数N从前到后分别为{2,2,3,3,3},纵向支路负责提取高维图像特征,横向支路负责进行特征的聚合以及上采样,过程可以通过公式(1)、公式(2)进行表示:Feature extraction process: map the sample image x into 5 groups of d-dimensional features. The feature extraction network consists of 5 CSU modules. As a trainable feature extraction network, the input sample image x is mapped into 5 groups of 21-dimensional features. feature, that is The CSU module contains two branches, horizontal and vertical. The number of vertical branches N of each CSU module is {2, 2, 3, 3, 3} from front to back. The vertical branch is responsible for extracting high-dimensional image features. The lateral branch is responsible for feature aggregation and upsampling, and the process can be expressed by formula (1) and formula (2):
其中,i为第i个CSU模块,n为CSU模块中第n个纵向支路卷积,W为纵向卷积核参数,卷积核大小统一为3×3,下同,dW为横向卷积核参数,卷积核大小统一为1×1,Φ为Relu激活函数,up(·)为双线性插值函数,聚合操作采用1×1卷积实现;Among them, i is the i-th CSU module, n is the n-th longitudinal branch convolution in the CSU module, W is the longitudinal convolution kernel parameter, and the size of the convolution kernel is uniformly 3×3, the same below, dW is the horizontal convolution Kernel parameters, the size of the convolution kernel is uniformly 1×1, Φ is the Relu activation function, up( ) is a bilinear interpolation function, and the aggregation operation is implemented by 1×1 convolution;
(2)将5个特征组[x1,x2,x3,x4,x5]按照正反序列方向分别有序输入到两个递归神经网络,得到前向递归神经网络的5个特征输出和后向递归神经网络的5个输出其中递归神经网络由5个CLSTM单元串联组成,CLSTM单元内部设有三部分,分别为输入单元it、输出单元ot、遗忘单元ft,递归过程如下所示:(2) Input the five feature groups [x 1 , x 2 , x 3 , x 4 , x 5 ] into two recurrent neural networks in an orderly manner according to the forward and reverse sequence directions, and obtain five features of the forward recurrent neural network output and the 5 outputs of the backward recurrent neural network The recurrent neural network is composed of 5 CLSTM units connected in series. There are three parts inside the CLSTM unit, which are the input unit i t , the output unit ot , and the forgetting unit ft . The recursive process is as follows:
输入:上一CLSTM单元的记忆特征ct-1,当前时刻输入xt,上一CLSTM单元输出ht-1,Input: memory feature c t-1 of the previous CLSTM unit, input x t at the current moment, output h t-1 of the previous CLSTM unit,
2-1)由遗忘门ft对上一CLSTM单元的记忆特征ct-1进行筛选,输出c'为公式(3):2-1) The memory feature c t-1 of the previous CLSTM unit is screened by the forget gate f t , and the output c' is formula (3):
c'=ft·ct-1 (3),c'=f t ·c t-1 (3),
ft=σ(Wf*[xt,ht-1]+bf) (4),f t =σ(W f *[x t ,h t-1 ]+b f ) (4),
其中Wf为1×1卷积权重,bf为偏置项,xt为当前时刻输入,ht-1为上一CLSTM单元输出,σ为Sigmoid激活函数,ct-1为上一CLSTM单元的记忆特征,遗忘门旨在对上一单元的记忆特征选择筛选,上一单元通过1×1卷积将xt,ht-1的特征信息进行融合,经过Sigmoid激活函数生成一张权重图,并利用该权重图对记忆特征进行选择和筛选;where W f is the 1×1 convolution weight, b f is the bias term, x t is the input at the current moment, h t-1 is the output of the previous CLSTM unit, σ is the sigmoid activation function, and c t-1 is the previous CLSTM The memory features of the unit, the forget gate is designed to select and filter the memory features of the previous unit. The previous unit fuses the feature information of x t , h t-1 through 1×1 convolution, and generates a weight through the sigmoid activation function. map, and use the weight map to select and filter memory features;
2-2)由输入门it对输入特征进行筛选,并与筛选后的记忆特征c'加权,得到当前CLSTM单元的记忆特征ct为公式(5):2-2) The input feature is screened by the input gate i t , and weighted with the screened memory feature c' to obtain the memory feature c t of the current CLSTM unit as formula (5):
ct=c'+(1+it)×Φ(Wx*[xt,ht-1]+bc) (5),c t =c'+(1+i t )×Φ(W x *[x t ,h t-1 ]+b c ) (5),
it=σ(Wi*[xt,ht-1]+bi) (6),i t =σ(W i *[x t ,h t-1 ]+b i ) (6),
其中Wx、Wi为卷积权重,bi、bc为偏置项,Φ为Relu函数,σ为Sigmoid函数,*为卷积操作,输入门旨在对输入特征的重要性进行建模,生成特征权重矩阵,对输入特征进行选择性增强,当前CLSTM单元通过1×1卷积将xt,ht-1的特征信息进行融合,经过Sigmoid激活函数生成一张权重图,并利用该权重图对融合后的输入特征信息进行选择性增强;where W x and Wi are the convolution weights, bi and b c are the bias terms, Φ is the Relu function, σ is the Sigmoid function, * is the convolution operation, and the input gate is designed to model the importance of the input features , generate a feature weight matrix, and selectively enhance the input features. The current CLSTM unit fuses the feature information of x t and h t-1 through 1×1 convolution, and generates a weight map through the Sigmoid activation function. The weight map selectively enhances the fused input feature information;
2-3)由输出门ot对记忆特征进行选择性增强,得到当前CLSTM输出ht为公式(7):2-3) The memory feature is selectively enhanced by the output gate o t , and the current CLSTM output h t is obtained as formula (7):
ht=(ot+1)×Φ(ct) (7),h t =(o t +1)×Φ(c t ) (7),
ot=σ(Wo*[xt,ht-1]+bo) (8),o t =σ(W o *[x t ,h t-1 ]+b o ) (8),
其中,Φ为Relu函数,σ为Sigmoid函数,*为卷积操作,Wo为卷积权重,bo为偏置项;Among them, Φ is the Relu function, σ is the Sigmoid function, * is the convolution operation, W o is the convolution weight, and b o is the bias term;
输出:当前CLSTM单元的记忆特征ct,当前CLSTM单元输出ht;Output: the memory feature c t of the current CLSTM unit, the output h t of the current CLSTM unit;
(3)将得到的前向支路特征张量以及后向支路特征张量分别经过卷积操作将特征通道数降至1,得到以及过程如公式(9)、公式(10)所示:(3) The obtained forward branch feature tensor and the backward branch feature tensor After the convolution operation, the number of feature channels is reduced to 1, and we get as well as The process is shown in formula (9) and formula (10):
(4)将特征张量在维度1进行通道拼接,得到张量so,采用1×1卷积对张量so进行加权融合,经过Sigmoid激活函数得到最终输出sout为公式(11):(4) The feature tensor Perform channel splicing in
sout=Wso*so (11),sout=Wso* so (11),
(5)将分别经过Sigmoid激活函数,得到10个支路输出 (5) will After passing through the Sigmoid activation function respectively, 10 branch outputs are obtained
(6)采用交叉熵lw(Xi)计算所有输出与标签的损失,通过最小化Lw损失优化整体网络,由于数据集中标签由多人进行标注,本技术方案对多个标签进行加权平均,并按照阀值θ对标签进行划分,以yi<θ的像素点为歧义点,不计算损失,yi=θ为非边缘点,yi≥θ为边缘点:(6) Calculate the loss of all outputs and labels by using the cross entropy lw (X i ), and optimize the overall network by minimizing the loss of Lw . Since the labels in the data set are labelled by many people, this technical solution performs a weighted average of multiple labels. , and divide the label according to the threshold θ, take the pixel points with y i <θ as the ambiguous point, do not calculate the loss, y i =θ is the non-edge point, y i ≥ θ is the edge point:
其中,in,
|Y+|和|Y-|分别代表了每个batch中正样本数量总和以及负样本数量总和,超参数λ为正负样本平衡参数,X表示网络输出,yi为标签像素点,所以,最终的损失函数为公式(16)::|Y + | and |Y - | represent the sum of the number of positive samples and the total number of negative samples in each batch, respectively, the hyperparameter λ is the balance parameter of positive and negative samples, X represents the network output, and yi is the label pixel. Therefore, in the end, The loss function of is formula (16):
为前向递归神经网络第k阶段的输出,为后向递归神经网络第k阶段的输出,sout是最终网络的输出值,|I|为图像I的所有像素点数量; is the output of the kth stage of the forward recurrent neural network, is the output of the kth stage of the backward recurrent neural network, sout is the output value of the final network, |I| is the number of all pixels in the image I;
(7)重复步骤(1)-步骤(6),直到网络收敛。(7) Repeat steps (1) to (6) until the network converges.
本技术方案采用双向递归神经网络来增强多尺度上下文之间的内在联系,解决了目前基于多尺度方法各尺度信息利用不充分,边缘检测精度受限的问题,该方法根据多尺度上下文感受野在时间顺序上由小到大的特点,将多尺度上下文信息从时间维度进行分析建模,利用递归神经网络的思想,通过由上至下和由下至上的递归神经网络支路逐步捕捉不同方向且不同范围内的上下文联系,完成对特征金字塔表达能力的增强,具有检测精度高,速度快的优点。This technical solution uses a bidirectional recurrent neural network to enhance the internal connection between multi-scale contexts, and solves the problems of insufficient utilization of each scale information based on the current multi-scale method and limited edge detection accuracy. The characteristics of time sequence from small to large, the multi-scale context information is analyzed and modeled from the time dimension, using the idea of recurrent neural network, through the recurrent neural network branches from top to bottom and bottom to top to gradually capture different directions and The context connection in different ranges completes the enhancement of the expression ability of the feature pyramid, which has the advantages of high detection accuracy and fast speed.
这种方法通过捕捉多尺度上下文信息之间的内在联系提升边缘检测的准确性,具有速度快的优点。This method improves the accuracy of edge detection by capturing the intrinsic connection between multi-scale contextual information, and has the advantage of being fast.
附图说明Description of drawings
图1为实施例中CLSTM单元内部结构示意图;1 is a schematic diagram of the internal structure of a CLSTM unit in an embodiment;
图2为实施例中总体网络结构示意图;2 is a schematic diagram of an overall network structure in an embodiment;
图3为实施例中方法流程示意图。FIG. 3 is a schematic flowchart of the method in the embodiment.
具体实施方式Detailed ways
下面结合附图和实施例对本发明的内容作进一步的阐述,但不是对本发明的限定。The content of the present invention will be further elaborated below in conjunction with the accompanying drawings and embodiments, but it is not intended to limit the present invention.
实施例:Example:
参照图2、3,一种基于上下文增强网络的感知边缘检测方法,包括如下步骤:Referring to Figures 2 and 3, a method for perceptual edge detection based on a context-enhanced network, comprising the following steps:
(1)获取图像训练数据集训练边缘检测模型:所述边缘检测模型包括特征提取过程、神经网络双向递归过程和分类过程,具体为:(1) Obtain the image training data set and train the edge detection model: the edge detection model includes a feature extraction process, a neural network two-way recursive process and a classification process, specifically:
特征提取过程:将样本图像x映射为5组具有d维的特征,特征提取网络由5个CSU模块组成,作为一个可训练的特征提取网络将输入的样本图像x映射为5组具有21维的特征,即其中CSU模块里包含横向和纵向两个支路,每个CSU模块的纵向支路数N从前到后分别为{2,2,3,3,3},纵向支路负责提取高维图像特征,横向支路负责进行特征的聚合以及上采样,过程可以通过公式(1)、公式(2)进行表示:Feature extraction process: map the sample image x into 5 groups of d-dimensional features. The feature extraction network consists of 5 CSU modules. As a trainable feature extraction network, the input sample image x is mapped into 5 groups of 21-dimensional features. feature, that is The CSU module contains two branches, horizontal and vertical. The number of vertical branches N of each CSU module is {2, 2, 3, 3, 3} from front to back. The vertical branch is responsible for extracting high-dimensional image features. The lateral branch is responsible for feature aggregation and upsampling, and the process can be expressed by formula (1) and formula (2):
其中,i为第i个CSU模块,n为CSU模块中第n个纵向支路卷积,W为纵向卷积核参数,卷积核大小统一为3×3,下同,dW为横向卷积核参数,卷积核大小统一为1×1,Φ为Relu激活函数,up(·)为双线性插值函数,聚合操作采用1×1卷积实现;Among them, i is the i-th CSU module, n is the n-th longitudinal branch convolution in the CSU module, W is the longitudinal convolution kernel parameter, and the size of the convolution kernel is uniformly 3×3, the same below, dW is the horizontal convolution Kernel parameters, the size of the convolution kernel is uniformly 1×1, Φ is the Relu activation function, up( ) is a bilinear interpolation function, and the aggregation operation is implemented by 1×1 convolution;
(2)将5个特征组[x1,x2,x3,x4,x5]按照正反序列方向分别有序输入到两个递归神经网络,得到前向递归神经网络的5个特征输出和后向递归神经网络的5个输出其中递归神经网络由5个CLSTM单元串联组成,CLSTM单元内部设有三部分,分别为输入单元it、输出单元ot、遗忘单元ft,CLSTM单元内部结构如图1所示,递归过程如下所示:(2) Input the five feature groups [x 1 , x 2 , x 3 , x 4 , x 5 ] into two recurrent neural networks in an orderly manner according to the forward and reverse sequence directions, and obtain five features of the forward recurrent neural network output and the 5 outputs of the backward recurrent neural network The recurrent neural network is composed of 5 CLSTM units in series. There are three parts inside the CLSTM unit, namely the input unit it, the output unit ot, and the forgetting unit ft . The internal structure of the CLSTM unit is shown in Figure 1. The recursive process is as follows Show:
输入:上一CLSTM单元的记忆特征ct-1,当前时刻输入xt,上一CLSTM单元输出ht-1,Input: memory feature c t-1 of the previous CLSTM unit, input x t at the current moment, output h t-1 of the previous CLSTM unit,
2-1)由遗忘门ft对上一CLSTM单元的记忆特征ct-1进行筛选,输出c'为公式(3):2-1) The memory feature c t-1 of the previous CLSTM unit is screened by the forget gate f t , and the output c' is formula (3):
c'=ft·ct-1 (3),c'=f t ·c t-1 (3),
ft=σ(Wf*[xt,ht-1]+bf) (4),f t =σ(W f *[x t ,h t-1 ]+b f ) (4),
其中Wf为1×1卷积权重,bf为偏置项,xt为当前时刻输入,ht-1为上一CLSTM单元输出,σ为Sigmoid激活函数,ct-1为上一CLSTM单元的记忆特征,遗忘门旨在对上一单元的记忆特征选择筛选,上一单元通过1×1卷积将xt,ht-1的特征信息进行融合,经过Sigmoid激活函数生成一张权重图,并利用该权重图对记忆特征进行选择和筛选;where W f is the 1×1 convolution weight, b f is the bias term, x t is the input at the current moment, h t-1 is the output of the previous CLSTM unit, σ is the sigmoid activation function, and c t-1 is the previous CLSTM The memory features of the unit, the forget gate is designed to select and filter the memory features of the previous unit. The previous unit fuses the feature information of x t , h t-1 through 1×1 convolution, and generates a weight through the sigmoid activation function. map, and use the weight map to select and filter memory features;
2-2)由输入门it对输入特征进行筛选,并与筛选后的记忆特征c'加权,得到当前CLSTM单元的记忆特征ct为公式(5):2-2) The input feature is screened by the input gate i t , and weighted with the screened memory feature c' to obtain the memory feature c t of the current CLSTM unit as formula (5):
ct=c'+(1+it)×Φ(Wx*[xt,ht-1]+bc) (5),c t =c'+(1+i t )×Φ(W x *[x t ,h t-1 ]+b c ) (5),
it=σ(Wi*[xt,ht-1]+bi) (6),i t =σ(W i *[x t ,h t-1 ]+b i ) (6),
其中Wx、Wi为卷积权重,bi、bc为偏置项,Φ为Relu函数,σ为Sigmoid函数,*为卷积操作,输入门旨在对输入特征的重要性进行建模,生成特征权重矩阵,对输入特征进行选择性增强,当前CLSTM单元通过1×1卷积将xt,ht-1的特征信息进行融合,经过Sigmoid激活函数生成一张权重图,并利用该权重图对融合后的输入特征信息进行选择性增强;where W x and Wi are the convolution weights, bi and b c are the bias terms, Φ is the Relu function, σ is the Sigmoid function, * is the convolution operation, and the input gate is designed to model the importance of the input features , generate a feature weight matrix, and selectively enhance the input features. The current CLSTM unit fuses the feature information of x t and h t-1 through 1×1 convolution, and generates a weight map through the Sigmoid activation function. The weight map selectively enhances the fused input feature information;
2-3)由输出门ot对记忆特征进行选择性增强,得到当前CLSTM输出ht为公式(7):2-3) The memory feature is selectively enhanced by the output gate o t , and the current CLSTM output h t is obtained as formula (7):
ht=(ot+1)×Φ(ct) (7),h t =(o t +1)×Φ(c t ) (7),
ot=σ(Wo*[xt,ht-1]+bo) (8),其中,Φ为Relu函数,σ为Sigmoid函数,*为卷积操作,Wo为卷积权重,bo为偏置项;o t =σ(W o *[x t ,h t-1 ]+b o ) (8), where Φ is the Relu function, σ is the Sigmoid function, * is the convolution operation, and W o is the convolution weight, b o is the bias term;
输出:当前CLSTM单元的记忆特征ct,当前CLSTM单元输出ht;Output: the memory feature c t of the current CLSTM unit, the output h t of the current CLSTM unit;
(3)将得到的前向支路特征张量以及后向支路特征张量分别经过卷积操作将特征通道数降至1,得到以及过程如公式(9)、公式(10)所示:(3) The obtained forward branch feature tensor and the backward branch feature tensor After the convolution operation, the number of feature channels is reduced to 1, and we get as well as The process is shown in formula (9) and formula (10):
(4)将特征张量在维度1进行通道拼接,得到张量so,采用1×1卷积对张量so进行加权融合,经过Sigmoid激活函数得到最终输出sout为公式(11):(4) The feature tensor Perform channel splicing in
sout=Wso*so (11),sout=Wso* so (11),
(5)将分别经过Sigmoid激活函数,得到10个支路输出 (5) will After passing through the Sigmoid activation function respectively, 10 branch outputs are obtained
(6)采用交叉熵lw(Xi)计算所有输出与标签的损失,通过最小化Lw损失优化整体网络,由于数据集中标签由多人进行标注,本技术方案对多个标签进行加权平均,并按照阀值θ对标签进行划分,以yi<θ的像素点为歧义点,不计算损失,yi=θ为非边缘点,yi≥θ为边缘点:(6) Calculate the loss of all outputs and labels by using the cross entropy lw (X i ), and optimize the overall network by minimizing the loss of Lw . Since the labels in the data set are labelled by many people, this technical solution performs a weighted average of multiple labels. , and divide the label according to the threshold θ, take the pixel points with y i <θ as the ambiguous point, do not calculate the loss, y i =θ is the non-edge point, y i ≥ θ is the edge point:
其中,in,
|Y+|和|Y-|分别代表了每个batch中正样本数量总和以及负样本数量总和,超参数λ为正负样本平衡参数,X表示网络输出,yi为标签像素点,所以,最终的损失函数为公式(16):|Y + | and |Y - | represent the sum of the number of positive samples and the total number of negative samples in each batch, respectively, the hyperparameter λ is the balance parameter of positive and negative samples, X represents the network output, and yi is the label pixel. Therefore, in the end, The loss function of is Equation (16):
为前向递归神经网络第k阶段的输出,为后向递归神经网络第k阶段的输出,sout是最终网络的输出值,|I|为图像I的所有像素点数量; is the output of the kth stage of the forward recurrent neural network, is the output of the kth stage of the backward recurrent neural network, sout is the output value of the final network, |I| is the number of all pixels in the image I;
(7)重复步骤(1)-步骤(6),直到网络收敛。(7) Repeat steps (1) to (6) until the network converges.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010965729.8A CN112070784B (en) | 2020-09-15 | 2020-09-15 | Perception edge detection method based on context enhancement network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010965729.8A CN112070784B (en) | 2020-09-15 | 2020-09-15 | Perception edge detection method based on context enhancement network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112070784A true CN112070784A (en) | 2020-12-11 |
CN112070784B CN112070784B (en) | 2022-07-01 |
Family
ID=73696724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010965729.8A Active CN112070784B (en) | 2020-09-15 | 2020-09-15 | Perception edge detection method based on context enhancement network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112070784B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN107180248A (en) * | 2017-06-12 | 2017-09-19 | 桂林电子科技大学 | Strengthen the hyperspectral image classification method of network based on associated losses |
CN108595632A (en) * | 2018-04-24 | 2018-09-28 | 福州大学 | A kind of hybrid neural networks file classification method of fusion abstract and body feature |
US10289903B1 (en) * | 2018-02-12 | 2019-05-14 | Avodah Labs, Inc. | Visual sign language translation training device and method |
CN109886971A (en) * | 2019-01-24 | 2019-06-14 | 西安交通大学 | A method and system for image segmentation based on convolutional neural network |
CN110322009A (en) * | 2019-07-19 | 2019-10-11 | 南京梅花软件系统股份有限公司 | Image prediction method based on the long Memory Neural Networks in short-term of multilayer convolution |
CN110705457A (en) * | 2019-09-29 | 2020-01-17 | 核工业北京地质研究院 | Remote sensing image building change detection method |
CN111222580A (en) * | 2020-01-13 | 2020-06-02 | 西南科技大学 | High-precision crack detection method |
US10701394B1 (en) * | 2016-11-10 | 2020-06-30 | Twitter, Inc. | Real-time video super-resolution with spatio-temporal networks and motion compensation |
CN111539916A (en) * | 2020-04-08 | 2020-08-14 | 中山大学 | Image significance detection method and system for resisting robustness |
-
2020
- 2020-09-15 CN CN202010965729.8A patent/CN112070784B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
US10701394B1 (en) * | 2016-11-10 | 2020-06-30 | Twitter, Inc. | Real-time video super-resolution with spatio-temporal networks and motion compensation |
CN107180248A (en) * | 2017-06-12 | 2017-09-19 | 桂林电子科技大学 | Strengthen the hyperspectral image classification method of network based on associated losses |
US10289903B1 (en) * | 2018-02-12 | 2019-05-14 | Avodah Labs, Inc. | Visual sign language translation training device and method |
CN108595632A (en) * | 2018-04-24 | 2018-09-28 | 福州大学 | A kind of hybrid neural networks file classification method of fusion abstract and body feature |
CN109886971A (en) * | 2019-01-24 | 2019-06-14 | 西安交通大学 | A method and system for image segmentation based on convolutional neural network |
CN110322009A (en) * | 2019-07-19 | 2019-10-11 | 南京梅花软件系统股份有限公司 | Image prediction method based on the long Memory Neural Networks in short-term of multilayer convolution |
CN110705457A (en) * | 2019-09-29 | 2020-01-17 | 核工业北京地质研究院 | Remote sensing image building change detection method |
CN111222580A (en) * | 2020-01-13 | 2020-06-02 | 西南科技大学 | High-precision crack detection method |
CN111539916A (en) * | 2020-04-08 | 2020-08-14 | 中山大学 | Image significance detection method and system for resisting robustness |
Non-Patent Citations (10)
Title |
---|
H CHO等: "Biomedical named entity recognition using deep neural networks with contextual information", 《BMC BIOINFORMATICS》 * |
JINZHENG CAI等: "Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function", 《COMPUTER VISION AND PATTERN RECOGNITION》 * |
RUNMIN WU等: "A mutual learning method for salient object detection with intertwined multi-supervision", 《PROCEEDINGS OF THE IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
刘鹏里: "基于深度卷积神经网络特征再提取的边缘检测算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
杨国花: "基于级联神经网络的对话状态追踪技术研究与实现", 《中国博士学位论文全文数据库 (信息科技辑)》 * |
欧阳宁等: "结合感知边缘约束与多尺度融合网络的图像超分辨率重建方法", 《计算机应用》 * |
王帅帅等: "基于全卷积神经网络的车道线检测", 《数字制造科学》 * |
王雅玲: "基于词义消歧卷积神经网络的文本分类技术研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
秦锋等: "融合主题的CLSTM短文本情感分类", 《安徽工业大学学报(自然科学版)》 * |
马惠珠等: "项目计算机辅助受理的研究方向与关键词――2012年度受理情况与2013年度注意事项", 《电子与信息学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112070784B (en) | 2022-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108830855B (en) | Full convolution network semantic segmentation method based on multi-scale low-level feature fusion | |
Chen et al. | Crowd counting with crowd attention convolutional neural network | |
Wang et al. | FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection | |
CN109993220B (en) | Multi-source remote sensing image classification method based on two-way attention fusion neural network | |
CN110852383B (en) | Target detection method and device based on attention mechanism deep learning network | |
CN105701508B (en) | Global local optimum model and conspicuousness detection algorithm based on multistage convolutional neural networks | |
CN112686304B (en) | Target detection method and device based on attention mechanism and multi-scale feature fusion and storage medium | |
CN113052210A (en) | Fast low-illumination target detection method based on convolutional neural network | |
CN110298387A (en) | Incorporate the deep neural network object detection method of Pixel-level attention mechanism | |
CN110533041B (en) | Regression-based multi-scale scene text detection method | |
CN110378288A (en) | A kind of multistage spatiotemporal motion object detection method based on deep learning | |
CN110633708A (en) | Deep network significance detection method based on global model and local optimization | |
CN113743505A (en) | An improved SSD object detection method based on self-attention and feature fusion | |
CN113435254A (en) | Sentinel second image-based farmland deep learning extraction method | |
CN115631369A (en) | A fine-grained image classification method based on convolutional neural network | |
CN113128308B (en) | Pedestrian detection method, device, equipment and medium in port scene | |
CN112733942A (en) | Variable-scale target detection method based on multi-stage feature adaptive fusion | |
CN108664968B (en) | An Unsupervised Text Localization Method Based on Text Selection Model | |
CN117372898A (en) | Unmanned aerial vehicle aerial image target detection method based on improved yolov8 | |
CN112528058B (en) | Fine-grained image classification method based on image attribute active learning | |
CN115459996B (en) | Network intrusion detection method based on gated convolution and feature pyramid | |
CN110675405B (en) | One-shot image segmentation method based on attention mechanism | |
CN113065426A (en) | Gesture image feature fusion method based on channel perception | |
CN116503726A (en) | Multi-scale light smoke image segmentation method and device | |
Wang et al. | SLMS-SSD: Improving the balance of semantic and spatial information in object detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240927 Address after: Room 2, 9th Floor, Unit 1, Building 15, No. 63-1 Taoyuan Road, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region 530000 Patentee after: Guangxi Zhengshichang Information Technology Co.,Ltd. Country or region after: China Address before: 541004 1 Jinji Road, Qixing District, Guilin, the Guangxi Zhuang Autonomous Region Patentee before: GUILIN University OF ELECTRONIC TECHNOLOGY Country or region before: China |