[go: up one dir, main page]

CN109711413B - Image semantic segmentation method based on deep learning - Google Patents

Image semantic segmentation method based on deep learning Download PDF

Info

Publication number
CN109711413B
CN109711413B CN201811646148.7A CN201811646148A CN109711413B CN 109711413 B CN109711413 B CN 109711413B CN 201811646148 A CN201811646148 A CN 201811646148A CN 109711413 B CN109711413 B CN 109711413B
Authority
CN
China
Prior art keywords
module
image
semantic segmentation
neural network
coordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811646148.7A
Other languages
Chinese (zh)
Other versions
CN109711413A (en
Inventor
郭敏
丁晓
马苗
陈昱莅
裴炤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Normal University
Original Assignee
Shaanxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Normal University filed Critical Shaanxi Normal University
Priority to CN201811646148.7A priority Critical patent/CN109711413B/en
Publication of CN109711413A publication Critical patent/CN109711413A/en
Application granted granted Critical
Publication of CN109711413B publication Critical patent/CN109711413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

一种基于深度学习的图像语义分割方法,由数据集处理、构建深度语义分割网络、深度语义分割网络训练及参数学习、对测试图像进行语义分割四部分组成。本发明将输入图像的RGB图像和灰度图像作为网络模型的输入,充分利用灰度图像的边缘信息,有效增加输入特征的丰富程度;把卷积神经网络和双向门限递归单元相结合,在学习图像局部特征的基础上,捕获更多的上下文依赖关系和全局特征信息;通过第一坐标通道模块和第二坐标通道模块对特征图加入坐标信息,丰富模型的坐标特征,提升模型的泛化能力,产生分辨率高、边界精确的语义分割结果。

Figure 201811646148

An image semantic segmentation method based on deep learning, which consists of four parts: data set processing, construction of deep semantic segmentation network, deep semantic segmentation network training and parameter learning, and semantic segmentation of test images. The present invention takes the RGB image and the grayscale image of the input image as the input of the network model, makes full use of the edge information of the grayscale image, and effectively increases the richness of the input features; Based on the local features of the image, capture more context dependencies and global feature information; add coordinate information to the feature map through the first coordinate channel module and the second coordinate channel module, enrich the coordinate features of the model, and improve the generalization ability of the model , producing semantic segmentation results with high resolution and precise boundaries.

Figure 201811646148

Description

基于深度学习的图像语义分割方法Image Semantic Segmentation Method Based on Deep Learning

技术领域technical field

本发明属于计算机视觉与深度学习技术领域,具体涉及一种基于深度学习的图像语义分割方法。The invention belongs to the technical field of computer vision and deep learning, and in particular relates to an image semantic segmentation method based on deep learning.

背景技术Background technique

图像语义分割是从像素水平上,理解、识别图片的内容,其目的是建立每个像素和语义类别之间的一一映射关系,根据语义信息进行分割,其被广泛应用于场景理解、自动驾驶、医学影像分析、机器人视觉等领域。图像语义分割是图像理解的基石,其分割结果的好坏将直接影响对后续图像内容的处理,因此,对图像语义分割技术的研究具有非常重要的现实意义。Image semantic segmentation is to understand and recognize the content of pictures from the pixel level. Its purpose is to establish a one-to-one mapping relationship between each pixel and semantic categories, and perform segmentation based on semantic information. It is widely used in scene understanding and automatic driving. , medical image analysis, robot vision and other fields. Image semantic segmentation is the cornerstone of image understanding, and the quality of its segmentation results will directly affect the processing of subsequent image content. Therefore, the research on image semantic segmentation technology has very important practical significance.

传统的图像语义分割方法大多依赖于手工特征提取和概率图模型,如:随机森林、条件随机场(CRF)、马尔科夫随机场(MRF)等,这些方法只能学习浅层的表征信息,不能产生准确精细的分割结果。2012年以来,随着深度学习的快速发展,基于卷积神经网络的图像语义分割方法成为了研究热点。2014年,Hariharan 等人提出了协同目标检测和语义分割的SDS(simultaneous detection and segmentation)方法,该方法首先使用MCG方法抽取每幅图像中多个候选区域,然后分两路使用CNN提取边界框(bounding-box)特征和前景区域特征,并完成两路特征的信息融合,最后,利用非最大约束NMS(non-maximum suppression)方法生成语义分割结果。除了SDS方法,类似的还有R-CNN、SPP等方法,都是基于候选区域的语义分割方法,但这类方法依赖于大量的区域建议,导致内存消耗非常大,训练时间比较长,得到的语义分割结果精度低。Most of the traditional image semantic segmentation methods rely on manual feature extraction and probabilistic graphical models, such as: random forest, conditional random field (CRF), Markov random field (MRF), etc. These methods can only learn shallow representation information, Accurate and fine segmentation results cannot be produced. Since 2012, with the rapid development of deep learning, image semantic segmentation methods based on convolutional neural networks have become a research hotspot. In 2014, Hariharan et al. proposed the SDS (simultaneous detection and segmentation) method of collaborative target detection and semantic segmentation. This method first uses the MCG method to extract multiple candidate regions in each image, and then uses CNN to extract the bounding box in two ways ( bounding-box) features and foreground area features, and complete the information fusion of the two features, and finally, use the non-maximum constraint NMS (non-maximum suppression) method to generate semantic segmentation results. In addition to the SDS method, there are similar methods such as R-CNN and SPP, which are semantic segmentation methods based on candidate regions, but these methods rely on a large number of region proposals, resulting in very large memory consumption and long training time. The accuracy of semantic segmentation results is low.

为了进一步降低内存开销,提升语义分割精度。2015年,Long等人提出了全卷积网络模型FCN(fully convolutional networks),该模型将深度卷积神经网路最后的全连接层全部转换成卷积层,形成端到端、像素到像素的全卷积网络框架,使图像语义分割进入了一个全新的时代。Kendall等人提出了一种深度卷积编码-解码架构SegNet,该模型由一个卷积编码网络和一个反卷积解码网络组成,每一个编码器层都对应一个解码器层,最终编码器的输出被送入soft-max分类器进行逐像素分类。Chen等人在FCN的基础上,提出了一个更加成熟的语义分割模型Deeplab-CRF,该模型采用优化后的DCNN(深度卷积神经网络)得到粗糙得分图并通过双线性插值上采样到原图像大小,然后使用全连接条件随机场(CRF)进行迭代优化,得到精细的语义分割结果。In order to further reduce memory overhead and improve semantic segmentation accuracy. In 2015, Long et al. proposed the fully convolutional network model FCN (fully convolutional networks), which converts the last fully connected layer of the deep convolutional neural network into a convolutional layer to form an end-to-end, pixel-to-pixel The fully convolutional network framework has brought image semantic segmentation into a new era. Kendall et al. proposed a deep convolutional encoding-decoding architecture SegNet, which consists of a convolutional encoding network and a deconvolutional decoding network. Each encoder layer corresponds to a decoder layer, and the output of the final encoder is are fed into a soft-max classifier for pixel-by-pixel classification. On the basis of FCN, Chen et al. proposed a more mature semantic segmentation model Deeplab-CRF, which uses an optimized DCNN (deep convolutional neural network) to obtain a rough score map and upsample to the original image through bilinear interpolation. The image size is then iteratively optimized using a fully connected conditional random field (CRF) to obtain fine semantic segmentation results.

上述语义分割方法的缺点:其一,模型输入一般为RGB图像,输入过于单一,可能会导致局部特征缺失;其二,这些方法都是基于卷积神经网络来进行特征提取的,没有充分利用图像的局部特征信息和全局上下文依赖关系,导致图像的分割边缘非常粗糙,分割精度非常低。Disadvantages of the above semantic segmentation methods: first, the model input is generally an RGB image, and the input is too single, which may lead to the loss of local features; second, these methods are based on convolutional neural networks for feature extraction, and do not make full use of images. The local feature information and global context dependence of the image lead to very rough segmentation edges of the image and very low segmentation accuracy.

发明内容Contents of the invention

本发明所要解决的技术问题在于克服现有方法的缺陷,提供一种分割精度高、泛化能力强的基于深度学习的图像语义分割方法。The technical problem to be solved by the present invention is to overcome the defects of the existing methods and provide an image semantic segmentation method based on deep learning with high segmentation accuracy and strong generalization ability.

解决上述技术问题所采用的技术方案包括下述步骤:The technical solution adopted to solve the above technical problems comprises the following steps:

S1、数据集处理S1. Data set processing

将图像数据集分为训练图像集和测试图像集,并对训练图像集进行数据增强操作,将训练图像的数量增加到万级单位;Divide the image data set into a training image set and a test image set, and perform data enhancement operations on the training image set to increase the number of training images to tens of thousands of units;

S2、构建深度语义分割网络S2. Build a deep semantic segmentation network

深度语义分割网络由并行深度神经网络模块、特征融合模块、Softmax分类层构成,所述的并行深度神经网络模块用于对输入图像进行特征提取,所述特征融合模块将并行深度神经网络的输出特征图进行加权融合得到新的特征图,所述Softmax分类层将像素类别标签预测分值转换成像素类别标签预测概率分布图;The deep semantic segmentation network consists of a parallel deep neural network module, a feature fusion module, and a Softmax classification layer. The parallel deep neural network module is used to extract features from the input image, and the feature fusion module combines the output features of the parallel deep neural network. Graph carries out weighted fusion and obtains new feature map, and described Softmax classification layer converts pixel category label prediction score into pixel category label prediction probability distribution map;

所述的并行深度神经网络模块由第一深度神经网络模块和第二深度神经网络模块组成,且第一深度神经网络模块和第二深度神经网络模块网络结构相同,第一深度神经网络模块的输入为输入图像的RGB图像,第二深度神经网络模块的输入为输入图像的灰度图像;Described parallel deep neural network module is made up of the first deep neural network module and the second deep neural network module, and the first deep neural network module and the second deep neural network module network structure are identical, the input of the first deep neural network module Be the RGB image of input image, the input of the second depth neural network module is the grayscale image of input image;

所述的第一深度神经网络模块由全卷积网络模块、第一坐标通道模块、第一循环层模块、第二坐标通道模块、第二循环层模块、空间金字塔池化模块构成,第一坐标通道模块与第二坐标通道模块的结构相同,第一循环层模块与第二循环层模块的结构相同,所述全卷积网络模块对输入图像进行局部特征提取,所述第一循环层模块用于捕获图像的上下文依赖关系和全局特征信息,所述第一坐标通道模块对全卷积网络模块输出的特征图连接i、j、r坐标通道构成新的特征图,以学习更多的坐标特征信息并提高模型的泛化能力,所述空间金字塔池化模块对第二循环层模块输出的特征图在多个采样率上进行卷积操作,提取不同尺度区域的特征信息;The first deep neural network module is composed of a full convolutional network module, a first coordinate channel module, a first circulation layer module, a second coordinate channel module, a second circulation layer module, and a spatial pyramid pooling module. The structure of the channel module is the same as that of the second coordinate channel module, the structure of the first recurrent layer module is the same as that of the second recurrent layer module, the full convolution network module performs local feature extraction on the input image, and the first recurrent layer module uses In order to capture the context dependencies and global feature information of the image, the first coordinate channel module connects the i, j, r coordinate channels to the feature map output by the full convolutional network module to form a new feature map to learn more coordinate features information and improve the generalization capability of the model, the spatial pyramid pooling module performs convolution operations on the feature map output by the second loop layer module on multiple sampling rates, and extracts feature information of different scale regions;

S3、深度语义分割网络训练及参数学习S3, deep semantic segmentation network training and parameter learning

S31、网络模型参数初始化:使用ResNet101在ImageNet数据集上的预训练模型对全卷积网络模块进行参数初始化,使用标准均匀分布对第一循环层模块和第二循环层模块进行参数初始化,使用标准高斯分布对空间金字塔池化模块的卷积层进行参数初始化;S31. Network model parameter initialization: use the pre-training model of ResNet101 on the ImageNet dataset to initialize the parameters of the full convolutional network module, use the standard uniform distribution to initialize the parameters of the first cycle layer module and the second cycle layer module, and use the standard The Gaussian distribution initializes the parameters of the convolution layer of the spatial pyramid pooling module;

S32、使用数据增强后的训练图像集训练深度语义分割网络,生成像素类别预测标签概率分布图,利用预测标签概率和原始标签概率计算预测损失,具体采用混合损失函数L(θ)作为目标函数,S32. Use the data-enhanced training image set to train the deep semantic segmentation network, generate a pixel category prediction label probability distribution map, and use the predicted label probability and the original label probability to calculate the prediction loss. Specifically, the mixed loss function L(θ) is used as the objective function.

L(θ)=L1(θ)+L2(θ)L(θ)=L 1 (θ)+L 2 (θ)

式中L1(θ)为交叉熵损失函数,L2(θ)为L2正则化项,θ是深度语义分割网络的参数;where L 1 (θ) is the cross-entropy loss function, L 2 (θ) is the L2 regularization term, and θ is the parameter of the deep semantic segmentation network;

S33、采用随机梯度下降算法优化目标函数,运用反向传播算法更新网络模型参数,直到目标函数的值不再下降时结束训练;S33, using the stochastic gradient descent algorithm to optimize the objective function, using the backpropagation algorithm to update the network model parameters, and ending the training until the value of the objective function no longer decreases;

S4、对测试图像进行语义分割S4. Perform semantic segmentation on the test image

S41、将测试图像集输入步骤S3训练好的深度语义分割网络;S41. Input the test image set into the deep semantic segmentation network trained in step S3;

S42、并行深度神经网络模块对输入的测试图像集进行特征提取S42, the parallel deep neural network module performs feature extraction on the input test image set

测试图像的RGB图像作为第一深度神经网络模块的输入,测试图像的灰度图像作为第二深度神经网络模块的输入;The RGB image of the test image is used as the input of the first deep neural network module, and the grayscale image of the test image is used as the input of the second deep neural network module;

第一深度神经网络模块特征提取过程为:全卷积网络模块通过空洞卷积、最大池化、卷积操作对测试图像的RGB图像进行局部特征提取;将全卷积网络模块输出的特征图通过第一坐标通道模块得到新的特征图送入第一循环层模块进行水平和垂直扫描,学习图像的全局特征信息;将第一循环层模块输出的特征图通过第二坐标通道模块得到新的特征图再送入第二循环层模块进行水平和垂直扫描,捕获图像的全局特征信息;将第二循环层模块输出的特征图输入空间金字塔池化模块,在多个采样率上进行卷积操作,提取不同尺度区域的特征信息;The feature extraction process of the first deep neural network module is as follows: the full convolutional network module extracts local features from the RGB image of the test image through atrous convolution, maximum pooling, and convolution operations; the feature map output by the full convolutional network module is passed through The new feature map obtained by the first coordinate channel module is sent to the first circulation layer module for horizontal and vertical scanning, and the global feature information of the image is learned; the feature map output by the first circulation layer module is passed through the second coordinate channel module to obtain new features The image is then sent to the second circulation layer module for horizontal and vertical scanning to capture the global feature information of the image; the feature map output by the second circulation layer module is input into the spatial pyramid pooling module, and convolution operations are performed at multiple sampling rates to extract Feature information of regions of different scales;

第二深度神经网络模块特征提取过程与第一深度神经网络模块特征提取过程相同;The second deep neural network module feature extraction process is the same as the first deep neural network module feature extraction process;

S43、将第一深度神经网络模块输出的特征图与第二深度神经网络模块输出的特征图进行加权融合得到新的特征图;S43. Perform weighted fusion of the feature map output by the first deep neural network module and the feature map output by the second deep neural network module to obtain a new feature map;

S44、将步骤S43的结果送入Softmax分类层进行像素类别标签预测,得到图像中每个像素所属的物体类别,并做双线性插值操作上采样到原图像尺寸,得到精细的语义分割图。S44. Send the result of step S43 to the Softmax classification layer for pixel category label prediction, obtain the object category to which each pixel in the image belongs, and perform bilinear interpolation to upsample to the original image size to obtain a fine semantic segmentation map.

作为一种优选的技术方案,所述的第一循环层模块由两个双向门限递归单元构成,双向门限递归单元的神经元个数为150。As a preferred technical solution, the first recurrent layer module is composed of two bidirectional threshold recurrent units, and the number of neurons in the bidirectional threshold recurrent unit is 150.

作为一种优选的技术方案,所述的空间金字塔池化模块由4个不同采样率的空洞卷积构成,空洞卷积的卷积核大小为3×3,扩张率分别为4、6、8、12。As a preferred technical solution, the spatial pyramid pooling module is composed of four atrous convolutions with different sampling rates, the convolution kernel size of the atrous convolution is 3×3, and the dilation rates are 4, 6, and 8 respectively. , 12.

作为一种优选的技术方案,所述的步骤S2中i、j、r坐标通道由i坐标通道、j坐标通道、r坐标通道构成,i坐标通道、j坐标通道及r坐标通道均为e×f的坐标矩阵,i坐标通道第1行~第e行的元素依次为0、1、...、e-1,j坐标通道第1列~第f列的元素依次为0、1、...、f-1,e、f取正整数,r坐标通道为

Figure BDA0001932106590000041
m为i坐标通道中的任意元素,n为j坐标通道中与m坐标位置相同的元素,将i坐标通道和j坐标通道中的元素线性缩放到[-1,1]范围内。As a preferred technical solution, the i, j, and r coordinate channels in the step S2 are composed of i coordinate channel, j coordinate channel, and r coordinate channel, and the i coordinate channel, j coordinate channel and r coordinate channel are all e× The coordinate matrix of f, the elements in the first row to the eth row of the i coordinate channel are 0, 1, ..., e-1 in sequence, and the elements in the first column to the f column of the j coordinate channel are 0, 1, . .., f-1, e, f take positive integers, and the r coordinate channel is
Figure BDA0001932106590000041
m is any element in the i coordinate channel, n is the element in the j coordinate channel at the same position as the m coordinate, and the elements in the i coordinate channel and the j coordinate channel are linearly scaled to the range [-1,1].

作为一种优选的技术方案,所述的步骤S3中参数学习的学习率按照如下公式进行衰减:As a preferred technical solution, the learning rate of the parameter learning in the step S3 is attenuated according to the following formula:

Figure BDA0001932106590000042
Figure BDA0001932106590000042

式中t为迭代次数,l0是初始学习率,lt是第t次迭代的学习率,power是动量为0.9。In the formula, t is the number of iterations, l 0 is the initial learning rate, l t is the learning rate of the t-th iteration, and power is the momentum of 0.9.

本发明的有益效果如下:The beneficial effects of the present invention are as follows:

本发明将输入图像的RGB图像和灰度图像作为网络模型的输入,充分利用灰度图像的边缘信息,有效增加输入特征的丰富程度;把卷积神经网络和双向门限递归单元相结合,在学习图像局部特征的基础上,捕获更多的上下文依赖关系和全局特征信息;通过第一坐标通道模块和第二坐标通道模块对特征图加入坐标信息,丰富模型的坐标特征,提升模型的泛化能力,产生分辨率高、边界精确的语义分割结果。The present invention uses the RGB image and the grayscale image of the input image as the input of the network model, makes full use of the edge information of the grayscale image, and effectively increases the richness of the input features; Based on the local features of the image, capture more context dependencies and global feature information; add coordinate information to the feature map through the first coordinate channel module and the second coordinate channel module, enrich the coordinate features of the model, and improve the generalization ability of the model , producing semantic segmentation results with high resolution and precise boundaries.

附图说明Description of drawings

图1是基于深度学习的图像语义分割方法流程图。Figure 1 is a flow chart of an image semantic segmentation method based on deep learning.

图2是第一深度神经网络模块。Fig. 2 is the first deep neural network module.

图3是WeizmannHorse数据集中部分测试图像的语义分割图。Figure 3 is a semantic segmentation map of some test images in the WeizmannHorse dataset.

图4是StanfordBackground数据集中部分测试图像的语义分割图。Figure 4 is a semantic segmentation diagram of some test images in the StanfordBackground dataset.

具体实施方式Detailed ways

下面结合附图和实施例对本发明进一步详细说明,但本发明不限于这些实施例。The present invention will be described in further detail below in conjunction with the accompanying drawings and embodiments, but the present invention is not limited to these embodiments.

实施例1Example 1

WeizmannHorse数据集是一个由328幅图像组成的图像分割数据集,数据集中部分图像如图3所示,网络模型的训练使用Pytorch平台,代码在python上编写完成,本实施例基于深度学习的图像语义分割方法,如图1所示,其步骤如下:The WeizmannHorse dataset is an image segmentation dataset consisting of 328 images. Some images in the dataset are shown in Figure 3. The network model is trained using the Pytorch platform, and the code is written on python. This example is based on image semantics of deep learning The segmentation method, as shown in Figure 1, has the following steps:

S1、数据集处理S1. Data set processing

从WeizmannHorse数据集中随机选取200张作为训练图像集,剩下的128张作为测试图像集,并对训练图像集进行数据增强操作,使训练图像的数量增加到11000张;Randomly select 200 images from the WeizmannHorse dataset as a training image set, and the remaining 128 images as a test image set, and perform data enhancement operations on the training image set to increase the number of training images to 11,000;

S2、构建深度语义分割网络S2. Build a deep semantic segmentation network

深度语义分割网络由并行深度神经网络模块、特征融合模块、Softmax分类层构成,并行深度神经网络模块用于对输入图像进行特征提取,特征融合模块将两个并行深度神经网络的输出特征图进行加权融合得到新的特征图,Softmax分类层将像素类别标签预测分值转换成像素类别标签预测概率分布图;The deep semantic segmentation network consists of a parallel deep neural network module, a feature fusion module, and a Softmax classification layer. The parallel deep neural network module is used to extract features from the input image, and the feature fusion module weights the output feature maps of the two parallel deep neural networks. A new feature map is obtained by fusion, and the Softmax classification layer converts the pixel category label prediction score into a pixel category label prediction probability distribution map;

并行深度神经网络模块由第一深度神经网络模块和第二深度神经网络模块组成,且第一深度神经网络模块和第二深度神经网络模块结构相同,第一深度神经网络模块的输入为输入图像的RGB图像,第二深度神经网络模块的输入为输入图像的灰度图像;The parallel deep neural network module is made up of the first deep neural network module and the second deep neural network module, and the first deep neural network module and the second deep neural network module have the same structure, and the input of the first deep neural network module is the input image RGB image, the input of the second depth neural network module is the gray scale image of input image;

在图2中,第一深度神经网络模块由全卷积网络模块、第一坐标通道模块、第一循环层模块、第二坐标通道模块、第二循环层模块、空间金字塔池化模块构成,第一坐标通道模块与第二坐标通道模块的结构相同,第一循环层模块与第二循环层模块的结构相同;In Figure 2, the first deep neural network module is composed of a full convolutional network module, a first coordinate channel module, a first circulation layer module, a second coordinate channel module, a second circulation layer module, and a spatial pyramid pooling module. The first coordinate channel module has the same structure as the second coordinate channel module, and the first circulation layer module has the same structure as the second circulation layer module;

全卷积网络模块对输入图像进行局部特征提取,全卷积网络模块由Deeplab_largeFOV模型中Resnet101网络的第一卷积组~第五个卷积组构成,其中,第一卷积组~第三个卷积组使用卷积操作、最大池化操作,第四卷积组和第五个卷积组使用卷积操作、空洞卷积操作;The full convolutional network module performs local feature extraction on the input image. The full convolutional network module is composed of the first convolution group to the fifth convolution group of the Resnet101 network in the Deeplab_largeFOV model. Among them, the first convolution group to the third convolution group The convolution group uses convolution operation and maximum pooling operation, and the fourth convolution group and the fifth convolution group use convolution operation and hole convolution operation;

第一循环层模块由两个双向门限递归单元构成,双向门限递归单元的神经元个数为150,用于捕获图像的上下文依赖关系和全局特征信息;首先使用1×1的分块大小把特征图X分成G×K个不重叠的区域块,其中,G、K分别等于特征图X的长、宽;然后使用一个双向门限递归单元沿特征图X的每列进行垂直扫描,一个自顶向下扫描、一个自下向上扫描,每次读取一个区域块,并将扫描得到的输出预测按坐标索引连接起来得到一个复合特征图

Figure BDA0001932106590000061
同样地,使用另外一个双向门限递归单元沿着复合特征图
Figure BDA0001932106590000062
的每行进行水平扫描,一个自左向右扫描、一个自右向左扫描,每次读取一个区域块,并将输出预测按坐标索引连接起来得到一个新的复合特征图
Figure BDA0001932106590000063
新的复合特征图
Figure BDA0001932106590000064
具有来自整个图像的上下文信息;The first loop layer module is composed of two bidirectional threshold recurrent units. The number of neurons in the bidirectional threshold recurrent unit is 150, which is used to capture the context dependence and global feature information of the image; The graph X is divided into G×K non-overlapping area blocks, where G and K are equal to the length and width of the feature map X respectively; then a bidirectional threshold recursive unit is used to scan vertically along each column of the feature map X, and a Downscan, a bottom-up scan, reads one area block at a time, and connects the output predictions obtained by scanning according to the coordinate index to obtain a composite feature map
Figure BDA0001932106590000061
Similarly, use another bidirectional threshold recurrent unit along the composite feature map
Figure BDA0001932106590000062
Each line of each row is scanned horizontally, one scans from left to right, and the other scans from right to left, each time a region block is read, and the output predictions are connected by coordinate index to obtain a new composite feature map
Figure BDA0001932106590000063
New Composite Feature Map
Figure BDA0001932106590000064
have contextual information from the entire image;

第一坐标通道模块对全卷积网络模块输出的特征图连接i、j、r坐标通道构成新的特征图,以学习更多的坐标特征信息并提高模型的泛化能力;i、j、r坐标通道由i坐标通道、j坐标通道、r坐标通道构成,i坐标通道、j坐标通道及r坐标通道均为e×f的坐标矩阵,i坐标通道第1行~第e行的元素依次为0、1、...、e-1,j坐标通道第1列~第f列的元素依次为0、1、...、f-1,e、f取正整数,r坐标通道为

Figure BDA0001932106590000065
m为i坐标通道中的任意元素,n为j坐标通道中与m坐标位置相同的元素,将i坐标通道和j坐标通道中的元素线性缩放到[-1,1]范围内;The first coordinate channel module connects the feature map output by the full convolutional network module to the i, j, r coordinate channels to form a new feature map to learn more coordinate feature information and improve the generalization ability of the model; i, j, r The coordinate channel is composed of i coordinate channel, j coordinate channel and r coordinate channel. The i coordinate channel, j coordinate channel and r coordinate channel are all coordinate matrices of e×f. The elements in the first row to the eth row of the i coordinate channel are 0, 1, ..., e-1, the elements in the first column to the fth column of the j coordinate channel are 0, 1, ..., f-1 in turn, e, f take positive integers, and the r coordinate channel is
Figure BDA0001932106590000065
m is any element in the i coordinate channel, n is the element in the j coordinate channel with the same position as the m coordinate, and the elements in the i coordinate channel and the j coordinate channel are linearly scaled to the range of [-1,1];

空间金字塔池化模块对第二循环层模块输出的特征图在多个采样率上进行卷积操作,提取不同尺度区域的特征信息,该模块由4个不同采样率的空洞卷积构成,空洞卷积的卷积核大小为3×3,扩张率分别为4、6、8、12;The spatial pyramid pooling module performs convolution operations on the feature maps output by the second loop layer module at multiple sampling rates to extract feature information of regions of different scales. This module consists of four hole convolutions with different sampling rates. The hole convolution The convolution kernel size of the product is 3×3, and the expansion rates are 4, 6, 8, and 12 respectively;

S3、深度语义分割网络训练及参数学习S3, deep semantic segmentation network training and parameter learning

S31、网络模型参数初始化:使用ResNet101在ImageNet数据集上的预训练模型对全卷积网络模块进行参数初始化,使用标准均匀分布对第一循环层模块和第二循环层模块进行参数初始化,使用标准高斯分布对空间金字塔池化模块的卷积层进行参数初始化;S31. Network model parameter initialization: use the pre-training model of ResNet101 on the ImageNet dataset to initialize the parameters of the full convolutional network module, use the standard uniform distribution to initialize the parameters of the first cycle layer module and the second cycle layer module, and use the standard The Gaussian distribution initializes the parameters of the convolution layer of the spatial pyramid pooling module;

S32、将数据增强后的训练图像集中的图像尺寸裁剪为330×330,使用裁剪后的训练图像训练深度语义分割网络,生成像素类别预测标签概率分布图,利用预测标签概率和原始标签概率计算预测损失,具体采用混合损失函数L(θ)作为目标函数,S32. Crop the size of the image in the training image set after data enhancement to 330×330, use the cropped training image to train the deep semantic segmentation network, generate a pixel category prediction label probability distribution map, and use the predicted label probability and the original label probability to calculate the prediction Loss, specifically using the hybrid loss function L(θ) as the objective function,

L(θ)=L1(θ)+L2(θ)L(θ)=L 1 (θ)+L 2 (θ)

式中L1(θ)为交叉熵损失函数,L2(θ)为L2正则化项,θ是深度语义分割网络的参数;where L 1 (θ) is the cross-entropy loss function, L 2 (θ) is the L2 regularization term, and θ is the parameter of the deep semantic segmentation network;

本实施例的交叉熵损失函数L1(θ)为:The cross-entropy loss function L 1 (θ) of this embodiment is:

Figure BDA0001932106590000071
Figure BDA0001932106590000071

式中ypq是预测标签概率向量,

Figure BDA0001932106590000072
是原始标签概率向量,N是每张图片的像素个数,N为330×330=108900,B是批大小,B为10,C是像素类别数,C为2,ln(.)是求自然对数;where ypq is the predicted label probability vector,
Figure BDA0001932106590000072
is the original label probability vector, N is the number of pixels in each picture, N is 330×330=108900, B is the batch size, B is 10, C is the number of pixel categories, C is 2, ln(.) is seeking nature logarithm;

本实施例的L2正则化项L2(θ)为:The L2 regularization term L 2 (θ) of this embodiment is:

Figure BDA0001932106590000073
Figure BDA0001932106590000073

式中λ是正则化系数且为正数,N是每张图像的像素个数,N为330×330=108900,B是批大小,B为10,S是w的参数个数且S取正整数,w是权重参数;In the formula, λ is a regularization coefficient and is a positive number, N is the number of pixels of each image, N is 330×330=108900, B is the batch size, B is 10, S is the number of parameters of w and S is positive Integer, w is the weight parameter;

S33、采用随机梯度下降算法优化目标函数,运用反向传播算法更新网络模型参数,直到目标函数的值不再下降时结束训练,为了加速模型收敛,引入参数学习的学习率,学习率按照如下公式进行衰减:S33. Use the stochastic gradient descent algorithm to optimize the objective function, use the backpropagation algorithm to update the network model parameters, and end the training until the value of the objective function no longer decreases. In order to accelerate the model convergence, introduce a learning rate for parameter learning, and the learning rate is according to the following formula For attenuation:

Figure BDA0001932106590000074
Figure BDA0001932106590000074

式中t为迭代次数且t≤35000,l0是初始学习率,l0为0.003,lt是第t次迭代的学习率,梯度衰减为0.0001,power是动量为0.9;where t is the number of iterations and t≤35000, l 0 is the initial learning rate, l 0 is 0.003, l t is the learning rate of the t-th iteration, the gradient decay is 0.0001, and power is the momentum of 0.9;

S4、对测试图像进行语义分割S4. Perform semantic segmentation on the test image

S41、将测试图像集输入步骤S3训练好的深度语义分割网络;S41. Input the test image set into the deep semantic segmentation network trained in step S3;

S42、并行深度神经网络模块对输入的测试图像集进行特征提取S42, the parallel deep neural network module performs feature extraction on the input test image set

测试图像的RGB图像作为第一深度神经网络模块的输入,测试图像对应的灰度图像作为第二深度神经网络模块的输入;The RGB image of the test image is used as the input of the first deep neural network module, and the grayscale image corresponding to the test image is used as the input of the second deep neural network module;

第一深度神经网络模块特征提取过程为:全卷积网络模块通过空洞卷积、最大池化、卷积操作对测试图像的RGB图像进行局部特征提取;将全卷积网络模块输出的特征图通过第一坐标通道模块得到新的特征图送入第一循环层模块进行水平和垂直扫描,学习图像的全局特征信息;将第一循环层模块输出的特征图通过第二坐标通道模块得到新的特征图再送入第二循环层模块进行水平和垂直扫描,捕获图像的全局特征信息;将第二循环层模块输出的特征图输入空间金字塔池化模块,在多个采样率上进行卷积操作,提取不同尺度区域的特征信息;The feature extraction process of the first deep neural network module is as follows: the full convolutional network module extracts local features from the RGB image of the test image through atrous convolution, maximum pooling, and convolution operations; the feature map output by the full convolutional network module is passed through The new feature map obtained by the first coordinate channel module is sent to the first circulation layer module for horizontal and vertical scanning, and the global feature information of the image is learned; the feature map output by the first circulation layer module is passed through the second coordinate channel module to obtain new features The image is then sent to the second circulation layer module for horizontal and vertical scanning to capture the global feature information of the image; the feature map output by the second circulation layer module is input into the spatial pyramid pooling module, and convolution operations are performed at multiple sampling rates to extract Feature information of regions of different scales;

第二深度神经网络模块特征提取过程与第一深度神经网络模块特征提取过程相同;The second deep neural network module feature extraction process is the same as the first deep neural network module feature extraction process;

S43、将第一深度神经网络模块输出的特征图与第二深度神经网络模块输出的特征图进行加权融合得到新的特征图;S43. Perform weighted fusion of the feature map output by the first deep neural network module and the feature map output by the second deep neural network module to obtain a new feature map;

S44、将步骤S43的结果送入Softmax分类层进行像素类别标签预测,得到图像中每个像素所属的物体类别,并做双线性插值操作上采样到原图像尺寸,得到精细的语义分割图。S44. Send the result of step S43 to the Softmax classification layer for pixel category label prediction, obtain the object category to which each pixel in the image belongs, and perform bilinear interpolation to upsample to the original image size to obtain a fine semantic segmentation map.

采用本实施例方法对WeizmannHorse数据集中128张测试图像进行语义分割,部分测试图像的语义分割图如图3所示,其中,第一行是输入图像、第二行是输入图像对应的彩色标签图像、第三行是其对应的语义分割图。Using the method of this embodiment to semantically segment 128 test images in the WeizmannHorse dataset, the semantic segmentation diagram of some test images is shown in Figure 3, where the first row is the input image, and the second row is the color label image corresponding to the input image , the third line is its corresponding semantic segmentation map.

实施例2Example 2

StanfordBackground数据集是一个由715幅图像组成的图像分割数据集,数据集中部分图像如图4所示,网络模型的训练使用Pytorch平台,代码在python上编写完成。The StanfordBackground dataset is an image segmentation dataset consisting of 715 images. Some images in the dataset are shown in Figure 4. The network model is trained using the Pytorch platform, and the code is written in python.

本实施例基于深度学习的图像语义分割方法,在步骤S1中从StanfordBackground数据集中随机选取573张作为训练图像集,剩下的142张作为测试图像集,并对训练图像集进行数据增强操作,使训练图像的数量增加到13752张;步骤S32中将数据增强后的训练图像集中的图像尺寸裁剪为421×421,使用裁剪后的训练图像训练深度语义分割网络,生成像素类别预测标签概率分布图,利用预测标签概率和原始标签概率计算预测损失,具体采用混合损失函数L(θ)作为目标函数,This embodiment is based on the image semantic segmentation method of deep learning. In step S1, 573 images are randomly selected from the StanfordBackground data set as a training image set, and the remaining 142 images are used as a test image set, and the data enhancement operation is performed on the training image set. The number of training images is increased to 13,752; in step S32, the size of the image in the training image set after data enhancement is cut to 421×421, and the training image is used to train the deep semantic segmentation network to generate a pixel category prediction label probability distribution map, The prediction loss is calculated using the predicted label probability and the original label probability, specifically using the mixed loss function L(θ) as the objective function,

L(θ)=L1(θ)+L2(θ)L(θ)=L 1 (θ)+L 2 (θ)

式中L1(θ)为交叉熵损失函数,L2(θ)为L2正则化项,θ是深度语义分割网络的参数;where L 1 (θ) is the cross-entropy loss function, L 2 (θ) is the L2 regularization term, and θ is the parameter of the deep semantic segmentation network;

本实施例的交叉熵损失函数L1(θ)为:The cross-entropy loss function L 1 (θ) of this embodiment is:

Figure BDA0001932106590000091
Figure BDA0001932106590000091

式中ypq是预测标签概率向量,

Figure BDA0001932106590000092
是原始标签概率向量,N是每张图片的像素个数,N为421×421=177241,B是批大小,B为6,C是像素类别数,C为8,ln(.)是求自然对数;where ypq is the predicted label probability vector,
Figure BDA0001932106590000092
is the original label probability vector, N is the number of pixels of each picture, N is 421×421=177241, B is the batch size, B is 6, C is the number of pixel categories, C is 8, ln(.) is seeking nature logarithm;

本实施例的L2正则化项L2(θ)为:The L2 regularization term L 2 (θ) of this embodiment is:

Figure BDA0001932106590000093
Figure BDA0001932106590000093

式中λ是正则化系数且为正数,N是每张图像的像素个数,N为421×421=177241,B是批大小,B为6,S是w的参数个数且S取正整数,w是权重参数;步骤S33中采用随机梯度下降算法优化目标函数,运用反向传播算法更新网络模型参数,直到目标函数的值不再下降时结束训练,为了加速模型收敛,引入参数学习的学习率,学习率按照如下公式进行衰减:In the formula, λ is a regularization coefficient and is a positive number, N is the number of pixels in each image, N is 421×421=177241, B is the batch size, B is 6, S is the number of parameters of w and S is positive Integer, w is a weight parameter; in step S33, adopt the stochastic gradient descent algorithm to optimize the objective function, use the backpropagation algorithm to update the network model parameters, and end the training until the value of the objective function no longer decreases. In order to accelerate the model convergence, the parameter learning method is introduced Learning rate, the learning rate decays according to the following formula:

Figure BDA0001932106590000094
Figure BDA0001932106590000094

式中t为迭代次数且t≤35000,l0是初始学习率,l0为0.001,lt是第t次迭代的学习率,梯度衰减为0.0001,power是动量为0.9;In the formula, t is the number of iterations and t≤35000, l 0 is the initial learning rate, l 0 is 0.001, l t is the learning rate of the t-th iteration, the gradient decay is 0.0001, and power is the momentum of 0.9;

其它操作步骤及参数与实施例1相同。Other operating steps and parameters are the same as in Example 1.

采用本实施例方法对StanfordBackground数据集中142张测试图像进行语义分割,部分测试图像的语义分割图如图4所示,其中,第一行是输入图像、第二行是输入图像对应的彩色标签图像、第三行是其对应的语义分割图。Using the method of this embodiment to semantically segment 142 test images in the StanfordBackground dataset, the semantic segmentation diagram of some test images is shown in Figure 4, where the first row is the input image, and the second row is the color label image corresponding to the input image , the third line is its corresponding semantic segmentation map.

Claims (5)

1. An image semantic segmentation method based on deep learning is characterized by comprising the following steps:
s1, processing data sets
Dividing an image data set into a training image set and a test image set, performing data enhancement operation on the training image set, and increasing the number of training images to ten thousand-level units;
s2, constructing a deep semantic segmentation network
The deep semantic segmentation network is composed of a parallel deep neural network module, a feature fusion module and a Softmax classification layer, wherein the parallel deep neural network module is used for carrying out feature extraction on an input image, the feature fusion module carries out weighting fusion on an output feature map of the parallel deep neural network to obtain a new feature map, and the Softmax classification layer converts a pixel class label prediction score into a pixel class label prediction probability distribution map;
the parallel deep neural network module consists of a first deep neural network module and a second deep neural network module, the network structures of the first deep neural network module and the second deep neural network module are the same, the input of the first deep neural network module is an RGB image of an input image, and the input of the second deep neural network module is a gray image of the input image;
the first deep neural network module consists of a full convolution network module, a first coordinate channel module, a first circulation layer module, a second coordinate channel module, a second circulation layer module and a space pyramid pooling module, wherein the first coordinate channel module and the second coordinate channel module have the same structure, the first circulation layer module and the second circulation layer module have the same structure, the full convolution network module is used for extracting local features of an input image, the first circulation layer module is used for capturing context dependency and global feature information of the image, the first coordinate channel module is used for connecting i, j and r coordinate channels to a feature map output by the full convolution network module to form a new feature map so as to learn more coordinate feature information and improve the generalization capability of the model, and the space pyramid pooling module is used for performing convolution operation on the feature map output by the second circulation layer module at a plurality of sampling rates to extract feature information of different scale areas;
s3, deep semantic segmentation network training and parameter learning
S31, initializing network model parameters: performing parameter initialization on a full convolution network module by using a pre-training model of ResNet101 on an ImageNet data set, performing parameter initialization on a first circulation layer module and a second circulation layer module by using standard uniform distribution, and performing parameter initialization on a convolution layer of a space pyramid pooling module by using standard Gaussian distribution;
s32, training a deep semantic segmentation network by using the training image set after data enhancement to generate a pixel class prediction label probability distribution graph, calculating prediction loss by using the prediction label probability and the original label probability, specifically adopting a mixed loss function L (theta) as a target function,
L(θ)=L 1 (θ)+L 2 (θ)
in the formula L 1 (θ) is a cross entropy loss function, L 2 (theta) is an L2 regularization term, theta is a parameter of the deep semantic segmentation network;
the cross entropy loss function L 1 (θ) is:
Figure FDA0003987298460000021
in the formula y pq Is a predicted label probability vector that is,
Figure FDA0003987298460000022
the method comprises the following steps of (1) obtaining an original label probability vector, wherein N is the number of pixels of each picture, B is the batch size, C is the number of pixel categories, and ln (.) is the natural logarithm;
the L2 regularization term L 2 (θ) is:
Figure FDA0003987298460000023
in the formula, lambda is a regularization coefficient and is a positive number, N is the number of pixels of each image, B is the batch size, S is the parameter number of w, S is a positive integer, and w is a weight parameter;
s33, optimizing the target function by adopting a random gradient descent algorithm, and updating network model parameters by adopting a back propagation algorithm until the value of the target function does not descend any more, so as to finish training;
s4, performing semantic segmentation on the test image
S41, inputting the test image set into the deep semantic segmentation network trained in the step S3;
s42, the parallel deep neural network module performs feature extraction on the input test image set
The RGB image of the test image is used as the input of the first deep neural network module, and the gray image of the test image is used as the input of the second deep neural network module;
the first deep neural network module feature extraction process comprises the following steps: the full convolution network module performs local feature extraction on the RGB image of the test image through hole convolution, maximum pooling and convolution operations; the feature map output by the full convolution network module is sent to a first circulation layer module through a first coordinate channel module to obtain a new feature map, the new feature map is sent to the first circulation layer module to be scanned horizontally and vertically, and the global feature information of the image is learned; obtaining a new feature map from the feature map output by the first circulating layer module through the second coordinate channel module, sending the new feature map into the second circulating layer module for horizontal and vertical scanning, and capturing global feature information of the image; inputting the feature map output by the second circulation layer module into a space pyramid pooling module, performing convolution operation on a plurality of sampling rates, and extracting feature information of different scale areas;
the second deep neural network module feature extraction process is the same as the first deep neural network module feature extraction process;
s43, carrying out weighted fusion on the feature map output by the first deep neural network module and the feature map output by the second deep neural network module to obtain a new feature map;
and S44, sending the result of the step S43 into a Softmax classification layer to perform pixel class label prediction to obtain the object class of each pixel in the image, performing bilinear interpolation operation to up-sample the size of the original image, and obtaining a fine semantic segmentation image.
2. The deep learning based image semantic segmentation method according to claim 1, characterized in that: the first circulation layer module is composed of two bidirectional threshold recursion units, and the number of neurons of each bidirectional threshold recursion unit is 150.
3. The deep learning based image semantic segmentation method according to claim 1, characterized in that: the spatial pyramid pooling module is formed by convolution of 4 holes with different sampling rates, the convolution kernel size of the hole convolution is 3 multiplied by 3, and the expansion rates are 4, 6, 8 and 12 respectively.
4. The deep learning based image semantic segmentation method according to claim 1, characterized in that: in the step S2, the i, j and r coordinate channels are composed of an i coordinate channel, a j coordinate channel and an r coordinate channel, the i coordinate channel, the j coordinate channel and the r coordinate channel are all coordinate matrixes of e multiplied by f, the elements of the 1 st row to the e th row of the i coordinate channel are sequentially 0, 1,. The elements of the 1 st column to the f th column of the i coordinate channel are sequentially 0, 1,. The.f, f-1, e and f take positive integers, and the r coordinate channel is
Figure FDA0003987298460000031
m is any element in the i coordinate channel, n is the element in the j coordinate channel with the same position as the m coordinate, and the elements in the i coordinate channel and the j coordinate channel are linearly scaled to [ -1,1]Within the range.
5. The deep learning based image semantic segmentation method according to claim 1, characterized in that: the learning rate of the parameter learning in the step S3 is attenuated according to the following formula:
Figure FDA0003987298460000032
where t is the number of iterations, l 0 Is the initial learning rate,/ t Is the learning rate for the t-th iteration, power is the momentum of 0.9.
CN201811646148.7A 2018-12-30 2018-12-30 Image semantic segmentation method based on deep learning Active CN109711413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811646148.7A CN109711413B (en) 2018-12-30 2018-12-30 Image semantic segmentation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811646148.7A CN109711413B (en) 2018-12-30 2018-12-30 Image semantic segmentation method based on deep learning

Publications (2)

Publication Number Publication Date
CN109711413A CN109711413A (en) 2019-05-03
CN109711413B true CN109711413B (en) 2023-04-07

Family

ID=66260443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811646148.7A Active CN109711413B (en) 2018-12-30 2018-12-30 Image semantic segmentation method based on deep learning

Country Status (1)

Country Link
CN (1) CN109711413B (en)

Families Citing this family (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245665B (en) * 2019-05-13 2023-06-06 天津大学 Image Semantic Segmentation Method Based on Attention Mechanism
CN110289081B (en) * 2019-05-14 2021-11-02 杭州电子科技大学 Epilepsy detection method based on adaptive weighted feature fusion of deep network stack models
CN110163878A (en) * 2019-05-28 2019-08-23 四川智盈科技有限公司 A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism
CN110189337A (en) * 2019-05-31 2019-08-30 广东工业大学 A Semantic Segmentation Method for Autonomous Driving Images
CN110175613B (en) * 2019-06-03 2021-08-10 常熟理工学院 Streetscape image semantic segmentation method based on multi-scale features and codec model
CN110310289A (en) * 2019-06-17 2019-10-08 北京交通大学 Lung tissue image segmentation method based on deep learning
CN110264483B (en) * 2019-06-19 2023-04-18 东北大学 Semantic image segmentation method based on deep learning
CN110232418B (en) * 2019-06-19 2021-12-17 达闼机器人有限公司 Semantic recognition method, terminal and computer readable storage medium
CN110276402B (en) * 2019-06-25 2021-06-11 北京工业大学 Salt body identification method based on deep learning semantic boundary enhancement
CN110298849A (en) * 2019-07-02 2019-10-01 电子科技大学 Hard exudate dividing method based on eye fundus image
CN110472483B (en) * 2019-07-02 2022-11-15 五邑大学 SAR image-oriented small sample semantic feature enhancement method and device
CN110348537B (en) 2019-07-18 2022-11-29 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN110390314B (en) * 2019-07-29 2022-02-15 深兰科技(上海)有限公司 Visual perception method and equipment
CN110517329B (en) * 2019-08-12 2021-05-14 北京邮电大学 A deep learning image compression method based on semantic analysis
CN111062947B (en) * 2019-08-14 2023-04-25 深圳市智影医疗科技有限公司 X-ray chest radiography focus positioning method and system based on deep learning
CN110619639A (en) * 2019-08-26 2019-12-27 苏州同调医学科技有限公司 Method for segmenting radiotherapy image by combining deep neural network and probability map model
CN110675421B (en) * 2019-08-30 2022-03-15 电子科技大学 Cooperative segmentation method of depth image based on few annotation boxes
CN112465826B (en) * 2019-09-06 2023-05-16 上海高德威智能交通系统有限公司 Video semantic segmentation method and device
CN110619633B (en) * 2019-09-10 2023-06-23 武汉科技大学 Liver image segmentation method based on multipath filtering strategy
CN110807462B (en) * 2019-09-11 2022-08-30 浙江大学 Training method insensitive to context of semantic segmentation model
CN110717921B (en) * 2019-09-26 2022-11-15 哈尔滨工程大学 Full convolution neural network semantic segmentation method of improved coding and decoding structure
CN110728683B (en) * 2019-09-29 2021-02-26 吉林大学 An Image Semantic Segmentation Method Based on Dense Connections
CN110729045A (en) * 2019-10-12 2020-01-24 闽江学院 A Tongue Image Segmentation Method Based on Context-Aware Residual Networks
CN110889854B (en) * 2019-10-16 2023-12-05 深圳信息职业技术学院 Sketch part segmentation method, system, device and storage medium based on multi-scale deep learning
CN111783811B (en) * 2019-10-30 2024-06-21 北京京东尚科信息技术有限公司 Pseudo tag generation method and device
CN110880182B (en) * 2019-11-18 2022-08-26 东声(苏州)智能科技有限公司 Image segmentation model training method, image segmentation device and electronic equipment
CN110866922B (en) * 2019-11-19 2023-05-16 中山大学 Image Semantic Segmentation Model and Modeling Method Based on Reinforcement Learning and Migration Learning
CN110890155B (en) * 2019-11-25 2022-10-28 中国科学技术大学 Multi-class arrhythmia detection method based on lead attention mechanism
CN111079744B (en) * 2019-12-06 2020-09-01 鲁东大学 Intelligent vehicle license plate identification method and device suitable for complex illumination environment
CN111160109B (en) * 2019-12-06 2023-08-18 北京联合大学 A road segmentation method and system based on deep neural network
CN111161273B (en) * 2019-12-31 2023-03-21 电子科技大学 Medical ultrasonic image segmentation method based on deep learning
CN111160311B (en) * 2020-01-02 2022-05-17 西北工业大学 Yellow river ice semantic segmentation method based on multi-attention machine system double-flow fusion network
CN111325259A (en) * 2020-02-14 2020-06-23 武汉大学 Remote sensing image classification method based on deep learning and binary coding
CN111400492B (en) * 2020-02-17 2022-08-19 合肥工业大学 Hierarchical feature text classification method and system based on SFM-DCNN
CN111310509A (en) * 2020-03-12 2020-06-19 北京大学 Real-time barcode detection system and method based on logistics waybill
CN113496442A (en) * 2020-03-19 2021-10-12 荷盛崧钜智财顾问股份有限公司 Graph representation generation system, graph representation generation method and graph representation intelligent module thereof
CN111539412B (en) * 2020-04-21 2021-02-26 上海云从企业发展有限公司 Image analysis method, system, device and medium based on OCR
CN111507423B (en) * 2020-04-24 2023-06-09 国网湖南省电力有限公司 Engineering quantity measuring method for cleaning transmission line channel
CN111583390B (en) * 2020-04-28 2023-05-02 西安交通大学 3D Semantic Map Reconstruction Method Based on Deep Semantic Fusion Convolutional Neural Network
CN111612803B (en) * 2020-04-30 2023-10-17 杭州电子科技大学 A semantic segmentation method for vehicle images based on image clarity
CN111738265B (en) * 2020-05-20 2022-11-08 山东大学 Semantic segmentation method, system, medium and electronic device for RGB-D images
CN111666842B (en) * 2020-05-25 2022-08-26 东华大学 Shadow detection method based on double-current-cavity convolution neural network
CN111860827B (en) * 2020-06-04 2023-04-07 西安电子科技大学 Multi-target positioning method and device of direction-finding system based on neural network model
CN111915619A (en) * 2020-06-05 2020-11-10 华南理工大学 A fully convolutional network semantic segmentation method with dual feature extraction and fusion
CN111754520B (en) * 2020-06-09 2023-09-15 江苏师范大学 Deep learning-based cerebral hematoma segmentation method and system
CN111932501A (en) * 2020-07-13 2020-11-13 太仓中科信息技术研究院 Seal ring surface defect detection method based on semantic segmentation
CN111870279B (en) * 2020-07-31 2022-01-28 西安电子科技大学 Method, system and application for segmenting left ventricular myocardium of ultrasonic image
CN111899274B (en) * 2020-08-05 2024-03-29 大连交通大学 Particle size analysis method based on deep learning TEM image segmentation
CN111914948B (en) * 2020-08-20 2024-07-26 上海海事大学 Ocean current machine blade attachment self-adaptive identification method based on rough and fine semantic segmentation network
CN112149547B (en) * 2020-09-17 2023-06-02 南京信息工程大学 Water Body Recognition Method Based on Image Pyramid Guidance and Pixel Pair Matching
CN112164077B (en) * 2020-09-25 2023-12-29 陕西师范大学 Cell instance segmentation method based on bottom-up path enhancement
CN112163111B (en) * 2020-09-28 2022-04-01 杭州电子科技大学 Rotation-invariant semantic information mining method
CN112184714B (en) * 2020-11-10 2023-08-22 平安科技(深圳)有限公司 Image segmentation method, device, electronic equipment and medium
CN112571425B (en) * 2020-11-30 2022-04-01 汕头大学 An autonomous control method and system for leak location of a leak-plugging robot with pressure
CN112465840B (en) * 2020-12-10 2023-02-17 重庆紫光华山智安科技有限公司 Semantic segmentation model training method, semantic segmentation method and related device
CN112541916B (en) * 2020-12-11 2023-06-23 华南理工大学 A Dense Connection Based Image Segmentation Method for Waste Plastics
CN112580509B (en) * 2020-12-18 2022-04-15 中国民用航空总局第二研究所 Logical reasoning pavement detection method and system
CN112508030A (en) * 2020-12-18 2021-03-16 山西省信息产业技术研究院有限公司 Tunnel crack detection and measurement method based on double-depth learning model
CN112507338B (en) * 2020-12-21 2023-02-14 华南理工大学 Improved system based on deep learning semantic segmentation algorithm
CN112989919B (en) * 2020-12-25 2024-04-19 首都师范大学 Method and system for extracting target object from image
CN112651440B (en) * 2020-12-25 2023-02-14 陕西地建土地工程技术研究院有限责任公司 Soil effective aggregate classification and identification method based on deep convolutional neural network
CN112629863B (en) * 2020-12-31 2022-03-01 苏州大学 Bearing fault diagnosis method based on dynamic joint distributed alignment network under variable working conditions
CN112669325B (en) * 2021-01-06 2022-10-14 大连理工大学 A Video Semantic Segmentation Method Based on Active Learning
CN112614131A (en) * 2021-01-10 2021-04-06 复旦大学 Pathological image analysis method based on deformation representation learning
CN112766195B (en) * 2021-01-26 2022-03-29 西南交通大学 Electrified railway bow net arcing visual detection method
CN112837326B (en) * 2021-01-27 2024-04-09 南京中兴力维软件有限公司 Method, device and equipment for detecting carryover
CN112508032A (en) * 2021-01-29 2021-03-16 成都东方天呈智能科技有限公司 Face image segmentation method and segmentation network for context information of association
CN112991266A (en) * 2021-02-07 2021-06-18 复旦大学 Semantic segmentation method and system for small sample medical image
CN112884788B (en) * 2021-03-08 2022-05-10 中南大学 An optic cup and optic disc segmentation method and imaging method based on rich context network
CN112990304B (en) * 2021-03-12 2024-03-12 国网智能科技股份有限公司 Semantic analysis method and system suitable for power scene
CN113159278B (en) * 2021-03-16 2025-07-18 无锡信捷电气股份有限公司 Segmentation network system
CN112950645B (en) * 2021-03-24 2023-05-12 中国人民解放军国防科技大学 Image semantic segmentation method based on multitask deep learning
CN113033570B (en) * 2021-03-29 2022-11-11 同济大学 An Image Semantic Segmentation Method Based on Improved Atrous Convolution and Multi-level Feature Information Fusion
CN113269197B (en) * 2021-04-25 2024-03-08 南京三百云信息科技有限公司 Certificate image vertex coordinate regression system and identification method based on semantic segmentation
CN113205523A (en) * 2021-04-29 2021-08-03 浙江大学 Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization
CN113269786B (en) * 2021-05-19 2022-12-27 青岛理工大学 Assembly image segmentation method and device based on deep learning and guided filtering
CN113222033A (en) * 2021-05-19 2021-08-06 北京数研科技发展有限公司 Monocular image estimation method based on multi-classification regression model and self-attention mechanism
CN113191367B (en) * 2021-05-25 2022-07-29 华东师范大学 Semantic segmentation method based on dense scale dynamic network
CN113487622B (en) * 2021-05-25 2023-10-31 中国科学院自动化研究所 Head and neck organ image segmentation method, device, electronic equipment and storage medium
CN113450311B (en) * 2021-06-01 2023-01-13 国网河南省电力公司漯河供电公司 Defect detection method and system for screw with pin based on semantic segmentation and spatial relationship
CN113468969B (en) * 2021-06-03 2024-05-14 江苏大学 Aliased electronic component space expression method based on improved monocular depth estimation
CN113421269B (en) * 2021-06-09 2024-06-07 南京瑞易智能科技有限公司 Real-time semantic segmentation method based on double-branch deep convolutional neural network
CN113280820B (en) * 2021-06-09 2022-11-29 华南农业大学 Method and system for extracting orchard visual navigation path based on neural network
CN113592894B (en) * 2021-08-29 2024-02-02 浙江工业大学 Image segmentation method based on boundary box and co-occurrence feature prediction
CN114005123B (en) * 2021-10-11 2024-05-24 北京大学 Digital reconstruction system and method for printed text layout
CN114202489B (en) * 2021-10-29 2024-11-26 湖北大学 PCB board mark point reflective spot segmentation method based on deep learning
CN114359558B (en) * 2021-12-14 2024-11-12 重庆大学 A roof image segmentation method based on hybrid framework
CN114445631A (en) * 2022-01-29 2022-05-06 智道网联科技(北京)有限公司 Pavement full-factor image semantic segmentation method and device based on deep learning
CN114882212B (en) * 2022-03-23 2024-06-04 上海人工智能创新中心 Semantic segmentation method and device based on priori structure
CN114663660B (en) * 2022-04-07 2025-04-01 天津大学 A method for image semantic segmentation based on configurable context path
CN114913189B (en) * 2022-05-31 2024-07-02 东北大学 Coal gangue image segmentation method, device and equipment based on deep neural network
CN115049603B (en) * 2022-06-07 2024-06-07 安徽大学 A method and system for intestinal polyp segmentation based on small sample learning
CN115631127B (en) * 2022-08-15 2023-09-19 无锡东如科技有限公司 An image segmentation method for industrial defect detection
CN115423810B (en) * 2022-11-04 2023-03-14 国网江西省电力有限公司电力科学研究院 Blade icing form analysis method for wind generating set
CN116188479B (en) * 2023-02-21 2024-04-02 北京长木谷医疗科技股份有限公司 Hip joint image segmentation method and system based on deep learning
CN115861323B (en) * 2023-02-28 2023-06-06 泉州装备制造研究所 Leather defect detection method based on refined segmentation network
CN117351520B (en) * 2023-10-31 2024-06-11 广州恒沙数字科技有限公司 Front background image mixed generation method and system based on generation network
CN119048747A (en) * 2024-11-01 2024-11-29 北京星网船电科技有限公司 Method and system for detecting room obstacle target based on multi-mode information

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548159A (en) * 2016-11-08 2017-03-29 中国科学院自动化研究所 Reticulate pattern facial image recognition method and device based on full convolutional neural networks
WO2018217828A1 (en) * 2017-05-23 2018-11-29 Intel Corporation Methods and apparatus for discriminative semantic transfer and physics-inspired optimization of features in deep learning
CN109035263A (en) * 2018-08-14 2018-12-18 电子科技大学 Brain tumor image automatic segmentation method based on convolutional neural networks

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184226A (en) * 2015-08-11 2015-12-23 北京新晨阳光科技有限公司 Digital identification method, digital identification device, neural network training method and neural network training device
US9760807B2 (en) * 2016-01-08 2017-09-12 Siemens Healthcare Gmbh Deep image-to-image network learning for medical image analysis
CN106295139B (en) * 2016-07-29 2019-04-02 汤一平 A kind of tongue body autodiagnosis health cloud service system based on depth convolutional neural networks
US10360494B2 (en) * 2016-11-30 2019-07-23 Altumview Systems Inc. Convolutional neural network (CNN) system based on resolution-limited small-scale CNN modules
CN106803247B (en) * 2016-12-13 2021-01-22 上海交通大学 A method for image recognition of microaneurysm based on multi-level screening convolutional neural network
CN106709568B (en) * 2016-12-16 2019-03-22 北京工业大学 Object detection and semantic segmentation of RGB-D images based on deep convolutional networks
CN108319972B (en) * 2018-01-18 2021-11-02 南京师范大学 An End-to-End Disparity Network Learning Method for Image Semantic Segmentation
CN108268870B (en) * 2018-01-29 2020-10-09 重庆师范大学 Multi-scale feature fusion ultrasonic image semantic segmentation method based on counterstudy
CN108427961B (en) * 2018-02-11 2020-05-29 陕西师范大学 Synthetic Aperture Focused Imaging Depth Evaluation Method Based on Convolutional Neural Networks
CN108564587A (en) * 2018-03-07 2018-09-21 浙江大学 A kind of a wide range of remote sensing image semantic segmentation method based on full convolutional neural networks
CN108898145A (en) * 2018-06-15 2018-11-27 西南交通大学 A kind of image well-marked target detection method of combination deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106548159A (en) * 2016-11-08 2017-03-29 中国科学院自动化研究所 Reticulate pattern facial image recognition method and device based on full convolutional neural networks
WO2018217828A1 (en) * 2017-05-23 2018-11-29 Intel Corporation Methods and apparatus for discriminative semantic transfer and physics-inspired optimization of features in deep learning
CN109035263A (en) * 2018-08-14 2018-12-18 电子科技大学 Brain tumor image automatic segmentation method based on convolutional neural networks

Also Published As

Publication number Publication date
CN109711413A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109711413B (en) Image semantic segmentation method based on deep learning
Gao et al. Domain-adaptive crowd counting via high-quality image translation and density reconstruction
CN110363716B (en) High-quality reconstruction method for generating confrontation network composite degraded image based on conditions
CN107480726A (en) A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN108986050A (en) A kind of image and video enhancement method based on multiple-limb convolutional neural networks
CN113111716B (en) A method and device for semi-automatic labeling of remote sensing images based on deep learning
CN114821052B (en) Three-dimensional brain tumor nuclear magnetic resonance image segmentation method based on self-adjustment strategy
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN114037674A (en) A method and device for segmentation and detection of industrial defect images based on semantic context
CN112560719B (en) High-resolution image water body extraction method based on multi-scale convolution-multi-core pooling
CN113554653A (en) Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration
CN110852199A (en) A Foreground Extraction Method Based on Double Frame Encoding and Decoding Model
CN112329771A (en) Building material sample identification method based on deep learning
CN114359675A (en) A saliency map generation method for hyperspectral images based on semi-supervised neural network
CN114638408A (en) A Pedestrian Trajectory Prediction Method Based on Spatio-temporal Information
CN116957921A (en) Image rendering method, device, equipment and storage medium
CN119206568A (en) Video sequence segmentation method based on selective scanning visual state space model
CN110633706B (en) Semantic segmentation method based on pyramid network
CN120014256A (en) Image semi-supervised semantic segmentation method and system based on pixel-level correction
CN114926826A (en) Scene text detection system
Liu et al. Dsma: Reference-based image super-resolution method based on dual-view supervised learning and multi-attention mechanism
CN116152575B (en) Weakly supervised object localization method, device and medium based on class activation sampling guidance
CN109583584B (en) Method and system for enabling CNN with full connection layer to accept indefinite shape input
Zhang et al. Scale-progressive multi-patch network for image dehazing
CN117422644A (en) Depth image complement method based on transducer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant