CN118229724A - Method and device for acquiring opacity graph in portrait matting process - Google Patents
Method and device for acquiring opacity graph in portrait matting process Download PDFInfo
- Publication number
- CN118229724A CN118229724A CN202410250379.5A CN202410250379A CN118229724A CN 118229724 A CN118229724 A CN 118229724A CN 202410250379 A CN202410250379 A CN 202410250379A CN 118229724 A CN118229724 A CN 118229724A
- Authority
- CN
- China
- Prior art keywords
- map
- portrait
- opacity
- binary mask
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 101
- 230000008569 process Effects 0.000 title claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 55
- 230000011218 segmentation Effects 0.000 claims abstract description 29
- 230000000877 morphologic effect Effects 0.000 claims abstract description 28
- 238000012804 iterative process Methods 0.000 claims abstract description 18
- 230000003044 adaptive effect Effects 0.000 claims description 40
- 238000004590 computer program Methods 0.000 claims description 16
- 230000003628 erosive effect Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 8
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 abstract description 6
- 230000010339 dilation Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 3
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000007797 corrosion Effects 0.000 description 2
- 238000005260 corrosion Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 description 1
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 description 1
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 210000000746 body region Anatomy 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000916 dilatatory effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
Description
技术领域Technical Field
本申请属于图像处理技术领域,更具体地,涉及一种人像抠图过程中不透明度图的获取方法及装置。The present application belongs to the field of image processing technology, and more specifically, relates to a method and device for obtaining an opacity map during a portrait cutout process.
背景技术Background technique
人像抠图,即将图像中的人像前景从背景图像中分离出来,获得其对应的不透明度图,被广泛应用于影视制作、摄影后期处理、广告设计、教育和培训等领域。常用的人像抠图模型主要是深度学习模型,其学习和泛化能力强烈依赖于数据集的质量和规模。标注工具的设计直接影响着数据集的质量和标注人员的效率,因此,设计一个高效且用户友好的标注工具至关重要。Portrait cutout, which is to separate the foreground of a portrait from the background image in an image and obtain its corresponding opacity map, is widely used in film and television production, photography post-processing, advertising design, education and training, etc. Commonly used portrait cutout models are mainly deep learning models, whose learning and generalization capabilities strongly depend on the quality and scale of the dataset. The design of annotation tools directly affects the quality of the dataset and the efficiency of the annotation staff. Therefore, it is crucial to design an efficient and user-friendly annotation tool.
标注工具可以是不透明度图,目前常用的获得人像抠图基准真实数据(即不透明度图)的方法包括两种:一种是通过Adobe Photoshop软件提供的多种工具和技术来人工分离人像前景和背景图像;另一种是通过切换多种单色背景,保持前景和相机位置不动多次拍摄来精确控制拍摄条件,再通过三角剖分技术获得基准真实数据。The annotation tool can be an opacity map. There are currently two commonly used methods for obtaining portrait cutout benchmark real data (i.e., opacity map): one is to manually separate the portrait foreground and background images through a variety of tools and techniques provided by Adobe Photoshop software; the other is to switch between a variety of monochrome backgrounds, keep the foreground and camera position fixed for multiple shots to accurately control the shooting conditions, and then obtain the benchmark real data through triangulation technology.
以上两种方法的操作都较为复杂,且对标注人员专业性要求较高,导致人像抠图基准真实数据的收集难度很高。The operations of the above two methods are relatively complicated and require a high level of professionalism from the labelers, which makes it very difficult to collect real data for portrait cutout benchmarks.
发明内容Summary of the invention
针对相关技术的缺陷,本申请的目的在于提供一种人像抠图过程中不透明度图的获取方法及装置,旨在解决人像抠图基准真实数据收集难度高的问题。In view of the defects of the related art, the purpose of the present application is to provide a method and device for obtaining an opacity map in the process of portrait cutout, aiming to solve the problem of high difficulty in collecting real data for portrait cutout benchmarks.
第一方面,本申请实施例提供一种人像抠图过程中不透明度图的获取方法,包括:In a first aspect, an embodiment of the present application provides a method for obtaining an opacity map during a portrait cutout process, comprising:
获取待抠图的原始图像;Get the original image to be cut out;
对原始图像进行人像分割和二值化处理,获取第一二值掩码;对原始图像进行头发分割和二值化处理,获取第二二值掩码;Performing portrait segmentation and binarization processing on the original image to obtain a first binary mask; performing hair segmentation and binarization processing on the original image to obtain a second binary mask;
对第一二值掩码和第二二值掩码进行自适应形态学处理,获取人像前景的初始三分图;Performing adaptive morphological processing on the first binary mask and the second binary mask to obtain an initial trisection image of the portrait foreground;
执行迭代过程,迭代过程包括:以初始三分图或上一次迭代过程获取的三分图作为先验信息,与原始图像共同输入至训练好的人像抠图网络模型,获取人像抠图网络模型输出的不透明度图;对不透明度图进行分割和二值化处理,获取第三二值掩码;对第三二值掩码进行自适应形态学处理,获取当前迭代过程的三分图;An iterative process is performed, and the iterative process includes: using the initial trisection map or the trisection map obtained in the previous iterative process as prior information, and inputting it together with the original image into the trained portrait cutout network model to obtain an opacity map output by the portrait cutout network model; segmenting and binarizing the opacity map to obtain a third binary mask; performing adaptive morphological processing on the third binary mask to obtain a trisection map of the current iterative process;
当人像抠图网络模型输出的不透明度图满足基准真实数据标准或达到最大迭代次数时,输出最后一次迭代过程的不透明度图。When the opacity map output by the portrait cutout network model meets the benchmark real data standard or reaches the maximum number of iterations, the opacity map of the last iteration process is output.
在一些实施例中,人像抠图网络模型包括:编码器模块、解码器块模块、嵌接入编码器模块和解码器模块之间的桥接块和传播细化模块;In some embodiments, the portrait matting network model includes: an encoder module, a decoder block module, a bridge block embedded between the encoder module and the decoder module, and a propagation refinement module;
获取人像抠图网络模型输出的不透明度图,包括:Get the opacity map output by the portrait cutout network model, including:
将先验信息和原始图像输入至编码器进行特征提取,获取第一输出,第一输出包括包括提取到的低级细节特征和高级语义特征;Inputting the prior information and the original image into the encoder for feature extraction to obtain a first output, where the first output includes the extracted low-level detail features and high-level semantic features;
将第一输出输入至桥接块进行编码器模块和解码器模块的跳跃链接,获取第二输出;Input the first output to the bridge block to perform a jump link between the encoder module and the decoder module to obtain the second output;
将第一输出和第二输出连接后输入至解码器进行上采样,输出初始不透明度图;The first output and the second output are connected and input to the decoder for upsampling, and an initial opacity map is output;
将原始图像和初始不透明度图输入至传播细化模块进行迭代细化,输出细化后的不透明度图。The original image and the initial opacity map are input into the propagation refinement module for iterative refinement, and the refined opacity map is output.
在一些实施例中,传播细化模块包括至少两个传播单元,每个传播单元包括两个ResBlock子单元和一个卷积LSTM子单元。In some embodiments, the propagation refinement module includes at least two propagation units, each propagation unit includes two ResBlock sub-units and one convolutional LSTM sub-unit.
在一些实施例中,该方法还包括:In some embodiments, the method further comprises:
接收用户的第一输入,第一输入用于对作为先验信息的三分图进行微调;和/或,receiving a first input from a user, where the first input is used to fine-tune the tripartite map as prior information; and/or,
接收用户的第二输入,第二输入用于对人像抠图网络模型输出的不透明度图进行微调。A second input from the user is received, where the second input is used to fine-tune the opacity map output by the portrait cutout network model.
在一些实施例中,人像抠图网络模型的联合训练损失包括:In some embodiments, the joint training loss of the portrait matting network model includes:
预测损失,用于表征不确定区域真实不透明度图和预测不透明度之间的绝对差;Prediction loss, which is used to characterize the absolute difference between the true opacity map and the predicted opacity in the uncertain region;
拉普拉斯损失,用于表征不确定区域真实不透明度和预测不透明度的拉普拉斯金字塔之间的L1距离;Laplacian loss, which is used to characterize the L1 distance between the Laplacian pyramid of the true opacity and the predicted opacity in the uncertain region;
合成损失,用于表征原始图像与合成图像之间的绝对差异,合成图像基于预测不透明度、前景图像和背景图像生成。The synthesis loss is used to characterize the absolute difference between the original image and the synthesized image, which is generated based on the predicted opacity, foreground image and background image.
在一些实施例中,自适应形态学处理,包括:In some embodiments, adaptive morphological processing includes:
对目标二值掩码做距离变换获得距离图,以距离图的最大值作为人像尺寸参数;目标二值掩码为第一二值掩码、第二二值掩码或第三二值掩码;Performing distance transformation on the target binary mask to obtain a distance map, and taking the maximum value of the distance map as the portrait size parameter; the target binary mask is the first binary mask, the second binary mask or the third binary mask;
基于人像尺寸参数获取自适应参数,自适应参数包括区域膨胀参数和腐蚀参数;Acquire adaptive parameters based on the portrait size parameters, where the adaptive parameters include a regional expansion parameter and an erosion parameter;
基于自适应参数对目标二值掩码分别做膨胀和腐蚀处理,获得膨胀及腐蚀处理后的第四二值掩码;Based on the adaptive parameters, the target binary mask is respectively expanded and eroded to obtain a fourth binary mask after the expansion and erosion processing;
基于目标二值掩码确定三分图的前景区域,基于目标二值掩码和第四二值掩码确定三分图的不确定区域,除前景区域和不确定区域之外的区域作为三分图的背景区域。A foreground area of the tripartite map is determined based on the target binary mask, an uncertain area of the tripartite map is determined based on the target binary mask and the fourth binary mask, and an area other than the foreground area and the uncertain area is used as a background area of the tripartite map.
第二方面,本申请实施例还提供一种人像抠图过程中不透明度图的获取装置,包括:In a second aspect, an embodiment of the present application further provides a device for obtaining an opacity map during a portrait cutout process, comprising:
原始图像获取单元,用于获取待抠图的原始图像;An original image acquisition unit, used to acquire the original image to be cut out;
分割和二值化处理单元,用于对原始图像进行人像分割和二值化处理,获取第一二值掩码;对原始图像进行头发分割和二值化处理,获取第二二值掩码;The segmentation and binarization processing unit is used to perform portrait segmentation and binarization processing on the original image to obtain a first binary mask; perform hair segmentation and binarization processing on the original image to obtain a second binary mask;
形态学处理单元,用于对第一二值掩码和第二二值掩码进行自适应形态学处理,获取人像前景的初始三分图;A morphological processing unit, configured to perform adaptive morphological processing on the first binary mask and the second binary mask to obtain an initial trisection image of the portrait foreground;
迭代单元,用于执行迭代过程,迭代过程包括:以初始三分图或上一次迭代过程获取的三分图作为先验信息,与原始图像共同输入至训练好的人像抠图网络模型,获取人像抠图网络模型输出的不透明度图;对不透明度图进行分割和二值化处理,获取第三二值掩码;对第三二值掩码进行自适应形态学处理,获取当前迭代过程的三分图;An iteration unit is used to execute an iteration process, wherein the iteration process includes: using an initial trisection image or a trisection image obtained in a previous iteration process as prior information, and inputting the trisection image together with the original image into a trained portrait cutout network model to obtain an opacity map output by the portrait cutout network model; segmenting and binarizing the opacity map to obtain a third binary mask; performing adaptive morphological processing on the third binary mask to obtain a trisection image of the current iteration process;
不透明度图输出单元,用于当人像抠图网络模型输出的不透明度图满足基准真实数据标准或达到最大迭代次数时,输出最后一次迭代过程的不透明度图。The opacity map output unit is used to output the opacity map of the last iteration process when the opacity map output by the portrait cutout network model meets the benchmark real data standard or reaches the maximum number of iterations.
第三方面,本申请实施例提供一种电子设备,包括:至少一个存储器,用于存储程序;至少一个处理器,用于执行存储器存储的程序,当存储器存储的程序被执行时,处理器用于执行第一方面或第一方面的任一种可能的实现方式所描述的方法。In a third aspect, an embodiment of the present application provides an electronic device, comprising: at least one memory for storing programs; and at least one processor for executing the programs stored in the memory, wherein when the programs stored in the memory are executed, the processor is used to execute the method described in the first aspect or any possible implementation of the first aspect.
第四方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,当计算机程序在处理器上运行时,使得处理器执行第一方面或第一方面的任一种可能的实现方式所描述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program runs on a processor, the processor executes the method described in the first aspect or any possible implementation of the first aspect.
第五方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在处理器上运行时,使得处理器执行第一方面或第一方面的任一种可能的实现方式所描述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product. When the computer program product runs on a processor, the processor executes the method described in the first aspect or any possible implementation manner of the first aspect.
本申请实施例提供的人像抠图过程中不透明度图的获取方法及装置,通过对原始图像进行人像分割和头发分割和二值化处理,获取第一二值掩码和第二二值掩码,然后对二值掩码进行自适应形态学处理,获取初始三分图,得到了更为精确的先验信息;然后利用预先训练好的人像抠图网络模型来获取不透明度图并进行迭代循环,直至满足迭代条件后输出不透明度图作为基准真实数据,实现了基准真实数据的自动化、准确和快速获取。The method and device for obtaining an opacity map in a portrait cutout process provided in an embodiment of the present application perform portrait segmentation and hair segmentation and binarization processing on an original image to obtain a first binary mask and a second binary mask, and then perform adaptive morphological processing on the binary mask to obtain an initial tri-map, thereby obtaining more accurate prior information; then, a pre-trained portrait cutout network model is used to obtain the opacity map and perform an iterative cycle until the iterative condition is met, and then the opacity map is output as the reference real data, thereby achieving automatic, accurate and rapid acquisition of the reference real data.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the present application or related technologies, the following is a brief introduction to the drawings required for use in the embodiments or related technical descriptions. Obviously, the drawings described below are some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.
图1是本申请实施例提供的人像抠图过程中不透明度图的获取方法的流程示意图之一;FIG1 is a schematic diagram of a flow chart of a method for obtaining an opacity map in a portrait cutout process provided in an embodiment of the present application;
图2是本申请实施例提供的人像抠图过程中不透明度图的获取方法的流程示意图之二;FIG2 is a second flow chart of a method for obtaining an opacity map in a portrait cutout process provided in an embodiment of the present application;
图3是本申请实施例提供的人像抠图过程中不透明度图的获取方法的流程示意图之三;FIG3 is a third flow chart of a method for obtaining an opacity map in a portrait cutout process provided in an embodiment of the present application;
图4是本申请实施例提供的人像抠图过程中不透明度图的获取装置的结构示意图;FIG4 is a schematic diagram of the structure of a device for obtaining an opacity map during a portrait cutout process provided by an embodiment of the present application;
图5是本申请实施例提供的电子设备的结构示意图。FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.
本文中术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。本文中符号“/”表示关联对象是或者的关系,例如A/B表示A或者B。The term "and/or" in this article is a description of the association relationship of associated objects, indicating that there can be three relationships. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. The symbol "/" in this article indicates that the associated objects are in an or relationship, for example, A/B means A or B.
本文中的说明书和权利要求书中的术语“第一”和“第二”等是用于区别不同的对象,而不是用于描述对象的特定顺序。例如,第一响应消息和第二响应消息等是用于区别不同的响应消息,而不是用于描述响应消息的特定顺序。The terms "first" and "second" in the specification and claims herein are used to distinguish different objects rather than to describe a specific order of the objects. For example, a first response message and a second response message are used to distinguish different response messages rather than to describe a specific order of the response messages.
图1是本申请实施例提供的人像抠图过程中不透明度图的获取方法的流程示意图,如图1所示,该方法至少包括以下步骤(Step):FIG. 1 is a flow chart of a method for obtaining an opacity map in a portrait cutout process provided by an embodiment of the present application. As shown in FIG. 1 , the method includes at least the following steps:
S101、获取待抠图的原始图像。S101, obtaining an original image to be cut out.
S102、对原始图像进行人像分割和二值化处理,获取第一二值掩码;对原始图像进行头发分割和二值化处理,获取第二二值掩码。S102, performing portrait segmentation and binarization processing on the original image to obtain a first binary mask; performing hair segmentation and binarization processing on the original image to obtain a second binary mask.
具体地,图像的二值化处理,就是将图像上的像素点的灰度值设置为0和255,也即整个图像呈现出明显的只有黑和白的视觉效果。二值化处理的目的是为了将感兴趣目标和背景分离,本申请实施例中的感兴趣目标为原始图像中的人像区域和头发区域。人像分割和二值化处理和头发分割和二值化处理可以通过任一种通用的网络学习模型来实现。实际应用过程中,人像分割模型和头发分割模型还可以不断地更新,从而获取更为精准的初始三分图结果。Specifically, the binarization of the image is to set the grayscale values of the pixels on the image to 0 and 255, that is, the entire image presents an obvious visual effect of only black and white. The purpose of the binarization is to separate the target of interest from the background. The target of interest in the embodiment of the present application is the portrait area and the hair area in the original image. Portrait segmentation and binarization and hair segmentation and binarization can be achieved by any general network learning model. In actual application, the portrait segmentation model and the hair segmentation model can also be continuously updated to obtain a more accurate initial tri-map result.
S103、对第一二值掩码和第二二值掩码进行自适应形态学处理,获取人像前景的初始三分图。S103: Perform adaptive morphological processing on the first binary mask and the second binary mask to obtain an initial trisection image of the portrait foreground.
具体地,通过对第一二值掩码和第二二值掩码进行自适应形态学处理,可以将原始图像中的每个像素划分至前景区域、不确定区域和背景区域,获取人像前景的初始三分图。Specifically, by performing adaptive morphological processing on the first binary mask and the second binary mask, each pixel in the original image can be divided into a foreground area, an uncertain area, and a background area, so as to obtain an initial tripartite map of the portrait foreground.
S104、执行迭代过程,迭代过程包括:以初始三分图或上一次迭代过程获取的三分图作为先验信息,与原始图像共同输入至训练好的人像抠图网络模型,获取人像抠图网络模型输出的不透明度图;对不透明度图进行分割和二值化处理,获取目标二值掩码;对目标二值掩码进行自适应形态学处理,获取当前迭代过程的三分图。S104, executing an iterative process, the iterative process comprising: using the initial tri-map or the tri-map obtained in the previous iterative process as prior information, and inputting it together with the original image into the trained portrait cutout network model to obtain an opacity map output by the portrait cutout network model; segmenting and binarizing the opacity map to obtain a target binary mask; and performing adaptive morphological processing on the target binary mask to obtain a tri-map of the current iterative process.
具体地,对于首次迭代,以初始三分图作为先验信息和原始图像共同输入至训练好的人像抠图网络模型;对于非首次迭代,则以上一次迭代过程获取的三分图作为先验信息和原始图像共同输入至训练好的人像抠图网络模型,获取当前迭代过程模型输出的不透明度图。对模型输出的不透明度图进行分割和二值化处理,得到目标二值掩码,然后进行自适应形态学处理,最终输出三分图,完成一次完整的迭代过程。Specifically, for the first iteration, the initial trisection image is used as prior information and the original image is input into the trained portrait cutout network model; for the non-first iteration, the trisection image obtained in the previous iteration process is used as prior information and the original image is input into the trained portrait cutout network model to obtain the opacity map output by the model in the current iteration process. The opacity map output by the model is segmented and binarized to obtain the target binary mask, and then adaptive morphological processing is performed to finally output the trisection image, completing a complete iteration process.
S105、当人像抠图网络模型输出的不透明图满足基准真实数据标准或达到最大迭代次数时,输出最后一次迭代过程的不透明度图。S105. When the opacity map output by the portrait cutout network model meets the benchmark real data standard or reaches the maximum number of iterations, the opacity map of the last iteration process is output.
具体地,当迭代过程中模型输出的不透明度图满足基准真实数据标注,或者达到最大迭代次数时,输出最后一次迭代过程中模型输出的不透明度图,即是作为抠图结果,也是作为基准真实数据标注结果。Specifically, when the opacity map output by the model during the iteration process meets the benchmark real data annotation, or when the maximum number of iterations is reached, the opacity map output by the model during the last iteration is output, which is both the cutout result and the benchmark real data annotation result.
本申请实施例提供的人像抠图过程中不透明度图的获取方法,通过对原始图像进行人像分割和头发分割和二值化处理,获取第一二值掩码和第二二值掩码,然后对二值掩码进行自适应形态学处理,获取初始三分图,得到了更为精确的先验信息;然后利用预先训练好的人像抠图网络模型来获取不透明度图并进行迭代循环,直至满足迭代条件后输出不透明度图作为基准真实数据,实现了基准真实数据的自动化、准确和快速获取。The method for obtaining an opacity map in a portrait cutout process provided in an embodiment of the present application performs portrait segmentation and hair segmentation and binarization on an original image to obtain a first binary mask and a second binary mask, and then performs adaptive morphological processing on the binary mask to obtain an initial tri-map, thereby obtaining more accurate prior information; then, a pre-trained portrait cutout network model is used to obtain the opacity map and perform an iterative cycle until the iterative condition is met, and then the opacity map is output as the reference real data, thereby achieving automated, accurate and rapid acquisition of the reference real data.
在一些实施例中,人像抠图过程中不透明度图的获取方法还包括:In some embodiments, the method for obtaining the opacity map during the portrait cutout process further includes:
接收用户的第一输入,第一输入用于对作为先验信息的三分图进行微调;和/或,receiving a first input from a user, where the first input is used to fine-tune the tripartite map as prior information; and/or,
接收用户的第二输入,第二输入用于对人像抠图网络模型输出的不透明度图进行微调。A second input from the user is received, where the second input is used to fine-tune the opacity map output by the portrait cutout network model.
具体地,在将作为先验信息的三分图输入至人像抠图网络模型之前,可以通过标注人员对三分图的微调操作,对三分图中的局部区域像素的类别标签进行调整,以获得更为精准的先验信息。Specifically, before the triplicated image, which is used as prior information, is input into the portrait cutout network model, the category labels of the pixels in the local area of the triplicated image can be adjusted by the annotator through fine-tuning operations on the triplicated image to obtain more accurate prior information.
可以想到的是,在将人像抠图网络模型输出的不透明度图进行分割和二值化处理之前,也可以通过标注人员对不透明度图的微调操作,改变局部区域像素的语义信息。It is conceivable that before the opacity map output by the portrait cutout network model is segmented and binarized, the semantic information of the pixels in the local area can also be changed by fine-tuning the opacity map by the annotator.
微调操作具体可以是对局部区域像素添加多边形、多边形、矩形、圆形、多段线、线段、点形式的标注,改变像素的语义类别,例如将误分类为背景的前景像素更正为前景像素,或修改不确定区域的像素范围等。The fine-tuning operation can specifically be to add polygons, polygons, rectangles, circles, polylines, line segments, and point annotations to the pixels in the local area, change the semantic category of the pixels, such as correcting the foreground pixels misclassified as background to foreground pixels, or modifying the pixel range of the uncertain area.
本申请实施例提供的人像抠图过程中不透明度图的获取方法,结合深度学习算法和预测结果和简单的手动微调来快速获得基准真实数据,降低了抠图过程中的标注工作难度,提高了标注效率。The method for obtaining the opacity map during the portrait cutout process provided in the embodiment of the present application combines a deep learning algorithm and prediction results with simple manual fine-tuning to quickly obtain benchmark real data, thereby reducing the difficulty of labeling work during the cutout process and improving labeling efficiency.
图2是本申请实施例提供的人像抠图过程中不透明度图的获取方法的流程示意图之二,如图2所示,人像抠图网络模型具体包括:编码器、解码器、嵌接入编码器和解码器之间的桥接块和传播细化模块;S104中获取人像抠图网络模型输出的不透明图,具体包括:FIG. 2 is a second flow chart of a method for obtaining an opacity map in a portrait cutout process provided by an embodiment of the present application. As shown in FIG. 2 , the portrait cutout network model specifically includes: an encoder, a decoder, a bridge block embedded between the encoder and the decoder, and a propagation refinement module; obtaining the opacity map output by the portrait cutout network model in S104 specifically includes:
将先验信息和原始图像输入至编码器进行特征提取,获取第一输出,第一输出包括包括提取到的低级细节特征和高级语义特征;Inputting the prior information and the original image into the encoder for feature extraction to obtain a first output, where the first output includes the extracted low-level detail features and high-level semantic features;
将第一输出输入至桥接块进行编码器模块和解码器模块的跳跃链接,获取第二输出;Input the first output to the bridge block to perform a jump link between the encoder module and the decoder module to obtain the second output;
将第一输出和第二输出连接后输入至解码器进行上采样,输出初始不透明度图;The first output and the second output are connected and input to the decoder for upsampling, and an initial opacity map is output;
将原始图像和初始不透明度图输入至传播细化模块进行迭代细化,输出细化后的不透明度图。The original image and the initial opacity map are input into the propagation refinement module for iterative refinement, and the refined opacity map is output.
具体地,人像抠图网络为包含跳跃连接的编码器-解码器网络,其基于原始图像预测出人像前景的二值掩码,人像抠图网络模型包括编码器模块、解码器块模块、嵌接入编码器模块和解码器模块之间的桥接块和传播细化模块。Specifically, the portrait cutout network is an encoder-decoder network containing jump connections, which predicts a binary mask of the portrait foreground based on the original image. The portrait cutout network model includes an encoder module, a decoder block module, a bridge block embedded between the encoder module and the decoder module, and a propagation refinement module.
编码器模块负责提取人像图片样本及先验信息(即三分图)中丰富的语义特征及细节特征,获取第一输出;解码器模块对编码器模块的第一输出进行上采样,保留必要的局部细节信息,预测初步的不透明度图;传播细化模块由三个传播单元组成,对解码器模块的输出做进一步细化,获得精细的预测。The encoder module is responsible for extracting rich semantic features and detail features from portrait image samples and prior information (i.e., the triplicate image) to obtain the first output; the decoder module upsamples the first output of the encoder module, retains the necessary local detail information, and predicts a preliminary opacity map; the propagation refinement module consists of three propagation units, which further refines the output of the decoder module to obtain a refined prediction.
编码器模块的输入通过随后的卷积层和最大池化层转换为降采样的特征图。具体来说,编码器模块有14个卷积层和5个最大池化层。The input of the encoder module is converted into a downsampled feature map through subsequent convolutional layers and maximum pooling layers. Specifically, the encoder module has 14 convolutional layers and 5 maximum pooling layers.
解码器模块使用非池化层,反向最大池化操作和卷积层来上采样特征图,并输出粗糙的不透明度图。解码器模块使用了一个比编码器网络更小的结构来减少参数的数量和加快训练过程。具体来说,解码器模块有6个卷积层,5个非池化层,以及最后的alpha预测层。The decoder module uses unpooling layers, inverse max pooling operations, and convolutional layers to upsample feature maps and output a rough opacity map. The decoder module uses a smaller structure than the encoder network to reduce the number of parameters and speed up the training process. Specifically, the decoder module has 6 convolutional layers, 5 unpooling layers, and a final alpha prediction layer.
编码器模块和解码器模块之间,插入了桥接块来利用不同感受野中的局部上下文。如图2所示,桥接块由三个膨胀卷积层组成。编码器模块和桥接块输出的特征被连接并输入解码器模块。本申请实施例中遵循U-net的风格,在编码器模块和解码器模块之间进行跳跃链接,以保留精细的细节。A bridge block is inserted between the encoder module and the decoder module to utilize local context in different receptive fields. As shown in Figure 2, the bridge block consists of three dilated convolutional layers. The features output by the encoder module and the bridge block are connected and input to the decoder module. In the embodiment of the present application, the style of U-net is followed, and a jump link is made between the encoder module and the decoder module to preserve fine details.
传播细化模块包含至少两个传播单元,每个传播单元由两个ResBlock和一个卷积LSTM子单元组成,其输入为原始图像及解码器模块的输出结果。如图2所示,传播细化模块包含三个传播单元,在每次循环迭代中,将输入图像、融合特征和先前的不透明度传播结果作为输入。ResBlocks从输入中提取特征,而卷积LSTM在传播步骤之间保留记忆。传播单元逐步细化预测的不透明度图,产生具有更准确的边缘细节和更少不良伪影的最终结果。The propagation refinement module contains at least two propagation units, each of which consists of two ResBlocks and a convolutional LSTM subunit, and its input is the original image and the output of the decoder module. As shown in Figure 2, the propagation refinement module contains three propagation units, which take the input image, fused features, and previous opacity propagation results as input in each loop iteration. ResBlocks extract features from the input, while the convolutional LSTM retains memory between propagation steps. The propagation unit gradually refines the predicted opacity map, producing a final result with more accurate edge details and fewer undesirable artifacts.
在一些实施例中,人像抠图物联网模型的联合训练损失包括:In some embodiments, the joint training loss of the portrait cutout IoT model includes:
预测损失,用于计算输入的三分图中不确定区域真实不透明度和预测不透明度之间的绝对差;Prediction loss, which is used to calculate the absolute difference between the true opacity and the predicted opacity of the uncertain region in the input triplicate map;
拉普拉斯损失,用于计算输入的三分图中不确定区域真实不透明度和预测不透明度的拉普拉斯金字塔之间的L1距离;Laplacian loss, which is used to calculate the L1 distance between the true opacity of the uncertain region in the input triplicate map and the Laplacian pyramid of the predicted opacity;
合成损失,用于计算原始图像与合成图像之间的绝对差异,合成图像基于预测不透明度、前景图像和背景图像生成。The synthesis loss is used to calculate the absolute difference between the original image and the synthesized image, which is generated based on the predicted opacity, foreground image and background image.
具体地,人像抠图网络模型训练过程中使用的联合损失函数为预测损失LT、拉普拉斯损失Llap和合成损失Lcom之和:Specifically, the joint loss function used in the training process of the portrait cutout network model is the sum of the prediction loss LT , the Laplace loss Llap and the synthesis loss Lcom :
L=LT+Llap+Lcom L= LT + Llap + Lcom
预测损失LT,用于表征不确定区域中真实不透明度αp和预测不透明度αp之间的绝对差,具体为:The prediction loss LT is used to characterize the absolute difference between the true opacity αp and the predicted opacity αp in the uncertain region, specifically:
其中,i表示像素索引,Wi T∈{0,1)表示像素i是否属于不确定区域。为了计算稳定性,令ε=10-6。Where i represents the pixel index, and W i T ∈ {0, 1) represents whether the pixel i belongs to the uncertain region. In order to calculate the stability, let ε = 10 -6 .
拉普拉斯损失Llap,用于表征不确定区域真实不透明度αp和预测不透明度αp的拉普拉斯金字塔之间的L1距离,具体为:The Laplace loss L lap is used to characterize the L 1 distance between the Laplace pyramid of the true opacity α p and the predicted opacity α p in the uncertain region, specifically:
其中,Lapk表示拉普拉斯金字塔的第k层。Wherein, Lap k represents the kth layer of the Laplacian pyramid.
合成损失Lcom,用于表征输入图像I与根据预测不透明度αp、前景图像F和背景B生成的合成图像之间的绝对差异,具体为:The synthesis loss L com is used to characterize the absolute difference between the input image I and the synthesized image generated according to the predicted opacity α p , the foreground image F and the background B, specifically:
Lcom=||αpF+(1-αp)B-I||L com =||α p F+(1-α p )BI||
在一些实施例中,自适应形态学处理,具体包括:In some embodiments, the adaptive morphological processing specifically includes:
对头发二值掩码做距离变换获得距离图,以距离图的最大值作为人像尺寸参数;目标二值掩码为第一二值掩码、第二二值掩码或第三二值掩码;Performing distance transformation on the hair binary mask to obtain a distance map, and taking the maximum value of the distance map as the portrait size parameter; the target binary mask is the first binary mask, the second binary mask or the third binary mask;
基于人像尺寸参数获取自适应参数,自适应参数包括区域膨胀参数和腐蚀参数;Acquire adaptive parameters based on the portrait size parameters, where the adaptive parameters include a regional expansion parameter and an erosion parameter;
基于自适应参数对目标二值掩码分别做膨胀和腐蚀处理,获得膨胀及腐蚀处理后的第四二值掩码;Based on the adaptive parameters, the target binary mask is respectively expanded and eroded to obtain a fourth binary mask after the expansion and erosion processing;
基于目标二值掩码确定三分图的前景区域,基于目标二值掩码和第四二值掩码确定三分图的不确定区域,除前景区域和不确定区域之外的区域作为三分图的背景区域。A foreground area of the tripartite map is determined based on the target binary mask, an uncertain area of the tripartite map is determined based on the target binary mask and the fourth binary mask, and an area other than the foreground area and the uncertain area is used as a background area of the tripartite map.
具体地,对二值掩码进行自适应形态学处理,获取人像前景的三分图,是指通过对二值掩码进行自适应地膨胀和腐蚀来生成三分图,即根据图像的前景尺寸来计算适应的形态学参数,从而进行对人像或头发进行分区膨胀腐蚀。Specifically, performing adaptive morphological processing on the binary mask to obtain a triplicate map of the portrait foreground refers to generating a triplicate map by adaptively dilating and corroding the binary mask, that is, calculating adaptive morphological parameters according to the foreground size of the image, thereby performing partition dilation and corrosion on the portrait or hair.
首先通过对目标二值掩码做距离变换来获得距离图,以距离图的最大值作为人像尺寸参数。距离变换描述的是图像中像素点与某个区域块的距离,区域块中的像素点值为0,临近区域块的像素点就较小的值,距离区域块距离越远像素点值越大。First, we perform distance transformation on the target binary mask to obtain a distance map, and use the maximum value of the distance map as the portrait size parameter. Distance transformation describes the distance between a pixel in the image and a certain area block. The pixel value in the area block is 0, and the pixel value of the adjacent area block is smaller. The farther away from the area block, the larger the pixel value.
基于人像尺寸参数获取自适应参数,自适应参数具体包括区域膨胀参数和腐蚀参数。头部区域膨胀参数满足:Based on the portrait size parameters, adaptive parameters are obtained. The adaptive parameters specifically include regional expansion parameters and erosion parameters. The head region expansion parameters satisfy:
head_dilate_intersize=D*head_parameter/100head_dilate_intersize = D * head_parameter / 100
其中,head_dilate_intersize为可调节的头部区域膨胀参数,一般取3.5。Among them, head_dilate_intersize is an adjustable head area expansion parameter, which is generally set to 3.5.
身体区域膨胀参数满足:The body area expansion parameters satisfy:
body_dilate_intersize=D*body_parameter/100body_dilate_intersize = D * body_parameter / 100
其中,body_dilate_intersize为可调节的身体区域膨胀参数,一般取1.5。Among them, body_dilate_intersize is an adjustable body area expansion parameter, which is generally set to 1.5.
腐蚀参数满足:The corrosion parameters meet the following requirements:
erodeintersize=D*body_parameter/100erode intersize = D*body_parameter/100
进一步地,基于前述的自适应参数对头发和人像的二值掩码分别做膨胀和腐蚀处理,获得膨胀及腐蚀处理后的第四二值掩码。Furthermore, based on the aforementioned adaptive parameters, the binary masks of the hair and the portrait are respectively dilated and eroded to obtain a fourth binary mask after dilation and erosion.
将原始二值掩码作为三分图的前景区域,膨胀及腐蚀处理后的第四二值掩码减去原始二值掩码的结果作为三分图的不确定区域,其余像素作为三分图的背景区域。The original binary mask is used as the foreground area of the tripartite map, the result of subtracting the original binary mask from the fourth binary mask after dilation and erosion processing is used as the uncertain area of the tripartite map, and the remaining pixels are used as the background area of the tripartite map.
可选地,迭代过程中二值化处理规则如下:Optionally, the binarization processing rules during the iteration process are as follows:
其中,m为二值化后的人像掩码,α为(微调后的)不透明度图。由该人像掩码经膨胀和腐蚀操作生成三分图,自适应参数采用S102中获取的身体区域膨胀参数和腐蚀参数。Wherein, m is the binarized portrait mask, and α is the (fine-tuned) opacity map. The portrait mask is subjected to dilation and erosion operations to generate a trisection map, and the adaptive parameters use the body region dilation parameters and erosion parameters obtained in S102.
下面通过一个具体的示例对本申请实施例提供的技术方案进一步进行说明。The technical solution provided in the embodiment of the present application is further illustrated below through a specific example.
图3是本申请实施例提供的人像抠图过程中不透明度图的获取方法的流程示意图之三,如图3所示,该方法至少包括:FIG. 3 is a third flow chart of a method for obtaining an opacity map during a portrait cutout process provided in an embodiment of the present application. As shown in FIG. 3 , the method at least includes:
步骤a、获取待标注的图像;Step a, obtaining the image to be labeled;
步骤b、对图像分别进行人像分割和头发分割处理,获得有关人像和头发二值掩码;Step b, performing portrait segmentation and hair segmentation processing on the image respectively to obtain binary masks of the portrait and hair;
步骤c、对两个分割输出做形态学处理,生成人像前景的三分图;Step c: Perform morphological processing on the two segmentation outputs to generate a three-part image of the portrait foreground;
步骤d、标注人员对三分图进行适当微调(本步可省略);Step d: The annotator makes appropriate fine-tuning on the three-dimensional image (this step can be omitted);
步骤e、将三分图与原始图像一起输入人像抠图网络,获得算法预测的不透明度图;Step e: input the three-part image and the original image into the portrait cutout network to obtain the opacity map predicted by the algorithm;
步骤f、标注人员对不透明度图进行微调(本步可省略);Step f, the labeler fine-tunes the opacity map (this step can be omitted);
步骤g、将微调后的不透明度图做二值化,获得对应的二值掩码,进而生成对应的三分图,重复步骤e至步骤g至不透明度图满足基准真实数据精度,则标注完成。Step g: Binarize the fine-tuned opacity map to obtain the corresponding binary mask, and then generate the corresponding trisection map. Repeat steps e to g until the opacity map meets the benchmark real data accuracy, and the labeling is completed.
在本示例中,训练人像抠图网络模型所使用的训练数据集为人像图片样本。每个原始的人像图片可以被包含一个或多个人像前景。获取原始图像后,对其进行人像分割和头发分割处理,得到人像二值掩码及头发二值掩码;接下来,对两个二值掩码进行自适应形态学处理,及自适应的膨胀和腐蚀,获得人像前景的三分图;标注人员可以对该三分图进行适当的微调,改变像素的语义信息,并将其输入人像抠图网络预测不透明度图;标注人员同样可以微调不透明度图,或直接将预测的不透明度图作为先验信息重新输入人像抠图网络进行预测,重复执行步骤e至步骤g直至满足基准真实数据的精度要求。In this example, the training data set used to train the portrait cutout network model is a sample of portrait images. Each original portrait image may contain one or more portrait foregrounds. After obtaining the original image, portrait segmentation and hair segmentation are performed on it to obtain a portrait binary mask and a hair binary mask; next, the two binary masks are subjected to adaptive morphological processing, as well as adaptive dilation and erosion, to obtain a triplicate map of the portrait foreground; the annotator can make appropriate fine-tuning to the triplicate map, change the semantic information of the pixels, and input it into the portrait cutout network to predict the opacity map; the annotator can also fine-tune the opacity map, or directly re-input the predicted opacity map as prior information into the portrait cutout network for prediction, and repeat steps e to g until the accuracy requirements of the benchmark real data are met.
图4是本申请实施例提供的人像抠图过程中不透明度图的获取装置的结构示意图,如图4所示,该装置至少包括:FIG4 is a schematic diagram of the structure of a device for obtaining an opacity map in a portrait cutout process provided by an embodiment of the present application. As shown in FIG4 , the device at least includes:
原始图像获取单元401,用于获取待抠图的原始图像;The original image acquisition unit 401 is used to acquire the original image to be cut out;
分割和二值化处理单元402,用于对原始图像进行人像分割和二值化处理,获取第一二值掩码;对原始图像进行头发分割和二值化处理,获取第二二值掩码;The segmentation and binarization processing unit 402 is used to perform portrait segmentation and binarization processing on the original image to obtain a first binary mask; perform hair segmentation and binarization processing on the original image to obtain a second binary mask;
形态学处理单元403,用于对第一二值掩码和第二二值掩码进行自适应形态学处理,获取人像前景的初始三分图;A morphological processing unit 403 is used to perform adaptive morphological processing on the first binary mask and the second binary mask to obtain an initial trisection image of the portrait foreground;
迭代单元404,用于执行迭代过程,迭代过程包括:以初始三分图或上一次迭代过程获取的三分图作为先验信息,与原始图像共同输入至训练好的人像抠图网络模型,获取人像抠图网络模型输出的不透明度图;对不透明度图进行分割和二值化处理,获取第三二值掩码;对第三二值掩码进行自适应形态学处理,获取当前迭代过程的三分图;The iteration unit 404 is used to perform an iteration process, which includes: using the initial trisection image or the trisection image obtained in the previous iteration process as prior information, and inputting it together with the original image into the trained portrait cutout network model to obtain an opacity map output by the portrait cutout network model; segmenting and binarizing the opacity map to obtain a third binary mask; performing adaptive morphological processing on the third binary mask to obtain a trisection image of the current iteration process;
不透明度图输出单元405,用于当人像抠图网络模型输出的不透明度图满足基准真实数据标准或达到最大迭代次数时,输出最后一次迭代过程的不透明度图。The opacity map output unit 405 is used to output the opacity map of the last iteration process when the opacity map output by the portrait cutout network model meets the benchmark real data standard or reaches the maximum number of iterations.
在一些实施例中,人像抠图网络模型包括:编码器模块、解码器块模块、嵌接入编码器模块和解码器模块之间的桥接块和传播细化模块;迭代单元404具体用于:In some embodiments, the portrait cutout network model includes: an encoder module, a decoder block module, a bridge block embedded between the encoder module and the decoder module, and a propagation refinement module; the iteration unit 404 is specifically used to:
将先验信息和原始图像输入至编码器进行特征提取,获取第一输出,第一输出包括包括提取到的低级细节特征和高级语义特征;Inputting the prior information and the original image into the encoder for feature extraction to obtain a first output, where the first output includes the extracted low-level detail features and high-level semantic features;
将第一输出输入至桥接块进行编码器模块和解码器模块的跳跃链接,获取第二输出;Input the first output to the bridge block to perform a jump link between the encoder module and the decoder module to obtain the second output;
将第一输出和第二输出连接后输入至解码器进行上采样,输出初始不透明度图;The first output and the second output are connected and input to the decoder for upsampling, and an initial opacity map is output;
将原始图像和初始不透明度图输入至传播细化模块进行迭代细化,输出细化后的不透明度图。The original image and the initial opacity map are input into the propagation refinement module for iterative refinement, and the refined opacity map is output.
在一些实施例中,传播细化模块包括至少两个传播单元,每个传播单元包括两个ResBlock子单元和一个卷积LSTM子单元。In some embodiments, the propagation refinement module includes at least two propagation units, each propagation unit includes two ResBlock sub-units and one convolutional LSTM sub-unit.
在一些实施例中,该装置还包括用户输入接收单元,用于:In some embodiments, the device further comprises a user input receiving unit, configured to:
接收用户的第一输入,第一输入用于对作为先验信息的三分图进行微调;和/或,receiving a first input from a user, where the first input is used to fine-tune the tripartite map as prior information; and/or,
接收用户的第二输入,第二输入用于对人像抠图网络模型输出的不透明度图进行微调。A second input from the user is received, where the second input is used to fine-tune the opacity map output by the portrait cutout network model.
在一些实施例中,人像抠图网络模型的联合训练损失包括:In some embodiments, the joint training loss of the portrait matting network model includes:
预测损失,用于表征不确定区域真实不透明度图和预测不透明度之间的绝对差;Prediction loss, which is used to characterize the absolute difference between the true opacity map and the predicted opacity in the uncertain region;
拉普拉斯损失,用于表征不确定区域真实不透明度和预测不透明度的拉普拉斯金字塔之间的L1距离;Laplacian loss, which is used to characterize the L1 distance between the Laplacian pyramid of the true opacity and the predicted opacity in the uncertain region;
合成损失,用于表征原始图像与合成图像之间的绝对差异,合成图像基于预测不透明度、前景图像和背景图像生成。The synthesis loss is used to characterize the absolute difference between the original image and the synthesized image, which is generated based on the predicted opacity, foreground image and background image.
在一些实施例中,自适应形态学处理,包括:In some embodiments, adaptive morphological processing includes:
对目标二值掩码做距离变换获得距离图,以距离图的最大值作为人像尺寸参数;目标二值掩码为第一二值掩码、第二二值掩码或第三二值掩码;Performing distance transformation on the target binary mask to obtain a distance map, and taking the maximum value of the distance map as the portrait size parameter; the target binary mask is the first binary mask, the second binary mask or the third binary mask;
基于人像尺寸参数获取自适应参数,自适应参数包括区域膨胀参数和腐蚀参数;Acquire adaptive parameters based on the portrait size parameters, where the adaptive parameters include a regional expansion parameter and an erosion parameter;
基于自适应参数对目标二值掩码分别做膨胀和腐蚀处理,获得膨胀及腐蚀处理后的第四二值掩码;Based on the adaptive parameters, the target binary mask is respectively expanded and eroded to obtain a fourth binary mask after the expansion and erosion processing;
基于目标二值掩码确定三分图的前景区域,基于目标二值掩码和第四二值掩码确定三分图的不确定区域,除前景区域和不确定区域之外的区域作为三分图的背景区域。A foreground area of the tripartite map is determined based on the target binary mask, an uncertain area of the tripartite map is determined based on the target binary mask and the fourth binary mask, and an area other than the foreground area and the uncertain area is used as a background area of the tripartite map.
可以理解的是,上述各个单元/模块的详细功能实现可参见前述方法实施例中的介绍,在此不做赘述。It is understandable that the detailed functional implementation of each of the above-mentioned units/modules can be found in the introduction of the aforementioned method embodiment, and will not be repeated here.
应当理解的是,上述装置用于执行上述实施例中的方法,装置中相应的程序模块,其实现原理和技术效果与上述方法中的描述类似,该装置的工作过程可参考上述方法中的对应过程,此处不再赘述。It should be understood that the above-mentioned device is used to execute the method in the above-mentioned embodiment. The implementation principle and technical effect of the corresponding program module in the device are similar to those described in the above-mentioned method. The working process of the device can refer to the corresponding process in the above-mentioned method, which will not be repeated here.
基于上述实施例中的方法,本申请实施例提供了一种电子设备。该设备可以包括:至少一个用于存储程序的存储器和至少一个用于执行存储器存储的程序的处理器。其中,当存储器存储的程序被执行时,处理器用于执行上述实施例中所描述的方法。Based on the method in the above embodiment, an embodiment of the present application provides an electronic device. The device may include: at least one memory for storing programs and at least one processor for executing the programs stored in the memory. When the program stored in the memory is executed, the processor is used to execute the method described in the above embodiment.
图5是本申请实施例提供的电子设备的结构示意图,如图5所示,该电子设备可以包括:处理器(processor)501、通信接口(Communications Interface)520、存储器(memory)503和通信总线504,其中,处理器501,通信接口502,存储器503通过通信总线504完成相互间的通信。处理器501可以调用存储器503中的软件指令,以执行上述实施例中所描述的方法。FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application. As shown in FIG5 , the electronic device may include: a processor 501, a communication interface 520, a memory 503, and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 communicate with each other through the communication bus 504. The processor 501 may call the software instructions in the memory 503 to execute the method described in the above embodiment.
此外,上述的存储器503中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。In addition, the logic instructions in the above-mentioned memory 503 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present application, or the part that contributes to the relevant technology or the part of the technical solution, can be embodied in the form of a software product, which is stored in a storage medium and includes a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of each embodiment of the present application.
基于上述实施例中的方法,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,当计算机程序在处理器上运行时,使得处理器执行上述实施例中的方法。Based on the method in the above embodiment, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program runs on a processor, the processor executes the method in the above embodiment.
基于上述实施例中的方法,本申请实施例提供了一种计算机程序产品,当计算机程序产品在处理器上运行时,使得处理器执行上述实施例中的方法。Based on the method in the above embodiment, an embodiment of the present application provides a computer program product. When the computer program product runs on a processor, the processor executes the method in the above embodiment.
可以理解的是,本申请实施例中的处理器可以是中央处理单元(CentralProcessing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital SignalProcessor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。It is understandable that the processor in the embodiment of the present application may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. A general-purpose processor may be a microprocessor or any conventional processor.
本申请实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。The method steps in the embodiments of the present application can be implemented by hardware or by a processor executing software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disks, mobile hard disks, CD-ROMs, or any other form of storage medium known in the art. An exemplary storage medium is coupled to a processor so that the processor can read information from the storage medium and can write information to the storage medium. Of course, the storage medium can also be a component of the processor. The processor and the storage medium can be located in an ASIC.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者通过计算机可读存储介质进行传输。计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loading and executing computer program instructions on a computer, the process or function according to the embodiment of the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions can be stored in a computer-readable storage medium or transmitted by a computer-readable storage medium. The computer instructions can be transmitted from a website site, a computer, a server or a data center to another website site, a computer, a server or a data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode. The computer-readable storage medium can be any available medium that a computer can access or a data storage device such as a server, a data center, etc. that contains one or more available media integrated. The available medium can be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)), etc.
可以理解的是,在本申请实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。It should be understood that the various numerical numbers involved in the embodiments of the present application are only used for the convenience of description and are not used to limit the scope of the embodiments of the present application.
本领域的技术人员容易理解,以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本申请的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present application and is not intended to limit the present application. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application shall be included in the scope of protection of the present application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410250379.5A CN118229724A (en) | 2024-03-05 | 2024-03-05 | Method and device for acquiring opacity graph in portrait matting process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410250379.5A CN118229724A (en) | 2024-03-05 | 2024-03-05 | Method and device for acquiring opacity graph in portrait matting process |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118229724A true CN118229724A (en) | 2024-06-21 |
Family
ID=91509446
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410250379.5A Pending CN118229724A (en) | 2024-03-05 | 2024-03-05 | Method and device for acquiring opacity graph in portrait matting process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118229724A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118521602A (en) * | 2024-07-22 | 2024-08-20 | 电子科技大学中山学院 | Matting processing method, program product, electronic equipment and storage medium |
-
2024
- 2024-03-05 CN CN202410250379.5A patent/CN118229724A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118521602A (en) * | 2024-07-22 | 2024-08-20 | 电子科技大学中山学院 | Matting processing method, program product, electronic equipment and storage medium |
CN118521602B (en) * | 2024-07-22 | 2024-10-29 | 电子科技大学中山学院 | A cutout processing method, program product, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018108129A1 (en) | Method and apparatus for use in identifying object type, and electronic device | |
CN111161277B (en) | Natural image matting method based on deep learning | |
JP7110493B2 (en) | Deep model training method and its device, electronic device and storage medium | |
WO2022127454A1 (en) | Method and device for training cutout model and for cutout, equipment, and storage medium | |
CN108376244B (en) | A method for identifying text fonts in natural scene pictures | |
CN113658150B (en) | An automatic chromosome segmentation and classification method based on deep learning | |
CN108335303B (en) | A multi-scale palm bone segmentation method applied to palm X-ray films | |
CN111161279B (en) | Medical image segmentation method, device and server | |
CN112183541B (en) | Contour extraction method and device, electronic equipment and storage medium | |
CN108665454A (en) | A kind of endoscopic image intelligent classification and irregular lesion region detection method | |
WO2022148415A1 (en) | Coronary vessel segmentation method and apparatus, device, and computer readable storage medium | |
CN111860353A (en) | Video behavior prediction method, device and medium based on dual-stream neural network | |
CN113313700B (en) | X-ray image interactive segmentation method based on deep learning | |
CN118229724A (en) | Method and device for acquiring opacity graph in portrait matting process | |
CN112734778B (en) | Vehicle cutout method, system, device and storage medium based on neural network | |
CN114926849A (en) | Text detection method, device, equipment and storage medium | |
CN108717436B (en) | A Fast Retrieval Method for Commodity Objects Based on Saliency Detection | |
CN116188474A (en) | Three-level Lymphatic Structure Recognition Method and System Based on Image Semantic Segmentation | |
CN115861255A (en) | Model training method, device, equipment, medium and product for image processing | |
Tang et al. | A scene-text synthesis engine achieved through learning from decomposed real-world data | |
CN118229651A (en) | Tumor tertiary lymphoid structure identification, positioning and analysis method | |
CN115546149B (en) | Liver segmentation method and device, electronic equipment and storage medium | |
CN115830316A (en) | Nematode image segmentation method and system based on deep learning and iterative feature fusion | |
CN114862747A (en) | A Fully Convolutional Refinement Network Gland Image Segmentation Method Based on Information Completion | |
CN111815689A (en) | Semi-automatic labeling method, equipment, medium and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |