CN110874575A

CN110874575A - A face image processing method and related equipment

Info

Publication number: CN110874575A
Application number: CN201911061482.0A
Authority: CN
Inventors: 郭晓杰; 杨洋; 马林
Original assignee: Tianjin University; Tencent Technology Shenzhen Co Ltd
Current assignee: Tianjin University; Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-01
Filing date: 2019-11-01
Publication date: 2020-03-10

Abstract

The application provides a face image processing method and related equipment, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring a full-face feature point set of a face image to be processed; adjusting corresponding feature points in the full-face feature point set according to the face feature adjustment information to obtain an adjusted full-face feature point set; inputting the adjusted full-face feature point set and the face image to be processed into the trained image completion model, and obtaining the full-face image in which the undisplayed face region in the face image to be processed is subjected to completion processing. The method completes the face image to be processed based on the full-face feature point set, realizes more accurate face completion, and realizes more abundant and accurate face image completion by accurately adjusting the full-face feature point set.

Description

A face image processing method and related equipment

技术领域technical field

本申请涉及人工智能技术领域，尤其涉及一种脸部图像处理方法及相关设备。The present application relates to the technical field of artificial intelligence, and in particular, to a facial image processing method and related equipment.

背景技术Background technique

随着科学技术的不断发展，图像补全技术备受青睐。人脸图像补全技术是指将包含残缺区域的人脸图像进行补全，得到完整的人脸图像。通常是提取包含残缺区域的人脸图像中的特征图，再根据包含残缺区域的人脸图像中的特征图，预测残缺人脸图像，最后获得补全后的人脸图像。目前的人脸补全方法通常是提取残缺人脸图像的层级化特征图像，对残缺人脸图像进行非线性地表达和补全。With the continuous development of science and technology, image completion technology is very popular. The face image completion technology refers to the completion of the face image containing the incomplete area to obtain a complete face image. Usually, the feature map in the face image containing the incomplete area is extracted, and then the incomplete face image is predicted according to the feature map in the face image containing the incomplete area, and finally the completed face image is obtained. The current face completion method usually extracts the hierarchical feature image of the incomplete face image, and expresses and completes the incomplete face image nonlinearly.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供一种脸部图像处理方法及相关设备，用于提供一种基于特征点实现脸部图像处理方法，可以根据需要对脸部特征进行调整，从而获得相应的全脸图像。Embodiments of the present application provide a facial image processing method and related equipment, which are used to provide a facial image processing method based on feature points, which can adjust facial features as needed to obtain a corresponding full-face image.

第一方面，提供一种脸部图像处理方法，包括：In a first aspect, a method for processing a face image is provided, including:

获得待处理脸部图像，所述待处理脸部图像包括部分脸部未显示区域；obtaining a face image to be processed, where the face image to be processed includes a part of the face that is not displayed;

将所述待处理脸部图像输入已训练的脸部特征点预测模型，获得所述待处理脸部图像的全脸特征点集；Inputting the facial image to be processed into a trained facial feature point prediction model to obtain a full-face feature point set of the facial image to be processed;

根据脸部特征调整信息，对所述全脸特征点集中相应的特征点进行调整，获得调整后的全脸特征点集；According to the facial feature adjustment information, adjusting the corresponding feature points in the full-face feature point set to obtain an adjusted full-face feature point set;

将调整后的全脸特征点集，以及所述待处理脸部图像输入已训练的图像补全模型，获得将所述待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。Input the adjusted full-face feature point set and the to-be-processed face image into the trained image completion model to obtain a full-face that completes the undisplayed area of the face in the to-be-processed face image image.

第二方面，提供一种脸部图像处理方法，包括：In a second aspect, a method for processing a face image is provided, including:

响应在脸部图像补全界面上的脸部图像补全输入操作，向服务器发送补全请求，以使所述服务器对根据如第一方面中任一所述的方法对待处理脸部图像和脸部特征调整信息进行处理，得到脸部图像补全后的全脸图像；其中，所述补全请求包括所述待处理脸部图像和所述脸部特征调整信息；In response to the facial image completion input operation on the facial image completion interface, a completion request is sent to the server, so that the server can respond to the facial image and face to be processed according to the method according to any one of the first aspects. The facial feature adjustment information is processed to obtain a full-face image after the facial image is completed; wherein, the completion request includes the to-be-processed facial image and the facial feature adjustment information;

获取并显示所述服务器返回的所述全脸图像。Acquire and display the full-face image returned by the server.

第三方面，提供一种脸部图像处理装置，包括：In a third aspect, a facial image processing apparatus is provided, including:

收发模块，用于获得待处理脸部图像，所述待处理脸部图像包括部分脸部未显示区域；a transceiver module, configured to obtain a face image to be processed, where the face image to be processed includes a part of the face that is not displayed;

处理模块，用于将所述待处理脸部图像输入已训练的脸部特征点预测模型，获得所述待处理脸部图像的全脸特征点集；以及，根据脸部特征调整信息，对所述全脸特征点集中相应的特征点进行调整，获得调整后的全脸特征点集；以及，将调整后的全脸特征点集，以及所述待处理脸部图像输入已训练的图像补全模型，获得将所述待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。The processing module is used for inputting the face image to be processed into the trained facial feature point prediction model to obtain the full face feature point set of the face image to be processed; The corresponding feature points in the full-face feature point set are adjusted to obtain the adjusted full-face feature point set; The model obtains a full-face image obtained by performing complement processing on the undisplayed area of the face in the to-be-processed face image.

在一种可能的实施例中，所述脸部特征点预测模型包括第一编码模块、多个第二编码模块和全连接模块，所述处理模块具体用于：In a possible embodiment, the facial feature point prediction model includes a first encoding module, a plurality of second encoding modules and a fully connected module, and the processing module is specifically used for:

通过所述第一编码模块，对所述待处理脸部图像进行多种尺度下的特征提取，获得第一特征图集；Through the first encoding module, feature extraction is performed on the face image to be processed under various scales to obtain a first feature atlas;

分别通过所述多个第二编码模块中各个编码模块，对所述第一特征图集进行卷积池化处理，获得第二特征值集；Perform convolution pooling processing on the first feature map set through each coding module in the plurality of second coding modules respectively to obtain a second feature value set;

通过所述全连接模块，对多个第二特征值集进行拼接，获得全脸特征点集。Through the full connection module, a plurality of second feature value sets are spliced to obtain a full face feature point set.

在一种可能的实施例中，所述脸部特征调整信息为表情分类标签，所述处理模块具体用于：In a possible embodiment, the facial feature adjustment information is an expression classification label, and the processing module is specifically configured to:

将所述表情分类标签，以及全脸特征点集输入脸部调整模型，获得调整后的全脸特征点集；Inputting the facial expression classification label and the full-face feature point set into the face adjustment model to obtain the adjusted full-face feature point set;

其中，所述脸部调整模型是根据样本脸部表情图像集，所述样本脸部表情图像集中每个样本脸部表情图像标注对应的表情分类标签。The face adjustment model is based on a sample facial expression image set, and each sample facial expression image in the sample facial expression image set is marked with a corresponding expression classification label.

在一种可能的实施例中，所述处理模块具体用于：In a possible embodiment, the processing module is specifically used for:

提取所述脸部特征调整信息对应的脸部轮廓特征；其中，所述脸部轮廓特征用于表示脸部关键点构成的轮廓；Extracting the facial contour feature corresponding to the facial feature adjustment information; wherein, the facial contour feature is used to represent the contour formed by the facial key points;

根据所述脸部轮廓特征，对所述全脸特征点集中对应特征点的坐标进行调整，获得调整后的全脸特征点集。According to the facial contour feature, the coordinates of the corresponding feature points in the full-face feature point set are adjusted to obtain an adjusted full-face feature point set.

在一种可能的实施例中，所述图像补全模型包括第三编码模块，空洞卷积模块和解码模块，所述处理模块具体用于：In a possible embodiment, the image completion model includes a third encoding module, a hole convolution module and a decoding module, and the processing module is specifically used for:

将所述调整后的全脸特征点集转换为脸部特征点图；converting the adjusted full-face feature point set into a facial feature point map;

通过所述第三编码模块，对所述脸部特征点图，以及所述待处理脸部图像进行不同尺度下的卷积处理，获得第三特征图集；其中，所述第三特征图集包括卷积处理后特征图集，所述卷积处理后特征图集所述第三编码模块是对所述脸部特征点图，以及所述待处理脸部图像进行不同尺度下的卷积处理过程中，输出的最后卷积特征图；Through the third encoding module, the face feature point map and the face image to be processed are subjected to convolution processing at different scales to obtain a third feature atlas; wherein, the third feature atlas Including a feature map set after convolution processing, the third encoding module of the feature map set after convolution processing is to perform convolution processing at different scales on the facial feature point map and the face image to be processed In the process, the final convolution feature map of the output;

通过所述空洞卷积模块中的残差单元，对所述卷积处理后特征图集进行空洞卷积，获得第四特征图集；Through the residual unit in the hole convolution module, perform hole convolution on the feature atlas after the convolution processing to obtain a fourth feature atlas;

通过所述空洞卷积模块中的注意力单元，提取所述第四特征图集，以及所述卷积处理后特征图集的局部特征，获得第五特征图集；Extracting the fourth feature atlas and the local features of the feature atlas after the convolution processing through the attention unit in the hole convolution module to obtain the fifth feature atlas;

通过所述解码模块对所述第五特征图集，以及所述第三特征图集进行上采样处理，获得将所述待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。Perform upsampling processing on the fifth feature atlas and the third feature atlas by the decoding module, so as to obtain a full face obtained by performing complement processing on the undisplayed area of the face in the face image to be processed image.

在一种可能的实施例中，所述第三特征图集还包括第一中间特征图集和第二中间特征图集，所述第一中间特征图集是对所述脸部特征点图，以及所述待处理脸部图像依次进行预设次卷积处理获得的中间卷积结果，所述第二中间特征图集是对所述第一中间特征图集进行预设次卷积处理获得的中间卷积结果，所述处理模块具体用于：In a possible embodiment, the third feature atlas further includes a first intermediate feature atlas and a second intermediate feature atlas, the first intermediate feature atlas is a map of the facial feature points, and the intermediate convolution results obtained by sequentially performing preset sub-convolution processing on the face images to be processed, and the second intermediate feature atlas is obtained by performing preset sub-convolution processing on the first intermediate feature atlas. Intermediate convolution results, the processing module is specifically used for:

通过所述解码模块对所述第五特征图集进行上采样处理，获得第六特征图集；Perform up-sampling processing on the fifth feature atlas by the decoding module to obtain a sixth feature atlas;

通过所述解码模块对所述第二中间特征图集，以及所述第六特征图集进行加权处理，获得第七特征图集；Perform weighting processing on the second intermediate feature atlas and the sixth feature atlas by the decoding module to obtain a seventh feature atlas;

通过所述解码模块对所述第七特征图集，以及所述第一中间特征图集进行加权和上采样处理，获得第八特征图集；Perform weighting and upsampling processing on the seventh feature atlas and the first intermediate feature atlas by the decoding module to obtain an eighth feature atlas;

通过所述解码模块对所述第八特征图集进行卷积处理，获得将所述待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。Perform convolution processing on the eighth feature atlas by the decoding module to obtain a full-face image obtained by performing complement processing on the undisplayed area of the face in the face image to be processed.

在一种可能的实施例中，所述脸部特征点预测模型是通过如下步骤训练得到的：In a possible embodiment, the facial feature point prediction model is obtained by training through the following steps:

获取第一样本数据集；其中，所述第一样本数据集中包括所述样本待处理脸部图像集，所述与每个样本待处理脸部图像对应的真值样本全脸特征点集；Obtain a first sample data set; wherein, the first sample data set includes the sample face image set to be processed, and the true-value sample full-face feature point set corresponding to each sample face image to be processed ;

基于所述第一样本数据集，训练脸部特征点预测模型；Based on the first sample data set, train a facial feature point prediction model;

直到脸部特征点预测模型的预测损失满足第一预设条件，获得已训练的脸部特征点预测模型；其中，所述预测损失用于表示预测样本全脸特征点集和真值样本全脸特征点集之间的损失。Until the prediction loss of the facial feature point prediction model satisfies the first preset condition, the trained facial feature point prediction model is obtained; wherein, the prediction loss is used to represent the full face feature point set of the predicted sample and the full face of the true value sample loss between feature point sets.

在一种可能的实施例中，所述图像补全模型包括生成子模型和判别子模型，所述图像补全模型的损失根据逐像素损失、感知损失、风格损失、全微分损失以及对抗损失加权得到的；In a possible embodiment, the image completion model includes a generative sub-model and a discriminative sub-model, and the loss of the image completion model is weighted according to pixel-wise loss, perceptual loss, style loss, total differential loss and adversarial loss owned;

其中，所述逐像素损失用于表示补全后的样本全脸图像和样本待处理脸部图像的像素值误差；所述感知损失用于表示补全后的样本全脸图像的特征图和样本待处理脸部图像的特征图之间的误差；所述风格损失用于表示补全后的样本全脸图像加遮挡区域后的图像，和样本待处理脸部图像加所述遮挡区域后的图像的特征图的各通道信息向量化后通道间的格雷姆矩阵差值之和；全微分损失用于表示补全后的样本全脸图像横向、纵向导数之和与样本待处理脸部图像包括的像素点的总数之间的比值；所述生成损失用于表示所述判别子模型对所述生成子模型的生成结果来源的判别结果与生成结果的真实来源的损失。Wherein, the pixel-by-pixel loss is used to represent the pixel value error of the sample full-face image after completion and the sample face image to be processed; the perceptual loss is used to represent the feature map and sample of the sample full-face image after completion The error between the feature maps of the to-be-processed face image; the style loss is used to represent the completed sample full-face image plus the occluded area image, and the sample to-be-processed face image plus the occlusion area. The information of each channel of the feature map is the sum of the Gramm matrix differences between the channels after quantization; the full differential loss is used to represent the sum of the horizontal and vertical derivatives of the completed sample full face image and the sample face image to be processed. The ratio between the total number of pixel points; the generation loss is used to represent the loss of the discriminant sub-model for the source of the generation result of the generation sub-model and the loss of the real source of the generation result.

第四方面，提供一种终端设备，包括：In a fourth aspect, a terminal device is provided, including:

发送模块，用于响应在脸部图像补全界面上的脸部图像补全输入操作，向服务器发送补全请求，以使所述服务器对根据如第一方面中任一所述的方法对待处理脸部图像和脸部特征调整信息进行处理，得到脸部图像补全后的全脸图像；其中，所述补全请求包括所述待处理脸部图像和所述脸部特征调整信息；以及获取所述服务器返回的所述全脸图像；A sending module, configured to send a completion request to the server in response to a face image completion input operation on the face image completion interface, so that the server is to be processed according to the method described in any one of the first aspects The facial image and the facial feature adjustment information are processed to obtain a full-face image after the facial image is completed; wherein, the completion request includes the to-be-processed facial image and the facial feature adjustment information; and obtaining the full-face image returned by the server;

显示模块，用于显示所述全脸图像。The display module is used for displaying the full face image.

第五方面，提供一种脸部图像处理设备，包括：In a fifth aspect, a facial image processing device is provided, including:

至少一个处理器，以及at least one processor, and

与所述至少一个处理器通信连接的存储器；a memory communicatively coupled to the at least one processor;

其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述至少一个处理器通过执行所述存储器存储的指令实现如第一方面或第二方面中任一项所述的方法。Wherein, the memory stores instructions executable by the at least one processor, and the at least one processor implements the method according to any one of the first aspect or the second aspect by executing the instructions stored in the memory .

第六方面，提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，当所述计算机指令在计算机上运行时，使得计算机执行如第一方面或第二方面中任一项所述的方法。In a sixth aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores computer instructions that, when the computer instructions are executed on a computer, cause the computer to perform any one of the first aspect or the second aspect. method described in item.

由于本申请实施例采用上述技术方案，至少具有如下技术效果：Because the embodiment of the present application adopts the above technical solution, it has at least the following technical effects:

本申请实施例中，通过获得待处理脸部图像的全脸特征点集，实现基于特征点进行的脸部补全方法，以及基于对全脸特征点集的精确修改，以获得更加准确地调整后的全脸图像。且，由于无需用户手动对全脸特征点集进行调整，相对可以提高调整全脸特征点集的效率，进而提高脸部图像处理的效率。进一步的，可以根据需要设定不同的脸部特征调整信息，进而生成不同的全脸图像，以满足不同用户的个性化需求，提升用户进行脸部补全的趣味性。通过调整脸部特征点，可以对图像的姿态、表情以及面部轮廓等各个方面进行修改，实现了一种可调性图像补全方案。In the embodiment of the present application, by obtaining the full-face feature point set of the face image to be processed, a face completion method based on feature points is implemented, and based on the accurate modification of the full-face feature point set, so as to obtain a more accurate adjustment Full face image after. Moreover, since the user does not need to manually adjust the full-face feature point set, the efficiency of adjusting the full-face feature point set can be relatively improved, thereby improving the efficiency of facial image processing. Further, different facial feature adjustment information can be set as required, and then different full-face images can be generated to meet the individual needs of different users and improve the fun of the user's face completion. By adjusting the facial feature points, the pose, expression and facial contour of the image can be modified, and an adjustable image completion scheme is realized.

附图说明Description of drawings

图1为本申请实施例提供的一种脸部图像处理设备的结构示意图；FIG. 1 is a schematic structural diagram of a facial image processing device according to an embodiment of the present application;

图2为本申请实施例提供的一种脸部图像处理方法的应用场景示意图；FIG. 2 is a schematic diagram of an application scenario of a face image processing method provided by an embodiment of the present application;

图3为本申请实施例提供的一种脸部图像处理方法的原理示意图；3 is a schematic diagram of the principle of a facial image processing method provided by an embodiment of the present application;

图4为本申请实施例提供的各个模型的分布示意图；4 is a schematic diagram of the distribution of each model provided by the embodiment of the present application;

图5为本申请实施例提供的两种添加遮挡方式所生成的样本待处理脸部图像；FIG. 5 is a sample face image to be processed generated by two occlusion methods provided by the embodiment of the present application;

图6为本申请实施例提供的一种调整全脸特征点集过程示例图；6 is an example diagram of a process for adjusting a full-face feature point set provided by an embodiment of the present application;

图7为本申请实施例提供的一种调整全脸特征点集的过程示意图；7 is a schematic diagram of a process for adjusting a full-face feature point set provided by an embodiment of the present application;

图8为本申请实施例提供的一种脸部图像处理方法的交互示意图；FIG. 8 is an interactive schematic diagram of a facial image processing method provided by an embodiment of the present application;

图9为本申请实施例提供的一种脸部图像补全界面的示意图；9 is a schematic diagram of a face image completion interface provided by an embodiment of the present application;

图10为本申请实施例提供的一种在原图上增加遮挡以生成待处理图像的过程示意图；FIG. 10 is a schematic diagram of a process of adding occlusion to an original image to generate an image to be processed according to an embodiment of the present application;

图11为本申请实施例提供的一种补全获得全脸图像过程中的示例图一；FIG. 11 is an example diagram 1 of a process of obtaining a full-face image by complementing provided by an embodiment of the application;

图12为本申请实施例提供的一种补全获得全脸图像过程中的示例图二；FIG. 12 is an example FIG. 2 in the process of obtaining a full-face image by complementing provided by an embodiment of the present application;

图13为本申请实施例提供的一种补全获得全脸图像过程中的示例图三；FIG. 13 is an example FIG. 3 in the process of obtaining a full-face image by complementing provided by an embodiment of the present application;

图14为本申请实施例提供的一种补全获得全脸图像过程中的示例图四；FIG. 14 is an example FIG. 4 in the process of obtaining a full-face image by complementing provided by an embodiment of the present application;

图15为本申请实施例提供的一种补全获得全脸图像过程中的示例图五；FIG. 15 is an example FIG. 5 in the process of obtaining a full-face image by complementing provided by an embodiment of the present application;

图16为本申请实施例提供的一种补全获得全脸图像过程中的示例图六；FIG. 16 is an example FIG. 6 in the process of obtaining a full-face image by complementing provided by an embodiment of the present application;

图17为本申请实施例提供的一种显示全脸图像的界面示意图一；17 is a schematic diagram 1 of an interface for displaying a full-face image provided by an embodiment of the present application;

图18为本申请实施例提供的一种显示全脸图像的界面示意图二；18 is a schematic diagram 2 of an interface for displaying a full-face image provided by an embodiment of the present application;

图19为本申请实施例提供的一种图像处理装置的结构示意图；FIG. 19 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the application;

图20为本申请实施例提供的一种终端设备的结构示意图一；FIG. 20 is a schematic structural diagram 1 of a terminal device according to an embodiment of the present application;

图21为本申请实施例提供的一种终端设备的结构示意图二。FIG. 21 is a second schematic structural diagram of a terminal device according to an embodiment of the application.

具体实施方式Detailed ways

为了更好的理解本申请实施例提供的技术方案，下面将结合说明书附图以及具体的实施方式进行详细的说明。In order to better understand the technical solutions provided by the embodiments of the present application, detailed descriptions will be given below with reference to the accompanying drawings and specific implementation manners.

为了便于本领域技术人员更好地理解本申请的技术方案，下面对本申请涉及的专业术语进行介绍。In order to facilitate those skilled in the art to better understand the technical solutions of the present application, the technical terms involved in the present application are introduced below.

人工智能(Artificial Intelligence，AI)：是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI): It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

计算机视觉技术(Computer Vision,CV)：计算机视觉是一门研究如何使机器“看”的科学，更进一步的说，就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术，还包括常见的人脸识别、指纹识别等生物特征识别技术。Computer Vision (CV): Computer vision is a science that studies how to make machines "see". Further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure objects. , and further do graphic processing, so that the computer processing becomes a more suitable image for human eye observation or transmission to the instrument for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous localization and mapping It also includes common biometric identification technologies such as face recognition and fingerprint recognition.

脸部图像补全：图像补全是指对不完整的脸部图像进行补全以及编辑脸部属性，生成的全脸图像既可以与原始全脸图像一样精确，也可以与完整脸部图像在内容上保持一致，以使补全的全脸图像看起来具有真实的视觉感受。编辑脸部属性例如对脸部表情、脸部姿态和脸部轮廓中的一种或几种进行编辑。脸部图像补全可以应用于用户日常的照片处理中，也可以应用于安防领域等。Face image completion: Image completion refers to the completion of incomplete face images and editing of face attributes. The generated full-face image can be as accurate as the original full-face image, or it can be matched with the complete face image. Consistent in content, so that the completed full-face image looks realistic. Editing facial attributes, for example, edits one or more of facial expressions, facial postures, and facial contours. Facial image completion can be applied to the user's daily photo processing, as well as to the security field.

待处理脸部图像：是指该脸部图像包括部分没有显示的脸部区域，例如脸部具有部分残缺，或脸部具有遮挡等等。本申请中所指的脸部包括但不限于人脸或动物脸部等。To-be-processed face image: means that the face image includes a part of the face area that is not displayed, for example, the face has partial defects, or the face has occlusions, and so on. The face referred to in this application includes, but is not limited to, a human face or an animal face.

脸部特征调整信息：用于调整全脸特征点集中各个特征点的坐标的信息，可以用于脸部的轮廓、表情和姿态等。脸部特征调整信息的格式可以是坐标点，也可以是图像，也可以是简笔画等等。Facial feature adjustment information: the information used to adjust the coordinates of each feature point in the full-face feature point set, which can be used for the outline, expression, and posture of the face. The format of the facial feature adjustment information may be a coordinate point, an image, a sketch, and the like.

需要说明的是，本申请中，脸部是指人脸或者动物脸，本申请主要以人脸为例进行详细说明，而对于动物脸部也是可以依据同样的原理实现补全处理的。It should be noted that, in this application, a face refers to a human face or an animal face. This application mainly takes a human face as an example for detailed description, and the animal face can also be complemented according to the same principle.

设计思想design thinking

在拍摄环境光线条件差，或图像受到人为或自然的损害等各种情况下，可能造成脸部图像不完整。例如在夜晚摄像头拍摄不法分子时，未拍摄到完整的脸部图像，此时可以采用脸部补全方法对该脸部图像进行处理。In various situations, such as poor lighting conditions in the shooting environment, or the image is damaged by man or nature, the face image may be incomplete. For example, when a camera shoots a criminal at night, a complete face image is not captured. At this time, a face completion method can be used to process the face image.

目前，脸部图像方法一般是通过非线性地表达及补全脸部图像，无法实现对待处理脸部图像的精确补全以及精确修改等。At present, the facial image method generally expresses and complements the facial image in a non-linear manner, and cannot achieve accurate complementing and accurate modification of the facial image to be processed.

鉴于此，本申请发明人提供一种脸部图像处理方法，先预测待处理脸部图像的全脸特征点集，根据脸部特征调整信息，对全脸特征点集进行调整，获得调整后的全脸特征点集，再基于调整后的全脸特征点集，以及待处理脸部图像，实现对待处理脸部图像的补全。由于本申请实施例中是提取全脸特征点集，也就是说获得了用于精确量化脸部的信息，有利于精确地补全脸部图像，且，便于更精确地对全脸特征点集进行修改，在修改全脸特征点集之后，补全后的全脸图像也有所不同，丰富了补全后的全脸图像，满足了不同用户对全脸补全的不同需求。In view of this, the inventor of the present application provides a facial image processing method, which first predicts the full-face feature point set of the face image to be processed, adjusts the full-face feature point set according to the facial feature adjustment information, and obtains the adjusted facial feature point set. The full-face feature point set is based on the adjusted full-face feature point set and the face image to be processed to complete the face image to be processed. Because the whole-face feature point set is extracted in the embodiment of the present application, that is to say, the information for accurately quantifying the face is obtained, which is beneficial to accurately complete the face image, and facilitates the more accurate analysis of the full-face feature point set. After modifying the full-face feature point set, the completed full-face image is also different, which enriches the completed full-face image and meets the different needs of different users for full-face completion.

本申请发明人进一步考虑，如果是手动对全脸特征点集进行调整，调整效率低，且无法保证调整后的全脸特征点集生成的全脸图像的准确性。因此本申请发明人考虑引入脸部调整模型，用户只需选择对应的表情的分类标签，脸部调整模型就能实现对全脸特征点集的精确调整。The inventor of the present application further considers that if the full-face feature point set is adjusted manually, the adjustment efficiency is low, and the accuracy of the full-face image generated by the adjusted full-face feature point set cannot be guaranteed. Therefore, the inventor of the present application considers introducing a face adjustment model, and the user only needs to select the classification label of the corresponding expression, and the face adjustment model can accurately adjust the feature point set of the whole face.

应用场景示例Application Scenario Example

请参照图1，表示执行本申请实施例中的脸部图像处理方法的脸部图像处理设备的结构示意图，该脸部图像处理设备100包括一个或多个输入设备101、一个或多个处理器102、一个或多个存储器103和一个或多个输出设备104。Please refer to FIG. 1 , which shows a schematic structural diagram of a facial image processing device that executes the facial image processing method in the embodiment of the present application. The facial image processing device 100 includes one or more input devices 101 and one or more processors. 102 , one or more memories 103 and one or more output devices 104 .

输入设备101用于提供输入接口，以获取外界设备/用户输入的待处理脸部图像等。在获得输入待处理脸部图像之后，输入设备101将该待处理脸部图像发送给处理器102，处理器102利用存储器103中存储的程序指令，实现对待处理脸部图像的补全过程，获得补全后的全脸图像。通过输出设备104输出全脸图像。The input device 101 is used to provide an input interface to obtain the facial image to be processed and the like input by the external device/user. After obtaining the input facial image to be processed, the input device 101 sends the facial image to be processed to the processor 102, and the processor 102 uses the program instructions stored in the memory 103 to implement the completion process of the facial image to be processed, obtaining Completed full face image. The full face image is output through the output device 104 .

其中，输入设备101可以包括但不限于物理键盘、功能键、轨迹球、鼠标、触摸屏、操作杆等中的一种或多种。处理器102可以是一个中央处理单元(central processing unit，CPU)，或者为数字处理单元等。存储器103可以是易失性存储器(volatile memory)，例如随机存取存储器(random-access memory，RAM)；存储器103也可以是非易失性存储器(non-volatile memory)，例如只读存储器，快闪存储器(flash memory)，硬盘(hard diskdrive，HDD)或固态硬盘(solid-state drive，SSD)、或者存储器103是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。存储器103可以是上述存储器的组合。输出设备104例如显示器、扬声器和打印机等等。The input device 101 may include, but is not limited to, one or more of a physical keyboard, a function key, a trackball, a mouse, a touch screen, an operation stick, and the like. The processor 102 may be a central processing unit (central processing unit, CPU), or a digital processing unit or the like. The memory 103 may be a volatile memory (volatile memory), such as random-access memory (RAM); the memory 103 may also be a non-volatile memory (non-volatile memory), such as read-only memory, flash memory Flash memory, hard disk drive (HDD) or solid-state drive (SSD), or memory 103 is capable of carrying or storing desired program code in the form of instructions or data structures and capable of being used by a computer Access any other medium without limitation. The memory 103 may be a combination of the above-mentioned memories. Output devices 104 are, for example, displays, speakers, and printers, among others.

在可能的实施例中，脸部图像处理设备100可以是用户端设备，也可以是服务端设备。用户端设备可以是移动终端、固定终端或便携式终端，例如移动手机、站点、单元、设备、多媒体计算机、多媒体平板、互联网节点、通信器、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、个人通信系统(PCS)设备、个人导航设备、个人数字助理(PDA)、音频/视频播放器、数码相机/摄像机、定位设备、电视接收器、无线电广播接收器、电子书设备、游戏设备或者其任意组合，包括这些设备的配件和外设或者其任意组合。还可预见到的是，脸部图像处理设备100能够支持任意类型的针对用户的接口(例如可穿戴设备)等。服务端设备可以是各种服务提供的服务器、大型计算设备等。服务器可以是一个或多个服务器。服务器也可以是实体服务器或虚拟服务器等。In a possible embodiment, the facial image processing device 100 may be a client device or a server device. The client device may be a mobile terminal, a stationary terminal or a portable terminal, such as a mobile phone, a site, a unit, a device, a multimedia computer, a multimedia tablet, an Internet node, a communicator, a desktop computer, a laptop computer, a notebook computer, a netbook computer, Tablet Computers, Personal Communication System (PCS) Devices, Personal Navigation Devices, Personal Digital Assistants (PDAs), Audio/Video Players, Digital Cameras/Camcorders, Positioning Devices, TV Receivers, Radio Broadcast Receivers, E-book Devices, Games Devices or any combination thereof, including accessories and peripherals for such devices or any combination thereof. It is also envisioned that the facial image processing device 100 can support any type of user-oriented interface (eg, wearable device) and the like. The server device may be a server for various services, a large computing device, and the like. A server can be one or more servers. The server can also be a physical server or a virtual server.

在一种可能的应用场景中，请参照图2，表示一种应用场景示例，脸部图像处理设备100通过服务器220实现。用户通过终端设备210中客户端211确定需要处理的待处理脸部图像，客户端211生成补全请求，服务器220通过对待处理脸部图像进行上述处理，获得全脸图像，并将全脸图像返回给客户端211。客户端211泛指可以实现脸部补全的客户端211，客户端211例如拍照App、图像处理APP等，本申请不限制客户端211的具体类型。In a possible application scenario, please refer to FIG. 2 , which shows an example of an application scenario. The face image processing device 100 is implemented by the server 220 . The user determines the facial image to be processed to be processed through the client 211 in the terminal device 210, the client 211 generates a completion request, and the server 220 performs the above processing on the facial image to be processed to obtain a full-face image, and returns the full-face image to client 211. The client 211 generally refers to the client 211 that can implement face completion. The client 211 is, for example, a photographing app, an image processing app, etc. The application does not limit the specific type of the client 211 .

工作原理working principle

请参照图3，表示对脸部图像进行处理过程的原理示意图，图像处理过程主要包括三部分：预测全脸特征点集部分310、调整全脸特征点集部分320和对待处理脸部图像进行补全部分330。Please refer to FIG. 3 , which shows a schematic diagram of the principle of processing a face image. The image processing process mainly includes three parts: predicting the full-face feature point set part 310, adjusting the full-face feature point set part 320, and complementing the face image to be processed. All part 330.

具体的，待处理脸部图像为I，该待处理脸部图像的未显示区域可以表示为M，因此待处理脸部图像可以表示为：Specifically, the face image to be processed is I, and the undisplayed area of the face image to be processed can be expressed as M, so the face image to be processed can be expressed as:

其中“。”表示矩阵的哈达玛乘积。where "." represents the Hadamard product of the matrix.

脸部图像处理设备100预测出全脸特征点集L，根据脸部特征调整信息对全脸特征点集L进行调整，获得L’，再将L’和待处理脸部图像I输入到图像补全模型，从而获得全脸图像

最终获得的全脸图像

可以表示如下：The facial image processing device 100 predicts the full-face feature point set L, adjusts the full-face feature point set L according to the facial feature adjustment information, obtains L', and then inputs L' and the face image I to be processed into the image complement. full model to obtain a full face image

The final full face image

It can be expressed as follows:

其中，

表示对未显示区域的补全区域。in,

Indicates the completion area for the undisplayed area.

下面对预测全脸特征点集部分310进行说明。Next, the prediction full-face feature point set section 310 will be described.

脸部图像处理设备100可以通过脸部特征点预测模型，获得待处理脸部图像的全脸特征点集。The facial image processing device 100 may obtain a full-face feature point set of the facial image to be processed through the facial feature point prediction model.

在一种可能的实施例中，请参照图4，脸部特征点预测模型410可以包括第一编码模块411、多个第二编码模块412和全连接模块413，其中：In a possible embodiment, please refer to FIG. 4 , the facial feature point prediction model 410 may include a first encoding module 411, a plurality of second encoding modules 412 and a fully connected module 413, wherein:

第一编码模块411中包括多种不同尺度的卷积层，当图像依次经过第一编码模块411中的各个层之后，各个卷积层可以分别获得对应的特征图，从而获得第一特征图集，也就是说，第一特征图集包括多个特征图。应当说明的是，本申请中的特征图集在没有明确限定的情况下，可能包括一个特征图，也可能包括多个特征图。The first encoding module 411 includes a variety of convolutional layers of different scales. After the image passes through each layer in the first encoding module 411 in sequence, each convolutional layer can obtain corresponding feature maps respectively, thereby obtaining the first feature map set. , that is, the first feature map set includes a plurality of feature maps. It should be noted that the feature map set in this application may include one feature map or multiple feature maps unless it is explicitly defined.

在获得第一特征图集之后，图4中是以三个第二编码模块412为例，实际上不限制其数量，每个第二编码模块412中包括一个或多个卷积层以及一个或多个池化层，不同第二编码模块412可以对第一特征图集进行不同卷积池化处理，每个第二编码模块412对应输出相应的特征值集，从而获得第二特征值集。卷积池化处理可以是先做卷积，再对卷积结果做平均池化处理，也可以是先做卷积，再对卷积结果进行最大池化处理等。After the first feature atlas is obtained, three second encoding modules 412 are taken as an example in FIG. 4 , the number of which is not actually limited, and each second encoding module 412 includes one or more convolutional layers and one or more convolutional layers. For multiple pooling layers, different second encoding modules 412 can perform different convolution pooling processing on the first feature map set, and each second encoding module 412 outputs a corresponding feature value set, thereby obtaining a second feature value set. Convolution pooling can be performed by convolution first, followed by average pooling on the convolution results, or by convolution first, followed by max pooling on the convolution results.

作为一种实施例，多个第二编码模型412中存在不同尺度的两个编码模块。As an embodiment, there are two encoding modules of different scales in the plurality of second encoding models 412 .

本申请实施例中，通过多个第二编码模块412分别对第一组特征图进行卷积池化处理，不同尺度的处理过程可以保留第一组特征图不同的特征，使得后续的特征点预测更加准确。In this embodiment of the present application, a plurality of second encoding modules 412 respectively perform convolution and pooling processing on the first group of feature maps, and different features of the first set of feature maps can be retained in the processing processes of different scales, so that subsequent feature point predictions can be made. more precise.

通过全连接模块413对第二特征值集进行拼接处理，获得全脸特征点集。全脸特征点集包括用于表示脸部的关键点的坐标集。拼接处理可以理解为将第二特征值集中对应值按照预设顺序组合成坐标，从而获得对应的全脸特征点集。The second feature value set is spliced through the full connection module 413 to obtain a full face feature point set. The full-face feature point set includes a set of coordinates for key points representing the face. The splicing process can be understood as combining corresponding values in the second feature value set into coordinates according to a preset order, so as to obtain a corresponding full-face feature point set.

下面对脸部特征点预测模型410的结构进行具体示例说明。一种脸部特征点预测模型410具体如下表1所示：The structure of the facial feature point prediction model 410 will be described with a specific example below. A facial feature point prediction model 410 is specifically shown in Table 1 below:

表1Table 1

表1中c表示输出通道数，s表示卷积层的跨步(stride)，t表示扩张因子(expansion factor)，n表示对应层重复处理的次数。In Table 1, c represents the number of output channels, s represents the stride of the convolutional layer, t represents the expansion factor, and n represents the number of repetitions of the corresponding layer.

其中，为了便于理解瓶颈层(bottleneck)，下面对表1涉及的瓶颈层的结构进行示例说明。表1中s＝2的瓶颈层的结构为通道数为(tk的1x1)卷积-ReLU6-(通道数为tk，跨步为2，卷积核大小为3x3)深度卷积(每个卷积核仅对一张特征图进行卷积)-ReLU6-(通道数为k的1x1)卷积。对于s＝1的瓶颈层，其结构为：(通道数为tk的1x1)卷积-ReLU6-(通道数为tk，跨步为2，卷积核大小为3x3)深度卷积-ReLU6-(通道数为k的1x1)卷积，该输出结果与瓶颈层的输入相加，实现残差连接。Among them, in order to facilitate the understanding of the bottleneck layer (bottleneck), the structure of the bottleneck layer involved in Table 1 is exemplified below. The structure of the bottleneck layer with s=2 in Table 1 is that the number of channels is (1x1 of tk) convolution-ReLU6- (the number of channels is tk, the stride is 2, the convolution kernel size is 3x3) deep convolution (each volume The product kernel only convolves one feature map)-ReLU6-(1x1 with k channels) convolution. For the bottleneck layer with s=1, its structure is: (1x1 with channel number tk) convolution-ReLU6- (channel number is tk, stride is 2, convolution kernel size is 3x3) deep convolution-ReLU6-( 1x1) convolution with the number of channels k, and the output result is added to the input of the bottleneck layer to realize the residual connection.

请参照表1所示，第一编码模块411包括一卷积层和多个瓶颈层(1x1卷积-ReLU6-跨步为2，3x3逐通道卷积(Depthwise Convolution))，获得第一特征图集C1，再将第一特征图集C1经过第二编码单元412，如表1所示，第二编码单元包括卷积核大小为1x1，通道数为1280的第一卷积层，和第一平均池化层，卷积核为1x1，通道数为128的第二卷积层，第二平均池化层，第三卷积层，以及第三平均池化层。将第一特征图集C1输入卷积核大小为1x1，通道数为1280的卷积层获得特征图集C2。对特征图集C2做全局平均池化，得到特征值集S1。在另一条分支中，使第一特征图集C1通过卷积核为1x1，通道数为128的卷积层并经过全局平均池化，得到特征值集S2。在第三条分支中，将特征图集C2通过卷积核为1x1，通道数为128的卷积层并做全局平均池化得到特征值集S3。Please refer to Table 1, the first encoding module 411 includes a convolution layer and a plurality of bottleneck layers (1x1 convolution-ReLU6-stride is 2, 3x3 channel-by-channel convolution (Depthwise Convolution)), to obtain the first feature map Set C1, and then pass the first feature atlas C1 through the second coding unit 412. As shown in Table 1, the second coding unit includes a first convolutional layer with a convolution kernel size of 1×1 and a channel number of 1280, and a first convolutional layer with a convolution kernel size of 1×1 Average pooling layer, the second convolutional layer with 1x1 convolution kernel and 128 channels, the second average pooling layer, the third convolutional layer, and the third average pooling layer. Input the first feature atlas C1 into a convolutional layer with a kernel size of 1x1 and a channel number of 1280 to obtain a feature atlas C2. Perform global average pooling on the feature atlas C2 to obtain the feature value set S1. In another branch, the first feature atlas C1 is passed through a convolutional layer with a convolution kernel of 1×1 and a channel number of 128 and global average pooling to obtain a feature value set S2. In the third branch, the feature map set C2 is passed through a convolutional layer with a convolution kernel of 1×1 and a channel number of 128, and global average pooling is performed to obtain the feature value set S3.

将各个特征值集(S1、S2和S3)输入到全连接单元413(也就是表1中的全连接单元)，最终将S1，S2，S3连接并经过全连接层，得到136个值，该136个值对应68个人脸特征点的x，y坐标，从而得到最终预测的全脸特征点坐标，也就是全脸特征点集。Input each feature value set (S1, S2 and S3) into the fully connected unit 413 (that is, the fully connected unit in Table 1), and finally connect S1, S2, S3 and pass through the fully connected layer to obtain 136 values. The 136 values correspond to the x and y coordinates of the 68 face feature points, so as to obtain the final predicted full-face feature point coordinates, that is, the full-face feature point set.

基于前文论述的脸部特征点预测模型410，下面对该脸部特征点预测模型410的训练过程进行介绍，包括获取第一样本数据集部分、基于第一样本数据集训练脸部特征点预测模型410部分，下面对两个部分分别进行介绍。Based on the facial feature point prediction model 410 discussed above, the following describes the training process of the facial feature point prediction model 410, including acquiring the first sample data set part, training facial features based on the first sample data set Part 410 of the point prediction model, two parts are introduced separately below.

获取第一样本数据集：Get the first sample dataset:

具体的，第一样本数据集包括样本待处理脸部图像集，与每个样本待处理脸部图像对应的真值样本全脸特征点集，每个样本待处理脸部图像均包括未显示区域，真值样本全脸特征点集是指该样本待处理脸部图像对应的完整脸部图像所对应的真实的全脸特征点的集。真值样本全脸特征点集可以是人工标注生成的，也可以是采用网络资源上的数据，也可以是采用其它网络模型进行标注生成的，例如可以采用FAN用于标注样本待处理脸部图像对应的完整脸部图像所对应的真值样本全脸特征点集。Specifically, the first sample data set includes a sample face image set to be processed, a true-valued sample full-face feature point set corresponding to each sample face image to be processed, and each sample face image to be processed includes not shown The full-face feature point set of the true value sample refers to the set of real full-face feature points corresponding to the complete face image corresponding to the face image to be processed in the sample. The full face feature point set of the true value sample can be generated by manual annotation, or it can be generated by using data on network resources, or it can be generated by using other network models for annotation. For example, FAN can be used to label the face image of the sample to be processed. The ground-truth sample full-face feature point set corresponding to the corresponding full-face image.

作为一种实施例，可以将样本全脸图像集输入FAN网络，获得每个样本全脸图像对应的真值样本全脸特征点集，再对样本全脸图像集中部分或全部图像添加遮挡，以获得更多数据集。As an example, the sample full-face image set can be input into the FAN network to obtain the ground-truth sample full-face feature point set corresponding to each sample full-face image, and then add occlusion to some or all of the images in the sample full-face image set to Get more datasets.

具体的，添加遮挡的方式有多种，例如添加随机位置但遮挡区域大小相同的遮挡，或者添加随机掩码的遮挡，随机掩码的遮挡是指遮挡区域的位置和大小均不固定。其中，添加的遮挡的颜色可以是任意，本申请不限制遮挡区域的具体颜色。但是在一次训练过程中，添加的遮挡的颜色都是相同的，例如可以选择白色。对于不同颜色的遮挡，可以分别进行模型训练。Specifically, there are various ways to add occlusion, such as adding a occlusion with a random position but the same size of the occlusion area, or adding an occlusion with a random mask. The occlusion of a random mask means that the position and size of the occlusion area are not fixed. The color of the added occlusion may be arbitrary, and the application does not limit the specific color of the occlusion area. But in a training process, the color of the added occlusion is the same, for example, white can be selected. For occlusions of different colors, model training can be performed separately.

例如，请参照5，图5中(a)表示一种添加随机位置但遮挡区域大小相同的遮挡之后的样本待处理脸部图像，图5中(b)表示一种添加随机掩码的遮挡之后的样本待处理脸部图像。For example, please refer to 5, in Fig. 5 (a) represents a sample face image to be processed after occlusion with random positions but the same occlusion area size is added, and (b) in Fig. 5 represents an occlusion with random masks added A sample face image to be processed.

进一步的，为了扩充数据集，还可以对添加遮挡后的图像进行翻转、模糊以及改变光度处理。光度处理例如可以是将图像的所有像素乘以预设因子，预设因子的范围例如为0.7-1.3。Further, in order to expand the dataset, the occluded images can also be flipped, blurred, and photometric changed. The photometric processing may be, for example, multiplying all pixels of the image by a predetermined factor, and the predetermined factor is in the range of 0.7-1.3, for example.

基于第一样本数据集训练脸部特征点预测模型410：The facial landmark prediction model 410 is trained based on the first sample data set:

脸部特征点预测模型410的预测损失可以表示如下：The prediction loss of the facial landmark prediction model 410 can be expressed as follows:

其中，L_gt表示真值特征点坐标，‖·‖₂表示L2范数。Among them, L _gt represents the ground-truth feature point coordinates, and ‖·‖ ₂ represents the L2 norm.

将样本待处理脸部图像输入脸部特征点预测模型，获得样本待处理脸部图像对应的预测样本全脸特征点集。Input the sample face image to be processed into the face feature point prediction model, and obtain the predicted sample full face feature point set corresponding to the sample face image to be processed.

基于预测样本全脸特征点集和真值样本全脸特征点集，确定脸部特征点预测模型410的损失，例如可以根据公式(3)确定损失。根据损失不断地更新参数，例如可以采用Adam优化器更新参数，其中Adam优化器中参数初始值可以设为：β₁＝0，β₂＝0.9。脸部特征点预测模型410的初始学习率设置为10^-4。The loss of the facial feature point prediction model 410 is determined based on the predicted sample full-face feature point set and the ground-truth sample full-face feature point set, for example, the loss can be determined according to formula (3). The parameters are continuously updated according to the loss, for example, the Adam optimizer can be used to update the parameters, wherein the initial values of the parameters in the Adam optimizer can be set as: β ₁ =0, β ₂ =0.9. The initial learning rate of the facial feature point prediction model 410 is set to 10 ⁻⁴ .

当脸部特征点预测模型410的损失满足第一预设条件时，获得已训练的脸部特征点预测模型410。第一预设条件，例如脸部特征点预测模型410的损失小于或等于某个阈值，或者脸部特征点预测模型410的学习率达到预设值。When the loss of the facial feature point prediction model 410 satisfies the first preset condition, the trained facial feature point prediction model 410 is obtained. The first preset condition, for example, the loss of the facial feature point prediction model 410 is less than or equal to a certain threshold, or the learning rate of the facial feature point prediction model 410 reaches a preset value.

在一种可能的实施例中，脸部特征点预测模型410可以获得物种的全脸特征点集，但是由于不同物种的脸部特征点分布可能有差异，因此可以采用不同物种所对应的样本数据集依次经过上述的训练过程，获得各个物种对应的参数。脸部图像处理设备100可以预先存储各种物种与其对应的参数。In a possible embodiment, the facial feature point prediction model 410 can obtain the full face feature point set of the species, but since the distribution of facial feature points of different species may be different, the sample data corresponding to different species can be used The set goes through the above training process in turn to obtain the parameters corresponding to each species. The face image processing apparatus 100 may pre-store various species and their corresponding parameters.

下面对调整全脸特征点集部分320进行说明。The adjustment full-face feature point set section 320 will be described below.

在一种可能的实施例中，脸部图像处理设备100可以通过脸部调整模型420，基于脸部特征调整信息，对全脸全脸特征点集中相应的特征点进行调整，获得调整后的全脸特征点集。In a possible embodiment, the facial image processing device 100 may adjust the corresponding feature points in the full-face full-face feature point set based on the facial feature adjustment information through the face adjustment model 420 to obtain the adjusted full-face feature point. face feature point set.

下面对脸部调整模型420进行示例说明。The face adjustment model 420 is exemplified below.

A1：A1:

当脸部特征调整信息为脸部表情目标图像，脸部调整模型420采用前文论述的脸部特征点预测模型410，通过该脸部特征点预测模型获得该脸部表情目标图像所对应的目标全脸特征点集，基于该目标全脸特征点集各个特征点之间的相对位置去调整之前待处理脸部图像的全脸特征点集中各个特征点的坐标位置，从而获得调整后的全脸特征点集。调整方式例如对待处理脸部图像的全脸特征点集中各个特征点的坐标位置进行调整。When the facial feature adjustment information is the facial expression target image, the face adjustment model 420 adopts the facial feature point prediction model 410 discussed above, and obtains the target full range corresponding to the facial expression target image through the facial feature point prediction model. face feature point set, based on the relative positions between the feature points of the target full-face feature point set to adjust the coordinate position of each feature point in the full-face feature point set of the face image to be processed before, so as to obtain the adjusted full-face feature point set. The adjustment method is, for example, adjusting the coordinate position of each feature point in the full-face feature point set of the face image to be processed.

本申请实施例中调整的方式，不仅可以调整脸部图像的表情，还可以调整脸部图像的姿态以及轮廓等。The adjustment method in the embodiment of the present application can not only adjust the expression of the face image, but also adjust the posture and outline of the face image.

例如，请参照图6，图6中A表示脸部表情目标图像对应的全脸特征点集，图6中B表示待处理脸部图像的全脸特征点集，根据脸部表情目标图像对应的全脸特征点集(例如图6中的a1、b1和c1)对待处理脸部图像的全脸特征点集(a2、b2和c1)的坐标进行调整，从而获得图6中C所示的全脸特征点集，图C相对于图B不仅在表情上有所变化，在脸部轮廓以及姿态上也对应发生变化。For example, please refer to FIG. 6. In FIG. 6, A represents the full-face feature point set corresponding to the facial expression target image, and B in FIG. 6 represents the full-face feature point set of the facial image to be processed. The coordinates of the full-face feature point set (for example, a1, b1, and c1 in Fig. 6) of the face image to be processed (a2, b2, and c1) are adjusted to obtain the full-face feature point set (a2, b2, and c1) of the face image to be processed. The facial feature point set, compared to the graph B, not only the expression has changed, but also the facial contour and posture have also changed correspondingly.

当脸部调整模型420采用前文论述的脸部特征点预测模型410时，关于该脸部调整模型420的训练过程可以参照前文论述的脸部特征点预测模型410的训练过程，此处不再赘述。When the face adjustment model 420 adopts the facial feature point prediction model 410 discussed above, the training process of the face adjustment model 420 can refer to the training process of the facial feature point prediction model 410 discussed above, which will not be repeated here. .

A2：A2:

通过将表情分类标签，以及全脸特征点集输入脸部调整模型420，获得调整后的全脸特征点集。By inputting the expression classification label and the full-face feature point set into the face adjustment model 420, the adjusted full-face feature point set is obtained.

具体的，脸部调整模型420可以根据表情分类标签或脸部表情图像对全脸特征点集中的特征点进行坐标调整，从而获得调整后的全脸特征点集。下面对脸部调整模型420的结构进行示例说明。表情分类标签可以是采用独热编码等方式进行表示。Specifically, the face adjustment model 420 may adjust the coordinates of the feature points in the full-face feature point set according to the expression classification label or the facial expression image, so as to obtain the adjusted full-face feature point set. The structure of the face adjustment model 420 is exemplified below. The expression classification labels can be represented by means of one-hot encoding or the like.

脸部调整模型420可以包括多个全连接层和多个激活层，全连接层和激活层的数量相同，且间隔分布。全连接层和激活层通过多层运算，以实现对全脸特征点集中的特征点的坐标进行迁移。The face adjustment model 420 may include multiple fully connected layers and multiple activation layers, and the number of fully connected layers and activation layers are the same and distributed at intervals. The fully connected layer and the activation layer perform multi-layer operations to realize the migration of the coordinates of the feature points in the full-face feature point set.

具体的，脸部调整模型420包括依次连接的第一全连接层、第一激活层、第二全连接层、第二激活层、第三全连接层和第三激活层。Specifically, the face adjustment model 420 includes a first fully connected layer, a first activation layer, a second fully connected layer, a second activation layer, a third fully connected layer, and a third activation layer that are connected in sequence.

下面对A2中涉及的脸部调整模型420的训练方法进行说明。The training method of the face adjustment model 420 involved in A2 will be described below.

请参照图7，获得样本脸部表情图像集，每个样本脸部表情图像标注有该图像对应的表情分类标签，表情分类标签例如微笑、大笑、哭等。Referring to FIG. 7 , a sample facial expression image set is obtained, and each sample facial expression image is marked with an expression classification label corresponding to the image, such as smiling, laughing, crying, and the like.

构建脸部调整模型420的损失，具体示例如下:Construct the loss of face adjustment model 420, a specific example is as follows:

其中，L_gt用于表示样本脸部表情图像对应的真值特征点集，

为脸部调整模型预测的表情分类标签所对应的全脸特征点集，‖·‖₂表示L2范数。Among them, L _gt is used to represent the ground-truth feature point set corresponding to the sample facial expression image,

The full-face feature point set corresponding to the facial expression classification labels predicted by the face adjustment model, ‖ _· ‖2 represents the L2 norm.

直到该脸部调整模型420的损失满足预设条件时，获得已训练的脸部调整模型420。Until the loss of the face adjustment model 420 satisfies the preset condition, the trained face adjustment model 420 is obtained.

本申请实施例中，脸部调整模型420预先学习到各种表情分类标签所对应的全脸特征点集，在使用该模型时，可以选择对应的表情分类表标签，脸部调整模型420就能基于该表情分类标签，对输入的全脸特征点集中特征点的坐标进行调整，从而获得已训练的脸部调整模型420。由于提前训练了各种表情分类标签所对应的全脸特征点集，这样可以相对提高脸部调整模型420对全脸特征点集进行调整过程中的准确性。In the embodiment of the present application, the face adjustment model 420 pre-learns the full-face feature point sets corresponding to various expression classification labels. When using this model, the corresponding expression classification table label can be selected, and the face adjustment model 420 can Based on the expression classification label, the coordinates of the feature points in the input full-face feature point set are adjusted to obtain the trained face adjustment model 420 . Since the full-face feature point sets corresponding to various expression classification labels are trained in advance, the accuracy in the process of adjusting the full-face feature point set by the face adjustment model 420 can be relatively improved.

A3：A3:

脸部图像处理设备100可以提取脸部特征调整信息对应的脸部轮廓特征，根据该脸部轮廓特征，对全脸特征点集中对应特征点的坐标进行调整，获得调整后的全脸特征点集。The facial image processing device 100 can extract the facial contour feature corresponding to the facial feature adjustment information, and adjust the coordinates of the corresponding feature points in the full-face feature point set according to the facial contour feature to obtain the adjusted full-face feature point set. .

具体的，脸部特征调整信息可以是用户手动画出的简笔画，或者可以是具有表情的人脸图像，无论是哪一种，脸部图像处理设备100均可以提取得到该脸部特征调整信息中的脸部轮廓特征，脸部图像处理设备100也可以通过预训练的轮廓提取模型等提取得到脸部轮廓特征。该脸部轮廓特征用于表示脸部部分或全部关键点构成的轮廓。Specifically, the facial feature adjustment information may be a stick figure drawn by the user's hand, or may be a face image with expressions, no matter which type it is, the facial image processing device 100 can extract the facial feature adjustment information The facial contour features in the face image processing device 100 can also be extracted by a pre-trained contour extraction model to obtain the facial contour features. The face contour feature is used to represent the contour formed by part or all of the key points of the face.

在获得脸部轮廓特征之后，可以基于该脸部轮廓，对全脸特征点集中对应特征点的坐标进行偏移，获得调整后的全脸特征点集。After the facial contour feature is obtained, the coordinates of the corresponding feature points in the full-face feature point set may be offset based on the facial contour to obtain an adjusted full-face feature point set.

本申请实施例中，可以不需预先训练对应的模型，相对可以减少脸部图像处理设备100处理量，且，可以处理简笔画等，提高用户处理图像的准确性。In the embodiment of the present application, it is not necessary to pre-train the corresponding model, which can relatively reduce the processing amount of the facial image processing device 100, and can process sketches, etc., so as to improve the accuracy of the user's processing of images.

下面对待处理脸部图像进行补全部分330进行介绍。The complementing part 330 of the face image to be processed will be described below.

可以通过图像补全模型，基于调整后的全脸特征点集，以及待处理脸部图像对待处理脸部图像进行补全，获得待处理脸部图像中脸部未显示区域进行补全处理的全脸图像。The image completion model can be used to complete the to-be-processed face image based on the adjusted full-face feature point set and the to-be-processed face image to obtain a complete image of the undisplayed face in the to-be-processed face image for completion processing. face image.

请继续参照图4，图像补全模型包括生成子模型430，生成子模型430包括第三编码模块431，空洞卷积模块432和解码模块433，将调整后的全脸特征点集以及待处理脸部图像输入至生成子模型430中。Please continue to refer to FIG. 4 , the image completion model includes a generating sub-model 430, and the generating sub-model 430 includes a third encoding module 431, a hole convolution module 432 and a decoding module 433, which combine the adjusted full-face feature point set and the face to be processed Part of the image is input into the generation sub-model 430.

具体的，先将调整后的全脸特征点集转换为脸部特征点图，该转换过程可以理解为将各个特征点以特征图形式进行表示，例如可以是将待处理脸部图像中所对应调整后的全脸特征点集对应的位置设置为1，其余设置为0，从而获得脸部特征点图。Specifically, the adjusted full-face feature point set is first converted into a facial feature point map. The conversion process can be understood as representing each feature point in the form of a feature map. The corresponding position of the adjusted full-face feature point set is set to 1, and the rest are set to 0, so as to obtain the facial feature point map.

第三编码模块431中可以包括依次连接的多个卷积层，多个卷积层分别对待处理脸部图像以及脸部特征点图进行不同尺度下卷积处理，获得各个卷积层输出的特征图，从而获得第三特征图集。第三特征图集包括第一中间特征图集、第二中间特征图集以及卷积处理后特征图集。第一中间特征图集是第三编码模块431的前部分卷积层处理得到的特征图，第二中间特征图集是第三编码模块431的中间部分卷积层处理得到的特征图，卷积处理后特征图是第三编码模块431的最后卷积层输出的特征图。The third encoding module 431 may include multiple convolutional layers connected in sequence, and the multiple convolutional layers respectively perform convolution processing at different scales on the face image to be processed and the facial feature point map to obtain the features output by each convolutional layer. to obtain a third feature atlas. The third feature atlas includes a first intermediate feature atlas, a second intermediate feature atlas, and a feature atlas after convolution processing. The first intermediate feature map set is the feature map processed by the convolutional layer in the front part of the third coding module 431, and the second intermediate feature map set is the feature map processed by the convolutional layer in the middle part of the third coding module 431. The processed feature map is the feature map output by the last convolutional layer of the third encoding module 431 .

通过空洞卷积模块432中的残差单元对卷积处理后特征图进行空洞卷积，获得第四特征图集，通过空洞卷积模块432中的注意力单元提取第四特征图集，以及卷积处理后特征图的局部特征，获得第五特征图集。The feature map after convolution processing is subjected to hole convolution by the residual unit in the hole convolution module 432 to obtain a fourth feature map set, and the fourth feature map set is extracted by the attention unit in the hole convolution module 432, and the volume The local features of the processed feature map are accumulated to obtain the fifth feature map set.

下面对注意力单元的处理机制进行介绍。The processing mechanism of the attention unit is introduced below.

具体的，注意力单元用于捕获长距离的空间上下文特征间的关联，其具体计算过程为：Specifically, the attention unit is used to capture the association between long-distance spatial context features, and its specific calculation process is as follows:

对于第四特征图集的特征f_d,首先通过卷积核大小为1x1的卷积层获得Q(f_d)。再计算f_di分量对j分量的内积s_ij，s_ij＝[Q(f_d)^T]_iQ(f_d)_j，再通过softmax函数进行归一化，获得f_di对f_dj的注意力值

之后获得f_d其他分量对f_dj的经过注意力加权后的特征和c_dj，其计算方法为

最终获得第四特征图集的特征部分结果为y_d＝γ_dc_d+f_d。其中γ_d为可训练的参数。For the feature f _d of the fourth feature atlas, Q(f _d ) is first obtained through a convolutional layer with a kernel size of 1x1. Then calculate the inner product s _ij of the f _d i component to the j component, s _ij =[Q(f _d ) ^T ] _i Q(f _d ) _j , and then normalize it by the softmax function to obtain the f _di to f _dj attention value

Then, the attention-weighted feature sum c _dj of other components of f _d on f _dj is obtained, and the calculation method is as follows:

Finally, the result of the feature part of the fourth feature atlas is obtained as y _d =γ _d c _d +f _d . where _γd is a trainable parameter.

对于卷积处理后特征图的特征f_e，其使用f_d得出的注意力值对特征进行加权求和获得c_ej,c_ej＝∑_iβ_j,if_ei。最终输出的卷积处理后特征图集的特征结果为y_e＝γ_e(1-M)c_e+Mf_e。其中γ_e为可训练的参数。For the feature f _e of the feature map after convolution processing, it uses the attention value derived from f _d to perform a weighted summation of the features to obtain c _ej , c _ej =∑ _i β _j,i f _ei . The feature result of the final output feature map set after convolution processing is y _e =γ _e (1-M)c _e + _Mfe . where γ _e is a trainable parameter.

将第五特征图集的特征结果和卷积处理后特征图集的特征结果进行连接操作即为第五特征图集。Connecting the feature results of the fifth feature atlas and the feature results of the feature atlas after convolution processing is the fifth feature atlas.

通过解码模块433对第五特征图集，以及第三特征图集进行上采样处理，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。The decoding module 433 performs up-sampling processing on the fifth feature atlas and the third feature atlas, so as to obtain a full-face image in which the undisplayed area of the face in the face image to be processed is complemented.

具体的，解码模块433可以对第五特征图集，以及第三特征图集中的所有特征图一起进行上采样，获得最后的全脸图像。Specifically, the decoding module 433 may perform up-sampling on the fifth feature map set and all the feature maps in the third feature map set together to obtain a final full-face image.

解码模块433也可以对第五特征图集进行上采样处理，获得第六特征图集进行上采样处理，获得第六特征图集。再对第二中间特征图集以及第六特征图集进行加权和上采样处理，获得第七特征图集。再对第七特征图以及第一特征图集进行加权，获得第八特征图集。再通过解码模块对第八特征图集合进行卷积处理，获得补全后的全脸图像。The decoding module 433 may also perform up-sampling processing on the fifth feature atlas to obtain a sixth feature atlas and perform up-sampling processing to obtain a sixth feature atlas. Then, weighting and up-sampling processing are performed on the second intermediate feature atlas and the sixth feature atlas to obtain a seventh feature atlas. The seventh feature map and the first feature map set are then weighted to obtain an eighth feature map set. The eighth feature map set is then subjected to convolution processing through the decoding module to obtain a completed full-face image.

例如，一种生成子模型430具体如下表所示：For example, a generation sub-model 430 is specifically shown in the following table:

表2Table 2

其中，k、c、s和p分别表示卷积层或解卷积层的卷积核大小、输出通道数、跨步、填充(padding)。除表1中第一行中卷积层使用的是反射填充(Reflection Padding)外，其余使用的是零填充(Zero Padding)。其中IN表示实例归一化(Instance Normalization)。Among them, k, c, s, and p represent the convolution kernel size, the number of output channels, stride, and padding of the convolutional or deconvolutional layers, respectively. Except for the convolutional layer in the first row in Table 1, which uses Reflection Padding, the rest use Zero Padding. Where IN represents instance normalization (Instance Normalization).

下面对表1所示的生成子模型430的具体处理过程进行说明。The specific process of generating the sub-model 430 shown in Table 1 will be described below.

具体的，先将脸部特征点图和待处理脸部图像经过卷积核大小为7×7，通道数为3，填充为3的卷积层，获得第一中间特征图集E1。再将第一中间特征图集E1输入卷积核大小为4×4，通道数为128，跨步为2，填充为1的卷积层，获得第二中间特征图集E2。之后将E2通过卷积核大小为4x4，通道数为256，跨步为2，填充为1的卷积层获得卷积处理后特征图集E3。Specifically, the facial feature point map and the face image to be processed are first passed through a convolutional layer with a convolution kernel size of 7×7, a channel number of 3, and a filling of 3 to obtain a first intermediate feature map set E1. Then, input the first intermediate feature atlas E1 into a convolutional layer with a kernel size of 4 × 4, a number of channels of 128, a stride of 2, and a padding of 1 to obtain the second intermediate feature atlas E2. After that, E2 is passed through the convolutional layer with the convolution kernel size of 4x4, the number of channels is 256, the stride is 2, and the padding is 1 to obtain the feature atlas E3 after convolution processing.

将E3经过7个第一次卷积操作为空洞卷积的的残差块后得到第四特征图集合R7。将R7和E3共同输入到长短期注意力模块中得到第五特征图集合，并使第五特征图集合通过卷积核大小为4x4，通道数为128，跨步为2，填充为1的转置卷积层做上采样，得到第六特征图集D1。将E2和D1共同输入到通道数为256的1x1卷积层，获得经过加权后的特征图集。之后加权后的特征图集输入到卷积核大小为4x4，通道数为256，跨步为2，填充为1的转置卷积层做上采样得到第七特征图集合D2。将D2和E1共同输入128通道的1x1卷积层中进行得到加权的第八特征图集合。将该第八特征图集合输入卷积核大小为7，通道数为3，填充为3的卷积，最终经过tanh层获得输出的全脸图像。The fourth feature map set R7 is obtained after E3 undergoes seven first convolution operations as residual blocks of hole convolution. Input R7 and E3 together into the long-term and short-term attention module to obtain the fifth feature map set, and make the fifth feature map set pass through the convolution kernel size of 4x4, the number of channels is 128, the stride is 2, and the padding is 1. Set the convolutional layer for upsampling to obtain the sixth feature atlas D1. E2 and D1 are jointly input to a 1x1 convolutional layer with 256 channels to obtain a weighted feature atlas. After that, the weighted feature map set is input to the convolution kernel with a size of 4x4, the number of channels is 256, the stride is 2, and the transposed convolution layer filled with 1 is up-sampled to obtain the seventh feature map set D2. D2 and E1 are jointly input into a 128-channel 1x1 convolutional layer to obtain a weighted eighth feature map set. The eighth feature map set is input into a convolution kernel with a size of 7, a channel number of 3, and a convolution of 3 padding, and finally the output full-face image is obtained through the tanh layer.

本申请实施例中图像补全模型还包括判别子模型440，用于确定生成子模块430的生成结果是来源于真实来源还是生成子模型420，以及用于确定生成结果与调整后的真值全脸特征点集是否匹配。In this embodiment of the present application, the image completion model further includes a discriminant sub-model 440, which is used to determine whether the generation result of the generation sub-module 430 comes from the real source or the generation sub-model 420, and is used to determine whether the generation result is the same as the adjusted true value. Whether the face feature point set matches.

判别子模型440可以采用简单的二分类模型，也可以采用Patch-GAN的鉴别器结构，例如70×70Patch-GAN，下面对判别子模型440的结构进行示例说明。The discriminant sub-model 440 may adopt a simple binary classification model, or may adopt the discriminator structure of Patch-GAN, such as 70×70 Patch-GAN, and the structure of the discriminant sub-model 440 is exemplified below.

表3table 3

其中，每行代表由列出的层组成的序列。K、c、s和p分别表示卷积层或解卷积层的卷积核大小、输出通道数、跨步、填充。SN代表谱归一化(Spectral Normalization)，LReLU代表泄露修正线性单元(Leaky ReLU)，斜率(slope)＝0.2，relu表示x大于等于0输出x，小于等于0则输出slope乘以x。表1中的判别子模型440包括注意力层，用于自适应地处理相应的特征。where each row represents a sequence consisting of the listed layers. K, c, s, and p denote the convolution kernel size, number of output channels, stride, and padding of the convolutional or deconvolutional layers, respectively. SN stands for Spectral Normalization, LReLU stands for Leaky ReLU, slope=0.2, relu means that x is greater than or equal to 0 and output x, and less than or equal to 0, the output slope is multiplied by x. The discriminative sub-model 440 in Table 1 includes an attention layer for adaptively processing the corresponding features.

下面对图像补全模型的训练过程进行示例说明：The following is an example of the training process of the image completion model:

获得第二样本数据集，第二样本数据集包括样本待处理脸部图像集，每个样本待处理脸部图像对应的真值样本全脸特征点集，以及样本待处理脸部图所对应的样本全脸图像。真值样本全脸特征点集的理解可以参照前文论述的内容，此处不再赘述。Obtain a second sample data set, the second sample data set includes the sample face image set to be processed, the true value sample full face feature point set corresponding to each sample face image to be processed, and the sample face image corresponding to the processing. Sample full face image. For the understanding of the full-face feature point set of the true value sample, refer to the content discussed above, which will not be repeated here.

作为一种实施例，先获得大量的样本全脸图像，将样本全脸图像集中部分或全部样本全脸图像进行遮挡。遮挡方式可以参照前文论述的内容，此处不再赘述。As an embodiment, a large number of sample full-face images are obtained first, and part or all of the sample full-face images in the sample full-face image set are occluded. For the occlusion method, reference may be made to the content discussed above, which will not be repeated here.

根据第二样本数据集，训练图像补全模型，直到图像补全模型的损失满足第二预设条件，获得已训练的图像补全模型。According to the second sample data set, the image completion model is trained until the loss of the image completion model satisfies the second preset condition, and the trained image completion model is obtained.

图像补全模型的损失可以有多种表示方式，本申请实施例中图像补全模型的损失根据逐像素损失、感知损失、风格损失、全微分损失以及生成损失加权得到的。The loss of the image completion model can be represented in various manners. In this embodiment of the present application, the loss of the image completion model is weighted according to pixel-by-pixel loss, perceptual loss, style loss, full differential loss, and generative loss.

具体的，逐像素损失用于表示补全后的样本全脸图像和样本待处理脸部图像的像素值误差；感知损失用于表示补全后的样本全脸图像的特征图和样本待处理脸部图像的特征图之间的误差；风格损失用于表示补全后的样本全脸图像加遮挡区域后的图像，和样本待处理脸部图像加遮挡区域后的图像的特征图的各通道信息向量化后通道间的格雷姆矩阵差值之和；全微分损失用于表示补全后的样本全脸图像横向、纵向导数之和与样本待处理脸部图像包括的像素点的总数之间的比值；生成损失用于表示判别子模型对生成子模型的生成结果来源的判别结果与生成结果的真实来源的损失。Specifically, the pixel-by-pixel loss is used to represent the pixel value error of the sample full face image after completion and the sample face image to be processed; the perceptual loss is used to represent the feature map of the sample full face image after completion and the sample face to be processed. The error between the feature maps of the partial images; the style loss is used to represent each channel information of the feature map of the sample full face image after completion plus the occlusion area and the sample face image to be processed plus the occlusion area. The sum of the Gramm matrix differences between the vectorized channels; the total differential loss is used to represent the difference between the sum of the horizontal and vertical derivatives of the completed sample full face image and the total number of pixels included in the sample face image to be processed. Ratio; generation loss is used to represent the loss of the discriminative sub-model for the source of the generated results of the generated sub-model and the true source of the generated results.

下面对各个损失的具体公式进行示例说明。The specific formulas of each loss are illustrated below.

一种逐像素损失的计算公式如下：A formula for calculating pixel-wise loss is as follows:

其中，‖·‖₁表示L1范数，N_m表示被遮挡的像素个数。Among them, ‖· _‖1 represents the L1 norm, and N _m represents the number of occluded pixels.

一种感知损失的计算公式如下：A formula for calculating the perceptual loss is as follows:

其中，φ_p(·)表示从预训练的网络模型(例如VGG19)中提取的第p层的N_p个尺寸为H_p×W_p的特征图，

用于表示补全后的样本全脸图像的特征图，φ_p(I)用于表示样本待处理脸部图像的特征图。where φ _p ( ) represents N _p feature maps of size H _p × W _p of the p-th layer extracted from a pre-trained network model (such as VGG19),

It is used to represent the feature map of the sample full face image after completion, and φ _p (I) is used to represent the feature map of the sample face image to be processed.

一种风格损失的计算公式如下：A formula for calculating style loss is as follows:

其中，G_p(x)＝φ_p(x)^Tφ_p(x)表示关于φ_p(x)的格雷姆矩阵。

表示补全后的样本全脸图像加遮挡区域后的图像，

表示样本待处理脸部图像加遮挡区域后的图像。where G _p (x)=φ _p (x) ^T φ _p (x) represents the Gramm matrix with respect to φ _p (x).

Represents the image after the completed sample full face image plus the occlusion area,

Represents the image of the sample face image to be processed plus the occlusion area.

一种全微分损失的计算公式如下：A formula for calculating the total differential loss is as follows:

其中，其中N_I表示I的像素总数，

表示求一阶导，包括

(水平方向)和

(竖直方向)。where NI represents the total number of pixels of _I ,

Represents the first derivative, including

(horizontal direction) and

(Vertically).

一种生成损失的计算公式如下：A formula for calculating the generation loss is as follows:

其中，G表示生成子模型430，D表示判别子模型440，生成损失L_adv为公式(9)和公式(10)之和。Among them, G represents the generation sub-model 430, D represents the discriminant sub-model 440, and the generation loss _{La adv} is the sum of the formula (9) and the formula (10).

图像补全模型的损失的计算公式表示如下：The calculation formula of the loss of the image completion model is expressed as follows:

L_inp＝L_pixel+λ_percL_perc+λ_styleL_style+λ_tvL_tv+λ_advL_adv (11)L _inp =L _pixel +λ _perc L _perc +λ _style L _style +λ _tv L _tv +λ _adv L _adv (11)

采用第二样本数据集，基于每次训练结果调整参数，直到图像补全模型的损失满足第二预设条件，从而获得图像补全模型的参数。Using the second sample data set, the parameters are adjusted based on each training result until the loss of the image completion model satisfies the second preset condition, so as to obtain the parameters of the image completion model.

作为一种实施例，λ_perc＝0.1，λ_style＝250，λ_tv＝0.1，λ_adv＝0.01的超参数设置。As an example, hyperparameter settings of λ _perc =0.1, λ _style =250, λ _tv =0.1, λ _adv =0.01.

作为一种实施例，整个训练过程交替使用L_inp优化生成器，使用

优化判别子模型440直到模型收敛。As an example, the entire training process alternately uses _Linp to optimize the generator, using

The discriminant submodel 440 is optimized until the model converges.

作为一种实施例，可以使用Adam优化器调整参数，其中Adam优化器的参数值设置：β₁＝0，β₂＝0.9，生成子模型430的初始学习率为10^-4，判别子模型440的初始学习率为10^-5，批处理大小(batch size)设为4，批处理大小可以理解为训练一次所使用的样本数量。As an embodiment, the parameters can be adjusted using the Adam optimizer, wherein the parameter values of the Adam optimizer are set: β ₁ =0, β ₂ =0.9, the initial learning rate of the generation sub-model 430 is 10 ⁻⁴ , and the discriminant sub-model 440 The initial learning rate is 10 ^-5 , and the batch size is set to 4. The batch size can be understood as the number of samples used in one training session.

示例过程Example process

基于图2论述的应用场景，下面对脸部图像处理方法进行介绍，请参照图8，该脸部图像处理方法包括：Based on the application scenario discussed in FIG. 2 , the following describes the facial image processing method, please refer to FIG. 8 , the facial image processing method includes:

S810，客户端211响应在脸部图像补全界面上的脸部图像补全输入操作，生成补全请求。S810, the client 211 generates a completion request in response to a face image completion input operation on the face image completion interface.

具体的，如前文论述的内容，客户端211支持脸部图像处理，用户在需要进行脸部图像进行补全的时候，可以打开客户端211，在脸部图像补全界面上进行输入操作，客户端211响应于该输入操作，生成补全请求。用户可以是自定义输入待处理脸部图像以及脸部特征调整信息，或者选择客户端211推荐的待处理脸部图像和脸部特征调整信息，例如客户端211从用户之前相册中确定出显示不完整的图像，以推荐给用户进行补全操作，用户在输入或者选择待处理脸部图像之后，客户端211相当于接收到用户进行的补全指令，从而生成补全请求。该补全请求中携带有待处理脸部图像和脸部特征调整信息，或者是待处理脸部图像对应的索引以及脸部特征调整信息的索引。Specifically, as discussed above, the client 211 supports facial image processing. When the user needs to complete the facial image, he can open the client 211 and perform input operations on the facial image completion interface. The terminal 211 generates a completion request in response to the input operation. The user can input the facial image to be processed and the facial feature adjustment information by custom, or select the facial image to be processed and the facial feature adjustment information recommended by the client 211, for example, the client 211 determines from the user's previous album that the display is not The complete image is recommended to the user for a completion operation. After the user inputs or selects the face image to be processed, the client 211 is equivalent to receiving a completion instruction from the user, thereby generating a completion request. The completion request carries the facial image to be processed and the facial feature adjustment information, or the index corresponding to the facial image to be processed and the index of the facial feature adjustment information.

作为一种实施例，客户端211可以接收用户自定义的脸部特征调整信息，例如用户自定义的简笔画等。As an embodiment, the client 211 may receive user-defined facial feature adjustment information, such as user-defined stick figures and the like.

例如，请参照图9，表示一种脸部图像补全界面的示意图，用户在选择待处理图像之后，客户端211可以推荐脸部特征调整信息，例如图9中的“笑”、“平常”和“不满”等，用户可以选择客户端211推荐的脸部特征调整信息。当用户选择点击操作控件930之后，相当于用户进行了输入操作。或者用户可以点击图9中所示的自定义操作控件910，自定义选择对应的脸部特征调整信息。当用户点击确定控件920之后，相当于确定输入脸部特征调整信息。For example, please refer to FIG. 9 , which shows a schematic diagram of a facial image completion interface. After the user selects an image to be processed, the client 211 can recommend facial feature adjustment information, such as “smile” and “normal” in FIG. 9 . and “dissatisfaction”, etc., the user can select the facial feature adjustment information recommended by the client 211 . After the user selects the click operation control 930, it is equivalent to the user performing an input operation. Alternatively, the user may click the custom operation control 910 shown in FIG. 9 to customize and select corresponding facial feature adjustment information. After the user clicks the confirmation control 920, it is equivalent to confirming the input of facial feature adjustment information.

或者例如，请参照图10，表示一种用户添加遮挡的示意图，用户在选择待处理图像之后，客户端211显示图10中A所示的界面，用户也可以点击图10中的添加遮挡控件1001，输入自己定义的脸部特征调整信息，例如添加图10中B所示的涂鸦，用户点击该操作控件之后，相当于用户进行了输入操作。Or, for example, please refer to FIG. 10 , which shows a schematic diagram of a user adding occlusion. After the user selects an image to be processed, the client 211 displays the interface shown in A in FIG. 10 , and the user can also click the add occlusion control 1001 in FIG. 10 . , input self-defined facial feature adjustment information, such as adding the graffiti shown in B in FIG. 10 , after the user clicks the operation control, it is equivalent to the user performing an input operation.

S820，客户端211向服务器220发送补全请求。S820, the client 211 sends a completion request to the server 220.

具体的，客户端211在生成补全请求之后，将补全请求发送给服务器220。Specifically, after generating the completion request, the client 211 sends the completion request to the server 220 .

S830，服务器220获得待处理脸部图像。S830, the server 220 obtains the face image to be processed.

具体的，服务器220在接收补全请求之后，通过补全请求中携带的信息，解析获得待处理脸部图像。同时，服务器220还可以获得脸部特征调整信息。Specifically, after receiving the completion request, the server 220 parses and obtains the face image to be processed through the information carried in the completion request. At the same time, the server 220 can also obtain facial feature adjustment information.

S840，服务器220将待处理脸部图像输入到已训练的脸部特征点预测模型410，获得待处理脸部图像的全脸特征点集。S840, the server 220 inputs the face image to be processed into the trained face feature point prediction model 410, and obtains a full face feature point set of the face image to be processed.

在一种可能的实施例中，脸部特征点预测模型410包括第一编码模块411、多个第二编码模块412和全连接模块413，具体获得全脸特征点集的具体过程示例如下：In a possible embodiment, the facial feature point prediction model 410 includes a first encoding module 411, a plurality of second encoding modules 412 and a full connection module 413, and an example of a specific process for obtaining a full face feature point set is as follows:

通过第一编码模块411，对待处理脸部图像进行多种尺度下的特征提取，获得第一特征图集合；Through the first encoding module 411, feature extraction under various scales is performed on the face image to be processed to obtain a first feature map set;

分别通过多个第二编码模块412中各个编码模块，对第一特征图集进行卷积池化处理，获得第二特征值集合；Perform convolution pooling processing on the first feature map set through each coding module in the plurality of second coding modules 412 respectively to obtain a second feature value set;

通过全连接模块413，对多个第二特征值集合进行拼接，获得全脸特征点集合。Through the full connection module 413, a plurality of second feature value sets are spliced to obtain a full face feature point set.

脸部特征点预测模型410、第一编码模块411、多个第二编码模块412和全连接模块413可以参照前文论述的内容，此处不再赘述。The facial feature point prediction model 410 , the first encoding module 411 , the plurality of second encoding modules 412 and the fully connected module 413 may refer to the content discussed above, and will not be repeated here.

S850，根据脸部特征调整信息，对全脸特征点集中相应的特征点进行调整，获得调整后的全脸特征点集。S850: Adjust corresponding feature points in the full-face feature point set according to the face feature adjustment information, to obtain an adjusted full-face feature point set.

具体调整方式示例如下：Examples of specific adjustment methods are as follows:

方式一：method one:

将脸部特征调整信息输入脸部特征点预测模型410，获得与该脸部特征调整信息对应的全脸特征点集，再根据该脸部特征调整信息对应的全脸特征点集对待处理脸部图像的全脸特征点集进行调整，从而获得调整后的全脸特征点集。Input the facial feature adjustment information into the facial feature point prediction model 410, obtain a full-face feature point set corresponding to the facial feature adjustment information, and then according to the full-face feature point set corresponding to the facial feature adjustment information to treat the face The full-face feature point set of the image is adjusted to obtain the adjusted full-face feature point set.

该方式的处理原理可以参照前文论述的内容，此处不再赘述。该方式适用于脸部特征调整信息为图像的情况。For the processing principle of this manner, reference may be made to the content discussed above, which will not be repeated here. This method is suitable for the case where the facial feature adjustment information is an image.

方式二：Method two:

将脸部特征调整信息，以及全脸特征点集合输入脸部调整模型420，获得调整后的全脸特征点集合。The facial feature adjustment information and the full-face feature point set are input into the face adjustment model 420 to obtain an adjusted full-face feature point set.

具体的，脸部调整模型420可以参照前文论述的内容，此处不再赘述。该方式适用于脸部特征调整信息为表情分类标签的情况。Specifically, for the face adjustment model 420, reference may be made to the content discussed above, which will not be repeated here. This method is suitable for the case where the facial feature adjustment information is the expression classification label.

方式三：Method three:

提取脸部特征调整信息对应的脸部轮廓特征；其中，脸部轮廓特征用于表示脸部关键点构成的轮廓；Extracting the facial contour feature corresponding to the facial feature adjustment information; wherein, the facial contour feature is used to represent the contour formed by the facial key points;

根据脸部轮廓特征，对全脸特征点集合中对应特征点的坐标进行调整，获得调整后的全脸特征点集合。According to the facial contour feature, the coordinates of the corresponding feature points in the full-face feature point set are adjusted to obtain the adjusted full-face feature point set.

S860，服务器220将调整后的全脸特征点集合集，以及待处理脸部图像输入图像补全模型，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。S860, the server 220 inputs the adjusted full-face feature point set and the to-be-processed face image into the image completion model, and obtains a full-face image that completes the undisplayed area of the face in the to-be-processed face image.

具体的，图像补全模型可以参照前文论述的内容，此处不再赘述。图像补全模型包括生成子模型430，生成子模型430包括第三编码模块431，空洞卷积模块432和解码模块433，下面对通过生成子模型430获得全脸图像的方式进行说明。Specifically, for the image completion model, reference may be made to the content discussed above, which will not be repeated here. The image completion model includes a generation sub-model 430 , which includes a third encoding module 431 , a hole convolution module 432 and a decoding module 433 . The following describes how to obtain a full-face image by generating the sub-model 430 .

将所述调整后的全脸特征点集合转换为脸部特征点图；Converting the adjusted full-face feature point set into a facial feature point map;

通过第三编码模块431，对脸部特征点图，以及待处理脸部图像进行不同尺度下的卷积处理，获得第三特征图集合；其中，第三特征图集包括卷积处理后特征图，卷积处理后特征图第三编码模块是对脸部特征点图，以及待处理脸部图像进行不同尺度下的卷积处理过程中，输出的最后卷积特征图；Through the third encoding module 431, the facial feature point map and the face image to be processed are subjected to convolution processing at different scales to obtain a third feature map set; wherein, the third feature map set includes the feature maps after convolution processing , the third encoding module of the feature map after convolution processing is the final convolution feature map output during the process of convolution processing at different scales on the facial feature point map and the face image to be processed;

通过空洞卷积模块432中的残差单元，对卷积处理后特征图进行空洞卷积，获得第四特征图集合；Through the residual unit in the hole convolution module 432, a hole convolution is performed on the feature map after the convolution process to obtain a fourth feature map set;

通过空洞卷积模块432中的注意力单元，提取第四特征图集合，以及卷积处理后特征图的局部特征，获得第五特征图集合；Through the attention unit in the hole convolution module 432, the fourth feature map set is extracted, and the local features of the feature map after convolution processing are obtained, and the fifth feature map set is obtained;

通过解码模块433对第五特征图集合，以及第三特征图集合进行上采样处理，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。The decoding module 433 performs up-sampling processing on the fifth feature map set and the third feature map set to obtain a full-face image in which the undisplayed area of the face in the face image to be processed is complemented.

在一种可能的实施例中，第三特征图集合还包括第一中间特征图和第二中间特征图，第一中间特征图是对脸部特征点图，以及待处理脸部图像依次进行预设次卷积处理获得的中间卷积结果，第二中间特征图是对第一中间特征图进行预设次卷积处理获得的中间卷积结果。In a possible embodiment, the third feature map set further includes a first intermediate feature map and a second intermediate feature map, and the first intermediate feature map is a sequence of pre-processing the facial feature point map and the facial image to be processed. It is assumed that the intermediate convolution result obtained by the sub-convolution process, and the second intermediate feature map is an intermediate convolution result obtained by performing a preset sub-convolution process on the first intermediate feature map.

下面对通过解码模块433对第五特征图集，以及第三特征图集进行上采样处理，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像进行示例说明。The following is an example description of performing up-sampling processing on the fifth feature atlas and the third feature atlas by the decoding module 433 to obtain a full-face image obtained by performing complement processing on the undisplayed area of the face in the face image to be processed. .

通过解码模块433对第五特征图集进行上采样处理，获得第六特征图集；Perform upsampling processing on the fifth feature atlas by the decoding module 433 to obtain the sixth feature atlas;

通过解码模块433对第二中间特征图，以及第六特征图集合进行加权上采样处理，获得第七特征图集；Perform weighted upsampling processing on the second intermediate feature map and the sixth feature map set by the decoding module 433 to obtain the seventh feature map set;

通过解码模块433对第七特征图集合，以及第一中间特征图进行加权，获得第八特征图集合；The seventh feature map set and the first intermediate feature map are weighted by the decoding module 433 to obtain the eighth feature map set;

通过解码模块433对第八特征图集合进行卷积处理，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。The decoding module 433 performs convolution processing on the eighth feature map set to obtain a full-face image obtained by performing complement processing on the undisplayed area of the face in the face image to be processed.

生成子模型430，生成子模型430包括第三编码模块431，空洞卷积模块432和解码模块433的内容可以参照前文论述的内容，此处不再赘述。The generation sub-model 430 includes a third encoding module 431 , and the content of the atrous convolution module 432 and the decoding module 433 can be referred to the content discussed above, which will not be repeated here.

服务器220对待处理脸部图像进行处理之后，就可以获得全脸图像。After the server 220 processes the face image to be processed, a full face image can be obtained.

在一种可能的实施例中，服务器220存储有不同物种与不同模型(包括前文论述的脸部特征点预测模型、图像补全模型以及脸部调整模型)的参数之间的对应关系，服务器220在获得待处理脸部图像之后，先判断该待处理脸部图像的物种分类，再选取与该物种对应的参数，以提高脸部图像处理的准确性。In a possible embodiment, the server 220 stores the correspondence between different species and parameters of different models (including the facial feature point prediction model, the image completion model and the face adjustment model discussed above), the server 220 After obtaining the facial image to be processed, first determine the species classification of the facial image to be processed, and then select parameters corresponding to the species, so as to improve the accuracy of facial image processing.

例如，请参照图11，图11中的A、B、C、D、E分别表示为真值全脸图像，待处理脸部图像、预测的人脸特征点集所对应的图像(图11中C中包括各个特征点)、生成子模型430直接根据全脸特征点集补全的人脸图像、根据全脸特征点集补全的人脸图像、待处理脸部图像拼接之后的全脸图像。For example, please refer to FIG. 11 , A, B, C, D, and E in FIG. 11 are respectively the true-valued full-face image, the face image to be processed, and the image corresponding to the predicted face feature point set (in FIG. 11 ). C includes each feature point), the generation sub-model 430 directly completes the face image according to the full-face feature point set, the face image completed according to the full-face feature point set, and the full-face image after splicing the face images to be processed .

例如，请参照图12，图12中的A、B、C、D、E分别表示真值全脸图像、被遮挡的人脸、预测的人脸特征点集所对应的图像(图13中C中包括各个特征点)、生成子模型430直接根据全脸特征点集补全的人脸图像、根据全脸特征点集补全的人脸图像和待处理脸部图像拼接之后的全脸图像。For example, please refer to FIG. 12 , A, B, C, D, and E in FIG. 12 respectively represent the ground-truth full-face image, the occluded face, and the image corresponding to the predicted face feature point set (C in FIG. 13 ) (including each feature point), the sub-model 430 generates a face image that is directly completed according to the full-face feature point set, a face image that is completed according to the full-face feature point set, and a full-face image after splicing the face image to be processed.

例如，请参照图13，图13中的A、B、C、D分别表示遮挡的人脸图片，基于A和表情分类标签(平常)补全后的全脸图像、基于A和表情分类标签(不满)补全后的全脸图像、基于A和表情分类标签(笑)补全后的全脸图像。For example, please refer to FIG. 13 , A, B, C, and D in FIG. 13 respectively represent the occluded face pictures, the full-face image after completion based on A and the expression classification label (usually), based on A and the expression classification label ( Full-face image after completion of dissatisfaction), and full-face image after completion based on A and expression classification labels (laughs).

例如，请参照图14，图14中的A、B、C、D分别表示遮挡的人脸图片，基于A和表情分类标签(笑)补全后的全脸图像、基于A和表情分类标签(平常)补全后的全脸图像、基于A和表情分类标签(不满)补全后的全脸图像。For example, please refer to FIG. 14 , A, B, C, and D in FIG. 14 represent the occluded face pictures, respectively, based on the full-face image completed by A and the expression classification label (laugh), based on A and the expression classification label ( Normal) full-face image after completion, and full-face image after completion based on A and expression classification label (dissatisfaction).

例如，请参照图15，图15中的A、B、C、D、E分别表示真值原脸部图、用户手动添加的遮挡后的脸部图像，预测的人脸特征点集所对应的图像、根据全脸特征点集补全的人脸图像和待处理脸部图像拼接之后的全脸图像。For example, please refer to Fig. 15, A, B, C, D, and E in Fig. 15 respectively represent the original face image of the true value, the face image after occlusion added manually by the user, and the predicted facial feature point set corresponding to The image, the face image completed according to the full-face feature point set, and the full-face image after splicing the face image to be processed.

例如，请参照图16，图16中的A、B、C、D、E分别表示真值原脸部图、用户手动添加的遮挡后的脸部图像，预测的人脸特征点集所对应的图像、根据全脸特征点集补全的人脸图像和待处理脸部图像拼接之后的全脸图像。For example, please refer to FIG. 16 , A, B, C, D, and E in FIG. 16 respectively represent the original face image of the true value, the face image after occlusion added manually by the user, and the predicted facial feature point set corresponding to The image, the face image completed according to the full-face feature point set, and the full-face image after splicing the face image to be processed.

S870，服务器220将全脸图像发送给客户端211。S870 , the server 220 sends the full-face image to the client 211 .

S880，客户端211获取并显示全脸图像。S880, the client 211 acquires and displays the full-face image.

具体的，客户端211显示全脸图像，以供用户查看。客户端211还可以保存待处理脸部图像，用户可以点击取消补全，客户端211显示待处理脸部图像。在获得全脸图像之后，用户还可以重新添加遮挡，继续进行补全等操作。Specifically, the client 211 displays a full-face image for the user to view. The client 211 can also save the facial image to be processed, the user can click to cancel the completion, and the client 211 displays the facial image to be processed. After obtaining the full-face image, the user can also re-add occlusion and continue to perform operations such as completion.

例如，以用户选择图9中所示的待处理脸部图像，以及选择“笑”为例，服务器220根据图9所示的待处理脸部图像以及“笑”，获得如图17所示的全脸图像。For example, taking the user selecting the face image to be processed shown in FIG. 9 and selecting “smile” as an example, the server 220 obtains the image shown in FIG. 17 according to the face image to be processed and “smile” shown in FIG. 9 . Full face image.

例如，以用户选择图10中所示的原脸部图像，并对该原脸部图像调整遮挡，以及选择“笑”为例，服务器220根据图10所示的待处理脸部图像以及“笑”，获得如图18所示的全脸图像。For example, taking the user selecting the original face image shown in ” to obtain a full-face image as shown in Figure 18.

在一种可能的实施例中，服务器220中的各个模型是其它设备训练得到的，服务器220直接将训练好的各个模型进行使用，以实现图8中的脸部图像处理方法。In a possible embodiment, each model in the server 220 is obtained by training other devices, and the server 220 directly uses each trained model to implement the face image processing method in FIG. 8 .

应当说明的是，图8中是以图像处理设备100为服务器220为例进行说明，实际上图像处理设备100可能有很多种，具体可以参照前文论述的内容，也就是说，任何一种图像处理设备100均可以实现图8中论述的内容。It should be noted that the image processing device 100 as the server 220 is used as an example for illustration in FIG. 8. In fact, there may be many types of image processing devices 100. For details, please refer to the content discussed above, that is, any image processing device The devices 100 can each implement what is discussed in FIG. 8 .

基于同一发明构思，本申请实施例提供一种图像处理装置，请参照图19，该图像处理装置1900相当于设置在前文论述的图像处理设备100中，包括收发模块1901和处理模块1902，其中：Based on the same inventive concept, an embodiment of the present application provides an image processing apparatus. Please refer to FIG. 19. The image processing apparatus 1900 is equivalent to being set in the image processing apparatus 100 discussed above, and includes a transceiver module 1901 and a processing module 1902, wherein:

在一种可能的实施例中，脸部特征点预测模型包括第一编码模块、多个第二编码模块和全连接模块，处理模块1902具体用于：In a possible embodiment, the facial feature point prediction model includes a first encoding module, a plurality of second encoding modules and a fully connected module, and the processing module 1902 is specifically configured to:

通过第一编码模块，对待处理脸部图像进行多种尺度下的特征提取，获得第一特征图集；Through the first encoding module, the facial image to be processed is subjected to feature extraction under various scales to obtain a first feature atlas;

分别通过多个第二编码模块中各个编码模块，对第一特征图集进行卷积池化处理，获得第二特征值集；Perform convolution pooling processing on the first feature atlas through each encoding module in the plurality of second encoding modules to obtain a second feature value set;

通过全连接模块，对多个第二特征值集进行拼接，获得全脸特征点集。Through the full connection module, multiple second feature value sets are spliced to obtain a full face feature point set.

在一种可能的实施例中，脸部特征调整信息为表情分类标签，处理模块1902具体用于：In a possible embodiment, the facial feature adjustment information is an expression classification label, and the processing module 1902 is specifically used for:

将表情分类标签，以及全脸特征点集输入脸部调整模型，获得调整后的全脸特征点集；Input the expression classification label and the full-face feature point set into the face adjustment model to obtain the adjusted full-face feature point set;

其中，脸部调整模型是根据样本脸部表情图像集，样本脸部表情图像集中每个样本脸部表情图像标注对应的表情分类标签。The face adjustment model is based on a sample facial expression image set, and each sample facial expression image in the sample facial expression image set is marked with a corresponding expression classification label.

在一种可能的实施例中，处理模块1902具体用于：In a possible embodiment, the processing module 1902 is specifically configured to:

根据脸部轮廓特征，对全脸特征点集中对应特征点的坐标进行调整，获得调整后的全脸特征点集。According to the facial contour features, the coordinates of the corresponding feature points in the full-face feature point set are adjusted to obtain the adjusted full-face feature point set.

在一种可能的实施例中，图像补全模型包括第三编码模块，空洞卷积模块和解码模块，处理模块1902具体用于：In a possible embodiment, the image completion model includes a third encoding module, a hole convolution module and a decoding module, and the processing module 1902 is specifically used for:

将调整后的全脸特征点集转换为脸部特征点图；Convert the adjusted full-face feature point set into a face feature point map;

通过第三编码模块，对脸部特征点图，以及待处理脸部图像进行不同尺度下的卷积处理，获得第三特征图集；其中，第三特征图集包括卷积处理后特征图集，卷积处理后特征图集第三编码模块是对脸部特征点图，以及待处理脸部图像进行不同尺度下的卷积处理过程中，输出的最后卷积特征图；Through the third encoding module, the facial feature point map and the face image to be processed are subjected to convolution processing at different scales to obtain a third feature atlas; wherein, the third feature atlas includes the feature atlas after convolution processing. , the third encoding module of the feature map set after convolution processing is the final convolution feature map output during the process of convolution processing at different scales on the facial feature point map and the face image to be processed;

通过空洞卷积模块中的残差单元，对卷积处理后特征图集进行空洞卷积，获得第四特征图集；Through the residual unit in the hole convolution module, the hole convolution is performed on the feature atlas after the convolution processing, and the fourth feature atlas is obtained;

通过空洞卷积模块中的注意力单元，提取第四特征图集，以及卷积处理后特征图集的局部特征，获得第五特征图集；Through the attention unit in the atrous convolution module, the fourth feature atlas is extracted, and the local features of the feature atlas after convolution processing are used to obtain the fifth feature atlas;

通过解码模块对第五特征图集，以及第三特征图集进行上采样处理，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。The decoding module performs up-sampling processing on the fifth feature atlas and the third feature atlas, so as to obtain a full-face image obtained by performing complement processing on the undisplayed area of the face in the face image to be processed.

在一种可能的实施例中，第三特征图集还包括第一中间特征图集和第二中间特征图集，第一中间特征图集是对脸部特征点图，以及待处理脸部图像依次进行预设次卷积处理获得的中间卷积结果，第二中间特征图集是对第一中间特征图集进行预设次卷积处理获得的中间卷积结果，处理模块1902具体用于：In a possible embodiment, the third feature atlas further includes a first intermediate feature atlas and a second intermediate feature atlas, and the first intermediate feature atlas is a facial feature point map and a face image to be processed The intermediate convolution results obtained by performing the preset sub-convolution processing in sequence, the second intermediate feature atlas is the intermediate convolution results obtained by performing the preset sub-convolution processing on the first intermediate feature atlas, and the processing module 1902 is specifically used for:

通过解码模块对第五特征图集进行上采样处理，获得第六特征图集；Perform upsampling processing on the fifth feature atlas by the decoding module to obtain the sixth feature atlas;

通过解码模块对第二中间特征图集，以及第六特征图集进行加权处理，获得第七特征图集；Perform weighting processing on the second intermediate feature atlas and the sixth feature atlas by the decoding module to obtain the seventh feature atlas;

通过解码模块对第七特征图集，以及第一中间特征图集进行加权和上采样处理，获得第八特征图集；Perform weighting and upsampling processing on the seventh feature atlas and the first intermediate feature atlas by the decoding module to obtain the eighth feature atlas;

通过解码模块对第八特征图集进行卷积处理，获得将待处理脸部图像中的脸部未显示区域进行补全处理的全脸图像。Convolution processing is performed on the eighth feature atlas by the decoding module to obtain a full-face image obtained by performing complement processing on the undisplayed area of the face in the face image to be processed.

在一种可能的实施例中，脸部特征点预测模型是通过如下步骤训练得到的：In a possible embodiment, the facial feature point prediction model is trained through the following steps:

获取第一样本数据集；其中，第一样本数据集中包括样本待处理脸部图像集，与每个样本待处理脸部图像对应的真值样本全脸特征点集；Acquiring a first sample data set; wherein, the first sample data set includes a sample face image set to be processed, and a true-value sample full-face feature point set corresponding to each sample face image to be processed;

基于第一样本数据集，训练脸部特征点预测模型；Based on the first sample data set, train a facial feature point prediction model;

直到脸部特征点预测模型的预测损失满足第一预设条件，获得已训练的脸部特征点预测模型；其中，预测损失用于表示预测样本全脸特征点集和真值样本全脸特征点集之间的损失。Until the prediction loss of the facial feature point prediction model satisfies the first preset condition, the trained facial feature point prediction model is obtained; wherein, the prediction loss is used to represent the full face feature point set of the predicted sample and the full face feature point of the true value sample loss between sets.

在一种可能的实施例中，图像补全模型包括生成子模型和判别子模型，图像补全模型的损失根据逐像素损失、感知损失、风格损失、全微分损失以及对抗损失加权得到的；In a possible embodiment, the image completion model includes a generation sub-model and a discriminant sub-model, and the loss of the image completion model is weighted according to pixel-by-pixel loss, perceptual loss, style loss, total differential loss and adversarial loss;

其中，逐像素损失用于表示补全后的样本全脸图像和样本待处理脸部图像的像素值误差；感知损失用于表示补全后的样本全脸图像的特征图和样本待处理脸部图像的特征图之间的误差；风格损失用于表示补全后的样本全脸图像加遮挡区域后的图像，和样本待处理脸部图像加遮挡区域后的图像的特征图的各通道信息向量化后通道间的格雷姆矩阵差值之和；全微分损失用于表示补全后的样本全脸图像横向、纵向导数之和与样本待处理脸部图像包括的像素点的总数之间的比值；生成损失用于表示判别子模型对生成子模型的生成结果来源的判别结果与生成结果的真实来源的损失。Among them, the pixel-by-pixel loss is used to represent the pixel value error of the completed sample full face image and the sample face image to be processed; the perceptual loss is used to represent the feature map of the completed sample full face image and the sample face to be processed. The error between the feature maps of the images; the style loss is used to represent each channel information vector of the feature map of the image after the completion of the sample full face image plus the occlusion area, and the sample face image to be processed plus the occlusion area. The sum of the Gramm matrix differences between the channels after the transformation; the total differential loss is used to represent the ratio between the sum of the horizontal and vertical derivatives of the sample full face image after completion and the total number of pixels included in the sample face image to be processed ;Generation loss is used to represent the loss of the discriminant sub-model to the source of the generated results of the generated sub-model and the real source of the generated results.

作为一种实施例，图19中的图像处理装置1900可以实现图8中服务器220的功能，或实现前文论述的任意一种脸部图像处理方法。As an embodiment, the image processing apparatus 1900 in FIG. 19 can implement the function of the server 220 in FIG. 8 , or implement any one of the facial image processing methods discussed above.

基于同一发明构思，本申请实施例提供一种终端设备，请参照图20，该终端设备210包括收发模块2001和显示模块2002，其中：Based on the same inventive concept, an embodiment of the present application provides a terminal device. Please refer to FIG. 20. The terminal device 210 includes a transceiver module 2001 and a display module 2002, wherein:

收发模块2001，用于响应在脸部图像补全界面上的脸部图像补全输入操作，向服务器发送补全请求，以使服务器对根据如前文论述的脸部图像处理方法对待处理脸部图像和脸部特征调整信息进行处理，得到脸部图像补全后的全脸图像；其中，补全请求包括待处理脸部图像和脸部特征调整信息；以及获取服务器返回的全脸图像；The transceiver module 2001 is used to respond to the facial image completion input operation on the facial image completion interface, and send a completion request to the server, so that the server can process the facial image to be processed according to the facial image processing method as discussed above. Process with the facial feature adjustment information to obtain a full-face image after the facial image is completed; wherein, the completion request includes the to-be-processed facial image and the facial feature adjustment information; and obtain the full-face image returned by the server;

显示模块2002，用于显示全脸图像。The display module 2002 is used for displaying a full face image.

应当说明的是，终端设备210还包括一些其他部件，以辅助实现相应的功能，本申请不对其他部件进行列举。It should be noted that the terminal device 210 further includes some other components to assist in implementing corresponding functions, and this application does not list other components.

基于同一发明构思，本申请实施例提供一种脸部图像处理设备，请继续参照图1，该脸部图像处理设备100包括处理器102和存储器103，处理器102用于调用存储器103中的程序指令，以实现前文论述的任意一种脸部图像处理方法。Based on the same inventive concept, an embodiment of the present application provides a facial image processing device, please continue to refer to FIG. 1 , the facial image processing device 100 includes a processor 102 and a memory 103 , and the processor 102 is used to call a program in the memory 103 instructions to implement any of the facial image processing methods discussed above.

作为一种实施例，图1中的处理器102可以用于实现图19中的收发模块1901和处理模块1902的功能。As an embodiment, the processor 102 in FIG. 1 may be used to implement the functions of the transceiver module 1901 and the processing module 1902 in FIG. 19 .

基于同一发明构思，本申请实施例提供一种终端设备，请继续参照图21，该终端设备210包括处理器2101和存储器2102，处理器2101用于调用存储器2102中的程序指令，以实现前文论述的任意一种脸部图像处理方法，或实现图8中客户端211的功能。Based on the same inventive concept, an embodiment of the present application provides a terminal device. Please continue to refer to FIG. 21 . The terminal device 210 includes a processor 2101 and a memory 2102. The processor 2101 is used to call program instructions in the memory 2102 to implement the foregoing discussion. Any one of the facial image processing methods, or realize the function of the client 211 in FIG. 8 .

作为一种实施例，图21中的处理器2101可以用于实现图20中的收发模块2001和显示模块2002的功能。As an embodiment, the processor 2101 in FIG. 21 may be used to implement the functions of the transceiver module 2001 and the display module 2002 in FIG. 20 .

基于同一发明构思，本申请实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，当所述计算机指令在计算机上运行时，使得计算机执行前文论述的脸部图像处理方法。Based on the same inventive concept, an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer executes the facial image discussed above. Approach.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. a face image processing method, is characterized in that, comprises:

obtaining a face image to be processed, where the face image to be processed includes a part of the face that is not displayed;

Inputting the facial image to be processed into a trained facial feature point prediction model to obtain a full-face feature point set of the facial image to be processed;

According to the facial feature adjustment information, adjusting the corresponding feature points in the full-face feature point set to obtain an adjusted full-face feature point set;

Input the adjusted full-face feature point set and the to-be-processed face image into the trained image completion model to obtain a full-face that completes the undisplayed area of the face in the to-be-processed face image image.

2. The method according to claim 1, wherein the facial feature point prediction model comprises a first encoding module, a plurality of second encoding modules and a fully connected module, and the face image to be processed is input into a previously The trained facial feature point prediction model obtains the full-face feature point set of the to-be-processed facial image, including:

Through the first encoding module, feature extraction is performed on the face image to be processed under various scales to obtain a first feature atlas;

Perform convolution pooling processing on the first feature map set through each coding module in the plurality of second coding modules respectively to obtain a second feature value set;

Through the full connection module, a plurality of second feature value sets are spliced to obtain a full face feature point set.

3. The method of claim 1, wherein the facial feature adjustment information is an expression classification label, and according to the facial feature adjustment information, the corresponding feature points in the full-face feature point set are adjusted to obtain The adjusted full-face feature point set, including:

Inputting the facial expression classification label and the full-face feature point set into the face adjustment model to obtain the adjusted full-face feature point set;

The face adjustment model is based on a sample facial expression image set, and each sample facial expression image in the sample facial expression image set is marked with a corresponding expression classification label.

4. The method according to claim 1, wherein, according to the facial feature adjustment information, the full-face feature point set is adjusted to obtain an adjusted full-face feature point set, comprising:

Extracting the facial contour feature corresponding to the facial feature adjustment information; wherein, the facial contour feature is used to represent the contour formed by the facial key points;

According to the facial contour feature, the coordinates of the corresponding feature points in the full-face feature point set are adjusted to obtain an adjusted full-face feature point set.

5. The method of claim 1, wherein the image completion model comprises a third encoding module, a hole convolution module and a decoding module, the adjusted full-face feature point set, and the to-be-processed The face image is input into the trained image completion model to obtain a full-face image that completes the undisplayed area of the face in the face image to be processed, including:

converting the adjusted full-face feature point set into a facial feature point map;

Through the third encoding module, the face feature point map and the face image to be processed are subjected to convolution processing at different scales to obtain a third feature atlas; wherein, the third feature atlas Including a feature map set after convolution processing, the third encoding module of the feature map set after convolution processing is to perform convolution processing at different scales on the facial feature point map and the face image to be processed In the process, the final convolution feature map of the output;

Through the residual unit in the hole convolution module, perform hole convolution on the feature atlas after the convolution processing to obtain a fourth feature atlas;

Extracting the fourth feature atlas and the local features of the feature atlas after the convolution processing through the attention unit in the hole convolution module to obtain the fifth feature atlas;

Perform upsampling processing on the fifth feature atlas and the third feature atlas by the decoding module, so as to obtain a full face obtained by performing complement processing on the undisplayed area of the face in the face image to be processed image.

6. The method of claim 5, wherein the third feature atlas further comprises a first intermediate feature atlas and a second intermediate feature atlas, the first intermediate The facial feature point map, and the intermediate convolution results obtained by sequentially performing preset sub-convolution processing on the face image to be processed, and the second intermediate feature atlas is preset for the first intermediate feature atlas. The intermediate convolution results obtained by the secondary convolution processing are subjected to up-sampling processing on the fifth feature atlas and the third feature atlas by the up-sampling module to obtain the facial image to be processed. The full-face image with the completion processing of the undisplayed area of the face, including:

Perform up-sampling processing on the fifth feature atlas by the decoding module to obtain a sixth feature atlas;

Perform weighting processing on the second intermediate feature atlas and the sixth feature atlas by the decoding module to obtain a seventh feature atlas;

Perform weighting and upsampling processing on the seventh feature atlas and the first intermediate feature atlas by the decoding module to obtain an eighth feature atlas;

Perform convolution processing on the eighth feature atlas by the decoding module to obtain a full-face image obtained by performing complement processing on the undisplayed area of the face in the face image to be processed.

7. The method according to any one of claims 1-6, wherein the facial feature point prediction model is obtained by training through the following steps:

Obtain a first sample data set; wherein, the first sample data set includes the sample face image set to be processed, and the true-value sample full-face feature point set corresponding to each sample face image to be processed ;

Based on the first sample data set, train a facial feature point prediction model;

Until the prediction loss of the facial feature point prediction model satisfies the first preset condition, the trained facial feature point prediction model is obtained; wherein, the prediction loss is used to represent the full face feature point set of the predicted sample and the full face of the true value sample loss between feature point sets.

8. The method according to any one of claims 1-6, wherein the image completion model is obtained by training through the following steps:

Obtain a second sample data set; wherein, the second sample data set includes the sample face image set to be processed, the ground-truth sample full-face feature point set corresponding to each sample face image to be processed, and the a sample full-face image corresponding to each sample to-be-processed face image in the sample to-be-processed face image set;

training the image completion module based on the second sample data set;

Until the value of the loss corresponding to the image completion model satisfies the second preset condition, the trained image completion model is obtained.

9. The method of claim 8, wherein the image completion model comprises a generative sub-model and a discriminative sub-model, and the loss of the image completion model is based on pixel-wise loss, perceptual loss, style loss, full Differential loss and weighted against adversarial loss;

Wherein, the pixel-by-pixel loss is used to represent the pixel value error of the sample full-face image after completion and the sample face image to be processed; the perceptual loss is used to represent the feature map and sample of the sample full-face image after completion The error between the feature maps of the to-be-processed face image; the style loss is used to represent the completed sample full-face image plus the occluded area image, and the sample to-be-processed face image plus the occlusion area. The information of each channel of the feature map is the sum of the Gramm matrix differences between the channels after quantization; the full differential loss is used to represent the sum of the horizontal and vertical derivatives of the completed sample full face image and the sample face image to be processed. The ratio between the total number of pixel points; the generation loss is used to represent the loss of the discriminant sub-model for the source of the generation result of the generation sub-model and the loss of the real source of the generation result.

10. A face image processing method, comprising:

In response to a face image completion input operation on the face image completion interface, a completion request is sent to the server, so that the server responds to the facial image to be processed according to the method according to any one of claims 1-9. Process with the facial feature adjustment information to obtain a full-face image after the facial image is completed; wherein, the completion request includes the to-be-processed facial image and the facial feature adjustment information;

Acquire and display the full-face image returned by the server.

11. A face image processing device, comprising:

a transceiver module, configured to obtain a face image to be processed, where the face image to be processed includes a part of the face that is not displayed;

The processing module is used for inputting the face image to be processed into the trained facial feature point prediction model to obtain the full face feature point set of the face image to be processed; The corresponding feature points in the full-face feature point set are adjusted to obtain the adjusted full-face feature point set; The model obtains a full-face image obtained by performing complement processing on the undisplayed area of the face in the to-be-processed face image.

12. A terminal device, comprising:

The sending module is used to send a completion request to the server in response to a face image completion input operation on the face image completion interface, so that the server can respond to the method according to any one of claims 1-9. Process the facial image to be processed and the facial feature adjustment information to obtain a full-face image after the facial image is completed; wherein, the completion request includes the to-be-processed facial image and the facial feature adjustment information; and obtaining the full-face image returned by the server;

The display module is used for displaying the full face image.

13. A face image processing device, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor;

Wherein, the memory stores instructions executable by the at least one processor, and the at least one processor implements the method according to any one of claims 1-9 or 10 by executing the instructions stored in the memory .

14. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores computer instructions that, when the computer instructions are executed on a computer, cause the computer to perform any one of claims 1-9 or 10. one of the methods described.