CN114581293A

CN114581293A - Perspective transformation method and device, electronic equipment and readable storage medium

Info

Publication number: CN114581293A
Application number: CN202210216834.0A
Authority: CN
Inventors: 杨云
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-06-03
Anticipated expiration: 2042-03-07
Also published as: CN114581293B

Abstract

The application provides a perspective transformation method, a perspective transformation device, electronic equipment and a readable storage medium, wherein a face region image in a face image is obtained by intercepting the face image collected by shooting equipment, and the face region image comprises a plurality of face key points. And constructing a virtual perspective camera, and obtaining the pose parameters of the virtual perspective camera according to the reprojection error between the 3D coordinate information and the 2D coordinate information of each face key point in the face region image and the rigid transformation information of the face during perspective transformation. And finally, transforming the 3D coordinate information of each face key point into virtual 3D coordinate information in a virtual perspective camera space based on the pose parameters. In the scheme, the reprojection error and the rigid transformation information between the 3D point and the 2D point are comprehensively considered in perspective transformation, and a good effect can be achieved on avoiding face transformation deformation and improving the projection accuracy of key points.

Description

Perspective transformation method, apparatus, electronic device and readable storage medium

技术领域technical field

本申请涉及图像处理技术领域，具体而言，涉及一种透视变换方法、装置、电子设备和可读存储介质。The present application relates to the technical field of image processing, and in particular, to a perspective transformation method, apparatus, electronic device and readable storage medium.

背景技术Background technique

近年来，人脸3D关键点检测在脸部跟踪、情感识别和人脸VR特效等领域已经被广泛的应用。在人脸3D关键点检测领域中，通常会涉及到对采集到的原图进行裁剪等处理，从而导致相机内外参数发生改变。因此，为了准确还原处理后的图像中的关键点位置信息，一般采用构建一个虚拟相机并将原始的世界坐标系下的三维人脸信息透视变换到虚拟透视相机空间中。In recent years, face 3D keypoint detection has been widely used in the fields of face tracking, emotion recognition, and face VR special effects. In the field of face 3D key point detection, it usually involves cropping and other processing of the collected original image, resulting in changes in the internal and external parameters of the camera. Therefore, in order to accurately restore the position information of key points in the processed image, a virtual camera is generally constructed and the three-dimensional face information in the original world coordinate system is perspective-transformed into the virtual perspective camera space.

但是，现有技术中在进行透视变换时，一般是采用如最小二乘法等拟合方式进行拟合变换，这种方式容易出现解算不准确、人脸发生扭曲等问题。However, in the prior art, when performing perspective transformation, a fitting method such as the least squares method is generally used to perform the fitting transformation, which is prone to problems such as inaccurate calculation and distortion of the human face.

发明内容SUMMARY OF THE INVENTION

本申请的目的包括，例如，提供了一种透视变换方法、装置、电子设备和可读存储介质，其能够在虚拟透视变换中避免人脸变换变形且提高关键点投影精准性。The objectives of the present application include, for example, to provide a perspective transformation method, apparatus, electronic device, and readable storage medium, which can avoid face transformation and deformation during virtual perspective transformation and improve the accuracy of key point projection.

本申请的实施例可以这样实现：The embodiments of the present application can be implemented as follows:

第一方面，本申请提供一种透视变换方法，所述方法包括：In a first aspect, the present application provides a perspective transformation method, the method comprising:

获取拍摄设备所采集的人脸图像，并截取获得所述人脸图像中的人脸区域图像，所述人脸区域图像中包含多个人脸关键点；acquiring a face image collected by a photographing device, and intercepting and obtaining a face area image in the face image, where the face area image includes a plurality of face key points;

构建虚拟透视相机，根据所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到所述虚拟透视相机的位姿参数；基于所述位姿参数将各所述人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。Build a virtual perspective camera, according to the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image, and the rigid transformation information of the face during perspective transformation, obtain the virtual The pose parameters of the perspective camera; based on the pose parameters, the 3D coordinate information of each of the face key points is transformed into virtual 3D coordinate information in the virtual perspective camera space.

在可选的实施方式中，所述截取获得所述人脸图像中的人脸区域图像的步骤，包括：In an optional implementation manner, the step of obtaining a face region image in the face image by intercepting includes:

确定所述人脸图像中构成人脸轮廓的多个人脸关键点；Determine a plurality of face key points that constitute the face contour in the face image;

获得所述多个人脸关键点的最小外接框；obtaining the minimum bounding frame of the multiple face key points;

基于所述最小外接框对所述人脸图像进行截取，获得人脸区域图像。The face image is intercepted based on the minimum bounding frame to obtain a face region image.

在可选的实施方式中，所述基于所述最小外接框对所述人脸图像进行截取，获得人脸区域图像的步骤，包括：In an optional implementation manner, the step of intercepting the face image based on the minimum bounding frame to obtain a face region image includes:

确定所述最小外接框的中心位置；determining the center position of the minimum bounding box;

基于所述最小外接框的多个角点的位置信息计算得到截取长度；The interception length is calculated based on the position information of multiple corner points of the minimum bounding frame;

根据所述中心位置和截取长度确定截取范围，并基于所述截取范围对所述人脸图像进行截取，获得人脸区域图像。An interception range is determined according to the center position and the interception length, and the face image is intercepted based on the interception range to obtain a face area image.

在可选的实施方式中，所述方法还包括获得所述人脸区域图像中各所述人脸关键点的2D坐标信息的步骤，该步骤包括：In an optional embodiment, the method further includes the step of obtaining the 2D coordinate information of each of the face key points in the face region image, and the step includes:

获得各所述人脸关键点在世界坐标系下的3D坐标信息；Obtain the 3D coordinate information of each of the face key points in the world coordinate system;

根据所述拍摄设备的设备参数，将各所述人脸关键点的3D坐标信息投影为第一图像坐标系下的2D坐标信息，所述第一图像坐标系以所述人脸图像所构建；According to the device parameters of the photographing device, project the 3D coordinate information of each of the face key points as 2D coordinate information under a first image coordinate system, and the first image coordinate system is constructed with the face image;

基于截取所述人脸区域图像时的截取信息，对所述2D坐标信息进行更新，得到在以所述人脸区域图像所构建的第二图像坐标系下的2D坐标信息。The 2D coordinate information is updated based on the interception information when the face area image is intercepted to obtain 2D coordinate information in the second image coordinate system constructed with the face area image.

在可选的实施方式中，所述截取信息包括截取长度和截取中心位置；In an optional implementation manner, the clipping information includes clipping length and clipping center position;

所述基于截取所述人脸区域图像时的截取信息，对所述2D坐标信息进行更新的步骤，包括：The step of updating the 2D coordinate information based on the interception information when intercepting the face region image, includes:

将所述2D坐标信息中的横坐标减去所述截取中心位置的横坐标，并加上所述截取长度的一半；Subtract the abscissa of the interception center position from the abscissa in the 2D coordinate information, and add half of the interception length;

将所述2D坐标信息中的纵坐标减去所述截取中心位置的纵坐标，并加上所述截取长度的一半。The ordinate of the clipping center position is subtracted from the ordinate in the 2D coordinate information, and the half of the clipping length is added.

在可选的实施方式中，所述根据所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到所述虚拟透视相机的位姿参数的步骤，包括：In an optional implementation manner, the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face region image, and the rigid transformation information of the face during perspective transformation , the steps of obtaining the pose parameters of the virtual perspective camera include:

构建由所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差构成的第一优化项，并构建由透视变换时人脸的刚性变换信息构成的第二优化项；Constructing the first optimization term consisting of the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face region image, and constructing the rigid transformation information of the face during perspective transformation. The second optimization term of ;

基于设置的权重系数对所述第二优化项赋权值；assigning a weight to the second optimization term based on the set weight coefficient;

将所述第一优化项和赋权值后的第二优化项进行相加，构建得到优化模型，对所述优化模型进行最小化处理以得到所述虚拟透视相机的位姿参数。The first optimization term and the weighted second optimization term are added to construct an optimized model, and the optimized model is minimized to obtain the pose parameters of the virtual perspective camera.

在可选的实施方式中，所述构建由所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差构成的第一优化项的步骤，包括：In an optional implementation manner, the step of constructing a first optimization term consisting of the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face region image includes the following steps: :

构建由所述人脸区域图像中的各人脸关键点的3D坐标信息与所述位姿参数得到的投影项；constructing a projection item obtained from the 3D coordinate information of each face key point in the face region image and the pose parameter;

利用所述人脸区域图像中的各人脸关键点的2D坐标信息减去所述投影项，得到所述第一优化项。The first optimization term is obtained by subtracting the projection item from the 2D coordinate information of each face key point in the face region image.

在可选的实施方式中，所述位姿参数包含旋转矩阵，所述构建由透视变换时人脸的刚性变换信息构成的第二优化项的步骤，包括：In an optional implementation manner, the pose parameter includes a rotation matrix, and the step of constructing a second optimization term composed of rigid transformation information of the face during perspective transformation includes:

将所述旋转矩阵与旋转矩阵的转置矩阵相乘，得到刚性特征信息；Multiplying the rotation matrix by the transpose matrix of the rotation matrix to obtain rigid feature information;

利用构建的单位矩阵减去所述刚性特征信息，得到所述第二优化项。The rigid feature information is subtracted from the constructed identity matrix to obtain the second optimization term.

在可选的实施方式中，所述方法还包括：In an optional embodiment, the method further includes:

基于所述多个人脸关键点构建得到人脸网格；Constructing a face grid based on the plurality of face key points;

根据各所述人脸关键点的虚拟3D坐标信息得到所述人脸网格的网格信息。The grid information of the face grid is obtained according to the virtual 3D coordinate information of each of the face key points.

第二方面，本申请提供一种透视变换装置，所述装置包括：In a second aspect, the present application provides a perspective transformation device, the device comprising:

获取模块，获取拍摄设备所采集的人脸图像，并截取获得所述人脸图像中的人脸区域图像，所述人脸图像中包含多个人脸关键点；an acquisition module, which acquires a face image collected by a photographing device, and intercepts and obtains a face region image in the face image, where the face image includes a plurality of face key points;

确定模块，用于构建虚拟透视相机，根据所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到所述虚拟透视相机的位姿参数；A determination module, for constructing a virtual perspective camera, according to the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image, and the rigid transformation information of the face during perspective transformation , obtain the pose parameters of the virtual perspective camera;

变换模块，用于基于所述位姿参数将各所述人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。A transformation module, configured to transform the 3D coordinate information of each of the face key points into virtual 3D coordinate information in the virtual perspective camera space based on the pose parameters.

第三方面，本申请提供一种电子设备，包括一个或多个存储介质和一个或多个与存储介质通信的处理器，一个或多个存储介质存储有处理器可执行的机器可执行指令，当电子设备运行时，处理器执行所述机器可执行指令，以执行前述实施方式中任意一项所述的方法步骤。In a third aspect, the present application provides an electronic device, comprising one or more storage media and one or more processors in communication with the storage media, wherein the one or more storage media stores machine-executable instructions executable by the processor, When the electronic device is running, the processor executes the machine-executable instructions to perform the method steps described in any one of the preceding embodiments.

第四方面，本申请提供一种计算机可读存储介质，所述计算机可读存储介质存储有机器可执行指令，所述机器可执行指令被执行时实现前述实施方式中任意一项所述的方法步骤。In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are executed, implements the method described in any one of the foregoing embodiments step.

本申请实施例的有益效果包括，例如：The beneficial effects of the embodiments of the present application include, for example:

本申请提供一种透视变换方法、装置、电子设备和可读存储介质，针对拍摄设备所采集的人脸图像，截取获得人脸图像中的人脸区域图像，人脸区域图像中包含多个人脸关键点。构建虚拟透视相机，根据人脸区域图像中各人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到虚拟透视相机的位姿参数。最后基于位姿参数将各人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。该方案中，在透视变换中综合考虑了3D点和2D之间的重投影误差和刚性变换信息，可在避免人脸变换变形以及提高关键点投影精准性上达到良好效果。The present application provides a perspective transformation method, device, electronic device, and readable storage medium. For a face image collected by a photographing device, a face region image in the face image is obtained by intercepting, and the face region image contains a plurality of human faces. key point. A virtual perspective camera is constructed, and the pose parameters of the virtual perspective camera are obtained according to the reprojection error between the 3D coordinate information and 2D coordinate information of each face key point in the face area image, and the rigid transformation information of the face during perspective transformation. . Finally, based on the pose parameters, the 3D coordinate information of each face key point is transformed into the virtual 3D coordinate information in the virtual perspective camera space. In this scheme, the reprojection error and rigid transformation information between 3D points and 2D points are comprehensively considered in perspective transformation, which can achieve good results in avoiding face transformation and deformation and improving the accuracy of key point projection.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本申请的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following drawings will briefly introduce the drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present application, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.

图1为本申请实施例提供的透视变换方法的应用场景示意图；1 is a schematic diagram of an application scenario of a perspective transformation method provided by an embodiment of the present application;

图2为本申请实施例提供的透视变换方法的流程图；2 is a flowchart of a perspective transformation method provided by an embodiment of the present application;

图3为本申请实施例提供的人脸区域图像截取示意图；3 is a schematic diagram of a face region image interception provided by an embodiment of the present application;

图4为本申请实施例提供的人脸网格示意图；4 is a schematic diagram of a face grid provided by an embodiment of the present application;

图5为本申请实施例提供的人脸网格的另一视角的示意图；5 is a schematic diagram of another perspective of a face grid provided by an embodiment of the present application;

图6为本申请实施例提供的2D坐标信息获取方法的流程图；6 is a flowchart of a method for acquiring 2D coordinate information provided by an embodiment of the present application;

图7为图2中步骤S101包含的子步骤的流程图；FIG. 7 is a flowchart of the sub-steps included in step S101 in FIG. 2;

图8为本申请实施例中确定的最小外接框的示意图；FIG. 8 is a schematic diagram of a minimum bounding frame determined in an embodiment of the application;

图9为图7中步骤S1015包含的子步骤的流程图；FIG. 9 is a flowchart of the sub-steps included in step S1015 in FIG. 7;

图10为图2中步骤S103包含的子步骤的流程图；10 is a flowchart of the sub-steps included in step S103 in FIG. 2;

图11为本申请实施例提供的网格信息获取方法的流程图；11 is a flowchart of a method for acquiring grid information provided by an embodiment of the present application;

图12为本申请实施例提供的电子设备的结构框图；12 is a structural block diagram of an electronic device provided by an embodiment of the present application;

图13为本申请实施例提供的透视变换装置的功能模块框图。FIG. 13 is a block diagram of functional modules of a perspective transformation apparatus provided by an embodiment of the present application.

图标：110-存储介质；120-处理器；130-透视变换装置；131-获取模块；132-确定模块；133-变换模块；140-通信接口。Icons: 110-storage medium; 120-processor; 130-perspective transformation device; 131-acquisition module; 132-determination module; 133-transformation module; 140-communication interface.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. The components of the embodiments of the present application generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations.

因此，以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围，而是仅仅表示本申请的选定实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。Thus, the following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

在本申请的描述中，需要说明的是，若出现术语“第一”、“第二”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In the description of the present application, it should be noted that if the terms "first", "second" etc. appear, they are only used to distinguish the description, and should not be construed as indicating or implying relative importance.

需要说明的是，在不冲突的情况下，本申请的实施例中的特征可以相互结合。It should be noted that the features in the embodiments of the present application may be combined with each other under the condition of no conflict.

请参阅图1，为本申请实施例提供的透视变换方法的应用场景示意图，该应用场景中包含服务器及与服务器通信连接的拍摄设备。拍摄设备可以为一个或多个，拍摄设备中可包括用于采集二维图像的设备，如照相机，此外，还可包括用于采集深度图像的设备，例如，深度摄像机等。本实施例中，拍摄设备可将采集的图像信息或视频信息发送至服务器，通过服务器对接收到的图像信息或视频信息进行分析处理。Please refer to FIG. 1 , which is a schematic diagram of an application scenario of the perspective transformation method provided by an embodiment of the present application. The application scenario includes a server and a photographing device that is communicatively connected to the server. There may be one or more photographing devices, and the photographing device may include a device for collecting two-dimensional images, such as a camera, and also a device for collecting depth images, such as a depth camera. In this embodiment, the photographing device may send the collected image information or video information to the server, and the server may analyze and process the received image information or video information.

以直播应用场景为例，拍摄设备可以是设置在直播提供端或直播接收端上的拍摄设备。可以采集主播或者是观看主播直播的观众的人脸图像，进而将人脸图像发送到服务器进行分析处理。Taking the live broadcast application scenario as an example, the shooting device may be a shooting device set on the live broadcast provider or the live broadcast receiver. The face image of the host or the audience watching the live broadcast of the host can be collected, and then the face image is sent to the server for analysis and processing.

需要说明的是，本实施例所提供的透视变换方法也可以应用在其他的需要对人脸关键点的三维信息进行分析处理，并利用三维信息进行如人脸情感识别、人脸特效等处理的应用场景中。It should be noted that the perspective transformation method provided in this embodiment can also be applied to other processes that need to analyze and process the three-dimensional information of the key points of the face, and use the three-dimensional information to perform processing such as facial emotion recognition and facial special effects. in application scenarios.

结合图2，本申请实施例提供一种可应用于电子设备的透视变换方法，该电子设备可以为上述的服务器。该透视变换方法有关的流程所定义的方法步骤可以由所述电子设备实现。下面将对图2所示的具体流程进行详细阐述。With reference to FIG. 2 , an embodiment of the present application provides a perspective transformation method applicable to an electronic device, where the electronic device may be the above-mentioned server. The method steps defined by the flow related to the perspective transformation method can be implemented by the electronic device. The specific flow shown in FIG. 2 will be described in detail below.

S101，获取拍摄设备所采集的人脸图像，并截取获得所述人脸图像中的人脸区域图像，所述人脸区域图像中包含多个人脸关键点。S101: Acquire a face image collected by a photographing device, and intercept and obtain a face region image in the face image, where the face region image includes a plurality of face key points.

S103，构建虚拟透视相机，根据所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到所述虚拟透视相机的位姿参数。S103, constructing a virtual perspective camera, according to the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image, and the rigid transformation information of the face during perspective transformation, obtain the Describe the pose parameters of the virtual perspective camera.

S105，基于所述位姿参数将各所述人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。S105 , transform the 3D coordinate information of each of the face key points into virtual 3D coordinate information in a virtual perspective camera space based on the pose parameters.

以直播应用场景为例，在主播进行直播时，设置在主播一侧的拍摄设备可采集包含主播的人脸的人脸图像。拍摄设备所采集到的人脸图像可随直播视频流一并传送至服务器。其中，拍摄设备采集到人脸图像可以包括主播的不同角度、不同表情状态下的多张人脸图像。Taking the live broadcast application scenario as an example, when the host broadcasts the live broadcast, the camera device set on the host side can collect a face image including the host's face. The face image collected by the shooting device can be sent to the server together with the live video stream. Wherein, the face images collected by the photographing device may include multiple face images of the host from different angles and different expression states.

拍摄到的人脸图像中除了包含主要的人脸区域之外，可能还包含拍摄时的环境区域图像等，也即背景区域图像。为了便于对图像中的人脸信息进行着重分析，往往会截取出人脸图像中的人脸区域图像，如图3中所示。In addition to the main face area, the captured face image may also include the image of the environment area at the time of shooting, that is, the image of the background area. In order to facilitate the analysis of the face information in the image, the face region image in the face image is often cut out, as shown in FIG. 3 .

对人脸图像进行截取操作以截取出其中包含的人脸区域图像，人脸区域图像中的各个像素点的坐标信息将在原有的人脸图像的坐标系下有所改变。也即，人脸区域图像的图像坐标系相对于人脸图像的图像坐标系有所改变。The face image is cut out to cut out the face area image contained therein, and the coordinate information of each pixel in the face area image will be changed under the coordinate system of the original face image. That is, the image coordinate system of the face region image is changed with respect to the image coordinate system of the face image.

并且，由于图像坐标系的改变，人脸区域图像中的各个像素点的视点也有所改变，意味着需要将人脸区域图像中的各个像素点变换至改变后的视点下的相机空间。因此，本实施例中，需要构建虚拟透视相机，该虚拟透视相机的视点即为基于改变后的视点所建立，也即，虚拟透视相机为视点改变后的可对应三维相机空间的透视相机。需要将人脸区域图像中的像素点的坐标信息透视变换到虚拟透视相机空间内。也即，需要将像素点原本在世界坐标下的坐标信息变换到构建的虚拟透视相机空间内。Moreover, due to the change of the image coordinate system, the viewpoint of each pixel in the face area image also changes, which means that each pixel in the face area image needs to be transformed into the camera space under the changed viewpoint. Therefore, in this embodiment, a virtual perspective camera needs to be constructed, and the viewpoint of the virtual perspective camera is established based on the changed viewpoint, that is, the virtual perspective camera is a perspective camera corresponding to the three-dimensional camera space after the viewpoint is changed. It is necessary to transform the coordinate information of the pixel points in the face area image into the virtual perspective camera space. That is, it is necessary to transform the coordinate information of the pixel points originally in the world coordinates into the constructed virtual perspective camera space.

本实施例中，主要需要进行透视变换处理的像素点为人脸区域图像中的各个人脸关键点。人脸关键点可以是体现人脸轮廓信息、人脸五官信息等的关键点，例如构成脸部轮廓的关键点、构成嘴部形状的关键点、构成眉毛形状的关键点等。In this embodiment, the pixel points that mainly need to perform perspective transformation processing are the key points of each face in the image of the face region. The face key points may be key points that embody face contour information, facial feature information, etc., such as key points constituting the face contour, key points constituting the shape of the mouth, key points constituting the shape of the eyebrows, and the like.

而若想要准确构建虚拟透视相机，则需要确定出虚拟透视相机的位姿参数。虚拟透视相机的位姿参数包括旋转矩阵和平移向量。在虚拟透视相机的位姿参数确定的情况下，即可将虚拟透视相机固定下来。在对人脸关键点进行坐标的透视变换时，可基于确定出的位姿参数进行人脸关键点的透视变换。而虚拟透视相机的位姿参数将影响到人脸关键点透视变换后的变换效果。因此，需要获得可使得人脸关键点在透视变换后具有较佳效果的位姿参数。In order to accurately construct a virtual perspective camera, the pose parameters of the virtual perspective camera need to be determined. The pose parameters of the virtual perspective camera include rotation matrix and translation vector. When the pose parameters of the virtual perspective camera are determined, the virtual perspective camera can be fixed. When the perspective transformation of the coordinates of the face key points is performed, the perspective transformation of the face key points may be performed based on the determined pose parameters. The pose parameters of the virtual perspective camera will affect the transformation effect of the face key point perspective transformation. Therefore, it is necessary to obtain pose parameters that can make the face key points have better effects after perspective transformation.

人脸区域图像中的各个人脸关键点的坐标信息，包含3D坐标信息和2D坐标信息。其中，3D坐标信息为在世界坐标系下的坐标信息，可以理解为视点未发生变化之前的三维相机空间内的坐标信息。而2D坐标信息为在以人脸区域图像构建的图像坐标系下的坐标信息。在进行透视变换后，各个人脸像素点的3D坐标信息和2D坐标信息之间应当是可以精准投影的，也即，重投影的误差应当尽可能的小。其中，重投影误差表示像素点的2D坐标和3D坐标在当前预估的位姿参数下进行投影所得到的位置点之间的误差。在某个位姿参数的虚拟透视相机下，若人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差越小，则表明3D坐标信息和2D坐标信息之间的对齐效果越好。后续基于该位姿参数实现的透视变换的效果越好。Coordinate information of each face key point in the face area image, including 3D coordinate information and 2D coordinate information. The 3D coordinate information is the coordinate information in the world coordinate system, which can be understood as the coordinate information in the three-dimensional camera space before the viewpoint changes. The 2D coordinate information is the coordinate information in the image coordinate system constructed with the face area image. After the perspective transformation, the 3D coordinate information and the 2D coordinate information of each face pixel should be accurately projected, that is, the reprojection error should be as small as possible. The reprojection error represents the error between the 2D coordinates of the pixel points and the position points obtained by projecting the 3D coordinates under the currently estimated pose parameters. Under a virtual perspective camera with a certain pose parameter, if the reprojection error between the 3D coordinate information and the 2D coordinate information of the face key points is smaller, it indicates that the alignment effect between the 3D coordinate information and the 2D coordinate information is better. . The effect of the subsequent perspective transformation based on the pose parameter is better.

此外，本实施例中，还考虑到在透视变换后人脸容易出现扭曲。因此，本实施例中，结合人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息两个维度，来进行虚拟透视相机的位姿参数的优化。从而保障在重投影误差和人脸的刚性变换上均能有较佳的效果。In addition, in this embodiment, it is also considered that the human face is easily distorted after perspective transformation. Therefore, in this embodiment, the pose parameters of the virtual perspective camera are determined by combining the reprojection error between the 3D coordinate information and the 2D coordinate information of the key points of the face, and the rigid transformation information of the face during perspective transformation. Optimization. Therefore, it is guaranteed to have a better effect on the reprojection error and the rigid transformation of the face.

其中，人脸的刚性变换信息体现了人脸关键点变换前后之间的如位置、朝向、形状变换等信息。其中，人脸的刚性变换信息可以由位姿参数本身的数据特征所控制，若位姿参数本身的数据特征呈现的刚性特征越强，则表明变换后人脸的刚性特征越强。而位姿参数本身的刚性特征的体现，例如若位姿参数中旋转矩阵越接近正交矩阵，则表明位姿参数的刚性特性越强，相应地人脸的刚性变换信息的刚性特征越强。Among them, the rigid transformation information of the face reflects the information such as position, orientation, and shape transformation between the key points of the face before and after transformation. Among them, the rigid transformation information of the face can be controlled by the data features of the pose parameters themselves. If the rigid features presented by the data features of the pose parameters themselves are stronger, it indicates that the transformed face has stronger rigid features. And the embodiment of the rigid features of the pose parameters themselves, for example, if the rotation matrix in the pose parameters is closer to the orthogonal matrix, it indicates that the rigid features of the pose parameters are stronger, and the rigid features of the rigid transformation information of the face are correspondingly stronger.

在重投影误差和人脸的刚性变换信息两个指标下进行位姿参数的寻优，在得到虚拟透视相机的位姿参数的情况下，基于位姿参数将各个人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。基于得到的各个人脸关键点的虚拟3D坐标信息，则可以进行人脸的三维信息的构建，从而可以实现如面部跟踪、情感识别等应用处理。The optimization of the pose parameters is carried out under the two indicators of the reprojection error and the rigid transformation information of the face. When the pose parameters of the virtual perspective camera are obtained, the 3D coordinate information of each face key point is calculated based on the pose parameters. Transform to virtual 3D coordinate information in virtual perspective camera space. Based on the obtained virtual 3D coordinate information of key points of each face, the 3D information of the face can be constructed, so that application processing such as face tracking and emotion recognition can be realized.

本实施例所提供的透视变换方法，在进行透视变换时综合考虑了人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及人脸关键点变换时所体现的人脸刚性变换信息，从而可避免变换后人脸变形而出现扭曲，并且可以提高人脸关键点投影的精准性。The perspective transformation method provided in this embodiment comprehensively considers the reprojection error between the 3D coordinate information and the 2D coordinate information of the face key points and the rigidity of the face reflected in the face key point transformation when performing the perspective transformation. The transformation information can avoid the distortion of the face after transformation, and can improve the accuracy of the face key point projection.

本实施例中，对于拍摄设备采集并截取到的人脸区域图像，对于其中的各个人脸关键点可以预先按统一规则设置对应编号，例如，可以利用人脸部分的1220个人脸关键点来综合体现人脸信息。则多个人脸关键点中，各个人脸关键点可以分别对应具有编号1-1220中的其中一个编号。而该1220个人脸关键点可构成人脸网格，该人脸网格可以由多个三角形切片所构成。每个三角形切片由相邻的三个人脸关键点作为顶点，并结合其相互之间的连线所构成，如此，可以构成2304个三角形切片，如图4、5中所示。In this embodiment, for the face area image collected and intercepted by the photographing device, the corresponding numbers of each face key point can be set in advance according to unified rules. For example, 1220 face key points in the face part can be used to synthesize reflect face information. Then, among the multiple face key points, each face key point may correspond to one of the numbers 1-1220 respectively. The 1220 face key points can constitute a face mesh, and the face mesh can be constituted by a plurality of triangle slices. Each triangle slice is composed of three adjacent face key points as vertices, combined with the connection lines between them, so that 2304 triangle slices can be formed, as shown in Figures 4 and 5.

各个人脸关键点在世界坐标系下的3D坐标信息可以记为(x,y,z)，而各个三角形切片可以由其三个顶点分别对应的编号来记录，例如(i,j,k)。The 3D coordinate information of each face key point in the world coordinate system can be recorded as (x, y, z), and each triangle slice can be recorded by the numbers corresponding to its three vertices, such as (i, j, k) .

人脸区域图像中包含的多个人脸关键点的3D坐标信息则可以利用一个N×3的矩阵P_w来表示，其中，N表示人脸关键点的数量。The 3D coordinate information of multiple face key points contained in the face region image can be represented by an N×3 matrix P _w , where N represents the number of face key points.

由上述可知，在透视变换时需要利用到人脸区域图像下的2D坐标信息，因此，本实施例所提供的方法，还包括获得人脸区域图像中各个人脸关键点的2D坐标信息的步骤，请参阅图6，该步骤可以通过以下方式实现：It can be seen from the above that the 2D coordinate information under the face area image needs to be used during perspective transformation. Therefore, the method provided by this embodiment also includes the step of obtaining the 2D coordinate information of each face key point in the face area image. , see Figure 6, this step can be achieved by:

S1021，获得各所述人脸关键点在世界坐标系下的3D坐标信息。S1021, obtain 3D coordinate information of each of the face key points in the world coordinate system.

S1023，根据所述拍摄设备的设备参数，将各所述人脸关键点的3D坐标信息投影为第一图像坐标系下的2D坐标信息，所述第一图像坐标系以所述人脸图像所构建。S1023: Project the 3D coordinate information of each key point of the face into 2D coordinate information in a first image coordinate system according to the equipment parameters of the photographing equipment, and the first image coordinate system is based on the face image. Construct.

S1025，基于截取所述人脸区域图像时的截取信息，对所述2D坐标信息进行更新，得到在以所述人脸区域图像所构建的第二图像坐标系下的2D坐标信息。S1025 , based on the interception information when intercepting the face area image, update the 2D coordinate information to obtain 2D coordinate information in the second image coordinate system constructed with the face area image.

本实施例中，在拍摄设备的设备参数确定的情况下，基于拍摄设备所拍摄得到的人脸图像中的各个人脸关键点在世界坐标系下的3D坐标信息可确定。In this embodiment, when the device parameters of the photographing device are determined, the 3D coordinate information of each face key point in the world coordinate system in the face image obtained by the photographing device can be determined.

对于人脸图像下的各个人脸关键点，其3D坐标信息与2D坐标信息之间具有投影关系，可以基于拍摄设备的设备参数，得到各个人脸关键点的3D坐标信息投影下的2D坐标信息。需要说明的是，此处得到的是以人脸图像所构建的第一图像坐标下的2D坐标信息。得到的多个人脸关键点的2D坐标信息可以用一个N×2的矩阵P_uv来表示。For each face key point in the face image, there is a projection relationship between its 3D coordinate information and 2D coordinate information. Based on the device parameters of the shooting device, the 2D coordinate information under the projection of the 3D coordinate information of each face key point can be obtained. . It should be noted that what is obtained here is the 2D coordinate information under the coordinates of the first image constructed by the face image. The obtained 2D coordinate information of multiple face key points can be represented by an N×2 matrix P _uv .

在从人脸图像中截取出人脸区域图像中，图像坐标系发生了改变，从人脸图像下的第一图像坐标系变为人脸区域图像下的第二图像坐标系。相应地，各个人脸关键点的2D坐标信息也将发生改变，需要变更为第二图像坐标系下的坐标信息。When the face area image is cut out from the face image, the image coordinate system is changed from the first image coordinate system under the face image to the second image coordinate system under the face area image. Correspondingly, the 2D coordinate information of each face key point will also be changed, and needs to be changed to the coordinate information in the second image coordinate system.

本实施例中，可以基于从人脸图像中截取人脸区域图像时的截取信息对各个人脸关键点的2D坐标信息进行更新修订，从而转换为人脸区域图像为基础的第二图像坐标系下的2D坐标信息。其中，截取信息主要是人脸区域图像相对于人脸图像的位置信息，例如人脸区域图像的截取长度、人脸区域图像在第一图像坐标系下的中心点位置等。In this embodiment, the 2D coordinate information of each face key point can be updated and revised based on the interception information when the face area image is intercepted from the face image, so as to be converted into a second image coordinate system based on the face area image 2D coordinate information. The clipping information is mainly the position information of the face region image relative to the face image, such as the clipping length of the face region image, the position of the center point of the face region image in the first image coordinate system, and the like.

本实施例中，在从人脸图像中截取出人脸区域图像，并基于人脸区域图像进行后续的透视转换处理时，对人脸区域图像下的2D坐标信息进行了修订，从而保障了2D坐标信息的准确性。In this embodiment, when the face area image is cut out from the face image, and the subsequent perspective transformation processing is performed based on the face area image, the 2D coordinate information under the face area image is revised, thereby ensuring the 2D coordinate information. The accuracy of the coordinate information.

请参阅图7，本实施例中，在从人脸图像中截取人脸区域图像时，可以通过以下方式实现：Referring to FIG. 7 , in this embodiment, when the face region image is intercepted from the face image, the following methods can be used:

S1011，确定所述人脸图像中构成人脸轮廓的多个人脸关键点。S1011: Determine a plurality of face key points that constitute a face contour in the face image.

S1013，获得所述多个人脸关键点的最小外接框。S1013: Obtain the minimum bounding frame of the multiple face key points.

S1015，基于所述最小外接框对所述人脸图像进行截取，获得人脸区域图像。S1015: Intercept the face image based on the minimum bounding frame to obtain a face region image.

本实施例中，确定出的构成人脸轮廓的多个人脸关键点为图像中处于人脸外围的多个关键点，例如图8中所示的，可以确定出编号1至编号68的68个人脸关键点。而基于该多个人脸关键点可以确定一个最小外接框，该最小外接框为最小的可以将该多个人脸关键点包围在内的外接框。最小外接框的形状可以是长方形、正方形等，主要以人脸关键点构成的人脸轮廓而定。In this embodiment, the determined multiple face key points constituting the face contour are multiple key points located on the periphery of the face in the image. For example, as shown in FIG. 8 , 68 persons numbered 1 to 68 can be determined. face key points. A minimum bounding frame may be determined based on the multiple face key points, and the minimum bounding frame is the smallest bounding frame that can enclose the multiple face key points. The shape of the minimum bounding frame can be a rectangle, a square, etc., which is mainly determined by the face contour formed by the key points of the face.

基于所确定出的最小外接框，可以从人脸图像中截取出该最小外接框所框定的人脸区域，以得到人脸区域图像。Based on the determined minimum bounding frame, the face area framed by the minimum bounding frame may be cut out from the face image to obtain a face area image.

请参阅图9，本实施例中，在基于最小外接框对人脸图像进行截取时，可以采用以下方式实现：Referring to FIG. 9, in this embodiment, when the face image is intercepted based on the minimum bounding frame, the following methods can be used:

S10151，确定所述最小外接框的中心位置。S10151: Determine the center position of the minimum bounding frame.

S10153，基于所述最小外接框的多个角点的位置信息计算得到截取长度。S10153: Calculate the interception length based on the position information of multiple corner points of the minimum bounding frame.

S10155，根据所述中心位置和截取长度确定截取范围，并基于所述截取范围对所述人脸图像进行截取，获得人脸区域图像。S10155: Determine an interception range according to the center position and the interception length, and intercept the face image based on the interception range to obtain a face region image.

本实施例中，最小外接框可以为矩形框，该最小外接框可以利用一个坐标向量(u_tl,v_tl,u_br,v_br)来表示，其中，(u_tl,v_tl)为最小外接框的左上角的角点在人脸图像的图像坐标系下的坐标值，(u_br,v_br)为最小外接框的右下角的角点在人脸图像的图像坐标系下的坐标值。In this embodiment, the minimum bounding box may be a rectangular box, and the minimum bounding box may be represented by a coordinate vector (u _tl , v _tl , u _br , v _br ), where (u _tl , v _tl ) is the minimum bounding box The coordinate value of the corner point of the upper left corner of the frame in the image coordinate system of the face image, (u _br , v _br ) is the coordinate value of the corner point of the lower right corner of the minimum bounding box in the image coordinate system of the face image.

基于最小外接框的角点的位置信息，例如上述的左上角的角点和右下角的角点的坐标值，可得到最小外接框的中心位置，可记为(u_c,v_c)。其中，中心位置的横坐标可为左上角的横坐标和右下角的横坐标之和的一半：u_c＝0.5(u_tl+u_br)。中心位置的纵坐标可为左上角的纵坐标和右下角的纵坐标之和的一半：v_c＝0.5(v_tl+v_br)。Based on the position information of the corner points of the minimum bounding box, such as the above-mentioned coordinate values of the upper left corner and the lower right corner, the center position of the minimum bounding box can be obtained, which can be recorded as (u _c , _vc ). The abscissa of the center position may be half the sum of the abscissa of the upper left corner and the abscissa of the lower right corner: _uc =0.5(u _tl +u _br ). The ordinate of the center position may be half the sum of the ordinate of the upper left corner and the ordinate of the lower right corner: v _c =0.5(v _tl +v _br ).

并且，基于最小外接框的角点的位置信息可计算得到截取长度L，计算公式可如下：In addition, the interception length L can be calculated based on the position information of the corner points of the minimum bounding frame, and the calculation formula can be as follows:

在得到最小外接框的中心位置以及截取长度后可确定截取范围，该截取范围同样为一矩形框。该截取范围的左上角的坐标值和右下角的坐标值分别为

和

可以将截取范围记为一个向量

基于确定出的截取范围可对人脸图像进行截取，获得人脸区域图像。After obtaining the center position of the minimum bounding frame and the clipping length, the clipping range can be determined, and the clipping range is also a rectangular frame. The coordinate values of the upper left corner and the lower right corner of the intercepted range are respectively

and

The intercept range can be recorded as a vector

Based on the determined clipping range, the face image can be clipped to obtain a face region image.

本实施例中，通过上述的确定构成人脸轮廓的最小外接框，并在最小外接框的基础上进行一定变形确定截取范围。可在保障截取出人脸图像中完整的人脸区域的基础上，避免截取过多的背景区域，从而对后续人脸信息处理造成干扰。In this embodiment, the above-mentioned minimum bounding frame constituting the outline of the human face is determined, and a certain deformation is performed on the basis of the minimum bounding frame to determine the interception range. On the basis of ensuring that the complete face area in the face image is intercepted, the interception of too many background areas can be avoided, thereby causing interference to subsequent face information processing.

在按照上述方式截取人脸区域图像的情况下，在对截取后的人脸区域图像中人脸关键点的2D坐标信息进行修订更新时，可以基于最小外接框的截取中心位置以及截取长度进行修订。In the case of intercepting the face area image in the above manner, when revising and updating the 2D coordinate information of the face key points in the intercepted face area image, the revision can be performed based on the interception center position and the interception length of the minimum bounding frame .

在一种可能的实现方式中，可将人脸关键点的2D坐标信息中的横坐标减去截取中心位置的横坐标，并加上截取长度的一半。In a possible implementation manner, the abscissa in the 2D coordinate information of the face key points may be subtracted from the abscissa of the interception center position, and half of the interception length may be added.

此外，可将人脸关键点的2D坐标信息中的纵坐标减去截取中心位置的纵坐标，并加上截取长度的一半。In addition, the ordinate in the 2D coordinate information of the face key point may be subtracted from the ordinate of the clipping center position, and half of the clipping length may be added.

本实施例中，为了便于对人脸区域图像的处理，可以将截取出的人脸区域图像的分辨率统一为设定分辨率，例如PxP，其中P可为120或160等不限。In this embodiment, in order to facilitate the processing of the face region image, the resolution of the cutout face region image may be unified to a set resolution, such as PxP, where P may be 120 or 160 and so on.

相应地，在上述对人脸区域图像中的人脸关键点进行2D坐标信息的修订更新后，可以将2D坐标信息中的横坐标和纵坐标均乘以P/L。Correspondingly, after revising and updating the 2D coordinate information of the face key points in the face region image, the abscissa and the ordinate in the 2D coordinate information may be multiplied by P/L.

在通过以上处理后可获得截取后的人脸区域图像下，各个人脸关键点的3D坐标信息和2D坐标信息。为了将3D坐标信息变化为虚拟透视相机下的虚拟3D坐标信息，本实施例中，采用了3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息作为指标进行变换。After the above processing, the 3D coordinate information and the 2D coordinate information of each key point of the face can be obtained under the image of the face region after being intercepted. In order to change the 3D coordinate information into the virtual 3D coordinate information under the virtual perspective camera, in this embodiment, the reprojection error between the 3D coordinate information and the 2D coordinate information, and the rigid transformation information of the face during perspective transformation are used as indicators Transform.

请参阅图10，本实施例中，上述在确定虚拟透视相机的位姿参数时，可以通过以下方式实现：Referring to FIG. 10 , in this embodiment, the above-mentioned determination of the pose parameters of the virtual perspective camera can be implemented in the following manner:

S1031，构建由所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息的重投影误差构成的第一优化项，并构建由透视变换时人脸的刚性变换信息构成的第二优化项。S1031, constructing a first optimization term composed of the 3D coordinate information of each of the face key points in the face region image and the reprojection error of the 2D coordinate information, and constructing a first optimization term composed of the rigid transformation information of the face during perspective transformation the second optimization term.

S1033，基于设置的权重系数对所述第二优化项赋权值。S1033, assign a weight value to the second optimization item based on the set weight coefficient.

S1035，将所述第一优化项和赋权值后的第二优化项进行相加，构建得到所述优化模型，对所述优化模型进行最小化处理以得到所述虚拟透视相机的位姿参数。S1035, adding the first optimization item and the second optimization item after the weighted value is added to construct the optimization model, and performing a minimization process on the optimization model to obtain the pose parameters of the virtual perspective camera .

本实施例中，在进行虚拟透视相机的位姿参数的优化计算时，考虑了3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息。若3D坐标信息和2D坐标信息之间的重投影误差越小，表明变换后2D关键点投影越精准，若透视变换时人脸呈现的刚性特征越强，则表明人脸关键点之间的相对关系越能保持与变换前一致，也即人脸不会出现变形扭曲的现象。In this embodiment, the reprojection error between the 3D coordinate information and the 2D coordinate information, and the rigid transformation information of the face during perspective transformation are considered when optimizing the pose parameters of the virtual perspective camera. If the reprojection error between the 3D coordinate information and the 2D coordinate information is smaller, it indicates that the projection of the 2D key points after transformation is more accurate. The more the relationship can be kept the same as before the transformation, that is, the face will not be deformed and distorted.

而在综合上述两者进行透视变换处理时，为了控制在刚性变换上的倾向度，因此，可以设置一个权重系数对由刚性变换信息构成的第二优化赋权值。从而可以在保障重投影误差的情况下，基于对权重系数的调节来调整在刚性特征上的关注程度。In order to control the inclination in rigid transformation when the above two are combined to perform perspective transformation processing, a weighting coefficient may be set to give the second optimal weighting value composed of rigid transformation information. Therefore, the degree of attention on rigid features can be adjusted based on the adjustment of the weight coefficients while ensuring the reprojection error.

本实施例中，在基于重投影误差构建第一优化项时，可以通过以下方式实现：In this embodiment, when constructing the first optimization term based on the reprojection error, it can be implemented in the following manner:

构建由所述人脸区域图像中的各人脸关键点的3D坐标信息与所述位姿参数得到的投影项，利用所述人脸区域图像中的各人脸关键点的2D坐标信息减去所述投影项，得到所述第一优化项。Build a projection item obtained from the 3D coordinate information of each face key point in the face area image and the pose parameter, and subtract the 2D coordinate information of each face key point in the face area image by subtracting the The projection term obtains the first optimization term.

其中，虚拟透视相机的位姿参数包括虚拟透视相机的旋转矩阵和平移向量，旋转矩阵可为一个3×3的矩阵，平移向量可为一个维度为3的向量。The pose parameters of the virtual perspective camera include a rotation matrix and a translation vector of the virtual perspective camera, the rotation matrix may be a 3×3 matrix, and the translation vector may be a vector with a dimension of 3.

在构建人脸关键点的3D坐标信息与位姿参数得到的投影项时，相当于将人脸关键点的3D坐标信息转换为虚拟透视相机空间内，再投影为2D坐标信息。利用人脸区域图像中的人脸关键点的2D坐标信息减去该投影的2D坐标信息，即可得到重投影之间的误差。When constructing the projection item obtained from the 3D coordinate information of the face key point and the pose parameters, it is equivalent to converting the 3D coordinate information of the face key point into the virtual perspective camera space, and then projecting it into 2D coordinate information. The error between reprojections can be obtained by subtracting the 2D coordinate information of the projection from the 2D coordinate information of the face key points in the face region image.

此外，在构建由透视变换时人脸的刚性变换信息构成的第二优化项时，可以通过以下方式实现：In addition, when constructing the second optimization term composed of the rigid transformation information of the face during perspective transformation, it can be achieved in the following ways:

将所述旋转矩阵与旋转矩阵的转置矩阵相乘，得到刚性特征信息，利用构建的单位矩阵减去所述相乘后的刚性特征信息，得到所述第二优化项。Multiplying the rotation matrix and the transposed matrix of the rotation matrix to obtain rigid feature information, and subtracting the multiplied rigid feature information from the constructed identity matrix to obtain the second optimization term.

其中，旋转矩阵为一3×3的矩阵，若旋转矩阵越接近于正交矩阵，则基于旋转矩阵执行的变换越呈现出刚性特性。而旋转矩阵的正交特性可以由旋转矩阵与旋转矩阵的转置之间的乘积来体现。因此，可以基于旋转矩阵与旋转矩阵的转置之间的乘积，得到刚性特征信息。The rotation matrix is a 3×3 matrix, and if the rotation matrix is closer to an orthogonal matrix, the transformation performed based on the rotation matrix exhibits more rigid characteristics. The orthogonality of the rotation matrix can be represented by the product between the rotation matrix and the transpose of the rotation matrix. Therefore, rigid feature information can be obtained based on the product between the rotation matrix and the transpose of the rotation matrix.

在旋转矩阵为正交矩阵的情况下，则旋转矩阵与旋转矩阵的转置相乘，可以得到一单位矩阵。因此，可以再利用构建的单位矩阵减去相乘后的结果。若相减的结果越接近于0，则表明旋转矩阵越接近于正交矩阵，则基于旋转矩阵实现的变换其刚性特性越好。When the rotation matrix is an orthogonal matrix, the rotation matrix is multiplied by the transpose of the rotation matrix to obtain an identity matrix. Therefore, the constructed identity matrix can be reused to subtract the multiplied result. If the subtraction result is closer to 0, it indicates that the rotation matrix is closer to the orthogonal matrix, and the rigidity of the transformation based on the rotation matrix is better.

本实施例中，在一种可能的实现方式中，构建的优化项可如下所示：In this embodiment, in a possible implementation manner, the constructed optimization item may be as follows:

其中，

为第一优化项，

为第二优化项。P_uv表示人脸区域图像中的人脸关键点的2D坐标信息，P_w表示3D坐标信息，R表示旋转矩阵，t表示平移向量，Z表示各个人脸关键点在虚拟透视相机空间中z轴的坐标信息，为一个1220维的向量。K表示虚拟透视相机的内参矩阵。|| ||₂表示取L2泛函数，0:2表示取矩阵的前2列元素，w表示权重系数，例如可取值为0.2。I表示单位矩阵，维度为3×3。in,

is the first optimization term,

is the second optimization term. P _uv represents the 2D coordinate information of the face key points in the face area image, P _w represents the 3D coordinate information, R represents the rotation matrix, t represents the translation vector, and Z represents the z-axis of each face key point in the virtual perspective camera space The coordinate information is a 1220-dimensional vector. K represents the intrinsic parameter matrix of the virtual perspective camera. || || ₂ means to take the L2 functional function, 0:2 means to take the first 2 columns of the matrix, w means the weight coefficient, for example, the value can be 0.2. I represents the identity matrix with dimensions 3×3.

在基于优化模型进行最小化处理，以确定虚拟透视变换相机的位姿参数时，可以采用梯度下降法、BFGS方法或者其他优化方法进行优化处理，本实施例对此不作限制。When performing the minimization process based on the optimization model to determine the pose parameters of the virtual perspective transformation camera, the gradient descent method, the BFGS method, or other optimization methods may be used for the optimization process, which is not limited in this embodiment.

在获得虚拟透视变换相机的位姿参数的基础上，可以基于位姿参数将人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。On the basis of obtaining the pose parameters of the virtual perspective transformation camera, the 3D coordinate information of the face key points can be transformed into virtual 3D coordinate information in the virtual perspective camera space based on the pose parameters.

本实施例中，可以利用3D坐标信息乘以得到的旋转矩阵，再加上平移向量，进而实现到虚拟3D坐标信息的变换。本实施例中，可以通过如下所示的计算公式实现变换：In this embodiment, the obtained rotation matrix can be multiplied by the 3D coordinate information, and then the translation vector can be added, so as to realize the transformation to the virtual 3D coordinate information. In this embodiment, the transformation can be realized by the following calculation formula:

P_cam＝R·P_w+tP _cam =R·P _w +t

其中，P_cam表示变换后的虚拟3D坐标信息，同样地，可以为一个1220×3维的矩阵，矩阵中的每一行表示一个人脸关键点的3维坐标信息。Among them, P _cam represents the transformed virtual 3D coordinate information, and similarly, it can be a 1220×3-dimensional matrix, and each row in the matrix represents the 3-dimensional coordinate information of a face key point.

在基于人脸的3维信息进行如人脸跟踪、人脸特效处理时，通常需要基于人脸网格相关信息实现。因此，在获得人脸关键点在虚拟透视变换相机空间内的虚拟3D坐标信息的情况下，还可以具有以下步骤，请结合参阅图11：When performing face tracking and face special effects processing based on the 3-dimensional information of the face, it usually needs to be implemented based on the relevant information of the face mesh. Therefore, in the case of obtaining the virtual 3D coordinate information of the face key points in the virtual perspective transformation camera space, the following steps may also be performed, please refer to FIG. 11 in conjunction:

S107，基于所述多个人脸关键点构建得到人脸网格。S107, constructing a face grid based on the plurality of face key points.

S109，根据各所述人脸关键点的虚拟3D坐标信息得到所述人脸网格的网格信息。S109, obtaining grid information of the face grid according to the virtual 3D coordinate information of each of the face key points.

本实施例中，构建的人脸网格为如上述的由多个人脸关键点，以及由各个人脸关键点作为顶点形成的三角形切片所构成的人脸网格，如图4、5中所示。In this embodiment, the constructed face mesh is the above-mentioned face mesh composed of multiple face key points and triangular slices formed by each face key point as vertices, as shown in Figures 4 and 5 Show.

基于各个人脸关键点的虚拟3D坐标信息，即可得到人脸网格的网格信息。从而可以基于人脸网格的网格信息，实现人脸跟踪、人脸特效处理等。Based on the virtual 3D coordinate information of each face key point, the mesh information of the face mesh can be obtained. Therefore, based on the grid information of the face grid, face tracking, face special effect processing, etc. can be realized.

本实施例所提供的透视变换方法，综合考虑3D坐标信息与2D坐标信息之间的重投影误差，以及变换时人脸的刚性变换信息，通过最小化的优化处理的方式，使得透视变换后可以达到降低重投影误差以及提高人脸的刚性变换特征的效果。可以避免人脸变换变形，并且可以提高关键点投影精准性。The perspective transformation method provided in this embodiment comprehensively considers the reprojection error between the 3D coordinate information and the 2D coordinate information, and the rigid transformation information of the face during transformation. To achieve the effect of reducing the reprojection error and improving the rigid transformation characteristics of the face. It can avoid face transformation and deformation, and can improve the accuracy of key point projection.

请参阅图12，为本申请实施例提供的电子设备的示例性组件示意图，该电子设备可为图1中所示的服务器。该电子设备可包括存储介质110、处理器120、透视变换装置130及通信接口140。本实施例中，存储介质110与处理器120均位于电子设备中且二者分离设置。然而，应当理解的是，存储介质110也可以是独立于电子设备之外，且可以由处理器120通过总线接口来访问。可替换地，存储介质110也可以集成到处理器120中，例如，可以是高速缓存和/或通用寄存器。Please refer to FIG. 12 , which is a schematic diagram of an exemplary component of an electronic device provided in an embodiment of the present application, and the electronic device may be the server shown in FIG. 1 . The electronic device may include a storage medium 110 , a processor 120 , a perspective transforming device 130 and a communication interface 140 . In this embodiment, the storage medium 110 and the processor 120 are both located in the electronic device and are provided separately. However, it should be understood that the storage medium 110 may also be independent of the electronic device, and may be accessed by the processor 120 through a bus interface. Alternatively, the storage medium 110 may also be integrated into the processor 120, for example, may be a cache and/or a general purpose register.

透视变换装置130可以理解为上述电子设备，或电子设备的处理器120，也可以理解为独立于上述电子设备或处理器120之外的在电子设备控制下实现上述透视变换方法的软件功能模块。The perspective transformation device 130 can be understood as the above electronic device or the processor 120 of the electronic device, and can also be understood as a software function module independent of the above electronic device or processor 120 that implements the above perspective transformation method under the control of the electronic device.

如图13所示，上述透视变换装置130可以包括获取模块131、确定模块132和变换模块133。下面分别对该透视变换装置130的各个功能模块的功能进行详细阐述。As shown in FIG. 13 , the above perspective transformation apparatus 130 may include an acquisition module 131 , a determination module 132 and a transformation module 133 . The functions of each functional module of the perspective transformation device 130 will be described in detail below.

获取模块131，获取拍摄设备所采集的人脸图像，并截取获得所述人脸图像中的人脸区域图像，所述人脸图像中包含多个人脸关键点。The acquiring module 131 acquires a face image collected by the photographing device, and intercepts and obtains a face region image in the face image, where the face image includes a plurality of face key points.

可以理解，该获取模块131可以用于执行上述步骤S101，关于该获取模块131的详细实现方式可以参照上述对步骤S101有关的内容。It can be understood that the obtaining module 131 may be configured to execute the above-mentioned step S101, and for the detailed implementation of the obtaining module 131, reference may be made to the above-mentioned content related to the step S101.

确定模块132，用于构建虚拟透视相机，根据所述人脸区域图像中各所述人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到所述虚拟透视相机的位姿参数。The determination module 132 is used to construct a virtual perspective camera, according to the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image, and the rigid transformation of the face during perspective transformation information to obtain the pose parameters of the virtual perspective camera.

可以理解，该确定模块132可以用于执行上述步骤S103，关于该确定模块132的详细实现方式可以参照上述对步骤S103有关的内容。It can be understood that the determining module 132 may be configured to execute the above-mentioned step S103, and for the detailed implementation of the determining module 132, reference may be made to the above-mentioned content related to the step S103.

变换模块133，用于基于所述位姿参数将各所述人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。The transformation module 133 is configured to transform the 3D coordinate information of each of the face key points into virtual 3D coordinate information in the virtual perspective camera space based on the pose parameter.

可以理解，该变换模块133可以用于执行上述步骤S105，关于该变换模块133的详细实现方式可以参照上述对步骤S105有关的内容。It can be understood that the transformation module 133 can be used to execute the above-mentioned step S105, and for the detailed implementation of the transformation module 133, please refer to the above-mentioned content related to the step S105.

在一种可能的实施方式中，上述获取模块131可以用于：In a possible implementation manner, the above acquisition module 131 may be used for:

在一种可能的实施方式中，所述透视变换装置130获得所述人脸区域图像中各所述人脸关键点的2D坐标信息的获得模块，该获得模块可以用于：In a possible implementation manner, the perspective transformation device 130 obtains a module for obtaining the 2D coordinate information of each of the key points of the face in the face region image, and the obtaining module can be used for:

在一种可能的实施方式中，所述截取信息包括截取长度和截取中心位置，上述获得模块可以用于：In a possible implementation manner, the clipping information includes clipping length and clipping center position, and the above obtaining module can be used for:

在一种可能的实施方式中，上述变换模块133可以用于：In a possible implementation manner, the above-mentioned transformation module 133 may be used for:

在一种可能的实施方式中，所述透视变换装置还包括构建模块，该构建模块可以用于：In a possible implementation, the perspective transformation device further includes a building module, which can be used for:

关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明，这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

进一步地，本申请实施例还提供一种计算机可读存储介质，计算机可读存储介质存储有机器可执行指令，机器可执行指令被执行时实现上述实施例提供的透视变换方法。Further, the embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are executed, the perspective transformation method provided by the foregoing embodiments is implemented.

具体地，该计算机可读存储介质能够为通用的存储介质，如移动磁盘、硬盘等，该计算机可读存储介质上的计算机程序被运行时，能够执行上述透视变换方法。关于计算机可读存储介质中的及其可执行指令被运行时，所涉及的过程，可以参照上述方法实施例中的相关说明，这里不再详述。Specifically, the computer-readable storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, etc., when the computer program on the computer-readable storage medium is executed, the above-mentioned perspective transformation method can be executed. For the processes involved when the computer-readable storage medium and its executable instructions are executed, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

综上所述，本申请实施例提供的透视变换方法、装置、电子设备和可读存储介质，针对拍摄设备所采集的人脸图像，截取获得人脸图像中的人脸区域图像，人脸区域图像中包含多个人脸关键点。构建虚拟透视相机，根据人脸区域图像中各人脸关键点的3D坐标信息和2D坐标信息之间的重投影误差，以及透视变换时人脸的刚性变换信息，得到虚拟透视相机的位姿参数。最后基于位姿参数将各人脸关键点的3D坐标信息变换为虚拟透视相机空间内的虚拟3D坐标信息。该方案中，在透视变换中综合考虑了3D点和2D之间的重投影误差和刚性变换信息，可在避免人脸变换变形以及提高关键点投影精准性上达到良好效果。To sum up, the perspective transformation method, device, electronic device, and readable storage medium provided by the embodiments of the present application, for the face image collected by the photographing device, intercept the face region image in the face image, and the face region image is obtained. The image contains multiple face keypoints. A virtual perspective camera is constructed, and the pose parameters of the virtual perspective camera are obtained according to the reprojection error between the 3D coordinate information and 2D coordinate information of each face key point in the face area image, and the rigid transformation information of the face during perspective transformation. . Finally, based on the pose parameters, the 3D coordinate information of each face key point is transformed into the virtual 3D coordinate information in the virtual perspective camera space. In this scheme, the reprojection error and rigid transformation information between 3D points and 2D points are comprehensively considered in perspective transformation, which can achieve good results in avoiding face transformation and deformation and improving the accuracy of key point projection.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，附图中的流程图和框图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分，所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现方式中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architectures, functions and possible implementations of apparatuses, methods and computer program products according to various embodiments of the present application. operate. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.

另外，在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分，也可以是各个模块单独存在，也可以两个或两个以上模块集成形成一个独立的部分。In addition, each functional module in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.

所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software function modules and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

以上所述仅为本申请的优选实施例而已，并不用于限制本申请，对于本领域的技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, the present application may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the protection scope of this application.

Claims

1. A perspective transformation method, wherein the method comprises:

acquiring a face image collected by a photographing device, and intercepting and obtaining a face area image in the face image, where the face area image includes a plurality of face key points;

Build a virtual perspective camera, according to the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image, and the rigid transformation information of the face during perspective transformation, obtain the virtual The pose parameters of the perspective camera;

Based on the pose parameters, the 3D coordinate information of each of the face key points is transformed into virtual 3D coordinate information in the virtual perspective camera space.

2. perspective transformation method according to claim 1, is characterized in that, described intercepting the step of obtaining the face area image in described face image, comprising:

Determine a plurality of face key points that constitute the face contour in the face image;

obtaining the minimum bounding frame of the multiple face key points;

The face image is intercepted based on the minimum bounding frame to obtain a face region image.

3. The perspective transformation method according to claim 2, wherein the step of intercepting the face image based on the minimum bounding frame to obtain a face region image comprises:

determining the center position of the minimum bounding box;

The interception length is calculated based on the position information of multiple corner points of the minimum bounding frame;

An interception range is determined according to the center position and the interception length, and the face image is intercepted based on the interception range to obtain a face area image.

4. The perspective transformation method according to claim 2, wherein the method further comprises the step of obtaining the 2D coordinate information of each of the key points of the face in the face region image, the step comprising:

Obtain the 3D coordinate information of each of the face key points in the world coordinate system;

According to the device parameters of the photographing device, project the 3D coordinate information of each of the face key points as 2D coordinate information under a first image coordinate system, and the first image coordinate system is constructed with the face image;

The 2D coordinate information is updated based on the interception information when the face area image is intercepted to obtain 2D coordinate information in the second image coordinate system constructed with the face area image.

5. The perspective transformation method according to claim 4, wherein the clipping information comprises clipping length and clipping center position;

The step of updating the 2D coordinate information based on the interception information when intercepting the face region image, includes:

Subtract the abscissa of the interception center position from the abscissa in the 2D coordinate information, and add half of the interception length;

The ordinate of the clipping center position is subtracted from the ordinate in the 2D coordinate information, and the half of the clipping length is added.

6. The perspective transformation method according to claim 1, wherein the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face region image, and The rigid transformation information of the face during perspective transformation, and the steps of obtaining the pose parameters of the virtual perspective camera, including:

Constructing the first optimization term consisting of the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face region image, and constructing the rigid transformation information of the face during perspective transformation. The second optimization term of ;

assigning a weight value to the second optimization term based on the set weight coefficient;

The first optimization term and the weighted second optimization term are added to construct an optimized model, and the optimized model is minimized to obtain the pose parameters of the virtual perspective camera.

7. The perspective transformation method according to claim 6, wherein the construction is composed of the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image The steps of the first optimization term include:

constructing a projection item obtained from the 3D coordinate information of each face key point in the face region image and the pose parameter;

The first optimization term is obtained by subtracting the projection item from the 2D coordinate information of each face key point in the face region image.

8. The perspective transformation method according to claim 6, wherein the pose parameter comprises a rotation matrix, and the step of constructing the second optimization item formed by the rigid transformation information of the face during perspective transformation comprises:

Multiplying the rotation matrix by the transpose matrix of the rotation matrix to obtain rigid feature information;

The rigid feature information is subtracted from the constructed identity matrix to obtain the second optimization term.

9. The perspective transformation method according to any one of claims 1-8, wherein the method further comprises:

Constructing a face grid based on the plurality of face key points;

The grid information of the face grid is obtained according to the virtual 3D coordinate information of each of the face key points.

10. A perspective transformation device, characterized in that the device comprises:

an acquisition module, which acquires a face image collected by a photographing device, and intercepts and obtains a face region image in the face image, where the face image includes a plurality of face key points;

A determination module for constructing a virtual perspective camera, based on the reprojection error between the 3D coordinate information and the 2D coordinate information of each of the face key points in the face area image, and the rigid transformation information of the face during perspective transformation , obtain the pose parameters of the virtual perspective camera;

A transformation module, configured to transform the 3D coordinate information of each of the face key points into virtual 3D coordinate information in the virtual perspective camera space based on the pose parameters.

11. An electronic device, characterized by comprising one or more storage media and one or more processors in communication with the storage media, wherein the one or more storage media stores machine-executable instructions executable by the processor, when When the electronic device is running, the processor executes the machine-executable instructions to perform the method steps of any one of claims 1-9.

12. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are executed, implement the method described in any one of claims 1-9. method steps.