CN115457171A

CN115457171A - Efficient expression migration method adopting base expression space transformation

Info

Publication number: CN115457171A
Application number: CN202210990361.XA
Authority: CN
Inventors: 翁冬冬; 杜秋欣; 涂子奇
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-12-09

Abstract

The invention provides a high-efficiency expression migration method adopting base expression space transformation, which comprises the following steps: a preprocessing stage and a facial expression migration stage; the pre-processing stage comprises: acquiring a source model O, wherein the source model O is a basic expression model and comprises expression grids under various different expressions; performing facial expression reconstruction on the source model O and the target model T to obtain expression description parameters of the source model O and the target model T; similarity estimation is carried out according to the expression description parameters to obtain an expression parameter conversion matrix and a similarity matrix; the facial expression migration stage comprises: and inputting the role picture into the base expression model, and calculating to obtain expression description parameters of the target expression model according to the expression description parameters in the base expression model and the similarity matrix to finish expression migration.

Description

An efficient expression transfer method using base expression space transformation

技术领域technical field

本发明涉及计算机视觉，具体涉及一种采用基表情空间变换的高效表情迁移方法。The invention relates to computer vision, in particular to an efficient expression transfer method using basic expression space transformation.

背景技术Background technique

传统的方法将面部表情从源视频中的演员实时转移到目标视频中的演员，从而实现对目标演员面部表情的临时控制，例如论文1：《Real-time expression transfer forfacial reenactment》中经常将面部变形和细节的转移和照片级真实感重新渲染到目标视频中，使新合成的表情与真实视频几乎没有区别。为了实现这一目标，使用商用RGB-D传感器实时准确地捕获源和目标受试者的面部表现。对于每一帧，将用于标识、表情和皮肤反射率的参数化模型与输入颜色和深度数据相结合，并重建场景照明。对于表达式传输，计算参数空间中源表达式和目标表达式之间的差异，并修改目标参数以匹配源表达式。一个主要挑战是将合成的目标面令人信服地重新渲染到相应的视频流中。这需要仔细考虑照明和遮阳设计，这两者都必须与现实世界的环境相对应。作者在实时设置中演示了方法，该方法可以修改视频会议源，以便实时匹配不同人(例如翻译器)的面部表情。The traditional method transfers the facial expression from the actor in the source video to the actor in the target video in real time, so as to achieve temporary control of the facial expression of the target actor, such as paper 1: "Real-time expression transfer forfacial reenactment" often deforms the face and detail transfer and photorealistic re-rendering into the target video, making the newly synthesized expressions almost indistinguishable from the real video. To achieve this, a commercial RGB-D sensor is used to accurately capture the facial representations of source and target subjects in real time. For each frame, parametric models for logotype, expression, and skin albedo are combined with input color and depth data, and scene lighting is reconstructed. For expression transfer, the difference between the source and target expressions in parameter space is computed, and the target parameters are modified to match the source expression. A major challenge is to convincingly re-render the synthesized target surface into the corresponding video stream. This requires careful consideration of lighting and shading design, both of which must correspond to real-world environments. The authors demonstrate methods in a real-time setting that modify a video conference feed to match the facial expressions of different people (e.g. translators) in real time.

该方法缺点在于：该方法是通过在参数空间计算两个表情之间的差，然后修改目标的表情参数来匹配源表情，该方法假设兰伯特表面反射率和平滑变化的照明，这是由球面谐波参数化的，这些可能会导致一般环境中的伪影(例如，具有强烈的地下散射，高频照明变化，或自阴影)。其次，本方法的跟踪器使用了密集的深度和颜色信息，这允许紧密拟合，但也会导致大量的残差。The disadvantage of this method is that the method calculates the difference between two expressions in the parameter space, and then modifies the expression parameters of the target to match the source expression. Parameterized by spherical harmonics, these can cause artifacts in general environments (e.g., with strong subsurface scattering, high-frequency lighting variations, or self-shadowing). Second, our approach's tracker uses dense depth and color information, which allows tight fitting but also leads to large residuals.

论文2：《Face2Face:Real-time Face Capture and Reenactment of RGBVideos》则提出了一种用于实时重演单目目标视频序列的方法(例如，Youtube视频)。源序列也是一个单目视频流，使用商品网络摄像头实时捕获。本文的目标是由源演员对目标视频的面部表情进行动画处理，并以照片般逼真的方式重新渲染操纵的输出视频。为此，首先通过非刚性基于模型的捆绑来解决从单目视频中恢复面部身份的不足问题。在运行时，使用密集的光度一致性测量来跟踪源视频和目标视频的面部表情。然后通过在源和目标之间快速有效地传递变形来实现重演。从靶标序列中检索最匹配的重新靶向表达的口腔内部并翘曲以产生准确的拟合。最后，令人信服地在相应的视频流之上重新渲染合成的目标面，使其与现实世界的照明无缝融合。Paper 2: "Face2Face: Real-time Face Capture and Reenactment of RGBVideos" proposes a method for replaying monocular target video sequences in real time (for example, Youtube videos). The source sequence is also a monocular video stream, captured in real time using a commodity webcam. The goal of this paper is to animate the facial expressions of a target video by a source actor and re-render the manipulated output video in a photorealistic manner. To this end, the insufficient problem of facial identity recovery from monocular videos is first addressed by non-rigid model-based binding. At runtime, the facial expressions of the source and target videos are tracked using dense photometric coherence measurements. Replay is then achieved by transferring deformations quickly and efficiently between source and target. The closest match to the retargeted expressed buccal interior was retrieved from the target sequence and warped to produce an accurate fit. Finally, the composited target surface is convincingly re-rendered on top of the corresponding video stream so that it blends seamlessly with real-world lighting.

该方法存在以下缺点：该方法对于被长发和胡子遮住脸的场景具有挑战性。此外，该方法只重建和跟踪一个低维的弯曲形状模型(76个表达式系数)，它忽略了精细尺度的静态和瞬态表面细节，在一个太短的序列中，或者当目标保持静止时，我们无法学习特定的口腔行为。在这种情况下，由于检索到的嘴样本的目标空间太稀疏，可以观察到时间混叠。This method suffers from the following disadvantages: The method is challenging for scenes where the face is hidden by long hair and beard. Furthermore, the method only reconstructs and tracks a low-dimensional curved shape model (76 expression coefficients), which ignores fine-scale static and transient surface details, in a sequence that is too short, or when the target remains stationary , we cannot learn specific oral behaviors. In this case, temporal aliasing can be observed due to the fact that the object space of the retrieved mouth samples is too sparse.

在表情迁移的过程中，核心问题是怎样对两组表情建立映射，也就是评估表情相似性。目前进行相似性估计的大部分方法，都是通过网格模型的顶点或者三角面的空间位置姿态来确定的，这就需要寻找模型对应的顶点或者三角面。进行相似性估计的方法目前主要有基于图像中特征点，基于模型顶点，和基于三角面的方法。基于特征点或模型顶点的方法基本上采用比较点的偏移方向与大小进行相似性估计。基于三角面的是通过寻找模型对应的三角面并且变换对应三角面的方向实现生成相似性表情，不过这种方法在目标人脸与源人脸的形状相差比较大的情况下效果不太好。In the process of expression transfer, the core issue is how to establish a mapping for two sets of expressions, that is, to evaluate the similarity of expressions. Most of the current similarity estimation methods are determined by the spatial position and attitude of the vertices or triangular faces of the mesh model, which requires finding the corresponding vertices or triangular faces of the model. At present, the methods for similarity estimation mainly include methods based on feature points in the image, based on model vertices, and based on triangular faces. Methods based on feature points or model vertices basically use the offset direction and size of comparison points for similarity estimation. Based on the triangular face, the similarity expression is generated by finding the corresponding triangular face of the model and transforming the direction of the corresponding triangular face. However, this method does not work well when the shape of the target face and the source face are relatively different.

因此，如何实现一种高效率、超逼真、可泛用的面部表情迁移方法是本发明亟待解决的问题。Therefore, how to realize a high-efficiency, ultra-realistic, and universally applicable facial expression transfer method is an urgent problem to be solved in the present invention.

发明内容Contents of the invention

有鉴于此，本发明提供了一种采用基表情空间变换的高效表情迁移方法，包括：预处理阶段和面部表情迁移阶段；所述预处理阶段包括：In view of this, the present invention provides a kind of efficient expression transfer method that adopts base expression space transformation, comprises: preprocessing stage and facial expression transfer stage; Said preprocessing stage includes:

获取源模型O，所述源模型O是一种基表情模型，其中包括多种不同表情下的表情网格；Obtaining the source model O, the source model O is a base expression model, which includes expression grids under multiple different expressions;

对源模型O和对目标模型T进行面容表情重建，得到二者的表情描述参数；根据表情描述参数进行相似性估计，得到表情参数转换矩阵和相似性矩阵；Perform facial expression reconstruction on the source model O and the target model T to obtain the expression description parameters of the two; perform similarity estimation according to the expression description parameters to obtain the expression parameter conversion matrix and similarity matrix;

所述面部表情迁移阶段包括：将角色图片输入基表情模型，根据基表情模型中的表情描述参数和所述相似性矩阵，计算得到目标表情模型的表情描述参数，完成表情迁移。The facial expression transfer stage includes: inputting character pictures into the base expression model, calculating the expression description parameters of the target expression model according to the expression description parameters in the base expression model and the similarity matrix, and completing the expression transfer.

特别地，对面容表情进行重建包括：定义一组人脸特征点，在三维模型中找到对应的顶点序号；对于任意一张输入的人脸表情照片，检测出定义的特征点的二维坐标，建立最优化方程，使得模型中的特征点投影到平面上的位置与图像中2D点的位置距离最小；其中所述最优化方程采用如下表示：In particular, the facial expression reconstruction includes: defining a set of facial feature points, finding the corresponding vertex serial number in the 3D model; for any input facial expression photo, detecting the two-dimensional coordinates of the defined feature points, An optimization equation is established so that the position of the feature point in the model projected onto the plane is the smallest distance from the position of the 2D point in the image; wherein the optimization equation is expressed as follows:

E＝∑||P(x_v)-p_v||₂ E＝∑||P(x _v )-p _v || ₂

其中P为投影矩阵，x_v为第v个面部特征点的三维位置，由所述最优化方程可以求得，p_v是对应的第v个特征点在二维平面上的投影坐标。Wherein P is a projection matrix, xv is the three-dimensional position of the _vth facial feature point, which can be obtained from the optimization equation, and pv is the projection coordinate of the corresponding _vth feature point on a two-dimensional plane.

特别地，所述源模型O是选择一个无表情的网格模型作为中立基准，其余表情相对于中立基准的偏移构成了不同表情的叠加效果，所述偏移构成了不同面容表情的表情描述参数。In particular, the source model O selects an expressionless grid model as a neutral reference, and the offsets of other expressions relative to the neutral reference form the superposition effect of different expressions, and the offsets constitute the expression descriptions of different facial expressions parameter.

特别地，对于第i个面片，在不同表情下，特征点相对于中立表情的偏移为δ_i，作为当前表情的描述，那么源模型O和目标模型T的表情差距由以下公式表示：In particular, for the _i -th patch, under different expressions, the offset of the feature point relative to the neutral expression is δi, which is used as the description of the current expression, then the expression gap between the source model O and the target model T is expressed by the following formula:

其中E_Ci约束了源模型O和目标模型T之间特征点偏移的方向；E_Di则约束了源模型O和目标模型T之间特征点偏移的距离；Among them, E _Ci constrains the direction of feature point offset between source model O and target model T; E _Di constrains the distance of feature point offset between source model O and target model T;

δ_Oi为源模型O的第i个面片的表情描述，δ_Ti为模型T的第i个面片的表情描述；δ_Di，v则是进行了变型传递之后的第i个面片中第v个顶点的偏移，V_i为第i个面片的所有顶点。δ _Oi is the expression description of the i-th patch of the source model O _, δ _Ti is the expression description of the i-th patch of the model T; The offset of v vertices, V _i is all vertices of the i-th patch.

特别地，每个面片的表情相似性估计的优化方程如下：In particular, the optimization equation for expression similarity estimation for each patch is as follows:

E_Si＝λ_M*E_Di*E_Ci+λ_D*E_Di+λ_C*E_Ci E _Si ＝λ _M *E _Di *E _Ci +λ _D *E _Di +λ _C *E _Ci

其中E_Ci为源模型O和目标模型T的表情差距，E_Di是添加的一个参考形状作为第二项约束来获取准确的面部相似性估计，λ_M、λ_D、λ_C为权重值。Where E _Ci is the expression gap between the source model O and the target model T, E _Di is a reference shape added as the second constraint to obtain accurate facial similarity estimation, and λ _M , λ _D , and λ _C are weight values.

通过对所有面片求解上述方程，针对源模型O中的每一个表情，都可以获得对应的一个目标模型T的相似表情，所有的相似表情，可以组成一组等效混合变形的表情参数转换矩阵，该等效混合变形的表情参数转换矩阵对应的目标模型T的表情参数，对于每个面片的混合变形参数，组成了一个相似性矩阵S，通过该相似性矩阵，可以将任意一个源模型O的表情参数α_O转化成对应的模型T相似表情的表情参数α_T＝α_O*S。By solving the above equations for all the faces, for each expression in the source model O, a corresponding similar expression of a target model T can be obtained, and all similar expressions can form a set of expression parameter transformation matrix equivalent to mixed deformation , the expression parameter transformation matrix of the equivalent blend shape corresponds to the expression parameters of the target model T. For the blend shape parameters of each patch, a similarity matrix S is formed. Through this similarity matrix, any source model can be The expression parameter α _O of O is transformed into the expression parameter α _T =α _O *S of the corresponding similar expression of the model T.

特别地，在优化面片和面片之间的平滑过渡后，得到表情相似性估计的优化方程如下：In particular, after optimizing the smooth transition between patches and patches, the optimization equation for expression similarity estimation is obtained as follows:

E_E＝E_R+E_O E _E =E _R +E _O

其中E_R为重投影误差优化方程，L_i是第i个面片上的所有特征点，x_v，i为第i个面片上的第v个特征点由上面的局部混合变形模型的公式定义，p_v，i为该特征点对应的图像中的二维坐标，P_i为第i个面片的投影矩阵，λ_R为E_R的权重值；Where E _R is the reprojection error optimization equation, L _i is all the feature points on the i-th patch, x _{v, i} is the v-th feature point on the i-th patch is defined by the formula of the local mixed deformation model above, p _{v, i} is the two-dimensional coordinates in the image corresponding to the feature point, P _i is the projection matrix of the i-th patch, and λ _R is the weight value of E _R ;

E_O为约束相邻面片的形状的方程，其中v∈S为相邻的重叠区域的顶点，E_O约束了第i个面片与相邻的第j个面片之间重叠部分的点的距离尽量接近。E _O is an equation that constrains the shape of adjacent patches, where v∈S is the vertex of the adjacent overlapping area, and E _O constrains the point of the overlapping part between the i-th patch and the adjacent j-th patch distance as close as possible.

有益效果：Beneficial effect:

1、首先，本发明通过基表情的选取，有效的缩短了后期相似性估计所需的时间，具有高效率，高逼真，普适化等特点；1. First of all, the present invention effectively shortens the time required for similarity estimation in the later stage through the selection of base expressions, and has the characteristics of high efficiency, high fidelity, and universalization;

2、其次，通过参数转换矩阵，保证了目前模型脸部的动态真实感，具有高保真度；2. Secondly, through the parameter conversion matrix, the dynamic realism of the current model face is guaranteed, with high fidelity;

3、再次，由于采用的相似性估计和参数转换矩阵，使得整个过程具有很强的可泛用性。不同年龄、性别、人种的用户，均可以实时高逼真的实现表情迁移；3. Again, due to the similarity estimation and parameter conversion matrix adopted, the whole process has strong general applicability. Users of different ages, genders, and races can realize expression transfer in real time and with high fidelity;

4、最后，本发明中优化了面片和面片之间的平滑过渡过程，使得表情迁移更加逼真。4. Finally, the present invention optimizes the smooth transition process between patches, making expression transfer more realistic.

附图说明Description of drawings

图1为本发明采用基表情空间变换的高效表情迁移方法的流程图；Fig. 1 is the flow chart of the efficient expression transfer method that adopts basic expression space transformation of the present invention;

图2为本发明中人脸模型分割以及用于进行相似性估计的特征点的示意图。Fig. 2 is a schematic diagram of face model segmentation and feature points used for similarity estimation in the present invention.

具体实施方式detailed description

下面结合附图并举实施例，对本发明进行详细描述。The present invention will be described in detail below with reference to the accompanying drawings and examples.

本发明提供了一种采用基表情空间变换的高效表情迁移方法The invention provides an efficient expression transfer method using base expression space transformation

本发明的流程如图1所示，本发明提出的从2D照片到3D模型的表情迁移方法，通过两个部分实现。首先是预处理部分，如图1的下半部分，包括：步骤1：获取源模型O，所述源模型O是一种基表情模型，其中包括多种不同表情下的表情网格；The process flow of the present invention is shown in Figure 1. The expression transfer method from 2D photos to 3D models proposed by the present invention is realized through two parts. First is the preprocessing part, as shown in the lower part of Figure 1, including: Step 1: Obtain the source model O, the source model O is a base expression model, which includes expression grids under multiple different expressions;

本发明中源模型采用的基表情模型，例如Blendshape是通过获取不同的面部表情模型，并构建成为具有相同拓扑结构的一系列网格模型，使用权重插值的方式，将不同表情按照权重进行插值融合，从而实现人脸模型在多个不同的表情之间进行变换。一般情况下，每个Blendshape都用来表达面部的一个区域的变换情况，融合的时候每个Blendshape不会干扰其他的Blendshape的变形，获取每个Blendshape的权重也只需要关注对应局部区域的变化情况就可以了。当前构建Blendshape的主流方式是参考FACS系统，对于其中的46个表情进行建模，作为Blendshape的基表情。The basic expression model adopted by the source model in the present invention, such as Blendshape, obtains different facial expression models and builds them into a series of grid models with the same topological structure, and uses weight interpolation to interpolate and fuse different expressions according to weights , so that the face model can be transformed between multiple different expressions. In general, each Blendshape is used to express the transformation of an area of the face. During fusion, each Blendshape will not interfere with the deformation of other Blendshapes. To obtain the weight of each Blendshape, you only need to pay attention to the changes in the corresponding local area. That's it. The current mainstream way to build Blendshape is to refer to the FACS system and model 46 expressions as the base expressions of Blendshape.

Blendshape一般有两种形式，一种是绝对Blendshape(Absolute BlendshapeModel)，将所有表情按照一定权重相加，通过调节权重实现在不同表情之间进行变形，如公式(1)所述，假设除了中立表情以外一共有K个表情，其中α_k是第k个表情的权重参数，b_k是第k个基表情的形状，e是加权融合后的表情形状。但是这种方式只能处理不同表情的融合，当需要多种表情同时存在的时候则无法出现正确的效果；另外一种是偏移(差值，补偿)(Delta Blendshape Model)，通过选择一个无表情的网格模型作为中立，其余表情相对于中立的偏移(差值)构成了Blendshape，偏移Blendshape能够直观的反应出不同表情的叠加效果，由公式(2)描述，其中u＝b₀为选择的中立表情，d_k＝b_k-b₀为第k个表情相对于中立表情的偏移量。这种方法应用比较广泛。本发明中采用的是偏移Blendshape。Blendshape generally has two forms, one is Absolute Blendshape (Absolute BlendshapeModel), which adds all expressions according to a certain weight, and realizes deformation between different expressions by adjusting the weight, as described in formula (1), assuming that in addition to neutral expressions There are a total of K expressions, where α _k is the weight parameter of the k-th expression, b _k is the shape of the k-th base expression, and e is the weighted and fused expression shape. But this method can only deal with the fusion of different expressions. When multiple expressions need to exist at the same time, the correct effect cannot appear; the other is offset (difference, compensation) (Delta Blendshape Model), by selecting an The grid model of the expression is neutral, and the offset (difference) of other expressions relative to the neutral constitutes a Blendshape. The offset Blendshape can intuitively reflect the superposition effect of different expressions, which is described by formula (2), where u=b ₀ is the selected neutral expression, d _k =b _k -b ₀ is the offset of the kth expression relative to the neutral expression. This method is widely used. The offset Blendshape is used in this invention.

步骤2：根据所述源模型O对面容表情进行重建，获得源模型O下的多种不同面容表情的表情描述参数；Step 2: Reconstruct facial expressions according to the source model O, and obtain expression description parameters of various facial expressions under the source model O;

本发明采用了单视角面部表情重建的方法，进行面部表情重建，也是本发明实现面部表情迁移的重要环节。图2是人脸模型分割以及后续用于进行相似性估计的特征点的示意图。特征点是对称均匀分布的，保证每个面片都拥有数个特征点，特征点在不同表情下的位移方向，作为当前表情的描述。首先需要定义一组人脸特征点，特征点是对称均匀分布的，保证每个面片都拥有数个特征点，并且在三维模型中找到对应的顶点序号。对于任意一张输入的人脸表情照片，检测出定义的标志点的二维坐标，建立最优化方程，使得模型中的特征点投影到平面上的位置与图像中2D点的位置距离最小。最优化公式(能量方程)如下The present invention adopts a single-view facial expression reconstruction method to perform facial expression reconstruction, which is also an important link in the present invention to realize facial expression transfer. Fig. 2 is a schematic diagram of face model segmentation and subsequent feature points used for similarity estimation. The feature points are symmetrically and evenly distributed, ensuring that each patch has several feature points, and the displacement direction of the feature points under different expressions is used as the description of the current expression. First, a set of face feature points needs to be defined. The feature points are symmetrically and evenly distributed, ensuring that each face has several feature points, and finding the corresponding vertex number in the 3D model. For any input facial expression photo, the two-dimensional coordinates of the defined marker points are detected, and the optimization equation is established so that the distance between the projected position of the feature point in the model and the position of the 2D point in the image is the smallest. The optimization formula (energy equation) is as follows

E＝∑||P(x_v)-p_v||² (3)E＝∑||P(x _v )-p _v || ² (3)

其中P为投影矩阵，x_v为第v个面部特征点的三维位置，由公式(3)可以求得，p_v是对应的第v个特征点在二维平面上的投影坐标。Where P is the projection matrix, x _v is the three-dimensional position of the vth facial feature point, which can be obtained by formula (3), and p _v is the projection coordinate of the corresponding vth feature point on the two-dimensional plane.

步骤3，对目标模型T进行面容表情重建，Step 3, perform facial expression reconstruction on the target model T,

得到二者的表情描述参数；根据表情描述参数进行相似性估计，得到表情参数转换矩阵和相似性矩阵；Obtain the expression description parameters of the two; perform similarity estimation according to the expression description parameters, and obtain the expression parameter conversion matrix and similarity matrix;

进行面部表情迁移，就是将源人脸上的表情，迁移到目标人脸上，使得目标人脸做出与源人脸相同的表情。一种普遍的3D表情迁移方法，是直接转移描述表情的Blendshape参数，但是这个要求源和目标模型使用相同的方式构建Blendshape，所使用的基表情都是具有相同含义(语义)的，并且一一对应。这就需要在创建模型的时候就制作出表情一致的一组基表情，而且最终迁移的效果，也与基表情的相似程度紧密相关。该方法不仅需要美工的付出很多劳动，也对模型的限制很多。还有一些通过寻找目标和源模型之间的线性映射空间，或者是找出序列中相似表情的方法实现迁移。这些方法都需要寻找出源和目标之间的对应表情，也就是进行相似性估计。进行相似性估计的方法目前主要有基于图像中特征点，基于模型顶点，和基于三角面的方法。基于特征点或模型顶点的方法基本上采用比较点的偏移方向与大小进行相似性估计。基于三角面的是通过寻找模型对应的三角面并且变换对应三角面的方向实现生成相似性表情，不过这种方法在目标人脸与源人脸的形状相差比较大的情况下效果不太好。Facial expression transfer is to transfer the expression on the face of the source person to the face of the target person, so that the target person's face has the same expression as the source person's face. A common 3D expression transfer method is to directly transfer the Blendshape parameters describing the expression, but this requires the source and target models to use the same method to build Blendshape, the base expressions used have the same meaning (semantics), and one by one correspond. This requires creating a set of base expressions with consistent expressions when creating the model, and the final migration effect is also closely related to the similarity of the base expressions. This method not only requires a lot of labor from the artist, but also has many restrictions on the model. Still others achieve transfer by finding a linear mapping space between the target and source models, or by finding similar expressions in sequences. These methods all need to find the corresponding expressions between the source and the target, that is, to perform similarity estimation. At present, the methods for similarity estimation mainly include methods based on feature points in the image, based on model vertices, and based on triangular faces. Methods based on feature points or model vertices basically use the offset direction and size of comparison points for similarity estimation. Based on the triangular face, the similarity expression is generated by finding the corresponding triangular face of the model and transforming the direction of the corresponding triangular face. However, this method does not work well when the shape of the target face and the source face are relatively different.

本步骤中，假设对于第i个面片，在不同表情下，特征点相对于中立表情的偏移为δ_i，作为当前表情的描述，那么源模型O和目标模型T的表情差距由公式(4)表示。In this step, assuming that for the i-th patch, under different expressions, the offset of the feature point relative to the neutral expression is δ _i , which is used as the description of the current expression, then the expression gap between the source model O and the target model T is given by the formula ( 4) Express.

其中δ_Oi为模型O的第i个面片的表情描述，δ_Ti为模型T的第i个面片的表情描述。但是E_Ci只约束了特征点偏移的方向，没有对特征点偏移的距离进行约束，只有这一项还无法获取准确的面部相似性估计，本发明添加了一个参考形状作为第二项约束，公式(5)。Where δ _Oi is the expression description of the i-th patch of model O, and δ _Ti is the expression description of the i-th patch of model T. However, E _Ci only constrains the direction of feature point offset, and does not constrain the distance of feature point offset. Only this item cannot obtain accurate facial similarity estimation. The present invention adds a reference shape as the second constraint , formula (5).

本发明针对模型O中的每一个表情，都获取了一个对应的模型T的表情，在使用普氏分析法将表情与中立表情对齐之后，计算了模型顶点相对于中立表情的偏移大小，δ_Di，v则是进行了变形传递(deformation transfer)之后的第i个面片中第v个顶点的偏移，Vi为第i个面片的所有顶点。For each expression in model O, the present invention obtains a corresponding expression of model T, and after aligning the expression with the neutral expression using the Platts analysis method, calculates the offset of the model vertices relative to the neutral expression, δ _{Di, v} is the offset of the v-th vertex in the i-th patch after the deformation transfer (deformation transfer), and Vi is all the vertices of the i-th patch.

最终的每个面片的表情相似性估计的优化方程为公式(6)。The final optimization equation for the expression similarity estimation of each face is formula (6).

E_Si＝λ_M*E_Di*E_Ci+λ_D*E_Di+λ_C*E_Ci (6)E _Si ＝λ _M *E _Di *E _Ci +λ _D *E _Di +λ _C *E _Ci (6)

通过对所有面片求解上述方程，针对模型O中的每一个表情，都可以获得对应的一个模型T的相似表情，所有的相似表情，可以组成一组等效混合变形的表情参数转换矩阵，该等效混合变形对应的模型T的表情参数，对每个面片的混合变形参数，组成了一个相似性矩阵S，通过该相似性矩阵S，可以将任意一个模型O的表情参数α_O转化成对应的模型T相似表情的表情参数α_T＝α_O*S。By solving the above equations for all the faces, for each expression in the model O, a corresponding similar expression of a model T can be obtained, and all similar expressions can form a set of equivalent mixed deformation expression parameter conversion matrices. The expression parameters of the model T corresponding to the equivalent mixed deformation form a similarity matrix S for the mixed deformation parameters of each patch. Through the similarity matrix S, the expression parameter α _O of any model O can be transformed into The expression parameter α _T =α _O *S of the similar expression of the corresponding model T.

用于单视角面部表情重建的优化方程也需要针对基表情进行调整。不仅要考虑每个面片中特征点的重投影距离最小，还要考虑到面片与面片之间的平滑过渡。调整过后的最优化方程包含两个部分，如公式(7)所示。The optimization equations for single-view facial expression reconstruction also need to be adjusted for the base expressions. Not only should the minimum reprojection distance of the feature points in each patch be considered, but also the smooth transition between patches should be considered. The adjusted optimization equation contains two parts, as shown in formula (7).

E_E＝E_R+E_O (7)E _E =E _R +E _O (7)

E_O＝λ_O∑_v∈S∑_{(i，j)∈I，i＞j}||x_v，i-x_v，j||₂ (9)E _O ＝λ _O ∑ _v∈S ∑ _{(i, j)∈I, i>j} ||x _{v, i} -x _{v, j} || ₂ (9)

其中E_R为重投影误差优化方程(能量)，L_i是第i个面片上的所有特征点，x_v，i为第i个面片上的第v个特征点由上面的局部混合变形模型的公式定义，p_v，i为该特征点对应的图像中的二维坐标，P_i为第i个面片的投影矩阵，λ_R为E_R的权重值。Where E _R is the reprojection error optimization equation (energy), L _i is all the feature points on the i-th patch, x _{v, i} is the v-th feature point on the i-th patch by the above local mixed deformation model Formula definition, p _{v, i} is the two-dimensional coordinates in the image corresponding to the feature point, P _i is the projection matrix of the i-th patch, and λ _R is the weight value of E _R.

E_O为约束相邻面片的形状的方程(能量)，其中v∈S为相邻的重叠区域的顶点，E_O约束了第i个面片与相邻的第j个面片之间重叠部分的点的距离尽量接近。E _O is the equation (energy) that constrains the shape of adjacent patches, where v∈S is the vertex of the adjacent overlapping area, and E _O constrains the overlap between the i-th patch and the adjacent j-th patch Some points are as close as possible.

另外一个部分是面部表情迁移过程，如图1的上半部分包括：步骤4：通过输入的用户O的面部照片，与用户O的局部混合变形的模型，与等效T的混合变形模型，进行单视角面部表情重建，所获得的一组混合变形参数，然后通过形成的相似性矩阵一起，进行面部表情迁移。The other part is the facial expression transfer process, as shown in the upper part of Figure 1, which includes: Step 4: Through the input user O's face photo, the local mixed deformation model of user O, and the equivalent T mixed deformation model, perform For single-view facial expression reconstruction, the obtained set of hybrid deformation parameters are then passed together to form a similarity matrix for facial expression transfer.

具体地，二维图像到三维模型的表情迁移的过程，输入模型O对应的角色的图片，并检测特征点，按照公式(8)的方法，使用模型O的特征点与检测的特征点计算E_R，将模型T的等效混合变形按照同样的方式分割，带入公式(9)中的E_O进行平滑约束，通过求解E_E，获得的表情参数α_O，并通过相似性矩阵转S化为α_T，将分离的面片融合起来，即可获得迁移的模型T的相似表情。Specifically, in the process of expression transfer from a 2D image to a 3D model, input the picture of the character corresponding to model O, and detect the feature points. According to the method of formula (8), use the feature points of model O and the detected feature points to calculate E _R , divide the equivalent mixed deformation of the model T in the same way, bring it into E _O in the formula (9) for smooth constraints, and obtain the expression parameter α _O by solving E _E , and transform it into S through the similarity matrix is α _T , and the similar expressions of the transferred model T can be obtained by fusing the separated patches.

本发明提出了一种采用基表情空间变换的面部表情迁移方法。通过人脸的肌肉分布情况对于面部区域进行分割选取基表情，接着采用了通过顶点的偏移方向进行面部表情相似性估计的方法，对于不同模型之间的基表情进行了相似性估计，并获取了一组等效的混合变形与表情参数转换矩阵。本发明的迁移过程是在面部表情重建的基础上进行的，通过对于面部特征点进行检测，并计算重投影误差，从而实现面部表情重建，并通过等效混合变形约束分割的面部模型具有完整的面部表情，最后通过参数转换矩阵获取目标模型的混合变形参数，并生成迁移的目标表情。The invention proposes a facial expression transfer method using base expression space transformation. Based on the muscle distribution of the face, the facial area is segmented to select the base expression, and then the method of facial expression similarity estimation through the offset direction of the vertices is used to estimate the similarity of the base expressions between different models, and obtain A set of equivalent transformation matrices for blend shape and expression parameters is developed. The migration process of the present invention is carried out on the basis of facial expression reconstruction. By detecting facial feature points and calculating the reprojection error, facial expression reconstruction is realized, and the facial model segmented by equivalent blending deformation constraints has a complete Facial expressions, and finally obtain the hybrid deformation parameters of the target model through the parameter transformation matrix, and generate the transferred target expressions.

本发明相比于面部表情迁移方法，有三大主要优势：高效率，高逼真，普适化。首先，本发明通过基表情的选取，有效的缩短了后期相似性估计所需的时间。其次，通过参数转换矩阵，保证了目前模型脸部的动态真实感，具有高保真度。最后，由于采用的相似性估计和参数转换矩阵，使得整个过程具有很强的可泛用性。不同年龄、性别、人种的用户，均可以实时高逼真的实现表情迁移。Compared with the facial expression transfer method, the present invention has three major advantages: high efficiency, high fidelity, and universalization. Firstly, the present invention effectively shortens the time required for later similarity estimation through the selection of base expressions. Secondly, through the parameter conversion matrix, the dynamic realism of the current model face is guaranteed, with high fidelity. Finally, due to the similarity estimation and parameter transformation matrix adopted, the whole process has strong general applicability. Users of different ages, genders, and races can realize expression transfer in real time and with high fidelity.

综上所述，本方法有效2D图像到3D人脸模型表情迁移所需的时间成本和经济成本，提高了虚拟人表情动作的真实感，具有广泛应用的潜力。To sum up, this method effectively transfers the time cost and economic cost from 2D image to 3D facial model expression, improves the realism of virtual human expression and action, and has the potential of wide application.

综上所述，以上仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。To sum up, the above are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

对于本领域技术人员而言，显然本发明实施例不限于上述示范性实施例的细节，而且在不背离本发明实施例的精神或基本特征的情况下，能够以其他的具体形式实现本发明实施例。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明实施例的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明实施例内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统、装置或终端权利要求中陈述的多个单元、模块或装置也可以由同一个单元、模块或装置通过软件或者硬件来实现。第一，第二等词语用来表示名称，而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the embodiments of the present invention are not limited to the details of the above-mentioned exemplary embodiments, and that the embodiments of the present invention can be implemented in other specific forms without departing from the spirit or essential features of the embodiments of the present invention. example. Therefore, no matter from any point of view, the embodiments should be regarded as exemplary and non-restrictive, and the scope of the embodiments of the present invention is defined by the appended claims rather than the above description, so it is intended that the All changes within the meaning and range of equivalents to the claims are included in the embodiments of the present invention. Any reference sign in a claim should not be construed as limiting the claim concerned. In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Multiple units, modules or devices stated in system, device or terminal claims may also be realized by the same unit, module or device through software or hardware. The words first, second, etc. are used to denote names without implying any particular order.

最后应说明的是，以上实施方式仅用以说明本发明实施例的技术方案而非限制，尽管参照以上较佳实施方式对本发明实施例进行了详细说明，本领域的普通技术人员应当理解，可以对本发明实施例的技术方案进行修改或等同替换都不应脱离本发明实施例的技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the embodiments of the present invention and not to limit them. Although the embodiments of the present invention have been described in detail with reference to the above preferred embodiments, those of ordinary skill in the art should understand that they can Modifications or equivalent replacements to the technical solutions of the embodiments of the present invention should not deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An efficient expression migration method adopting a base expression space transformation is characterized by comprising the following steps: a preprocessing stage and a facial expression migration stage; the pre-treatment stage comprises:

acquiring a source model O, wherein the source model O is a base expression model and comprises expression grids under various different expressions;

performing facial expression reconstruction on the source model O and the target model T to obtain expression description parameters of the source model O and the target model T; performing similarity estimation according to the expression description parameters to obtain an expression parameter conversion matrix and a similarity matrix;

the facial expression migration stage comprises: and inputting the role picture into a base expression model, calculating expression description parameters of the target expression model according to the expression description parameters in the base expression model and the similarity matrix, and finishing expression migration.

2. The method of claim 1, wherein reconstructing the facial expression comprises: defining a group of human face characteristic points, and finding corresponding vertex serial numbers in the three-dimensional model; for any input facial expression photo, detecting two-dimensional coordinates of defined feature points, and establishing an optimization equation to enable the distance between the position of the feature points in the model projected onto the plane and the position of the 2D points in the image to be minimum; wherein the optimization equation is represented as follows:

E＝∑||P(x _v )-p _v || ₂

where P is the projection matrix, x _v For the three-dimensional position of the v-th facial feature point, p can be found from the optimization equation _v Is the projection coordinate of the corresponding v-th feature point on the two-dimensional plane.

3. The method for migrating expressions efficiently by using the spatial transformation of the base expression according to claim 1 or 2, wherein the source model O selects a grid model without expressions as a neutral reference, and the offsets of the rest of expressions relative to the neutral reference constitute the superposition effect of different expressions, and the offsets constitute the expression description parameters of different facial expressions.

4. The method of claim 3, wherein for the ith patch, under different expressions, the shift of feature points from neutral expressions is δ _i As a description of the current expression, the expression gap between the source model O and the target model T is represented by the following formula:

wherein E _Ci Constraining the direction of characteristic point offset between the source model O and the target model T; e _Di The distance of the characteristic point offset between the source model O and the target model T is constrained;

δ _Oi is an expressive description, δ, of the ith patch of the source model O _Ti Expression description of the ith patch of the model T; delta _Di，v Then the offset of the V-th vertex, V, in the ith patch after the modified pass is performed _i All vertices of the ith patch.

5. The method for efficient expression migration using expression-based spatial transformation according to claim 4, wherein the expression similarity estimation of each patch has the following optimization equation:

E _Si ＝λ _M *E _Di *E _Ci +λ _D *E _Di +λ _C *E _Ci

wherein E _Ci Is the difference in expression between the source model O and the target model T, E _Di Is to add a reference shape as a second term constraint to obtain an accurate face similarity estimate, λ _M 、λ _D 、λ _C Is a weighted value;

by solving the above equation for all patches, for each expression in the source model O, a corresponding similar expression of a target model T can be obtained, and all similar expressions can form a set of expression parameter transformation matrices of equivalent mixed deformation, for the expression parameters of the target model T corresponding to the expression parameter transformation matrices of equivalent mixed deformation, for the mixed deformation parameters of each patch, a similarity matrix S is formed, by which,the expression parameter alpha of any one source model O can be converted _O Expression parameter alpha of similar expression of corresponding model T is converted into _T ＝α _O *S。

6. The method for efficient expression migration using spatial transformations of basal expressions according to claim 5, wherein after optimizing the smooth transitions between patches, the optimization equation for expression similarity estimation is obtained as follows:

E _E ＝E _R +E _O

wherein E _R Optimizing the equation for reprojection error, L _i Is all the characteristic points, x, on the ith panel _v，i For the v characteristic point on the ith patch, defined by the formula of the local mixed deformation model above, p _v，i For two-dimensional coordinates, P, in the image corresponding to the feature point _i Is the projection matrix of the ith patch, λ _R Is E _R The weight value of (1);

E _O to constrain the shape of adjacent patches, where v ∈ S is the vertex of the adjacent overlap region, E _O The distance of the overlapping point between the ith patch and the adjacent jth patch is constrained to be as close as possible.