CN112215050A

CN112215050A - Nonlinear 3DMM face reconstruction and pose normalization method, device, medium and equipment

Info

Publication number: CN112215050A
Application number: CN201910820065.3A
Authority: CN
Inventors: 周军; 刘利朋; 江武明; 丁松
Original assignee: Beijing Eyes Intelligent Technology Co ltd; Beijing Eyecool Technology Co Ltd
Current assignee: Beijing Eyes Intelligent Technology Co ltd; Beijing Eyecool Technology Co Ltd
Priority date: 2019-06-24
Filing date: 2019-08-31
Publication date: 2021-01-12

Abstract

The invention discloses a nonlinear 3DMM face reconstruction and attitude normalization method, device, medium and equipment, which belong to the field of computer vision. The method includes: training a model, inputting a 2D face image into the model, and obtaining a 3D face. The model includes a CNN encoder, a shape decoder, a texture decoder and a rendering layer; the CNN encoder estimates the camera projection parameters, shape parameters and texture parameters from the 2D face image samples, and the shape decoder and texture decoder combine the shape parameters and texture parameters. Parameters are decoded into 3D shapes and 3D textures. During training, the rendering layer gets the rendered image and trains the model through the loss function. During prediction, the rendering layer performs 3D rendering to obtain a 3D face. The present invention has higher representation ability than linear 3DMM, training and prediction are carried out end-to-end, 2D images can be used for network training without 3D face scanning, and the reconstructed 3D face has high recognition accuracy after normalization.

Description

Nonlinear 3DMM face reconstruction and posture normalization method, device, medium and equipment

Technical Field

The invention relates to the field of computer vision, in particular to a nonlinear 3DMM face reconstruction method, a nonlinear 3DMM face reconstruction device, a computer readable storage medium and computer readable storage equipment, and a face posture normalization method, a nonlinear 3DMM face reconstruction device, a computer readable storage medium and computer readable storage equipment based on the nonlinear 3DMM face reconstruction method.

Background

In the face image recognition technology, the pose of a face is an important factor influencing the face recognition rate, the face image recognition in the prior art is mainly used for recognizing a front face image or a small-pose (angle) face image, the recognition result of a large-pose face image is not ideal, and in order to improve the recognition accuracy rate, the pose normalization of the face image (especially the large-pose face image) is required.

The face image, the small-pose face image and the large-pose face image are all 2D face images. The human face posture normalization method based on 3D reconstruction is a method for performing 3D reconstruction on the 2D human face image to obtain a 3D human face, performing posture correction (normalization) on the 3D human face, and then projecting the 3D human face into a 2D human face image again to finish human face posture normalization.

The core of the human face posture normalization method based on 3D reconstruction is to perform 3D reconstruction on a 2D human face image to be normalized, generally, 3D is most applied to 3DMM type methods in reconstruction, and the 3DMM method is mainly divided into the following aspects.

(1) Linear 3DMM parameter estimation method

The 3D deformation Model (3D Mobile Model, 3DMM) is a method for constructing a 3D face Model based on statistical principles. The linear 3DMM parameter estimation method is roughly thought of constructing an average (specifically, a characteristic face: an average face + a characteristic vector group corresponds to a coefficient, note that the coefficient is not a characteristic value and needs to be finally solved reversely) face deformation model by using a face database, matching and combining the 2D face image and the average face deformation model after a new 2D face image is given, adjusting corresponding parameters of the average face deformation model, deforming the average face deformation model until the difference between the average face deformation model and the face image is reduced to the minimum, and optimizing and adjusting textures at this time to complete 3D face modeling.

After the 3D modeling of the face is completed through the steps, the 3D face is subjected to posture rotation through a 3D face rotation method, and finally the corrected 3D face is projected on a two-dimensional image plane to complete the normalization of the 2D face posture.

Linear 3DMM face reconstruction is based on a face scan to obtain a training set and performing Principal Component Analysis (PCA) based on the training set to supervise the 3 DMM. Principal Component Analysis (PCA) is a statistical method. A group of variables which are possibly correlated are converted into a group of linearly uncorrelated variables through orthogonal transformation, and the group of converted variables are called principal components. To simulate highly variable 3D face shapes, a large number of high quality 3D face scans are required. However, this requirement is expensive to implement. A widely used Basel Face Model (BFM) was constructed with only 200 subjects in the neutral expression. The missing expression data is compensated using the FaceWarehouse expression dataset. But almost all models use less than 300 training scans. Such a small training set is far from sufficient to describe the complete change of a face.

Second, under well-controlled conditions, a texture model of linear 3DMM is typically constructed with 2D face images collectively captured by a small number of 3D scans. Therefore, such models can only learn to perform facial textures under similar conditions, but do not perform well under other conditions (e.g., in a field environment). This greatly limits the application scenarios of 3 DMM.

Finally, the representation capability of linear 3DMM is not only limited by the size of the training set, but also formulated by it. Facial changes are nonlinear in nature. For example, the changes in different facial expressions or poses are inherently non-linear, which violates the linear assumption of PCA-based models. Therefore, the linear 3DMM model does not account well for facial changes.

(2) Improved linear 3DMM based method

The standard linear 3DMM is based on PCA and the statistical distribution is a single-mode gaussian distribution. Koppen et al believe that single mode gaussians do not represent real-world distributions. They proposed a gaussian mixture 3DMM that models the global population as a mixture of gaussian subgroups, each with its own mean, but sharing covariance. By modeling the 3D face through the method, the modeling precision is obviously improved. However, the method is still based on statistical PCA. Duong et al solved the linearity problem in face modeling by using a deep boltzmann machine. However, they only use 2D faces and sparse ground truth and therefore do not handle large posture change faces well.

Disclosure of Invention

In order to solve the technical problem, the invention provides a nonlinear 3d mm face reconstruction method, a nonlinear 3d mm face reconstruction device, a computer-readable storage medium and a computer-readable device, and a face pose normalization method, a nonlinear 3d mm face reconstruction device, a computer-readable storage medium and a computer-readable device based on the nonlinear 3d mm face reconstruction method. The nonlinear 3DMM face reconstruction method has higher representation capability than the traditional linear 3DMM, the training and the prediction are carried out end to end, and the network training can be carried out by using the unconstrained 2D image without collecting 3D face scanning. The identification accuracy of the reconstructed 3D face is high after normalization.

The technical scheme provided by the invention is as follows:

in a first aspect, the present invention provides a nonlinear 3DMM face reconstruction method, including:

training a nonlinear 3DMM model by using a training set;

the training set comprises a plurality of 2D face image samples, and the nonlinear 3DMM model comprises a CNN encoder, a multilayer perceptual shape decoder, a CNN texture decoder and a rendering layer;

during training, 2D face image samples input into the nonlinear 3DMM model are estimated by the CNN encoder to obtain camera projection parameters, shape parameters and texture parameters, the multilayer perception shape decoder decodes the shape parameters into 3D shapes, the CNN texture decoder decodes the texture parameters into 3D textures, and the rendering layer obtains rendering images according to the camera projection parameters, the 3D shapes and the 3D textures; training parameters of the CNN encoder, the multilayer perceptual shape decoder and the CNN texture decoder through a loss function;

inputting the acquired 2D face image into a trained nonlinear 3DMM model to obtain a 3D face;

the 2D face image is estimated by the CNN encoder to obtain shape parameters and texture parameters, the multilayer perception shape decoder decodes the shape parameters into a 3D shape, the CNN texture decoder decodes the texture parameters into a 3D texture, and the rendering layer performs 3D rendering according to the 3D shape, the 3D texture and a predefined 2D texture map to obtain a 3D face.

Further, the training of the nonlinear 3DMM model includes a pre-training stage and a fine-tuning stage, which are performed in sequence, wherein:

loss function L of the pre-training phase₀Comprises the following steps:

L₀＝λ₁L₁+L₂+λ₃L₃+λ₄L₄

L₁is a loss of key point, L₂For 3D shape loss, L₃For 3D texture loss, L₄Is the projection parameter loss;

the loss function L of the fine tuning stage is:

L＝L₆+λ₅L₅+λ₁L₁

L₅to fight against damageIn the countermeasure loss, the generator is the nonlinear 3DMM model, and the discriminator is a discriminator of patchGAN; l is₆To reconstruct losses;

x (i, j) is the value of the rendered image at coordinate (i, j), Y (i, j) is the value of the 2D face image at coordinate (i, j), H, W is the height and width of the 2D face image, respectively;

λ₁～λ₅are defined coefficients.

Further, the CNN encoder includes 14 convolutional layers, an AvgPool layer, and a full connection layer, which are connected in sequence, and the CNN encoder outputs a shape parameter and a texture parameter at the AvgPool layer; the CNN texture decoder comprises a full connection layer and 14 convolution layers which are connected in sequence, and the CNN texture decoder outputs a 3D texture at the last convolution layer; the multi-layer perceptual shape decoder includes two fully connected layers.

Further, the rendering layer performs 3D rendering according to the 3D shape, the 3D texture, and a predefined 2D texture map to obtain a 3D face, including:

predefining a 2D texture map, each pixel point of the 2D texture map corresponding to a vertex of a 3D shape;

and determining the texture value of each vertex in the 3D shape according to the texture value of each pixel point in the 2D texture map.

In a second aspect, the present invention provides a nonlinear 3DMM face reconstruction apparatus corresponding to the nonlinear 3DMM face reconstruction method described in the first aspect, where the apparatus includes:

the training module is used for training the nonlinear 3DMM model by using a training set;

the prediction module is used for inputting the acquired 2D face image into the trained nonlinear 3DMM model to obtain a 3D face;

Further, the training module includes a pre-training unit and a fine-tuning unit, wherein:

loss function L of the pre-training unit₀Comprises the following steps:

L₀＝λ₁L₁+L₂+λ₃L₃+λ₄L₄

the loss function L of the fine tuning unit is:

L＝L₆+λ₅L₅+λ₁L₁

L₅for resisting loss, in the resisting loss, a generator is the nonlinear 3DMM model, and a discriminator is a discriminator of patchGAN; l is₆To reconstruct losses;

λ₁～λ₅are defined coefficients.

Further, in the prediction module, the rendering layer performs 3D rendering according to the 3D shape, the 3D texture, and a predefined 2D texture map to obtain a 3D face, including:

a pre-defining unit for pre-defining a 2D texture map, each pixel point of the 2D texture map corresponding to a vertex of a 3D shape;

and the rendering unit is used for determining the texture value of each vertex in the 3D shape through the texture value of each pixel point in the 2D texture map.

In a third aspect, the present invention provides a computer-readable storage medium for nonlinear 3DMM face reconstruction corresponding to the nonlinear 3DMM face reconstruction method described in the first aspect, comprising a memory for storing processor-executable instructions, which when executed by the processor, implement the steps comprising the nonlinear 3DMM face reconstruction method described in the first aspect.

In a fourth aspect, the present invention provides an apparatus for nonlinear 3DMM face reconstruction corresponding to the nonlinear 3DMM face reconstruction method described in the first aspect, which includes at least one processor and a memory storing computer-executable instructions, and when the processor executes the instructions, the steps of the nonlinear 3DMM face reconstruction method described in the first aspect are implemented.

In a fifth aspect, the present invention provides a method for normalizing a face pose based on nonlinear 3DMM reconstruction, the method comprising:

3D reconstruction is carried out on the 2D face image by using the nonlinear 3DMM face reconstruction method in the first aspect to obtain a 3D face;

carrying out posture normalization on the 3D face;

and projecting the 3D face after the posture normalization onto a two-dimensional plane to obtain a 2D face image after the posture normalization.

Further, the posture normalization of the 3D face includes:

predefining a standard 3D pose face, wherein the standard 3D pose face and the 3D face have the same point cloud number;

performing matrixing storage on a standard 3D posture face and the 3D face and performing parameter fitting to obtain a conversion matrix;

performing matrix product on the conversion matrix and the 3D face matrix to finish the posture normalization of the 3D face;

the projecting the 3D face after the posture normalization onto a two-dimensional plane comprises the following steps:

dividing the 3D face into 3D meshes according to the vertexes of the 3D face, and coloring the 3D meshes through bilinear interpolation;

and rendering by using a Z cache renderer, and projecting the 3D face onto a two-dimensional plane.

In a sixth aspect, the present invention provides a face pose normalization apparatus based on nonlinear 3DMM reconstruction corresponding to the face pose normalization method based on nonlinear 3DMM reconstruction in the fifth aspect, the apparatus includes:

a 3D reconstruction module, configured to perform 3D reconstruction on the 2D face image by using the nonlinear 3DMM face reconstruction apparatus according to the second aspect, so as to obtain a 3D face;

the 3D face normalization module is used for carrying out posture normalization on the 3D face;

and the projection module is used for projecting the 3D face after the posture normalization onto a two-dimensional plane to obtain a 2D face image after the posture normalization.

Further, the 3D face normalization module includes:

the pre-defining unit is used for pre-defining a standard 3D pose human face, and the standard 3D pose human face and the 3D human face have the same point cloud number;

the parameter fitting unit is used for matrixing and storing the standard 3D posture face and the 3D face and performing parameter fitting to obtain a conversion matrix;

the normalization unit is used for performing matrix product on the conversion matrix and the 3D face matrix to finish the posture normalization of the 3D face;

the projection module includes:

the coloring unit is used for dividing the 3D face into 3D meshes according to the vertexes of the 3D face and coloring the 3D meshes through bilinear interpolation;

and the rendering unit is used for rendering by using the Z cache renderer and projecting the 3D face onto the two-dimensional plane.

In a seventh aspect, the present invention provides a computer-readable storage medium for face pose normalization corresponding to the nonlinear 3DMM reconstruction based face pose normalization method of the fifth aspect, comprising a memory for storing processor-executable instructions, which when executed by the processor, implement the steps of the nonlinear 3DMM reconstruction based face pose normalization method of the fifth aspect.

In an eighth aspect, the present invention provides an apparatus for face pose normalization corresponding to the nonlinear 3DMM reconstruction based face pose normalization method of the fifth aspect, comprising at least one processor and a memory storing computer-executable instructions, wherein the processor implements the steps of the nonlinear 3DMM reconstruction based face pose normalization method of the fifth aspect when executing the instructions.

The invention has the following beneficial effects:

in view of the obstacles of the linear 3D mm in the prior art on its data, supervised and linear basis, the present invention learns a non-linear 3D mm model of facial shape and texture from a set of unconstrained 2D face images for 3D reconstruction by innovating the learning paradigm of 3D mm, which has the following outstanding advantages:

1. the present invention learns a non-linear 3d mm model that has a higher representation capability than a conventional linear 3d mm model.

2. The coding-decoding network structure and the rendering layer provided by the invention can enable the training and prediction of the 3D reconstruction task to be carried out end to end.

3. The invention can use the unconstrained 2D image to perform network training without collecting 3D face scan.

Drawings

FIG. 1 is a flow chart of a nonlinear 3DMM face reconstruction method of the present invention;

FIG. 2 is a flow chart of a nonlinear 3DMM model training phase;

FIG. 3 is a flow chart of a prediction phase of a nonlinear 3DMM model;

FIG. 4 is a parameter diagram of a non-linear 3DMM model of the present invention;

FIG. 5 is a schematic diagram of a nonlinear 3DMM face reconstruction apparatus according to the present invention;

FIG. 6 is a flowchart of a method for normalizing a face pose based on nonlinear 3DMM reconstruction according to the present invention;

FIG. 7 is a flow chart of a nonlinear 3DMM model prediction stage and a subsequent face pose normalization method;

fig. 8 is a schematic diagram of a human face pose normalization device based on nonlinear 3DMM reconstruction according to the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

the embodiment of the invention provides a nonlinear 3DMM face reconstruction method, as shown in FIG. 1, the method comprises the following steps:

step S100: the non-linear 3DMM model is trained using a training set.

The training set comprises a plurality of 2D face image samples, and the nonlinear 3DMM model comprises a CNN encoder, a multilayer perceptual shape decoder, a CNN texture decoder and a rendering layer.

The nonlinear 3DMM model comprises an encoder and two decoders, wherein the encoder is a Convolutional Neural Network (CNN), and the CNN is a deep learning method. One of the two decoders is a shape decoder, the other is a texture decoder, the shape decoder is a deep Convolutional Neural Network (CNN), the texture decoder is a Multi-Layer Perceptron (MLP), which is a relatively simple artificial neural network that maps input vectors to output vectors.

The nonlinear 3DMM face reconstruction method of the invention is divided into a training stage and a prediction stage on the whole, wherein the step S100 is the training stage.

During training, a 2D face image sample input into a nonlinear 3DMM model is estimated by a CNN encoder to obtain a camera projection parameter, a shape parameter and a texture parameter, a multi-layer perception shape decoder decodes the shape parameter into a 3D shape, the CNN texture decoder decodes the texture parameter into a 3D texture, and a rendering layer obtains a rendering image according to the camera projection parameter, the 3D shape and the 3D texture; the parameters of the CNN encoder, the multi-layer perceptual shape decoder, and the CNN texture decoder are trained by the loss function.

FIG. 2 is a flow chart of the nonlinear 3DMM model training phase. During training, errors between the output of a CNN encoder, a multi-layer perceptual shape decoder, a CNN texture decoder and a rendering layer and the input 2D face image sample (including the marking information of the 2D face image sample) are calculated, and a loss function comprises one or more errors.

Step S200: and inputting the acquired 2D face image into the trained nonlinear 3DMM model to obtain the 3D face.

After the nonlinear 3DMM model is trained, 3D reconstruction can be performed on the acquired 2D face image, and a 3D face is predicted, where step S200 is a prediction stage. The 2D face image is estimated through a CNN encoder to obtain shape parameters and texture parameters, a multi-layer perception shape decoder decodes the shape parameters into a 3D shape, a CNN texture decoder decodes the texture parameters into a 3D texture, and a rendering layer performs 3D rendering according to the 3D shape, the 3D texture and a predefined 2D texture map to obtain a 3D face. FIG. 3 is a flow chart of the prediction phase of the nonlinear 3DMM model.

The nonlinear 3DMM model provided by the invention firstly takes a 2D face image as input through a CNN encoder and generates shape parameters, texture parameters and camera projection parameters, and estimates a 3D shape and a 3D texture from the shape parameters and the texture parameters through two decoders. The rendering layer of the present invention then generates a reconstructed face (i.e., the rendered image described above) by fusing the 3D face, the 3D texture, and the camera projection parameters.

The invention designs different decoding networks for shape parameters and texture parameters: multi-layer perceptrons (MLP) for shape parameters and deep Convolutional Neural Networks (CNN) for texture parameters. The deep convolutional neural network decodes the texture parameters and outputs nonlinear 3D textures, the multi-layer perceptron decodes the shape parameters and outputs nonlinear 3D shapes, and the two decoders are nonlinear 3DMM in nature. With full learning of the 3D dm by the fitting algorithm, the 3D shape and 3D texture will perfectly reconstruct the input face.

The 3d mm model of the present invention is a coding-decoding framework structure, and the shape parameters, texture parameters, and camera projection parameters are estimated by a CNN encoder network, so that the framework can perform end-to-end training, which is essentially the fitting algorithm of the 3d mm of the present invention. In an end-to-end training scheme, the encoder and the two decoders are jointly learned to minimize the difference between the reconstructed face and the input face. By jointly learning the 3D dm model encoding and decoding, network training can be performed with a large number of unconstrained 2D images without relying on 3D scanning.

In summary, in view of the obstacles of the linear 3D mm in the prior art on its data, supervision and linear basis, the present invention learns a non-linear 3D mm model of face shape and texture from a set of unconstrained 2D face images for 3D reconstruction by innovating the learning paradigm of 3D mm, which has the following outstanding advantages:

In the invention, in order to enable the proposed encoding-decoding network to be trained end-to-end, the 3D shape and texture are represented in a 2D manner. For 3D shape representation, the same representation method as linear 3DMM is used. Namely, it is

Where Q is all vertices representing a 3D shape. By concatenating vertices into a vector as a 3D shape representation and as an output representation of the fully-connected layer in the shape decoding network. For 3D texture representation, the method employs UV parameterized textures. In the prior art, a one-dimensional vector method is generally adopted for 3D texture representation. Since the vertex textures in the one-dimensional vector representation are parameterized as vectors, this not only loses the spatial information of the vertices, but also causes inconvenience for deploying the network using CNN. The invention adopts the expanded 2D UV texture map as the 3D texture representation, completely solves the problem of one-dimensional vector representation, and is proved to be effective in the training and prediction of the coding-decoding network.

As an improvement of the embodiment of the present invention, the whole training of the nonlinear 3d mm model may include a pre-training stage and a fine-tuning stage, which are performed sequentially, and are described below according to the loss functions used in the two different stages.

Pre-training of the non-linear 3DMM model is performed on a 300W dataset that provides a 3DMM shape that fits well based on 2D face images

(i.e., the aforementioned 3D shape) and camera projection parameters

Then the texture map is rendered in UV space by the above two parameters

Thereby obtaining an input 2D image. Since the input 2D image is known, a pseudo-texture map can be found

In addition, the data set also provides a key point L corresponding to the 2D face image_L. The preliminary pre-training of the nonlinear 3DMM model is based on

And L_LBy carrying out the following

And L_LThat is, the label information of the 2D face image sample is the information provided by the training data set. In particular, the loss function L of the pre-training phase₀Comprises the following steps:

L₀＝λ₁L₁+L₂+λ₃L₃+λ₄L₄

L₁is a loss of key point, L₂For 3D shape loss, L₃For 3D texture loss, L₄For projection parameter loss, as shown in fig. 2.

Keypoint loss refers to the keypoint L provided by the dataset_LWith model predictionThe error of the key point L;

is a 3DMM shape provided by a data set

Error from the model predicted 3D shape S;

is a pseudo-texture map computed from a data set

Error from the model predicted 3D texture T;

is camera projection parameters provided by a data set

Error from the camera projection parameters predicted by the model.

And after the nonlinear 3DMM model completes the preliminary pre-training, fine tuning the pre-trained model in a semi-supervised mode. With end-to-end 2D texture reconstruction and fully unsupervised training against network constraints, and again introducing supervised keypoint loss into this phase training.

The loss function L at the fine tuning stage is:

L＝L₆+λ₅L₅+λ₁L₁

as shown at L in FIG. 2₅Shown as L₅To combat the loss, the method applies the combat loss to the training of the network. In the countermeasure loss, the generator is a nonlinear 3DMM model, and the discriminator is that of PatchGAN. The purpose of the counter-damage is to bring the image generated by the generator closer to the true facial texture, further targeting by the closer to true facial texture.

L₆For reconstruction of losses, by reconstruction of losses L₆Optimizing parameters of a texture decoder. In particular, the reconstruction loss of the present invention is trained by constructing the Mean Square Error (MSE) between the textures of the input 2D face image and the generated rendered image as a target. The specific reconstruction loss formula is as follows:

x (i, j) is the value of the rendered image at coordinate (i, j), Y (i, j) is the value of the 2D face image at coordinate (i, j), H, W is the height and width of the 2D face image, respectively, and is also the height and width of the rendered image generated.

The aforementioned λ₁～λ₅Are defined coefficients.

As another improvement of the embodiment of the present invention, the CNN encoder of the present invention includes 14 convolutional layers, AvgPool layers, and fully-connected layers that are sequentially connected, the CNN texture decoder includes a fully-connected layer, 14 convolutional layers that are sequentially connected, and the multi-layer perceptual shape decoder includes two fully-connected layers.

The parameters of the non-linear 3DMM model of the present invention are shown in FIG. 4, where E denotes the CNN encoder, D_TRepresenting a CNN texture decoder, and further a D consisting of two fully-connected layers_SIn part (multi-layer perceptual shape decoder), the entire model is trained end-to-end to perform 3D reconstruction of the input 2D face image.

Taking the first convolution layer Conv11 of the CNN encoder as an example, the convolution layer performs convolution operation in which the convolution kernel (Filter is a Filter, which is a general name of Filter kernels for various operations, and Filter is a convolution kernel in the convolution operation herein) is 3 × 3 and the step Size (Stride) is 1 on the input 2D face image, and the Size (Output Size) of the obtained Output image is 96 × 32; and so on; CNN encoder E outputs shape parameter l at AvgPool layer_SAnd a texture parameter l_T。

Taking the first convolution layer FConv52 of the CNN texture decoder as an example, the convolution layer performs a convolution operation with a convolution kernel of 3 × 3 and a step size of 1 on the image output by the previous fully-connected layer, and the size of the obtained output image is 8 × 160; and so on; CNN textureDecoder D_TOutputting the non-linear 3D texture at the last convolution layer FConv11, multilayer perceptual shape decoder D_SOutputting a non-linear 3D shape. The 3D face processed by the deep neural network has high nonlinear characteristics, and has better discriminability compared with linear 3 DMM.

In the present invention, the rendering layer in step S200 performs 3D rendering according to the 3D shape, the 3D texture, and the predefined 2D texture map to obtain a 3D face, which specifically includes:

step S210: the 2D texture map is predefined, and each pixel point of the 2D texture map corresponds to a vertex of the 3D shape.

Step S220: and determining the texture value of each vertex in the 3D shape according to the texture value of each pixel point in the 2D texture map.

And outputting a human face parameter model which can be subjected to 3D representation and has the human face posture consistent with the posture of the input 2D human face image after the input 2D human face image is subjected to nonlinear 3 DMM. The human face parameter model comprises shape parameters, texture parameters and projection parameters, and the method renders texture information to a 3D space through a 3D texture rendering layer. According to the invention, the 2D texture map is predefined, and as each pixel point in the predefined 2D texture map has a corresponding 3D vertex and is associated with the corresponding 3D vertex, the pixel points at different positions in the 2D texture map simultaneously represent the different 3D vertices. The rendering process of the present invention determines the texture value for each vertex in the 3D shape from a predefined 2D texture map.

In summary, the nonlinear 3DMM face reconstruction method of the present invention is divided into two stages, namely, a training stage and a prediction stage.

The training phase is shown in fig. 2, and the nonlinear 3d mm model is trained in a semi-supervised manner. The method uses two depth networks of a multi-layer perception shape decoder and a CNN texture decoder to decode the shape and the texture parameters into a 3D shape and a 3D texture respectively. To enable the framework to be end-to-end trainable, the shape, texture parameters are estimated by a CNN encoder, which is essentially our 3DMM fitting algorithm. Three depths of CNN encoder, multi-layer perceptual shape decoder and CNN texture decoder with the help of a geometry-based rendering layerAnd combining the degree networks to reconstruct the final target of the input 2D face image. Formally, given a set of two-dimensional face images, the learning of the invention by a CNN encoder E estimates the projection parameters m, and the shape parameters l_SAnd a texture parameter l_TThe multi-layer perceptual three-dimensional shape decoder decodes the shape parameters to map to the 3D shape S, and the CNN texture decoder decodes the texture parameters to a realistic texture

The goal is that the rendered image with m, S and T can closely approximate the input 2D face image.

In the prediction stage, as shown in fig. 3, firstly, the input 2D face image passes through a trained coding-decoding network to obtain 3D shape and texture expression, and then passes through a rendering layer to obtain 3D face expression, that is, a 3D face.

The invention provides a nonlinear 3DMM model realized by a deep neural network, which has stronger 3D reconstruction expressive force than that of a linear 3DMM model. The two-stage network model training method based on semi-supervised learning enables the model to be optimized more easily. And the invention can utilize unconstrained 2D images for network training without collecting 3D face scans.

Example 2:

an embodiment of the present invention provides a nonlinear 3DMM face reconstruction device corresponding to the nonlinear 3DMM face reconstruction method described in embodiment 1, as shown in fig. 5, the device includes:

and the training module 10 is used for training the nonlinear 3DMM model by using a training set.

During training, a 2D face image sample input into a nonlinear 3DMM model is estimated by a CNN encoder to obtain a camera projection parameter, a shape parameter and a texture parameter, a multi-layer perception shape decoder decodes the shape parameter into a 3D shape, the CNN texture decoder decodes the texture parameter into a 3D texture, and a rendering layer obtains a rendering image according to the camera projection parameter, the 3D shape and the 3D texture; training parameters of a CNN encoder, a multi-layer perceptual shape decoder and a CNN texture decoder through a loss function;

and the prediction module 20 is configured to input the acquired 2D face image into the trained nonlinear 3DMM model to obtain a 3D face.

The 2D face image is estimated through a CNN encoder to obtain shape parameters and texture parameters, a multi-layer perception shape decoder decodes the shape parameters into a 3D shape, a CNN texture decoder decodes the texture parameters into a 3D texture, and a rendering layer performs 3D rendering according to the 3D shape, the 3D texture and a predefined 2D texture map to obtain a 3D face.

In view of the obstacles of the linear 3D mm in the prior art on its data, supervision and linear basis, the present invention learns a non-linear 3D mm model of facial shape and texture from a set of unconstrained 2D face images for 3D reconstruction by innovating the learning paradigm of 3D mm, and has the following outstanding advantages:

As an improvement of the present invention, the training module includes a pre-training unit and a fine-tuning unit, which are performed in sequence, wherein:

loss function L of pre-training unit₀Comprises the following steps:

L₀＝λ₁L₁+L₂+λ₃L₃+λ₄L₄

L₁is a loss of key point, L₂For 3D shape loss, L₃For 3D texture loss, L₄Is the projection parameter loss.

The loss function L of the trim unit is:

L＝L₆+λ₅L₅+λ₁L₁

L₅for the countermeasure loss, the generator is a nonlinear 3DMM model, and the discriminator is a discriminator of patchGAN; l is₆To reconstruct losses.

X (i, j) is the value of the rendered image at coordinate (i, j), Y (i, j) is the value of the 2D face image at coordinate (i, j), H, W is the height and width of the 2D face image, respectively.

λ₁～λ₅Are defined coefficients.

As another improvement of the present invention, the CNN encoder includes 14 convolutional layers, an AvgPool layer, and a full-link layer, which are connected in sequence, and the CNN encoder outputs a shape parameter and a texture parameter at the AvgPool layer; the CNN texture decoder comprises a full connection layer and 14 convolution layers which are connected in sequence, and the CNN texture decoder outputs 3D textures at the last convolution layer; the multi-layer perceptual shape decoder includes two fully connected layers.

In the prediction module of the present invention, the rendering layer performs 3D rendering according to a 3D shape, a 3D texture, and a predefined 2D texture map to obtain a 3D face, including:

a pre-defining unit for pre-defining a 2D texture map, each pixel point of the 2D texture map corresponding to a vertex of the 3D shape;

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Example 3:

the method provided by the embodiment of the present specification can implement the service logic through a computer program and record the service logic on a storage medium, and the storage medium can be read and executed by a computer, so as to implement the effect of the solution described in embodiment 1 of the present specification. Accordingly, the present invention also provides a computer-readable storage medium for nonlinear 3DMM face reconstruction corresponding to the nonlinear 3DMM face reconstruction method of embodiment 1, comprising a memory for storing processor-executable instructions, which when executed by the processor, implement the steps comprising the nonlinear 3DMM face reconstruction method of embodiment 1.

The storage medium may include a physical device for storing information, and typically, the information is digitized and then stored using an electrical, magnetic, or optical media. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.

The above description of the apparatus according to the method embodiment may also include other embodiments. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.

Example 4:

the invention also provides a device for nonlinear 3DMM face reconstruction, which can be a single computer, and can also comprise an actual operation device and the like using one or more methods or one or more embodiment devices of the specification. The apparatus for nonlinear 3d mm face reconstruction may include at least one processor and a memory storing computer-executable instructions, which when executed by the processor implement the steps of the nonlinear 3d mm face reconstruction method described in embodiment 1 above.

The above description of the device according to the method or apparatus embodiment may also include other embodiments, and specific implementation may refer to the description of the related method embodiment, which is not described herein in detail.

Example 5:

the embodiment of the invention provides a human face posture normalization method based on nonlinear 3DMM reconstruction, as shown in FIGS. 6 and 7, the method comprises the following steps:

step S100': and 3D reconstruction is carried out on the 2D face image by using the nonlinear 3DMM face reconstruction method in the embodiment 1 to obtain a 3D face.

This step is equivalent to steps S100 to S200 of embodiment 1, and the specific implementation method and beneficial effects thereof are described in embodiment 1, which is not described again in this embodiment.

Step S200': and carrying out posture normalization on the 3D face.

The 3D face pose normalization can accurately correct the face pose in a three-dimensional space.

Step S300': and projecting the 3D face after the posture normalization onto a two-dimensional plane to obtain a 2D face image after the posture normalization.

The purpose of the 3D face projection is to project the 3D face with normalized posture onto a two-dimensional plane, and further obtain a normalized face on the two-dimensional plane.

The human face posture normalization method based on nonlinear 3DMM reconstruction can accurately normalize the human face posture and effectively solve the problem of low human face recognition precision under the large-angle human face posture: on the basis of the nonlinear 3DMM face reconstruction in the embodiment 1, the invention further performs posture normalization on the 3D face in a three-dimensional space; and finally, projecting the normalized 3D human face onto a two-dimensional image plane in a projection mode, thereby completing the posture normalization of the human faces with different angles on the two-dimensional image. The invention can accurately normalize the face pose, can also accurately normalize the face under the condition of large face pose change, effectively solves the problem of reduction of face identification accuracy under the condition of large-pose face, and can accurately process the face under the large pose. Through experimental tests, the method improves the 3DMM reconstruction performance, and the accuracy of the normalized face based on the method on a face recognition test set reaches 99.93%.

As a modification of the present invention, step S200' includes:

step S210': a standard 3D pose face is predefined, and the standard 3D pose face and the 3D face have the same point cloud number.

Step S220': and performing matrixing storage on the standard 3D posture face and the 3D face and performing parameter fitting to obtain a conversion matrix.

Step S230': and performing matrix product on the conversion matrix and the 3D face matrix, and performing posture rotation on the 3D face to finish the posture normalization of the 3D face.

The 3D face pose normalization is used as an important step in 2D image face pose normalization, and aims to solve a conversion matrix between the 3D face pose normalization and a predefined standard 3D pose face, and perform pose rotation on the 3D face through the conversion matrix so as to finish 3D pose normalization. Specifically, for convenience of calculation and 3D rotation, the number of point clouds of the predefined standard 3D pose face is the same as the number of point clouds of the 3D face estimated in the present invention. When the conversion matrix is calculated, the predefined standard 3D pose face and the estimated 3D face data are respectively subjected to matrixing storage, and the affine matrix of the two matrixes is solved to be the conversion matrix. And then performing matrix product on the conversion matrix and the estimated 3D face matrix to finish the normalization of the 3D posture.

After the 3D posture normalization processing is completed, the posture normalization of the final 2D face image can be completed only by projecting the 3D texture and the 3D shape of the 3D face to the 2D image plane. Specifically, step S300' includes:

step S310': and dividing the 3D face into 3D meshes according to the vertexes of the 3D face, and coloring the 3D meshes through bilinear interpolation.

When the method is used for projection, the 3D face needs to be subjected to mesh division, the 3D face is divided into triangular or quadrangular 3D meshes according to the vertexes of the 3D face, the method is preferably divided into the triangular meshes, and the obtained triangular meshes are also called triangular patches.

Since the texture value of each vertex of the 3D face is determined by the predefined position in the 2D texture map during projection, the 3D mesh needs to be rendered by bilinear down-sampling when the vertices are subdivided into meshes to ensure high accuracy of projection.

Step S320': and rendering by using a Z cache renderer, and projecting the 3D face onto a two-dimensional plane.

The invention uses a Z Buffer renderer for rendering. Z buffer rendering is performed according to the distance (i.e., Z value) between each spatial triangle patch and the viewer. If the Z value in the Z Buffer is larger, the triangular patch is closer to the observer, and the triangular patch should be rendered at the corresponding position by the color of the patch, and the Z value in the Z Buffer is updated. On the contrary, if the Z value in the Z Buffer is small, it indicates that the current triangular patch is relatively far away, and the current triangular patch is covered by the preceding triangular patch, so that rendering is not needed, and the Z value does not need to be updated. The Z Buffer carries out hidden surface elimination, and when the rendering is carried out, structures behind other objects are blanked to enable the structures not to be displayed, so that the visible part of an observer can be obtained through the Z Buffer, the invisible part of the observer is blanked, the final rendering result is that the visible part of the observer is rendered, and the 3D face is projected onto a two-dimensional plane.

The method provided by the embodiment 1 is used for 3D reconstruction, then the 3D reconstructed face pose is regularized by solving an affine matrix between the 3D reconstructed face and a predefined pose regularization face (standard 3D pose face), and finally the pose regularized 3D face is projected to a 2D image plane to complete face pose normalization. The method can accurately normalize the face posture while improving the 3DMM reconstruction performance.

In the face pose normalization method based on nonlinear 3DMM reconstruction provided in the embodiment of the present invention, the 3D reconstruction method is the nonlinear 3DMM face reconstruction method described in embodiment 1, the implementation principle and the generated technical effect are the same as those of embodiment 1, and for brief description, corresponding contents in embodiment 1 may be referred to where this embodiment does not refer to.

Example 6:

the embodiment of the present invention provides a human face posture normalization device based on nonlinear 3DMM reconstruction, corresponding to the human face posture normalization method based on nonlinear 3DMM reconstruction of embodiment 5, as shown in fig. 8, the device includes:

and a 3D reconstruction module 10' configured to perform 3D reconstruction on the 2D face image by using the nonlinear 3D dm face reconstruction apparatus described in embodiment 2, so as to obtain a 3D face.

And the 3D face normalization module 20' is used for performing pose normalization on the 3D face.

And the projection module 30' is used for projecting the 3D face after the posture normalization onto a two-dimensional plane to obtain a 2D face image after the posture normalization.

The human face posture normalization device based on nonlinear 3DMM reconstruction can accurately normalize the human face posture and effectively solve the problem of low human face recognition precision under the large-angle human face posture: on the basis of the nonlinear 3DMM face reconstruction in the embodiment 2, the invention further performs posture normalization on the 3D face in a three-dimensional space; and finally, projecting the normalized 3D human face onto a two-dimensional image plane in a projection mode, thereby completing the posture normalization of the human faces with different angles on the two-dimensional image. The invention can accurately normalize the face pose, can also accurately normalize the face under the condition of large face pose change, effectively solves the problem of reduction of face identification accuracy under the condition of large-pose face, and can accurately process the face under the large pose. Through experimental tests, the method improves the 3DMM reconstruction performance, and the accuracy of the normalized face based on the method on a face recognition test set reaches 99.93%.

As an improvement of the present invention, the 3D face normalization module includes:

and the predefining unit is used for predefining a standard 3D pose face, and the standard 3D pose face and the 3D face have the same point cloud number.

And the parameter fitting unit is used for matrixing and storing the standard 3D posture face and the 3D face and performing parameter fitting to obtain a conversion matrix.

And the normalization unit is used for performing matrix product on the conversion matrix and the 3D face matrix to finish the posture normalization of the 3D face.

The projection module includes:

and the coloring unit is used for dividing the 3D face into 3D meshes according to the vertexes of the 3D face and coloring the 3D meshes through bilinear interpolation.

Example 7:

the method provided by the present specification and described in the foregoing embodiment may implement the service logic through a computer program and record the service logic on a storage medium, where the storage medium may be read and executed by a computer, so as to implement the effect of the solution described in embodiment 5 of the present specification. Accordingly, the present invention also provides a computer-readable storage medium for face pose normalization corresponding to the non-linear 3DMM reconstruction based face pose normalization method of embodiment 5, comprising a memory for storing processor-executable instructions, which when executed by the processor, implement the steps comprising the non-linear 3DMM reconstruction based face pose normalization method of embodiment 5.

The human face posture normalization device based on nonlinear 3DMM reconstruction can accurately normalize the human face posture and effectively solve the problem of low human face recognition precision under the large-angle human face posture: on the basis of the nonlinear 3DMM face reconstruction in the embodiment 1, the invention further performs posture normalization on the 3D face in a three-dimensional space; and finally, projecting the normalized 3D human face onto a two-dimensional image plane in a projection mode, thereby completing the posture normalization of the human faces with different angles on the two-dimensional image. The invention can accurately normalize the face pose, can also accurately normalize the face under the condition of large face pose change, effectively solves the problem of reduction of face identification accuracy under the condition of large-pose face, and can accurately process the face under the large pose. Through experimental tests, the method improves the 3DMM reconstruction performance, and the accuracy of the normalized face based on the method on a face recognition test set reaches 99.93%.

Example 8:

the invention also provides a device for normalizing the human face pose, which can be a single computer, and can also comprise an actual operation device and the like using one or more methods or one or more embodiment devices of the specification. The apparatus for face pose normalization may comprise at least one processor and a memory storing computer executable instructions, which when executed by the processor, implement the steps of the method for face pose normalization based on nonlinear 3d mm reconstruction described in embodiment 5 above.

It should be noted that, the above-mentioned apparatus or system in this specification may also include other implementation manners according to the description of the related method embodiment, and a specific implementation manner may refer to the description of the method embodiment, which is not described herein in detail. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class, storage medium + program embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A nonlinear 3DMM face reconstruction method, the method comprising:

training a nonlinear 3DMM model by using a training set;

2. The nonlinear 3DMM face reconstruction method according to claim 1, wherein the training of the nonlinear 3DMM model includes a pre-training phase and a fine-tuning phase performed in sequence, wherein:

loss function L of the pre-training phase₀Comprises the following steps:

L₀＝λ₁L₁+L₂+λ₃L₃+λ₄L₄

the loss function L of the fine tuning stage is:

L＝L₆+λ₅L₅+λ₁L₁

λ₁～λ₅are defined coefficients.

3. The nonlinear 3DMM face reconstruction method according to claim 1 or 2, wherein the CNN encoder includes 14 convolutional layers, an AvgPool layer and a fully connected layer which are connected in sequence, and the CNN encoder outputs shape parameters and texture parameters at the AvgPool layer; the CNN texture decoder comprises a full connection layer and 14 convolution layers which are connected in sequence, and the CNN texture decoder outputs a 3D texture at the last convolution layer; the multi-layer perceptual shape decoder includes two fully connected layers.

4. The nonlinear 3DMM face reconstruction method of claim 3, wherein the rendering layer performs 3D rendering according to the 3D shape, 3D texture and predefined 2D texture map to obtain 3D face, comprising:

5. A non-linear 3DMM face reconstruction apparatus, comprising:

6. The nonlinear 3DMM face reconstruction apparatus of claim 5, wherein the training module comprises a pre-training unit and a fine-tuning unit that are performed sequentially, wherein:

loss function L of the pre-training unit₀Comprises the following steps:

L₀＝λ₁L₁+L₂+λ₃L₃+λ₄L₄

the loss function L of the fine tuning unit is:

L＝L₆+λ₅L₅+λ₁L₁

λ₁～λ₅are defined coefficients.

7. The nonlinear 3DMM face reconstruction apparatus according to claim 5 or 6, wherein the CNN encoder includes 14 convolutional layers, an AvgPool layer and a fully connected layer which are connected in sequence, and the CNN encoder outputs shape parameters and texture parameters at the AvgPool layer; the CNN texture decoder comprises a full connection layer and 14 convolution layers which are connected in sequence, and the CNN texture decoder outputs a 3D texture at the last convolution layer; the multi-layer perceptual shape decoder includes two fully connected layers.

8. The apparatus of claim 7, wherein in the prediction module, the rendering layer performs 3D rendering according to the 3D shape, 3D texture and predefined 2D texture map to obtain a 3D face, and the apparatus comprises:

9. A computer readable storage medium for non-linear 3DMM face reconstruction, comprising a memory for storing processor executable instructions which, when executed by the processor, perform steps comprising the non-linear 3DMM face reconstruction method of any of claims 1-4.

10. An apparatus for nonlinear 3d dm face reconstruction, comprising at least one processor and a memory storing computer-executable instructions, which when executed by the processor, implement the steps of the nonlinear 3d dm face reconstruction method of any of claims 1-4.

11. A method for normalizing human face pose based on nonlinear 3DMM reconstruction is characterized by comprising the following steps:

3D reconstruction is carried out on the 2D face image by using the nonlinear 3DMM face reconstruction method of any one of claims 1 to 4 to obtain a 3D face;

carrying out posture normalization on the 3D face;

12. The method of claim 11, wherein the pose normalization of the 3D face comprises: