CN119206101B - Editable facial three-dimensional reconstruction method, system and storage medium - Google Patents
Editable facial three-dimensional reconstruction method, system and storage medium Download PDFInfo
- Publication number
- CN119206101B CN119206101B CN202411742013.6A CN202411742013A CN119206101B CN 119206101 B CN119206101 B CN 119206101B CN 202411742013 A CN202411742013 A CN 202411742013A CN 119206101 B CN119206101 B CN 119206101B
- Authority
- CN
- China
- Prior art keywords
- expression
- model
- expressive
- image data
- point cloud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000001815 facial effect Effects 0.000 title claims description 77
- 230000014509 gene expression Effects 0.000 claims abstract description 168
- 239000013598 vector Substances 0.000 claims abstract description 85
- 230000007935 neutral effect Effects 0.000 claims abstract description 69
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000013528 artificial neural network Methods 0.000 claims abstract description 27
- 238000009877 rendering Methods 0.000 claims abstract description 12
- 230000008569 process Effects 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 23
- 230000008859 change Effects 0.000 claims description 22
- 238000010276 construction Methods 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000006073 displacement reaction Methods 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 4
- 238000011084 recovery Methods 0.000 claims description 4
- 238000013256 Gubra-Amylin NASH model Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 11
- 230000005855 radiation Effects 0.000 abstract description 5
- 230000001537 neural effect Effects 0.000 abstract description 4
- 238000011282 treatment Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 206010003591 Ataxia Diseases 0.000 description 1
- 206010010947 Coordination abnormal Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 238000002316 cosmetic surgery Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 208000016290 incoordination Diseases 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Computer Graphics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image data processing, and particularly relates to an editable face three-dimensional reconstruction method, an editable face three-dimensional reconstruction system and a storage medium. The method mainly comprises the following steps of obtaining the non-expression image data and the expression image data of a modeler, obtaining a neutral model and a true expression model, training a deformation field, a color field and a super-resolution neural network, training an expression encoder to construct an expression base vector, forming an expression face three-dimensional model of the modeler through the expression base vector and a finally generated model, and outputting a rendering image. The invention further provides a system for realizing the method. The invention carries out three-dimensional reconstruction of the face based on the neural radiation field technology, can generate an accurate, natural, beautiful and real three-dimensional model of the face, can carry out flexible and convenient personalized adjustment, and has good application prospect in the fields of medical treatment and medical science.
Description
Technical Field
The invention belongs to the technical field of image data processing, and particularly relates to an editable face three-dimensional reconstruction method, an editable face three-dimensional reconstruction system and a storage medium.
Background
Facial scanning is a technique for generating a three-dimensional model of a human face using a facial image or video, and has an important role in the medical field. It can provide high quality facial morphology information to doctors and patients, helping to diagnose, treat and evaluate various facial related problems such as plastic surgery, mouth repair, maxillofacial deformity, etc. Facial scanning provides high-quality facial morphology information for doctors and patients, assists diagnosis, treatment and evaluation, enhances confidence and satisfaction, can also provide a large amount of facial data for medical research, and promotes development and innovation of medical knowledge.
The existing face scanning mainly comprises the following schemes of direct human body measurement, face scanning based on radiology, a face scanner, a face three-dimensional reconstruction technology based on deep learning and a face three-dimensional reconstruction technology based on a nerve radiation field technology.
The face three-dimensional reconstruction technology based on the deep learning is based on face images, marks face feature points and generates a three-dimensional model through a model with priori knowledge. The method has the advantages that a large amount of face data can be utilized for training, the reconstruction efficiency and stability are improved, and the conditions of certain shielding, expression, gesture and other changes can be processed. The disadvantage is that the accuracy and detail of the reconstruction is limited by the complexity and expressive power of the model, it is difficult to capture individual differences and subtle features, and pre-trained face shape and texture models, or deep learning based networks, are required as input a priori information.
The three-dimensional face reconstruction technology based on the neural radiation field technology shoots the face from multiple view angles, and learns three-dimensional geometric and texture information of the face through images, camera pose and a neural network. The method can scan the face by using a common camera or a mobile phone without special scanning equipment, thereby reducing the hardware cost and the use difficulty and improving the popularity and convenience of the face scanning technology. The high-quality three-dimensional facial image can be rendered from any view angle, is not influenced by factors such as shielding, expression, gesture and the like, and improves the accuracy and detail of facial scanning.
As can be seen, there are many methods and studies for facial scanning in the prior art. However, these prior art techniques still have the following problems:
1. the model or the network is required to be dependent on a pre-trained face model or network, and the model or the network often has certain limitation and deviation, is difficult to adapt to the changes of different face shapes, textures, expressions, postures and the like, is difficult to capture individual differences and fine characteristics, and causes the phenomena of discomfort, incoordination, unrealistic and the like of the edited face model in different scenes to influence the flexibility and the naturalness of face editing.
2. The face editing technology based on the 3D model is lacking, namely, the shape, color, expression, teeth and other attributes of the face of the patient cannot be modified and adjusted according to the requirements of the patient or doctor, and a new three-dimensional image is generated, so that personalized face design and preview are realized. The personalized requirements of the medical field cannot be met.
Accordingly, there is a need in the art to develop new techniques for constructing editable three-dimensional models of faces from facial scan data.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an editable face three-dimensional reconstruction method, an editable face three-dimensional reconstruction system and a storage medium.
An editable face three-dimensional reconstruction method comprising the steps of:
Obtaining modeler non-expression image data and expression image data shot at a plurality of angles;
Generating a head point cloud by using the expressionless image data to obtain a neutral model;
generating head point cloud by using the expressive image data to obtain a real expressive model;
training a deformation field and a color field by using a true expression model;
the neutral model obtains a model which is finally output through a color field and a deformation field;
Obtaining a rendered image by utilizing the finally output model, and training a super-resolution neural network by using the rendered image and the expressive image data;
training an expression encoder using the non-expressive image data and the expressive image data;
After the facial feature points of the patient in the non-expression image data are dragged, the facial feature points are sent to the expression encoder, and the coordinate of the dragged feature points is encoded into an expression base vector by the expression encoder;
the neutral model passes through the expression base vector and the finally output model to form a facial three-dimensional model with expression of a modeler;
And the facial three-dimensional model with the expression obtains a high-definition rendering chart through a super-resolution network.
Preferably, the non-expressive image data is selected from at least one of a picture or a video, and the expressive image data is selected from at least one of a picture or a video.
Preferably, the construction of the neutral model comprises the following steps:
Step a, preprocessing the expression-free image data;
Step b, reconstructing the pose by using Colmap, performing feature extraction and feature matching, and generating a sparse point cloud;
step c, training the sparse point cloud as an initial point cloud of a 3D Gaussian Splatting algorithm to generate a head point cloud;
and D, inputting DMTet the head point cloud to generate a 3D model, namely a neutral model.
Preferably, the training process of the expression encoder is as follows:
Inputting facial feature points in the non-expression image data and the expression image data, and obtaining expression base vectors through an expression encoder;
Generating an expressive image and expressive facial feature points through the expression base vectors, and carrying out loss calculation on the expressive image and the expressive facial feature points and the facial feature points of the true expressive image data and the true expressive image data respectively, so that the expression encoder is optimized continuously;
the specific process of constructing the expression basis vector is expressed as follows:
Zexp= Eexp(Iin, P0)
Wherein Z exp is an expression base vector, E exp is an expression encoder, I in is non-expression image data, and P 0 is 68 feature points of the expression image data;
in the process of training the expression encoder, the following loss function is adopted:
Lexp= Lgen+ Lf1+ αLf2
Wherein, L exp is a Loss function for constructing expression basis vectors, L gen is a mean square error Loss of each pixel point of a newly generated image and an original image, L f1 is MSE Loss of characteristic points of a face after characteristic recovery and original face characteristics, L f2 is a Loss between characteristic points of a surface expression image and the original characteristic points, and alpha is a switch for generating surface expression characteristic Loss.
Preferably, the training process of the deformation field, the color field and the super-resolution network is as follows:
The neutral model and the expression base vector obtain an expressed sparse point cloud through a deformation field and a color field;
The low-resolution image rendered by the expression sparse point cloud obtains a corresponding high-resolution image through a super-resolution neural network;
The neutral model comprises a neutral point cloud position vector and a neutral point cloud characteristic point color vector, wherein the neutral point cloud position vector is a vector formed by position coordinates of expressive characteristic points with position changes;
In a deformation field, the neutral point cloud position vector and the expression base vector form a matrix through shape change, are spliced in a channel dimension to form tensors with the channel number of 4, the tensors perform feature extraction and fusion through ResNet, and finally, the position offset of corresponding point positions relative to a neutral model is mapped through three full-connection layers;
in the color field, the neutral point cloud position vector, the neutral point cloud characteristic point color vector and the expression base vector form a matrix through shape change, the matrix is spliced in the channel dimension to form tensors with the channel number of 4, the tensors perform characteristic extraction and fusion through ResNet, and finally the position offset of the corresponding point position relative to the neutral model is mapped through three full-connection layers.
Preferably, the following three loss functions are used in training the deformation field and the color field:
LRGB= ||Irgb– Igt||1
Lsil= 1- IOU(M, Mgt)
Ldef= ||P - Pgt||2
wherein, L RGB is RGB Loss, L sil is contour Loss, L def is face feature point Loss, I rgb is RGB value output by a color field, I gt is RGB value of a corresponding point of a real expressive model, M is facial contour of a point cloud, M gt is facial contour of the real expressive model, IOU is intersection ratio of two contours, P gt is position of a point of the real expressive model, P is position of a corresponding point of the point cloud, in a calculation formula of L RGB, subscript 1 represents L1 Loss function, and subscript 2 represents L2 Loss function in a calculation formula of L def;
P = P0+ fdef(P0, zexp)
P 0 is the position of the corresponding point of the neutral point cloud, Z exp is the expression base vector, and f def is the deformation field;
in the process of training the deformation field and the color field, a priori constraint L offset is introduced, the L offset penalizes all non-zero displacements,
Loffset=λ1counter(fdef(P0, zexp) = 0)
Wherein lambda 1 is weight for scaling loss, counter (f def(P0, zexp) =0) represents data for calculating all offsets in the feature points to be zero;
in the process of training the super-resolution neural network, the following loss function is adopted:
L =λ2||Ihr− Igt||1+ (1-λ2)||Ilr− Igt||1
Wherein, I hr is a high-definition image after resolution improvement of the super-resolution neural network, I lr is a low-resolution image rendered by the point cloud, I gt is an image with a real expression, and lambda 2 is a weight.
The invention also provides a system for realizing the three-dimensional reconstruction method of the editable face, which comprises the following steps:
An input module configured to acquire a series of non-expressive image data and expressive image data of a modeler;
a neutral model construction module configured to generate a head point cloud using the expressionless image data to obtain a neutral model;
the true expressive model construction module is configured to generate head point clouds by using the expressive image data to obtain a true expressive model;
A deformation field and color field training module configured to train the deformation field and color field using the true expressive model;
The final output model training module is configured to obtain a final output model through a color field and a deformation field by the neutral model;
A super-resolution neural network module configured to obtain a rendered image using the finally output model, train a super-resolution neural network using the rendered image and the expressive image data;
The expression base vector construction module is configured to drag the facial feature points of the patient in the non-expression image data, and then send the facial feature points into the expression encoder, and the expression encoder encodes the dragged feature point coordinates into an expression base vector;
A facial three-dimensional model construction module configured to construct a modeler's expressive facial three-dimensional model by passing a neutral model through the expression basis vector and the final output model;
and the high-definition rendering module is configured to obtain a high-definition rendering chart from the facial three-dimensional model with the expression through a super-resolution network.
Preferably, the method further comprises:
and the dynamic adjustment module is configured to change the positions of the characteristic points, adjust the facial three-dimensional model and generate a new facial three-dimensional model.
Preferably, in the dynamic adjustment module, the process of generating the new face three-dimensional model comprises generating a new expression base according to the new feature point position, and generating the new face three-dimensional model by using a deformation field and a color field;
or, in the dynamic adjustment module, generating feature points corresponding to at least one expression or facial form through a GAN model, wherein the feature points corresponding to the at least one expression or facial form are integrated into at least one option;
Or, in the dynamic adjustment module, generating a sequence feature point change through a transducer model, and generating a change of a sequence face three-dimensional model according to the sequence feature point change.
The present invention also provides a computer-readable storage medium having stored thereon a computer program for implementing the above-described editable face three-dimensional reconstruction method, or a computer program for implementing the above-described system.
The invention carries out three-dimensional reconstruction of the human face based on the neural radiation field technology, and can output an edited high-definition three-dimensional model of the human face through editing the characteristic points of the human face, a color field, a deformation field and a super-resolution neural network, and the technical scheme of the invention achieves the following beneficial technical effects:
1. By moving the face feature points and other operations, the face three-dimensional model can be modified, so that the face of the modeler can be finely edited, and meanwhile, the edited model can still keep basic facial features and identity features of the modeler. Can meet the personalized requirements of the medical field. In the preferred scheme, feature points corresponding to the common expressions and the facial shapes can be made into options, the corresponding face three-dimensional model can be generated by clicking, complex manual operation is not needed, and the flexibility and convenience of adjusting the face three-dimensional model are improved.
2. The invention utilizes the combined action of a plurality of multi-layer perceptron (MLP) neural networks (comprising an expression encoder, a deformation field and a color field) to generate an edited face three-dimensional model, further adopts a super-resolution network to improve the resolution of the picture, and further generates an expression base feature vector according to feature points. Therefore, the three-dimensional model of the face ensures the naturalness and the aesthetic property of the face shaping, and improves the authenticity and the individuation.
In conclusion, the three-dimensional model of the face generated by the method is accurate, natural, attractive and real, can be flexibly and conveniently subjected to personalized adjustment, and has good application prospects in the fields of medical treatment and medical science.
It should be apparent that, in light of the foregoing, various modifications, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
The above-described aspects of the present invention will be described in further detail below with reference to specific embodiments in the form of examples. It should not be understood that the scope of the above subject matter of the present invention is limited to the following examples only. All techniques implemented based on the above description of the invention are within the scope of the invention.
Drawings
FIG. 1 is an exemplary diagram of a dataset used to construct an expression basis vector;
FIG. 2 is a diagram of a network structure for constructing expression basis vectors;
FIG. 3 is a schematic diagram of a process for training a deformation field, a color field, and a super-resolution neural network.
Fig. 4 is a schematic diagram of specific structures of deformation fields and color fields.
Detailed Description
It should be noted that, in the embodiments, algorithms of steps such as data acquisition, transmission, storage, and processing, which are not specifically described, and hardware structures, circuit connections, and the like, which are not specifically described may be implemented through the disclosure of the prior art.
Example 1 editable face three-dimensional reconstruction method and System
The present embodiment provides a system for three-dimensional reconstruction of a face, specifically including:
An input module configured to acquire a series of non-expressive image data and expressive image data of a modeler;
a neutral model construction module configured to generate a head point cloud using the expressionless image data to obtain a neutral model;
the true expressive model construction module is configured to generate head point clouds by using the expressive image data to obtain a true expressive model;
A deformation field and color field training module configured to train the deformation field and color field using the true expressive model;
The final output model training module is configured to obtain a final output model through a color field and a deformation field by the neutral model;
A super-resolution neural network module configured to obtain a rendered image using the finally output model, train a super-resolution neural network using the rendered image and the expressive image data;
The expression base vector construction module is configured to drag the facial feature points of the patient in the non-expression image data, and then send the facial feature points into the expression encoder, and the expression encoder encodes the dragged feature point coordinates into an expression base vector;
A facial three-dimensional model construction module configured to construct a modeler's expressive facial three-dimensional model by passing a neutral model through the expression basis vector and the final output model;
the high-definition rendering module is configured to obtain a high-definition rendering diagram of the facial three-dimensional model with the expression through a super-resolution network;
and the dynamic adjustment module is configured to change the positions of the characteristic points, adjust the facial three-dimensional model and generate a new facial three-dimensional model.
In the dynamic adjustment module, the process of generating the new face three-dimensional model comprises the steps of generating a new expression base according to the new feature point position, generating the new face three-dimensional model by using a deformation field and a color field, generating feature points corresponding to at least one type of expression or facial form through a GAN model for facilitating quick selection of a user, integrating the feature points corresponding to the at least one type of expression or facial form into at least one option, and generating sequence feature point change through a transducer model for dynamically demonstrating the face three-dimensional model, and generating sequence face three-dimensional model change according to the sequence feature point change.
The method for carrying out the three-dimensional reconstruction of the face by adopting the system specifically comprises the following steps:
Step 1, acquiring the non-expressive image data and the expressive image data of modelers shot at a plurality of angles, wherein the non-expressive image data is selected from at least one of pictures or videos, and the expressive image data is selected from at least one of pictures or videos. The pictures or videos can be acquired by adopting a common mobile phone or camera equipment. As a preferred approach, the video is acquired by 360 encircling the modeler's head or spiral encircling recordings of the patient's face in alignment. The collected content of the expressive image (video screen) data comprises expressions such as nodding, shaking head, blinking, smiling, opening mouth and the like.
Step 2, generating head point clouds by using the expression-free image data to obtain a neutral model; generating head point cloud by using the expressive image data to obtain a real expressive model;
Training a deformation field and a color field by using a true expression model, wherein the neutral model obtains a model which is finally output through the color field and the deformation field;
and 4, obtaining a rendering image by utilizing the finally output model, and training the super-resolution neural network by using the rendering image and the expressive image data of the modeler photographed at a plurality of angles. In training, image data output by the super-resolution neural network can be further added into training data.
Step 5, training an expression encoder by using the non-expression image data and the expression image data;
Step 6, dragging the facial feature points of the patient in the non-expression image data, and then sending the facial feature points to the expression encoder, wherein the expression encoder encodes the coordinates of the dragged feature points into expression base vectors;
Step 7, the neutral model passes through the expression base vector and the finally output model to form a facial three-dimensional model with expression of a modeler;
And 8, obtaining a high-definition rendering chart by the facial three-dimensional model with the expression through a super-resolution network.
In step 2, the construction of the neutral model includes the following steps:
Step 2.1, preprocessing the expression-free image data, taking video data as an example, wherein the preprocessing process comprises the following steps:
2.2, reconstructing the pose by using Colmap, and performing feature extraction and feature matching to generate a sparse point cloud;
Step 2.3, training the sparse point cloud as an initial point cloud of a 3D Gaussian Splatting algorithm to generate a head point cloud;
And 2.4, inputting DMTet the head point cloud to generate a 3D model, namely a neutral model.
In step 5, a data set is first constructed according to the video with the table request, and each data set is shown in fig. 1 and includes an unoccupied face picture, an expressed face picture and expressed face feature points. In order to be able to control expressions more finely and to do this, the 3D model can be controlled using feature points. The present study uses an encoder to extract the expression features and construct the expression basis vectors. The training process of the expression encoder is shown in fig. 2, and specifically comprises the following steps:
Inputting facial feature points in the non-expression image data and the expression image data, and obtaining expression base vectors through an expression encoder;
And generating an expressive image and expressive facial feature points through the expression base vectors, and carrying out loss calculation on the expressive image and the expressive facial feature points and the facial feature points of the true expressive image data and the true expressive image data respectively, so that the expression encoder is optimized continuously.
The process of constructing the expression basis vector is expressed as:
Zexp= Eexp(Iin, P0)
Wherein Z exp is an expression base vector, E exp is an expression encoder, I in is non-expression image data, and P 0 is 68 feature points of the expression image data. E exp compresses the image into a 1024 feature vector.
In the process of training the expression encoder, the following loss function is adopted:
Lexp= Lgen+ Lf1+ αLf2
Wherein, L exp is a Loss function for constructing expression basis vectors, L gen is a mean square error Loss of each pixel point of a newly generated image and an original image, L f1 is MSE Loss of characteristic points of a face after characteristic recovery and original face characteristics, L f2 is a Loss between characteristic points of a surface expression image and the original characteristic points, and alpha is a switch for generating surface expression characteristic Loss.
Lexp= ||Iin- Iout||2+ ||P0– P1||2+ α||P0– P2||2
α = 0.1*I(Lgen<0.1)
P 1 is a face feature point (i.e., "expressive feature point" in fig. 2) generated through the feature recovery network. I out is the result after decoding by the decoder, and P 2 is the face feature after generating the expression (i.e. "expression+feature point" in fig. 2). It should be noted that the generated expression image cannot be subjected to feature recognition in the initial stage of network training, so that α is set as a switch for generating expression feature loss in the study, and I is an indication function, which indicates that L f2 loss is only started when L gen is smaller than a certain threshold.
In step 4, the training process of the deformation field, the color field and the super-resolution network is shown in fig. 3, and the specific steps are as follows:
The neutral model and the expression base vector obtain an expressed sparse point cloud through a deformation field and a color field;
The low-resolution image rendered by the expression sparse point cloud obtains a corresponding high-resolution image through a super-resolution neural network;
The neutral model comprises a neutral point cloud position vector and a neutral point cloud characteristic point color vector, wherein the neutral point cloud position vector is a vector formed by position coordinates of expressive characteristic points with position changes, and the neutral point cloud characteristic point color vector is a vector formed by color values of the expressive characteristic points with position changes.
The structure of the deformation field and the color field is shown in fig. 4.
In a deformation field, the neutral point cloud position vector and the expression base vector form a matrix through shape change, are spliced in a channel dimension to form tensors with the channel number of 4, the tensors perform feature extraction and fusion through ResNet, and finally, the position offset of corresponding point positions relative to a neutral model is mapped through three full-connection layers;
in the color field, the neutral point cloud position vector, the neutral point cloud characteristic point color vector and the expression base vector form a matrix through shape change, the matrix is spliced in the channel dimension to form tensors with the channel number of 4, the tensors perform characteristic extraction and fusion through ResNet, and finally the position offset of the corresponding point position relative to the neutral model is mapped through three full-connection layers.
In training the deformation field and the color field, the following three loss functions are adopted:
LRGB= ||Irgb– Igt||1
Lsil= 1- IOU(M, Mgt)
Ldef= ||P - Pgt||2
Wherein, L RGB is RGB Loss, L sil is contour Loss, L def is face feature point Loss, I rgb is RGB value of color field output (namely RGB value of 'high resolution image' in figure 3), I gt is RGB value of corresponding point of real expressive model (namely RGB value of 'real expressive image' in figure 3), M is facial contour of sparse expression point cloud, M gt is facial contour of real expressive model, IOU is intersection ratio of two contours, P gt is position of real expressive model point, P is position of corresponding point of sparse expression point cloud, subscript 1 in the calculation formula of L RGB represents L1 Loss function, subscript 2in the calculation formula of L def represents L2 Loss function;
P = P0+ fdef(P0, zexp)
P 0 is the position of the corresponding point of the neutral point cloud, Z exp is the expression base vector, and f def is the deformation field;
in the process of training the deformation field and the color field, a priori constraint L offset is introduced, the L offset penalizes all non-zero displacements,
Loffset=λ1counter(fdef(P0, zexp) = 0)
Where λ 1 is a weight for scaling the loss, and counter (f def(P0, zexp) =0) represents data in which all offsets in the feature points are calculated to be zero.
In the process of training the super-resolution neural network, the following loss function is adopted:
L =λ2||Ihr− Igt||1+ (1-λ2)||Ilr− Igt||1
Wherein, I hr is a high-definition image after resolution improvement of the super-resolution neural network, I lr is a low-resolution image rendered by the point cloud, I gt is an image with a real expression, and lambda 2 is a weight.
The method for using the model constructed through the process through the dynamic adjustment module comprises the following steps:
The user can drag the feature points on the non-expressive picture of the modeler of the facial three-dimensional model, and the dynamic adjustment module can generate new expression bases according to the new feature point positions, so that a new facial three-dimensional model is generated by using deformation fields and color field control.
As a preferred mode, for convenient use, the dynamic adjustment module can train and integrate GAN to generate feature points of appointed expression or facial form, make option keys, and a user clicks the keys to generate corresponding feature points, and then drag and finely adjust the feature points through a mouse. Thereby performing quick facial editing.
Preferably, in order to show the feature point change in the dynamic process, the dynamic adjustment module may integrate a transducer, which is used to generate the sequence feature point change, so that the dynamic three-dimensional model of the face can be directly observed.
According to the embodiment, the face three-dimensional reconstruction method based on the neural radiation field technology can generate an accurate, natural, attractive and real face three-dimensional model, can perform flexible and convenient personalized adjustment, and has good application prospects in the fields of medical treatment and medical science.
Claims (7)
1. An editable face three-dimensional reconstruction method is characterized by comprising the following steps:
Obtaining modeler non-expression image data and expression image data shot at a plurality of angles;
Generating a head point cloud by using the expressionless image data to obtain a neutral model;
generating head point cloud by using the expressive image data to obtain a real expressive model;
training a deformation field and a color field by using a true expression model;
the neutral model obtains a model which is finally output through a color field and a deformation field;
Obtaining a rendered image by utilizing the finally output model, and training a super-resolution neural network by using the rendered image and the expressive image data;
training an expression encoder using the non-expressive image data and the expressive image data;
After the facial feature points of the patient in the non-expression image data are dragged, the facial feature points are sent to the expression encoder, and the coordinate of the dragged feature points is encoded into an expression base vector by the expression encoder;
The expression base vector and the finally output model form a facial three-dimensional model with expression of a modeler;
the facial three-dimensional model with the expression obtains a high-definition rendering chart through a super-resolution network;
the training process of the super-resolution network comprises the following steps:
The neutral model and the expression base vector obtain an expressed sparse point cloud through a deformation field and a color field;
The low-resolution image rendered by the expression sparse point cloud obtains a corresponding high-resolution image through a super-resolution neural network;
the training process of the deformation field and the color field is as follows:
The neutral model comprises a neutral point cloud position vector and a neutral point cloud characteristic point color vector, wherein the neutral point cloud position vector is a vector formed by position coordinates of expressive characteristic points with position changes;
In a deformation field, the neutral point cloud position vector and the expression base vector form a matrix through shape change, are spliced in a channel dimension to form tensors with the channel number of 4, the tensors perform feature extraction and fusion through ResNet, and finally, the position offset of corresponding point positions relative to a neutral model is mapped through three full-connection layers;
in a color field, the neutral point cloud position vector, the neutral point cloud characteristic point color vector and the expression base vector form a matrix through shape change, splice is carried out on channel dimensions to form tensors with the number of channels being 4, feature extraction and fusion are carried out on the tensors through ResNet, and finally, the position offset of corresponding point positions relative to a neutral model is mapped through three full-connection layers;
in the training of deformation fields and color fields, the following three loss functions are adopted:
LRGB = ||Irgb – Igt||1
Lsil = 1- IOU(M, Mgt)
Ldef = ||P - Pgt||2
wherein, L RGB is RGB Loss, L sil is contour Loss, L def is face feature point Loss, I rgb is RGB value output by a color field, I gt is RGB value of a corresponding point of a real expressive model, M is facial contour of a point cloud, M gt is facial contour of the real expressive model, IOU is intersection ratio of two contours, P gt is position of a point of the real expressive model, P is position of a corresponding point of the point cloud, in a calculation formula of L RGB, subscript 1 represents L1 Loss function, and subscript 2 represents L2 Loss function in a calculation formula of L def;
P = P0 + fdef(P0, zexp)
P 0 is the position of the corresponding point of the neutral point cloud, Z exp is the expression base vector, and f def is the deformation field;
in the process of training the deformation field and the color field, a priori constraint L offset is introduced, the L offset penalizes all non-zero displacements,
Loffset =λ1counter(fdef(P0, zexp) = 0)
Wherein lambda 1 is weight for scaling loss, counter (f def(P0, zexp) =0) represents data for calculating all offsets in the feature points to be zero;
in the process of training the super-resolution neural network, the following loss function is adopted:
L =λ2||Ihr − Igt||1 + (1-λ2)||Ilr − Igt||1
Wherein, I hr is a high-definition image after resolution improvement of the super-resolution neural network, I lr is a low-resolution image rendered by the point cloud, I gt is an image with a real expression, and lambda 2 is weight;
the specific process of constructing the expression basis vector is expressed as follows:
Zexp = Eexp(Iin, P0)
Wherein Z exp is an expression base vector, E exp is an expression encoder, I in is non-expression image data, and P 0 is 68 feature points of the expression image data.
2. The method of three-dimensional reconstruction of an editable face of claim 1, wherein the non-expressive image data is selected from at least one of a picture or a video, and wherein the expressive image data is selected from at least one of a picture or a video.
3. The method for three-dimensional reconstruction of an editable face as defined in claim 1, wherein:
The training process of the expression encoder is as follows:
Inputting facial feature points in the non-expression image data and the expression image data, and obtaining expression base vectors through an expression encoder;
Generating an expressive image and expressive facial feature points through the expression base vectors, and carrying out loss calculation on the expressive image and the expressive facial feature points and the facial feature points of the true expressive image data and the true expressive image data respectively, so that the expression encoder is optimized continuously;
in the process of training the expression encoder, the following loss function is adopted:
Lexp = Lgen+ Lf1 + αLf2
Wherein, L exp is a Loss function for constructing expression basis vectors, L gen is a mean square error Loss of each pixel point of a newly generated image and an original image, L f1 is MSE Loss of characteristic points of a face after characteristic recovery and original face characteristics, L f2 is a Loss between characteristic points of a surface expression image and the original characteristic points, and alpha is a switch for generating surface expression characteristic Loss.
4. A system for implementing the editable face three-dimensional reconstruction method of any one of claims 1 to 3, comprising:
An input module configured to acquire a series of non-expressive image data and expressive image data of a modeler;
a neutral model construction module configured to generate a head point cloud using the expressionless image data to obtain a neutral model;
the true expressive model construction module is configured to generate head point clouds by using the expressive image data to obtain a true expressive model;
A deformation field and color field training module configured to train the deformation field and color field using the true expressive model;
The final output model training module is configured to obtain a final output model through a color field and a deformation field by the neutral model;
A super-resolution neural network module configured to obtain a rendered image using the finally output model, train a super-resolution neural network using the rendered image and the expressive image data;
The expression base vector construction module is configured to drag the facial feature points of the patient in the non-expression image data, and then send the facial feature points into the expression encoder, and the expression encoder encodes the dragged feature point coordinates into an expression base vector;
A facial three-dimensional model construction module configured such that the expression base vector and the finally output model constitute a modeler's expressive facial three-dimensional model;
the high-definition rendering module is configured to obtain a high-definition rendering diagram of the facial three-dimensional model with the expression through a super-resolution network;
the training process of the deformation field, the color field and the super-resolution network is as follows:
The neutral model and the expression base vector obtain an expressed sparse point cloud through a deformation field and a color field;
The low-resolution image rendered by the expression sparse point cloud obtains a corresponding high-resolution image through a super-resolution neural network;
the training process of the deformation field and the color field is as follows:
The neutral model comprises a neutral point cloud position vector and a neutral point cloud characteristic point color vector, wherein the neutral point cloud position vector is a vector formed by position coordinates of expressive characteristic points with position changes;
In a deformation field, the neutral point cloud position vector and the expression base vector form a matrix through shape change, are spliced in a channel dimension to form tensors with the channel number of 4, the tensors perform feature extraction and fusion through ResNet, and finally, the position offset of corresponding point positions relative to a neutral model is mapped through three full-connection layers;
in a color field, the neutral point cloud position vector, the neutral point cloud characteristic point color vector and the expression base vector form a matrix through shape change, splice is carried out on channel dimensions to form tensors with the number of channels being 4, feature extraction and fusion are carried out on the tensors through ResNet, and finally, the position offset of corresponding point positions relative to a neutral model is mapped through three full-connection layers;
in the training of deformation fields and color fields, the following three loss functions are adopted:
LRGB = ||Irgb – Igt||1
Lsil = 1- IOU(M, Mgt)
Ldef = ||P - Pgt||2
wherein, L RGB is RGB Loss, L sil is contour Loss, L def is face feature point Loss, I rgb is RGB value output by a color field, I gt is RGB value of a corresponding point of a real expressive model, M is facial contour of a point cloud, M gt is facial contour of the real expressive model, IOU is intersection ratio of two contours, P gt is position of a point of the real expressive model, P is position of a corresponding point of the point cloud, in a calculation formula of L RGB, subscript 1 represents L1 Loss function, and subscript 2 represents L2 Loss function in a calculation formula of L def;
P = P0 + fdef(P0, zexp)
P 0 is the position of the corresponding point of the neutral point cloud, Z exp is the expression base vector, and f def is the deformation field;
in the process of training the deformation field and the color field, a priori constraint L offset is introduced, the L offset penalizes all non-zero displacements,
Loffset =λ1counter(fdef(P0, zexp) = 0)
Wherein lambda 1 is weight for scaling loss, counter (f def(P0, zexp) =0) represents data for calculating all offsets in the feature points to be zero;
in the process of training the super-resolution neural network, the following loss function is adopted:
L =λ2||Ihr − Igt||1 + (1-λ2)||Ilr − Igt||1
Wherein, I hr is a high-definition image after resolution improvement of the super-resolution neural network, I lr is a low-resolution image rendered by the point cloud, I gt is an image with a real expression, and lambda 2 is weight;
the specific process of constructing the expression basis vector is expressed as follows:
Zexp = Eexp(Iin, P0)
Wherein Z exp is an expression base vector, E exp is an expression encoder, I in is non-expression image data, and P 0 is 68 feature points of the expression image data.
5. The system as recited in claim 4, further comprising:
and the dynamic adjustment module is configured to change the positions of the characteristic points, adjust the facial three-dimensional model and generate a new facial three-dimensional model.
6. The system of claim 5, wherein the process of generating a new three-dimensional model of the face in the dynamic adjustment module includes generating a new expression base based on the new feature point locations, generating a new three-dimensional model of the face using the deformation field and the color field;
or, in the dynamic adjustment module, generating feature points corresponding to at least one expression or facial form through a GAN model, wherein the feature points corresponding to the at least one expression or facial form are integrated into at least one option;
Or, in the dynamic adjustment module, generating a sequence feature point change through a transducer model, and generating a change of a sequence face three-dimensional model according to the sequence feature point change.
7. A computer-readable storage medium, having stored thereon a computer program for implementing the editable face three-dimensional reconstruction method according to any one of claims 1 to 3, or a computer program for implementing the system according to any one of claims 4 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411742013.6A CN119206101B (en) | 2024-11-29 | 2024-11-29 | Editable facial three-dimensional reconstruction method, system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411742013.6A CN119206101B (en) | 2024-11-29 | 2024-11-29 | Editable facial three-dimensional reconstruction method, system and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119206101A CN119206101A (en) | 2024-12-27 |
CN119206101B true CN119206101B (en) | 2025-03-25 |
Family
ID=94061597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411742013.6A Active CN119206101B (en) | 2024-11-29 | 2024-11-29 | Editable facial three-dimensional reconstruction method, system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119206101B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160208A (en) * | 2019-12-24 | 2020-05-15 | 河南中原大数据研究院有限公司 | Three-dimensional face super-resolution method based on multi-frame point cloud fusion and variable model |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118196298A (en) * | 2024-04-02 | 2024-06-14 | 西北工业大学 | A 3D reconstruction method without prior pose input |
-
2024
- 2024-11-29 CN CN202411742013.6A patent/CN119206101B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160208A (en) * | 2019-12-24 | 2020-05-15 | 河南中原大数据研究院有限公司 | Three-dimensional face super-resolution method based on multi-frame point cloud fusion and variable model |
Non-Patent Citations (1)
Title |
---|
双模态及语义知识的三维人脸表情识别方法;胡步发;王金伟;;《仪器仪表学报》;20130415(第04期);873-880 * |
Also Published As
Publication number | Publication date |
---|---|
CN119206101A (en) | 2024-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113099208A (en) | Method and device for generating dynamic human body free viewpoint video based on nerve radiation field | |
CN113470182B (en) | Face geometric feature editing method and deep face remodeling editing method | |
CN117496072B (en) | Three-dimensional digital person generation and interaction method and system | |
Hoch et al. | Modeling and animation of facial expressions based on B-splines | |
JP7251003B2 (en) | Face mesh deformation with fine wrinkles | |
Galteri et al. | Deep 3d morphable model refinement via progressive growing of conditional generative adversarial networks | |
CN116740290B (en) | Three-dimensional interactive hands reconstruction method and system based on deformable attention | |
US11443473B2 (en) | Systems and methods for generating a skull surface for computer animation | |
CN112488971A (en) | Medical image fusion method for generating countermeasure network based on spatial attention mechanism and depth convolution | |
Song et al. | A generic framework for efficient 2-D and 3-D facial expression analogy | |
CN117893673A (en) | Method and system for generating an animated three-dimensional head model from a single image | |
Sun et al. | SSAT $++ $: A Semantic-Aware and Versatile Makeup Transfer Network With Local Color Consistency Constraint | |
Hosseinimanesh et al. | Personalized dental crown design: A point-to-mesh completion network | |
CN119888023A (en) | Audio-driven three-dimensional digital person generation method and system based on nerve radiation field | |
Zheng et al. | ImFace++: A sophisticated nonlinear 3D morphable face model with implicit neural representations | |
CN119206101B (en) | Editable facial three-dimensional reconstruction method, system and storage medium | |
CN118357083A (en) | Three-dimensional paint spraying system and method based on generative large model texture generation | |
CN117333604A (en) | A method of character facial reenactment based on semantic perception neural radiation field | |
Jian et al. | Realistic face animation generation from videos | |
CN118446930B (en) | Monocular video dressing human body space-time feature learning method based on nerve radiation field | |
CN119888028A (en) | Digital human body reconstruction method of focusing face | |
Zheng et al. | Human Model Reconstruction with Facial Region Optimization: A Fusion Nerf Approach | |
CN118411469A (en) | Three-dimensional digital person reconstruction method, three-dimensional digital person reconstruction device, network equipment, medium and product | |
CN118279257A (en) | A pure visual force estimation method based on deep learning method | |
CN116597066A (en) | Unsupervised single image three-dimensional face reconstruction method based on nerve renderer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |