JP2024522287A

JP2024522287A - 3D human body reconstruction method, apparatus, device and storage medium

Info

Publication number: JP2024522287A
Application number: JP2023574335A
Authority: JP
Inventors: 勃宇宋; 又▲銘▼ ▲デン▼; 文▲タオ▼ ▲劉▼; 晨 ▲錢▼
Original assignee: Shenzhen TetrasAI Technology Co Ltd
Current assignee: Shenzhen TetrasAI Technology Co Ltd
Priority date: 2021-03-31
Filing date: 2021-08-27
Publication date: 2024-06-13
Also published as: CN113012282A; CN113012282B; WO2022205760A1

Abstract

本発明の実施例は、三次元人体再構成方法、装置、デバイスおよび記憶媒体を提供する。当該方法は、目標人体の人体画像に基づいて人体幾何再構成を行い、目標人体の三次元メッシュモデルを取得するステップと、人体画像に基づいて、目標人体の局所部位に対して局所幾何再構成を行い、局所部位の三次元メッシュモデルを取得するステップと、局所部位の三次元メッシュモデルと目標人体の三次元メッシュモデルとを融合し、初期三次元モデルを取得するステップと、初期三次元モデルと人体画像とに基づいて人体テクスチャの再構成を行い、目標人体の三次元人体モデルを取得するステップとを含んでもよい。本発明の実施例によると、目標人体の三次元メッシュモデルにおける局所部位がより鮮明且つ正確になり、局所部位の再構成効果が向上する。The embodiment of the present invention provides a 3D human body reconstruction method, apparatus, device and storage medium. The method may include: performing human body geometric reconstruction according to a human body image of a target human body to obtain a 3D mesh model of the target human body; performing local geometric reconstruction on a local part of the target human body according to the human body image to obtain a 3D mesh model of the local part; fusing the 3D mesh model of the local part and the 3D mesh model of the target human body to obtain an initial 3D model; and performing human body texture reconstruction according to the initial 3D model and the human body image to obtain a 3D human body model of the target human body. According to the embodiment of the present invention, the local part in the 3D mesh model of the target human body is more clear and accurate, and the reconstruction effect of the local part is improved.

Description

［関連出願の相互引用］
本願は、２０２１年３月３１日に提出された、出願番号２０２１１０３５２１９９．４、発明の名称が「三次元人体再構成方法、装置、デバイスおよび記憶媒体」である中国特許出願の優先権を主張し、当該中国特許出願の内容が引用によって本願に組み込まれる。 [Cross-reference to related applications]
This application claims priority to a Chinese patent application filed on March 31, 2021, with application number 202110352199.4 and entitled "Three-dimensional human body reconstruction method, apparatus, device and storage medium," the contents of which are incorporated herein by reference.

本発明は、画像処理技術に関し、具体的に三次元人体再構成方法、装置、デバイスおよび記憶媒体に関する。 The present invention relates to image processing technology, and more specifically to a method, apparatus, device and storage medium for three-dimensional human body reconstruction.

三次元人体再構成は、コンピュータビジョンおよびコンピュータグラフィックス分野における重要な問題である。再構成された人体デジタルモデルは、人体測定、仮想試着、仮想ライバー、ゲームキャラクタのカスタムデザイン、仮想現実ソーシャルなど、多くの分野において重要な応用を有する。その中、如何にして実世界の人体を仮想世界に投影して三次元人体デジタルモデルを取得するかは、重要な問題となっている。しかし、三次元人体のデジタル化再構成は、非常に複雑であり、スキャン者がスキャン対象に対して複数の角度で死角なく連続的にスキャンする必要があり、再構成結果には、局所再構成効果が十分に繊細ではないという問題も存在する。 3D human body reconstruction is an important problem in the fields of computer vision and computer graphics. The reconstructed human body digital model has important applications in many fields, such as anthropometry, virtual fitting, virtual live streamers, custom design of game characters, and virtual reality social. Among them, how to project the real-world human body into the virtual world to obtain a 3D human body digital model has become an important issue. However, the digital reconstruction of the 3D human body is very complicated, and the scanner needs to continuously scan the object at multiple angles without blind spots, and the reconstruction result also has the problem that the local reconstruction effect is not sensitive enough.

これに鑑みて、本発明の実施例は、少なくとも三次元人体再構成方法、装置、デバイスおよび記憶媒体を提供する。 In view of this, embodiments of the present invention provide at least a method, apparatus, device, and storage medium for three-dimensional human body reconstruction.

第１態様は、三次元人体再構成方法を提供する。前記三次元人体再構成方法は、
目標人体の人体画像に基づいて人体幾何再構成を行い、前記目標人体の三次元メッシュモデルを取得するステップと、
前記目標人体の人体画像に基づいて前記目標人体の局所部位に対して局所幾何再構成を行い、前記局所部位の三次元メッシュモデルを取得するステップと、
前記局所部位の三次元メッシュモデルと前記目標人体の三次元メッシュモデルとを融合し、初期三次元モデルを取得するステップと、
前記初期三次元モデルと前記人体画像とに基づいて前記目標人体の人体テクスチャの再構成を行い、前記目標人体の三次元人体モデルを取得するステップと、を含む。 A first aspect provides a method for reconstructing a three-dimensional human body, the method comprising:
performing human body geometry reconstruction based on the human body image of the target human body to obtain a 3D mesh model of the target human body;
performing local geometric reconstruction on a local portion of the target human body based on a human body image of the target human body to obtain a three-dimensional mesh model of the local portion;
Fusing the 3D mesh model of the local region with the 3D mesh model of the target human body to obtain an initial 3D model;
and reconstructing a body texture of the target human body based on the initial 3D model and the human body image to obtain a 3D human body model of the target human body.

一例において、前記目標人体の人体画像に基づいて人体幾何再構成を行い、前記目標人体の三次元メッシュモデルを取得するステップは、第１深層ニューラルネットワークブランチを介して前記目標人体の人体画像に対して三次元再構成を行い、第１人体モデルを取得するステップと、第２深層ニューラルネットワークブランチを介して前記人体画像中の局所画像に対して三次元再構成を行い、第２人体モデルを取得するステップと、前記第１人体モデルと前記第２人体モデルとを融合し、融合人体モデルを取得するステップと、前記融合人体モデルに対してメッシュ化処理を行い、前記目標人体の三次元メッシュモデルを取得するステップと、を含み、前記局所画像は、前記目標人体の局所領域を含む。 In one example, the step of performing human body geometric reconstruction based on the human body image of the target human body and obtaining a three-dimensional mesh model of the target human body includes the steps of performing three-dimensional reconstruction on the human body image of the target human body via a first deep neural network branch to obtain a first human body model, performing three-dimensional reconstruction on a local image in the human body image via a second deep neural network branch to obtain a second human body model, fusing the first human body model and the second human body model to obtain a fused human body model, and performing a meshing process on the fused human body model to obtain a three-dimensional mesh model of the target human body, where the local image includes a local region of the target human body.

一例において、前記第１深層ニューラルネットワークブランチは、グローバル特徴サブネットワークと第１フィッティングサブネットワークとを含み、前記第２深層ニューラルネットワークブランチは、局所特徴サブネットワークと第２フィッティングサブネットワークとを含み、前記第１深層ニューラルネットワークブランチを介して前記目標人体の人体画像に対して三次元再構成を行い、第１人体モデルを取得するステップは、前記グローバル特徴サブネットワークを介して前記人体画像に対して特徴抽出を行い、第１画像特徴を取得するステップと、前記第１フィッティングサブネットワークを介して前記第１画像特徴に基づいて前記第１人体モデルを取得するステップと、を含み、前記第２深層ニューラルネットワークブランチを介して前記人体画像中の局所画像に対して三次元再構成を行い、第２人体モデルを取得するステップは、前記局所特徴サブネットワークを介して前記局所画像に対して特徴抽出を行い、第２画像特徴を取得するステップと、前記第２フィッティングサブネットワークを介して、前記第２画像特徴と前記第１フィッティングサブネットワークから出力された中間特徴とに基づいて、前記第２人体モデルを取得するステップと、を含む。 In one example, the first deep neural network branch includes a global feature sub-network and a first fitting sub-network, and the second deep neural network branch includes a local feature sub-network and a second fitting sub-network. The step of performing three-dimensional reconstruction on the human body image of the target human body via the first deep neural network branch and obtaining a first human body model includes a step of performing feature extraction on the human body image via the global feature sub-network and obtaining a first image feature, and a step of obtaining the first human body model based on the first image feature via the first fitting sub-network. The step of performing three-dimensional reconstruction on a local image in the human body image via the second deep neural network branch and obtaining a second human body model includes a step of performing feature extraction on the local image via the local feature sub-network and obtaining a second image feature, and a step of obtaining the second human body model via the second fitting sub-network based on the second image feature and the intermediate feature output from the first fitting sub-network.

一例において、前記目標人体の人体画像に基づいて前記目標人体の局所部位に対して局所幾何再構成を行い、前記局所部位の三次元メッシュモデルを取得するステップは、前記目標人体の人体画像に対して特徴抽出を行い、第３画像特徴を取得するステップと、前記第３画像特徴と前記局所部位の三次元トポロジーテンプレートとに基づいて、前記局所部位の三次元メッシュモデルを特定するステップと、を含む。 In one example, the step of performing local geometric reconstruction on a local portion of the target human body based on a human body image of the target human body and obtaining a three-dimensional mesh model of the local portion includes a step of performing feature extraction on the human body image of the target human body and obtaining a third image feature, and a step of identifying a three-dimensional mesh model of the local portion based on the third image feature and a three-dimensional topology template of the local portion.

一例において、前記局所部位の三次元メッシュモデルと前記目標人体の三次元メッシュモデルとを融合し、初期三次元モデルを取得するステップは、前記目標人体の人体画像に基づいて、前記局所部位の複数のキーポイントを取得するステップと、前記目標人体の三次元メッシュモデルにおける、前記複数のキーポイントに対応する第１モデルキーポイントの情報を特定し、且つ、前記局所部位の三次元メッシュモデルにおける、前記複数のキーポイントに対応する第２モデルキーポイントの情報を特定するステップと、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するステップと、を含む。 In one example, the step of fusing the 3D mesh model of the local region with the 3D mesh model of the target human body to obtain the initial 3D model includes the steps of: acquiring a plurality of key points of the local region based on a human body image of the target human body; identifying information of first model key points corresponding to the plurality of key points in the 3D mesh model of the target human body, and identifying information of second model key points corresponding to the plurality of key points in the 3D mesh model of the local region; and fusing the 3D mesh model of the local region with the 3D mesh model of the target human body based on the information of the first model key points and the information of the second model key points to obtain the initial 3D model.

一例において、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するステップは、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記目標人体の三次元メッシュモデルと前記局所部位の三次元メッシュモデルとの間の座標変換関係を特定するステップと、前記座標変換関係に基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルの座標系に変換するステップと、変換後の座標系において前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するステップと、を含む。 In one example, the step of fusing the 3D mesh model of the local region with the 3D mesh model of the target human body based on the information of the first model keypoints and the information of the second model keypoints to obtain the initial 3D model includes the steps of: determining a coordinate transformation relationship between the 3D mesh model of the target human body and the 3D mesh model of the local region based on the information of the first model keypoints and the information of the second model keypoints; transforming the 3D mesh model of the local region into the coordinate system of the 3D mesh model of the target human body based on the coordinate transformation relationship; and fusing the 3D mesh model of the local region with the 3D mesh model of the target human body in the transformed coordinate system to obtain the initial 3D model.

一例において、前記人体画像は、前記目標人体の正面テクスチャと背景画像とを含み、前記初期三次元モデルと前記人体画像とに基づいて前記目標人体の人体テクスチャの再構成を行い、前記目標人体の三次元人体モデルを取得するステップは、前記人体画像に対して人体分割を行い、第１分割マスクと、第２分割マスクと、前記目標人体の正面テクスチャとを取得するステップと、前記正面テクスチャと、前記第１分割マスクと、前記第２分割マスクとをテクスチャ生成ネットワークに入力し、前記目標人体の裏面テクスチャを取得するステップと、前記裏面テクスチャと前記正面テクスチャとに基づいて、前記目標人体に対応する、テクスチャを有する三次元人体モデルを取得するステップと、を含み、前記第１分割マスクは、前記正面テクスチャのマスク領域に対応し、前記第２分割マスクは、前記目標人体の裏面テクスチャのマスク領域に対応する。 In one example, the human body image includes a front texture of the target human body and a background image, and the step of reconstructing the human body texture of the target human body based on the initial three-dimensional model and the human body image and obtaining a three-dimensional human body model of the target human body includes a step of performing human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask, and a front texture of the target human body, a step of inputting the front texture, the first segmentation mask, and the second segmentation mask into a texture generation network to obtain a back texture of the target human body, and a step of obtaining a three-dimensional human body model having texture corresponding to the target human body based on the back texture and the front texture, where the first segmentation mask corresponds to a mask region of the front texture, and the second segmentation mask corresponds to a mask region of the back texture of the target human body.

一例において、前記テクスチャ生成ネットワークのトレーニングは、トレーニングサンプル画像セットにおける人体サンプルの画像に対して人体分割を行い、第１サンプル分割マスクと、第２サンプル分割マスクと、前記人体サンプルの正面テクスチャとを取得する処理と、前記人体サンプルの画像の解像度を低減することで取得された支援人体画像中の人体の正面テクスチャと、第３サンプル分割マスクと、第４サンプル分割マスクとに基づいて、支援テクスチャ生成ネットワークをトレーニングする処理と、前記支援テクスチャ生成ネットワークのトレーニングが完了した後、前記人体サンプルの正面テクスチャと、前記第１サンプル分割マスクと、前記第２サンプル分割マスクとに基づいて、前記テクスチャ生成ネットワークをトレーニングする処理と、を含み、前記第１サンプル分割マスクは、前記人体サンプルの正面テクスチャのマスク領域に対応し、前記第２サンプル分割マスクは、前記人体サンプルの裏面テクスチャのマスク領域に対応し、前記第３サンプル分割マスクは、前記支援人体画像中の人体の正面テクスチャのマスク領域に対応し、前記第４サンプル分割マスクは、前記支援人体画像中の人体の裏面テクスチャのマスク領域に対応し、前記テクスチャ生成ネットワークのネットワークパラメータは、トレーニングが完了した前記支援テクスチャ生成ネットワークの少なくとも一部のネットワークパラメータを含む。 In one example, training of the texture generation network includes a process of performing human body segmentation on an image of a human body sample in a training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask, and a frontal texture of the human body sample, a process of training the supporting texture generation network based on the frontal texture of the human body in the supporting human body image obtained by reducing the resolution of the image of the human body sample, a third sample segmentation mask, and a fourth sample segmentation mask, and after the training of the supporting texture generation network is completed, a process of performing human body segmentation on an image of the human body sample in a training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask, and a frontal texture of the human body sample. and a process of training the texture generation network based on a sample segmentation mask, the first sample segmentation mask corresponding to a mask region of the front texture of the human body sample, the second sample segmentation mask corresponding to a mask region of the back texture of the human body sample, the third sample segmentation mask corresponding to a mask region of the front texture of the human body in the supporting human body image, and the fourth sample segmentation mask corresponding to a mask region of the back texture of the human body in the supporting human body image, and the network parameters of the texture generation network include at least a portion of the network parameters of the supporting texture generation network that has completed training.

一例において、前記目標人体の局所部位は、前記目標人体の顔であり、および／または、前記人体画像は、ＲＧＢ画像である。 In one example, the localized portion of the target human body is the face of the target human body, and/or the human body image is an RGB image.

一例において、前記三次元人体再構成方法は、前記目標人体の人体画像に基づいて人体幾何再構成を行うときに、前記目標人体の人体骨格構造を取得するステップと、前記目標人体の三次元人体モデルが取得された後、前記三次元人体モデルと前記人体骨格構造とに基づいて、前記三次元人体モデルを駆動するためのスキニング重みを特定するステップと、をさらに含む。 In one example, the 3D human body reconstruction method further includes a step of acquiring a human body skeletal structure of the target human body when performing human body geometric reconstruction based on a human body image of the target human body, and a step of determining skinning weights for driving the 3D human body model based on the 3D human body model and the human body skeletal structure after the 3D human body model of the target human body is acquired.

第２態様は、三次元人体再構成装置を提供する。前記三次元人体再構成装置は、
目標人体の人体画像に基づいて人体幾何再構成を行い、前記目標人体の三次元メッシュモデルを取得するための全体再構成モジュールと、
前記目標人体の人体画像に基づいて前記目標人体の局所部位に対して局所幾何再構成を行い、前記局所部位の三次元メッシュモデルを取得するための局所再構成モジュールと、
前記局所部位の三次元メッシュモデルと前記目標人体の三次元メッシュモデルとを融合し、初期三次元モデルを取得するための融合処理モジュールと、
前記初期三次元モデルと前記人体画像とに基づいて前記目標人体の人体テクスチャの再構成を行い、前記目標人体の三次元人体モデルを取得するためのテクスチャ再構成モジュールと、を備える。 A second aspect provides a three-dimensional human body reconstruction apparatus, comprising:
a global reconstruction module for performing human body geometric reconstruction based on the human body image of a target human body to obtain a three-dimensional mesh model of the target human body;
a local reconstruction module for performing local geometric reconstruction on a local portion of the target human body based on a human body image of the target human body to obtain a three-dimensional mesh model of the local portion;
a fusion processing module for fusing the 3D mesh model of the local region and the 3D mesh model of the target human body to obtain an initial 3D model;
a texture reconstruction module for reconstructing a body texture of the target human body based on the initial 3D model and the human body image to obtain a 3D human body model of the target human body.

一例において、前記全体再構成モジュールは、前記目標人体の三次元メッシュモデルを取得するときに、具体的に、第１深層ニューラルネットワークブランチを介して前記目標人体の人体画像に対して三次元再構成を行い、第１人体モデルを取得し、第２深層ニューラルネットワークブランチを介して前記人体画像中の局所画像に対して三次元再構成を行い、第２人体モデルを取得し、前記第１人体モデルと前記第２人体モデルとを融合し、融合人体モデルを取得し、前記融合人体モデルに対してメッシュ化処理を行い、前記目標人体の三次元メッシュモデルを取得するために用いられ、前記局所画像は、前記目標人体の局所領域を含む。 In one example, when obtaining a three-dimensional mesh model of the target human body, the overall reconstruction module specifically performs three-dimensional reconstruction on a human body image of the target human body via a first deep neural network branch to obtain a first human body model, performs three-dimensional reconstruction on a local image in the human body image via a second deep neural network branch to obtain a second human body model, fuses the first human body model and the second human body model to obtain a fused human body model, and performs a meshing process on the fused human body model to obtain a three-dimensional mesh model of the target human body, and the local image includes a local region of the target human body.

一例において、前記局所再構成モジュールは、具体的に、前記目標人体の人体画像に対して特徴抽出を行い、第３画像特徴を取得し、前記第３画像特徴と前記局所部位の三次元トポロジーテンプレートとに基づいて、前記局所部位の三次元メッシュモデルを特定するために用いられる。 In one example, the local reconstruction module is specifically used to perform feature extraction on the human body image of the target human body, obtain third image features, and identify a three-dimensional mesh model of the local region based on the third image features and a three-dimensional topology template of the local region.

一例において、前記融合処理モジュールは、具体的に、前記目標人体の人体画像に基づいて、前記局所部位の複数のキーポイントを取得し、前記目標人体の三次元メッシュモデルにおける、前記複数のキーポイントに対応する第１モデルキーポイントの情報を特定し、且つ、前記局所部位の三次元メッシュモデルにおける、前記複数のキーポイントに対応する第２モデルキーポイントの情報を特定し、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するために用いられる。 In one example, the fusion processing module is specifically used to obtain a plurality of key points of the local region based on a human body image of the target human body, identify information of first model key points corresponding to the plurality of key points in a three-dimensional mesh model of the target human body, and identify information of second model key points corresponding to the plurality of key points in the three-dimensional mesh model of the local region, and fuse the three-dimensional mesh model of the local region with the three-dimensional mesh model of the target human body based on the information of the first model key points and the information of the second model key points to obtain the initial three-dimensional model.

一例において、前記融合処理モジュールは、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するときに、具体的に、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記目標人体の三次元メッシュモデルと前記局所部位の三次元メッシュモデルとの間の座標変換関係を特定し、前記座標変換関係に基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルの座標系に変換し、変換後の座標系において前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するために用いられる。 In one example, the fusion processing module is used to fuse the 3D mesh model of the local region with the 3D mesh model of the target human body based on the information of the first model keypoints and the information of the second model keypoints to obtain the initial 3D model, specifically, to determine a coordinate transformation relationship between the 3D mesh model of the target human body and the 3D mesh model of the local region based on the information of the first model keypoints and the information of the second model keypoints, transform the 3D mesh model of the local region into the coordinate system of the 3D mesh model of the target human body based on the coordinate transformation relationship, fuse the 3D mesh model of the local region with the 3D mesh model of the target human body in the transformed coordinate system, and obtain the initial 3D model.

一例において、前記テクスチャ再構成モジュールは、具体的に、前記人体画像に対して人体分割を行い、第１分割マスクと、第２分割マスクと、前記目標人体の正面テクスチャとを取得し、前記正面テクスチャと、前記第１分割マスクと、前記第２分割マスクとをテクスチャ生成ネットワークに入力し、前記目標人体の裏面テクスチャを取得し、前記裏面テクスチャと前記正面テクスチャとに基づいて、前記目標人体に対応する、テクスチャを有する三次元人体モデルを取得するために用いられ、前記第１分割マスクは、前記正面テクスチャのマスク領域に対応し、前記第２分割マスクは、前記目標人体の裏面テクスチャのマスク領域に対応する。 In one example, the texture reconstruction module specifically performs body segmentation on the human body image, obtains a first segmentation mask, a second segmentation mask, and a front texture of the target human body, inputs the front texture, the first segmentation mask, and the second segmentation mask into a texture generation network to obtain a back texture of the target human body, and is used to obtain a three-dimensional human body model having texture corresponding to the target human body based on the back texture and the front texture, where the first segmentation mask corresponds to a mask area of the front texture and the second segmentation mask corresponds to a mask area of the back texture of the target human body.

一例において、前記三次元人体再構成装置は、前記テクスチャ生成ネットワークのトレーニングを行うためのモデルトレーニングモジュールをさらに備え、前記モデルトレーニングモジュールは、具体的に、トレーニングサンプル画像セットにおける人体サンプルの画像に対して人体分割を行い、第１サンプル分割マスクと、第２サンプル分割マスクと、前記人体サンプルの正面テクスチャとを取得し、前記人体サンプルの画像の解像度を低減することで取得された支援人体画像中の人体の正面テクスチャと、第３サンプル分割マスクと、第４サンプル分割マスクとに基づいて、支援テクスチャ生成ネットワークをトレーニングし、前記支援テクスチャ生成ネットワークのトレーニングが完了した後、前記人体サンプルの正面テクスチャと、前記第１サンプル分割マスクと、前記第２サンプル分割マスクとに基づいて、前記テクスチャ生成ネットワークをトレーニングするために用いられ、前記第１サンプル分割マスクは、前記人体サンプルの正面テクスチャのマスク領域に対応し、前記第２サンプル分割マスクは、前記人体サンプルの裏面テクスチャのマスク領域に対応し、前記第３サンプル分割マスクは、前記支援人体画像中の人体の正面テクスチャのマスク領域に対応し、前記第４サンプル分割マスクは、前記支援人体画像中の前記人体の裏面テクスチャのマスク領域に対応し、前記テクスチャ生成ネットワークのネットワークパラメータは、トレーニングが完了した前記支援テクスチャ生成ネットワークの少なくとも一部のネットワークパラメータを含む。 In one example, the 3D human body reconstruction device further includes a model training module for training the texture generation network, and the model training module specifically performs human body segmentation on an image of a human body sample in a training sample image set, obtains a first sample segmentation mask, a second sample segmentation mask, and a front texture of the human body sample, and trains an assisting texture generation network based on the front texture of the human body in the assisting human body image obtained by reducing the resolution of the image of the human body sample, a third sample segmentation mask, and a fourth sample segmentation mask, and after the training of the assisting texture generation network is completed, The texture generation network is trained based on the model, the first sample division mask, and the second sample division mask, the first sample division mask corresponds to a mask region of the front texture of the human body sample, the second sample division mask corresponds to a mask region of the back texture of the human body sample, the third sample division mask corresponds to a mask region of the front texture of the human body in the supporting human body image, and the fourth sample division mask corresponds to a mask region of the back texture of the human body in the supporting human body image, and the network parameters of the texture generation network include at least a portion of the network parameters of the supporting texture generation network that has completed training.

第３態様は、電子デバイスを提供する。当該電子デバイスは、メモリと、プロセッサとを備え、前記メモリは、コンピュータ可読命令を記憶するために用いられ、前記プロセッサは、前記コンピュータ命令を呼び出すことにより、本発明のいずれかの実施例に記載の方法を実施するために用いられる。 A third aspect provides an electronic device, comprising a memory and a processor, the memory adapted to store computer-readable instructions, and the processor adapted to perform a method according to any of the embodiments of the present invention by invoking the computer instructions.

第４態様は、コンピュータ可読記憶媒体を提供する。当該コンピュータ可読記憶媒体には、コンピュータプログラムが記憶され、前記コンピュータプログラムがプロセッサによって実行されると、本発明のいずれかの実施例に記載の方法が実施される。 A fourth aspect provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, a method according to any one of the embodiments of the present invention is performed.

第５態様は、コンピュータプログラム製品を提供する。当該コンピュータプログラム製品は、コンピュータプログラムを含み、前記コンピュータプログラムがプロセッサによって実行されると、本発明のいずれかの実施例に記載の方法が実施される。 A fifth aspect provides a computer program product. The computer program product includes a computer program that, when executed by a processor, performs a method according to any of the embodiments of the present invention.

本発明の実施例に係る三次元人体再構成方法、装置、デバイスおよび記憶媒体では、目標人体の局所部位に対して局所幾何再構成を行い、当該局所幾何再構成で得られた局所部位の三次元メッシュモデルと目標人体の三次元メッシュモデルとを融合することにより、目標人体の三次元メッシュモデル中の局所部位がより鮮明、繊細且つ正確になり、局所部位の再構成効果が向上する。また、当該方法では、目標人体の単一の人体画像に基づいて再構成を行うことができるため、ユーザの協力手順が簡素化され、三次元人体再構成がより簡便になる。 In the three-dimensional human body reconstruction method, apparatus, device, and storage medium according to the embodiments of the present invention, local geometric reconstruction is performed on a local portion of a target human body, and a three-dimensional mesh model of the local portion obtained by the local geometric reconstruction is merged with a three-dimensional mesh model of the target human body, so that the local portion in the three-dimensional mesh model of the target human body becomes clearer, more delicate, and more accurate, and the reconstruction effect of the local portion is improved. In addition, the method can perform reconstruction based on a single human body image of the target human body, which simplifies the user's cooperation procedure and makes three-dimensional human body reconstruction easier.

本発明の１つもしくは複数の実施例または関連技術における技術案がより明瞭に説明されるように、以下では、本発明の実施例または関連技術の記述に使用必要な図面を簡単に紹介する。明らかに、以下の記述に係る図面が単に本発明の１つまたは複数の実施例に記載のいくつかの実施例に過ぎず、当業者であれば、創造的な労力を掛けずにこれらの図面から他の図面を取得可能である。
本発明の少なくとも１つの実施例に係る三次元人体再構成方法のフローチャートを示す。本発明の少なくとも１つの実施例に係る単一の人体画像に基づいて三次元メッシュモデルを取得する方式の模式図を示す。本発明の少なくとも１つの実施例に係る初期三次元モデルの取得手順の模式図を示す。本発明の少なくとも１つの実施例に係るテクスチャ再構成手順の模式図を示す。本発明の少なくとも１つの実施例に係るスキニング重みの特定手順の模式図を示す。本発明の少なくとも１つの実施例に係る単一の人体画像に基づいて三次元メッシュモデルを取得する方式の模式図を示す。本発明の少なくとも１つの実施例に係るテクスチャ生成の原理模式図を示す。本発明の少なくとも１つの実施例に係るテクスチャ生成ネットワークのトレーニング手順の模式図を示す。本発明の少なくとも１つの実施例に係る人体画像の模式図を示す。本発明の少なくとも１つの実施例に係る三次元人体再構成装置の構成図を示す。本発明の少なくとも１つの実施例に係る三次元人体再構成装置の構成図を示す。 In order to make the technical solutions in one or more embodiments of the present invention or related art more clearly described, the following briefly introduces drawings necessary for describing the embodiments of the present invention or related art. Obviously, the drawings in the following description are merely some embodiments described in one or more embodiments of the present invention, and those skilled in the art can obtain other drawings from these drawings without creative efforts.
1 shows a flowchart of a method for 3D body reconstruction in accordance with at least one embodiment of the present invention. 1 illustrates a schematic diagram of a scheme for obtaining a 3D mesh model based on a single human body image in accordance with at least one embodiment of the present invention; 1 shows a schematic diagram of a procedure for obtaining an initial 3D model in accordance with at least one embodiment of the present invention; 1 shows a schematic diagram of a texture reconstruction procedure in accordance with at least one embodiment of the present invention; 1 shows a schematic diagram of a procedure for determining skinning weights in accordance with at least one embodiment of the present invention; 1 illustrates a schematic diagram of a scheme for obtaining a 3D mesh model based on a single human body image in accordance with at least one embodiment of the present invention; FIG. 2 shows a principle schematic diagram of texture generation according to at least one embodiment of the present invention. FIG. 1 shows a schematic diagram of a training procedure for a texture generating network in accordance with at least one embodiment of the present invention. 1 shows a schematic diagram of a human body image in accordance with at least one embodiment of the present invention; FIG. 1 shows a block diagram of a 3D body reconstruction device in accordance with at least one embodiment of the present invention. FIG. 1 shows a block diagram of a 3D body reconstruction device in accordance with at least one embodiment of the present invention.

本発明の１つまたは複数の実施例における技術案が当業者によってより良好に理解されるように、以下では、本発明の１つまたは複数の実施例における図面を組み合わせて本発明の１つまたは複数の実施例における技術案を明瞭かつ完全に記述する。明らかに、記述される実施例は、単に本発明の一部の実施例であり、全部の実施例ではない。本発明の１つまたは複数の実施例に基づいて当業者が創造的な労力を掛けずに得た全ての他の実施例は、いずれも本発明の保護範囲に含まれるべきである。 In order to make the technical solutions in one or more embodiments of the present invention better understood by those skilled in the art, the technical solutions in one or more embodiments of the present invention are described below clearly and completely in combination with drawings in one or more embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments obtained by those skilled in the art without creative efforts based on one or more embodiments of the present invention should be included in the protection scope of the present invention.

三次元人体再構成は、多くの分野において重要な応用を有し、以下の応用シーンを含むが、それに限定されない。 3D human body reconstruction has important applications in many fields, including but not limited to the following application scenarios:

例えば、三次元人体再構成により、いくつかの仮想現実応用シーンの現実感を強化することができる。仮想現実応用シーンとして、例えば、仮想試着、仮想クラウド会議、仮想授業などがある。 For example, 3D human body reconstruction can enhance the realism of some virtual reality application scenarios, such as virtual fitting, virtual cloud meetings, virtual classes, etc.

さらに例えば、三次元人体再構成によって得られた三次元人体モデルをゲームデータに導入して、パーソナライズされた人物キャラクタを作成してもよい。 Furthermore, for example, a 3D human body model obtained by 3D human body reconstruction may be introduced into game data to create a personalized human character.

よりさらに例えば、現在、ＳＦ映画の作成には、グリーンスクリーン、モーションキャプチャなどの様々な科学技術を使用する必要があり、ハードウェアデバイスは高価であり、全体的なフローは時間がかかって煩雑である。三次元人体再構成によって仮想的な三次元人体モデルを取得することにより、フローを簡略化し、リソースを節約することができる。 Furthermore, for example, currently, the creation of science fiction movies requires the use of various scientific technologies such as green screen, motion capture, etc., the hardware devices are expensive, and the overall flow is time-consuming and cumbersome. By obtaining a virtual 3D human body model through 3D human body reconstruction, the flow can be simplified and resources can be saved.

どのような応用シーンにもかかわらず、三次元人体再構成は、以下の需要が存在する。一方では、ユーザの協力手順をできるだけ簡素化し、例えば、ユーザーが多角度のスキャンを協力する必要があり、ユーザに多くの協力を提供させるため、ユーザにとって体験が悪い。他方では、精度のより高い三次元人体モデルをできるだけ取得する必要があり、例えば、仮想クラウド会議またはＡＲ仮想インタラクションシーンにおいて、三次元人体再構成で得られた三次元人体モデルは、より高い現実感および没入感の需要を有する。 Regardless of the application scenario, there are the following demands for 3D human body reconstruction. On the one hand, it is necessary to simplify the user cooperation procedure as much as possible, for example, the user needs to cooperate with multi-angle scanning, which forces the user to provide a lot of cooperation, resulting in a poor experience for the user. On the other hand, it is necessary to obtain a 3D human body model with higher accuracy as much as possible, for example, in a virtual cloud meeting or AR virtual interaction scene, the 3D human body model obtained by 3D human body reconstruction has a demand for higher realism and immersion.

上記問題を解決するために、本発明の実施例は、三次元人体再構成方法を提供する。当該方法は、ユーザの１枚の写真に基づいて当該ユーザの三次元人体再構成を行うことを要旨とし、ユーザの協力フローを簡素化し、高精度の再構成効果を達成する。 To solve the above problems, an embodiment of the present invention provides a 3D human body reconstruction method. The method is based on one of the user's photos to perform 3D human body reconstruction for the user, simplifying the user's collaboration process and achieving a high-precision reconstruction effect.

図１に示すように、図１は、本発明の少なくとも１つの実施例に係る三次元人体再構成方法のフローチャートを示す。当該方法は、ステップ１００～ステップ１０６を含んでもよい。 As shown in FIG. 1, FIG. 1 illustrates a flowchart of a 3D body reconstruction method according to at least one embodiment of the present invention. The method may include steps 100 to 106.

ステップ１００では、目標人体の単一の人体画像に基づいて人体幾何再構成を行い、前記目標人体の三次元メッシュモデルを取得する。 In step 100, a human body geometry reconstruction is performed based on a single human body image of the target human body to obtain a three-dimensional mesh model of the target human body.

目標人体は、三次元人体再構成の基礎ユーザである。例えば、ユーザの張さんに対して三次元人体再構成を行い、張さんは、目標人体と呼ばれてもよく、再構成された三次元人体モデルも張さんの体を基にして得られたものであり、張さんの体勢、容貌、服装および髪型などと高い類似性を有する。 The target human body is the base user of the 3D human body reconstruction. For example, 3D human body reconstruction is performed for a user, Mr. Zhang, who may be called the target human body. The reconstructed 3D human body model is also obtained based on Mr. Zhang's body, and has a high similarity to Mr. Zhang's posture, appearance, clothing, hairstyle, etc.

前記単一の人体画像は、当該目標人体の１枚の人体画像である。本発明の実施例では、当該人体画像の収集方式、フォーマットに関して特別な要求がない。１つの例示的な方式において、当該単一の人体画像は、目標人体の１枚の人体全身正面写真であってもよい。さらに例えば、当該単一の人体画像は、ＲＧＢカラー画像であってもよい。このようなＲＧＢフォーマットの画像の入手コストが低い。例えば、画像収集時には、被写界深度カメラなどのコストの高いデバイスを使用する必要がなく、通常の撮影デバイスで収集することができる The single human body image is a single human body image of the target human body. In the embodiment of the present invention, there is no special requirement regarding the collection method and format of the human body image. In one exemplary method, the single human body image may be a single whole-body front photograph of the target human body. Furthermore, for example, the single human body image may be an RGB color image. The acquisition cost of such an RGB format image is low. For example, when collecting images, there is no need to use a costly device such as a depth-of-field camera, and they can be collected with a normal photographing device.

本ステップでは、目標人体の単一の人体画像に基づいて人体幾何再構成を行い、三次元メッシュモデルを取得してもよい。当該三次元メッシュモデルは、人体幾何形状を表す三次元メッシュＭｅｓｈであり、当該メッシュは、いくつかの頂点および面を含む。 In this step, a human body geometry reconstruction may be performed based on a single human body image of the target human body to obtain a 3D mesh model. The 3D mesh model is a 3D mesh that represents the human body geometry, and the mesh includes some vertices and faces.

一例示において、本実施例では、さらに、上記再構成によって得られた三次元メッシュＭｅｓｈと予め記憶された１つのパラメータ化された人体モデルとに対して姿勢と体型との整列フィッティングを行ってもよい。具体的に、当該パラメータ化人体モデルは、１つの人体表面のｍｅｓｈおよび１グループの骨格構造を含み、それらは、１グループの姿勢、体型パラメータによって制御され、人体の骨格位置および表面形状は、パラメータ値の変化とともに変化する。本ステップ１００の再構成で得られた三次元メッシュＭｅｓｈと当該パラメータ化された人体モデルとに対して幾何整列を行った後、本ステップ１００の再構成で得られた三次元メッシュＭｅｓｈに対応する骨格構造を取得する。当該骨格構造は、後のステップにおけるスキニング重みの算出に用いられる。 In one example, this embodiment may further perform posture and body shape alignment fitting for the three-dimensional mesh Mesh obtained by the reconstruction and one parameterized human body model stored in advance. Specifically, the parameterized human body model includes one human body surface mesh and one group of skeletal structures, which are controlled by one group of posture and body shape parameters, and the human body skeletal position and surface shape change with changes in parameter values. After performing geometric alignment for the three-dimensional mesh Mesh obtained by the reconstruction in step 100 and the parameterized human body model, a skeletal structure corresponding to the three-dimensional mesh Mesh obtained by the reconstruction in step 100 is obtained. The skeletal structure is used to calculate skinning weights in a later step.

図２は、単一の人体画像の再構成に基づいて三次元メッシュモデルを取得する方式を例示する。図２に示すように、目標人体の単一の人体画像２１を第１深層ニューラルネットワークブランチ２２に入力して三次元再構成を行わせてもよい。１つの例示的な実施形態において、当該第１深層ニューラルネットワークブランチ２２は、グローバル特徴サブネットワーク２２１および第１フィッティングサブネットワーク２２２を含んでもよい。 FIG. 2 illustrates a method of obtaining a 3D mesh model based on the reconstruction of a single human body image. As shown in FIG. 2, a single human body image 21 of a target human body may be input to a first deep neural network branch 22 for 3D reconstruction. In one exemplary embodiment, the first deep neural network branch 22 may include a global feature sub-network 221 and a first fitting sub-network 222.

グローバル特徴サブネットワーク２２１を介して単一の人体画像２１に対して特徴抽出を行い、当該単一の人体画像２１のハイレベル画像特徴を取得してもよい。当該ハイレベル画像特徴は、第１画像特徴と呼称されてもよい。例えば、当該グローバル特徴サブネットワーク２２１は、１つのＨｏｕｒＧｌａｓｓ畳み込みネットワークであってもよい。当該第１画像特徴は、第１フィッティングサブネットワーク２２２に入力される。当該第１フィッティングサブネットワーク２２２は、第１画像特徴に基づいて、三次元空間の各ボクセルブロックが目標人体の内部に属するか否かを予測してもよい。例えば、当該第１フィッティングサブネットワーク２２２は、１つの多層パーセプトロン構造であってもよい。当該第１フィッティングサブネットワーク２２２は、第１人体モデルを出力し、当該第１人体モデルは、目標人体内部に位置する各三次元ボクセルブロックを含む。 Feature extraction may be performed on the single human body image 21 via the global feature sub-network 221 to obtain high-level image features of the single human body image 21. The high-level image features may be referred to as first image features. For example, the global feature sub-network 221 may be a HourGlass convolutional network. The first image features are input to a first fitting sub-network 222. The first fitting sub-network 222 may predict whether each voxel block in the three-dimensional space belongs to the interior of the target human body based on the first image features. For example, the first fitting sub-network 222 may be a multi-layer perceptron structure. The first fitting sub-network 222 outputs a first human body model, which includes each three-dimensional voxel block located inside the target human body.

次に、当該第１人体モデルに対してメッシュ化処理を行ってもよい。例えば、当該メッシュ化処理は、当該第１人体モデルについてボクセル空間においてＭａｒｃｈｉｎｇＣｕｂｅｓアルゴリズムを用いて、目標人体の三次元メッシュモデルを取得することであってもよい。 Next, a meshing process may be performed on the first human body model. For example, the meshing process may involve using a MarchingCubes algorithm in voxel space for the first human body model to obtain a three-dimensional mesh model of the target human body.

ステップ１０２では、前記目標人体の単一の人体画像に基づいて前記目標人体の局所部位に対して局所ハイビジョン幾何再構成を行い、前記局所部位の三次元メッシュモデルを取得する。 In step 102, local high-definition geometric reconstruction is performed on a local portion of the target body based on a single body image of the target body, and a three-dimensional mesh model of the local portion is obtained.

ステップ１００における再構成で得られた目標人体の三次元メッシュモデルは、目標人体の局所部位においてぼやけている可能性がある。例えば、当該局所部位は、顔であってもよく、他の局所部位、例えば、手部などの細部特徴を体現する必要のある部位であってもよい。上記三次元メッシュモデルが目標人体の顔の細部においてぼやけており、顔が一般的にユーザの注目する領域であるため、本ステップでは、目標人体の局所部位に対して個別に幾何再構成を行ってもよい。 The 3D mesh model of the target body obtained by the reconstruction in step 100 may be blurred in localized areas of the target body. For example, the localized areas may be the face or other localized areas, such as the hands, that require detailed features to be embodied. Because the 3D mesh model is blurred in the facial details of the target body, and the face is typically an area of interest to users, in this step, geometric reconstruction may be performed individually for the localized areas of the target body.

前記局所部位が顔であることを例とする。人体の顔に対する再構成は、固定トポロジーの微細再構成を採用してもよく、即ち、目標人体の単一の人体画像に対して特徴抽出を行って得られた画像特徴に基づいて、顔の三次元トポロジーテンプレートにおける各頂点の位置に対してフィッティングを行い、顔の三次元メッシュモデルを取得してもよい。具体的に、人体の顔の意味構造が一致性を有するため、１つの固定のトポロジー構造を持つ三次元顔をテンプレートとして採用してもよい。当該テンプレートは、顔の三次元トポロジーテンプレートと呼称されてもよい。当該テンプレートに複数の頂点があり、各頂点は、１つの顔意味に固定的に対応し、例えば、１つの頂点が鼻先を表し、もう１つの頂点が目尻を表す。顔再構成時に、１つの深層ニューラルネットワークを介して上記顔の三次元トポロジーテンプレートの各頂点位置を回帰によって取得してもよい。 Take the local part as an example, the face. The reconstruction of the human face may be a fine reconstruction of a fixed topology, that is, fitting may be performed on the positions of each vertex in the three-dimensional topology template of the face based on the image features obtained by performing feature extraction on a single human body image of the target human body, to obtain a three-dimensional mesh model of the face. Specifically, since the semantic structure of the human face has consistency, a three-dimensional face having a fixed topology structure may be adopted as a template. The template may be referred to as a three-dimensional topology template of the face. The template has multiple vertices, each of which corresponds to a fixed face meaning, for example, one vertex represents the tip of the nose and another vertex represents the corner of the eye. During face reconstruction, the positions of each vertex of the three-dimensional topology template of the face may be obtained by regression through a deep neural network.

例えば、当該深層ニューラルネットワークは、１つの深層畳み込みネットワークと１つのグラフ畳み込みネットワークとを含んでもよい。目標人体の単一の人体画像を前記深層畳み込みネットワークに入力して画像特徴を抽出してもよい。抽出された特徴は、第３画像特徴と呼称されてもよい。さらに、当該第３画像特徴と顔の三次元トポロジーテンプレートとをグラフ畳み込みネットワークの入力として、最終的にグラフ畳み込みネットワークから出力された１つの顔の三次元メッシュモデルを取得してもよい。当該三次元メッシュモデルは、目標人体の顔に近い。オプションで、深層畳み込みネットワークの入力は、目標人体の単一の人体画像から切り出された、顔を含む一部の画像領域であってもよい。 For example, the deep neural network may include one deep convolutional network and one graph convolutional network. A single human body image of the target human body may be input to the deep convolutional network to extract image features. The extracted features may be referred to as third image features. Furthermore, the third image features and the three-dimensional topology template of the face may be input to the graph convolutional network to finally obtain a three-dimensional mesh model of the face output from the graph convolutional network. The three-dimensional mesh model is close to the face of the target human body. Optionally, the input of the deep convolutional network may be a portion of an image region including a face, which is cut out from the single human body image of the target human body.

ステップ１０４では、前記局所部位の三次元メッシュモデルと前記目標人体の三次元メッシュモデルとを融合し、初期三次元モデルを取得する。 In step 104, the 3D mesh model of the local area is fused with the 3D mesh model of the target human body to obtain an initial 3D model.

ステップ１００における再構成で得られた目標人体の三次元メッシュモデルは、人体の局所部位において多少ぼやけている可能性がある。当該局所部位は顔であることを例とする。ステップ１０２では、顔の個別幾何再構成によって顔の三次元メッシュモデルを取得した。本ステップでは、顔の三次元メッシュモデルでステップ１００における目標人体の三次元メッシュモデル中の対応部分を置き換えてもよい。このように、目標人体の三次元メッシュモデルにおける頭型、体型、体勢などの情報を保留しつつ、顔の五感構造をより繊細且つ正確にすることができ、より良好な再構成効果を達成する。もちろん、理解できるように、ここで単に局所部位が顔であることを例としたが、実際の実施において他の局所部位を個別に再構成してより鮮明にしてもよい。 The 3D mesh model of the target human body obtained by the reconstruction in step 100 may be somewhat blurred at local parts of the human body. Take the local part as an example, the face. In step 102, the 3D mesh model of the face is obtained by individual geometric reconstruction of the face. In this step, the 3D mesh model of the face may replace the corresponding part in the 3D mesh model of the target human body in step 100. In this way, the five senses structure of the face can be made more delicate and accurate while retaining information such as head type, body type, and posture in the 3D mesh model of the target human body, thereby achieving a better reconstruction effect. Of course, for the sake of understanding, the local part is merely taken as an example of the face here, but in actual implementation, other local parts may be individually reconstructed to be clearer.

具体的に、目標人体の単一の人体画像を予めトレーニングされたキーポイント検出モデルに入力し、当該キーポイント検出モデルを介して画像中の目標人体の局所部位の複数のキーポイントを特定してもよい。図３を参照し、依然として局所部位が顔であることを例とすると、顔の複数のキーポイント３１が取得された後、これらのキーポイント３１の顔における座標に基づいて、キーポイントの、前記目標人体の三次元メッシュモデルと顔の三次元メッシュモデルとにおける、対応するモデルキーポイントをそれぞれ特定してもよい。具体的に、顔の複数のキーポイントの、目標人体の三次元メッシュモデルにおける対応する複数の第１モデルキーポイントの情報を特定してもよい。例えば、当該情報は、各第１モデルキーポイントのキーポイント識別子と、対応するキーポイント位置とを含んでもよい。さらに、当該顔の複数のキーポイントの、顔の三次元メッシュモデルにおける対応する第２モデルキーポイントの情報を特定してもよい。例えば、当該情報は、各第２モデルキーポイントのキーポイント識別子と対応するキーポイント位置とを含んでもよい。 Specifically, a single human body image of a target human body may be input into a pre-trained keypoint detection model, and multiple keypoints of local parts of the target human body in the image may be identified through the keypoint detection model. Referring to FIG. 3, taking the local part as a face as an example, after multiple keypoints 31 of the face are obtained, corresponding model keypoints of the keypoints in the 3D mesh model of the target human body and the 3D mesh model of the face may be identified based on the coordinates of these keypoints 31 on the face. Specifically, information of multiple first model keypoints corresponding to multiple keypoints of the face in the 3D mesh model of the target human body may be identified. For example, the information may include a keypoint identifier of each first model keypoint and a corresponding keypoint position. Furthermore, information of second model keypoints corresponding to multiple keypoints of the face in the 3D mesh model of the face may be identified. For example, the information may include a keypoint identifier of each second model keypoint and a corresponding keypoint position.

上記第１モデルキーポイントの情報と第２モデルキーポイントの情報とが取得された後、当該第１モデルキーポイントの情報と第２モデルキーポイントの情報とに基づいて、顔の三次元メッシュモデルを目標人体の三次元メッシュモデルと融合して初期三次元モデルを取得してもよい。 After the first model keypoint information and the second model keypoint information are obtained, the 3D mesh model of the face may be fused with the 3D mesh model of the target body based on the first model keypoint information and the second model keypoint information to obtain an initial 3D model.

本発明の実施例において、顔の三次元メッシュモデルを目標人体の三次元メッシュモデルと融合するステップは、第１モデルキーポイントの情報と第２モデルキーポイントの情報とに基づいて、この２つのモデルのカメラ外部パラメータを用いて、目標人体の三次元メッシュモデルと顔の三次元メッシュモデルとの間の座標変換関係を特定するステップと、当該座標変換関係に基づいて、顔の三次元メッシュモデルを目標人体の三次元メッシュモデルの座標系に変換するステップと、変換後の座標系において、顔の三次元メッシュモデルを目標人体の三次元メッシュモデルと融合するステップと、を含んでもよい。例えば、目標人体の三次元メッシュモデル上の顔の幾何構造を除去し、顔の三次元メッシュモデルで補完し、ポアソン再構成の方式によって顔の三次元メッシュモデルと目標人体の三次元メッシュモデルとを１つの全体として融合してもよい。取得されたモデルは、初期三次元モデルと呼称されてもよい。当該初期三次元モデルは、鮮明な五感構造および類似する頭型、体勢などの情報を有し、精度が高い。 In an embodiment of the present invention, the step of fusing the 3D mesh model of the face with the 3D mesh model of the target human body may include the steps of: determining a coordinate transformation relationship between the 3D mesh model of the target human body and the 3D mesh model of the face using the camera external parameters of the two models based on the information of the first model key points and the information of the second model key points; transforming the 3D mesh model of the face into the coordinate system of the 3D mesh model of the target human body based on the coordinate transformation relationship; and fusing the 3D mesh model of the face with the 3D mesh model of the target human body in the transformed coordinate system. For example, the facial geometric structure on the 3D mesh model of the target human body may be removed and complemented with the 3D mesh model of the face, and the 3D mesh model of the face and the 3D mesh model of the target human body may be fused as a whole by the Poisson reconstruction method. The obtained model may be referred to as an initial 3D model. The initial 3D model has clear five sense structures and information such as similar head shape and posture, and has high accuracy.

ステップ１０６では、前記初期三次元モデルと前記単一の人体画像とに基づいて前記目標人体の人体テクスチャの再構成を行い、前記目標人体の、カラーテクスチャを有する三次元人体モデルを取得する。 In step 106, the body texture of the target body is reconstructed based on the initial 3D model and the single body image to obtain a 3D body model of the target body having a color texture.

本実施例は、目標人体の単一の人体画像に基づいて三次元人体再構成を行うため、一部の人体領域が不可視である。例えば、目標人体の正面人体画像を用いて再構成を行う場合に、当該目標人体の裏面が不可視であるため、テクスチャ欠落の問題を引き起こす。したがって、本ステップでは、初期三次元モデルと目標人体の単一の人体画像とに基づいて、目標人体の不可視領域の人体テクスチャの予測および補完を行い、前記単一の人体画像中の人体テクスチャと融合し、テクスチャ完全な三次元人体モデルを生成してもよい。 In this embodiment, since 3D human body reconstruction is performed based on a single human body image of the target human body, some human body regions are invisible. For example, when reconstruction is performed using a front human body image of the target human body, the back side of the target human body is invisible, which causes a problem of missing texture. Therefore, in this step, based on the initial 3D model and the single human body image of the target human body, the human body texture of the invisible region of the target human body is predicted and complemented, and then merged with the human body texture in the single human body image to generate a texture-complete 3D human body model.

図４に示すように、目標人体の単一の人体画像が正面画像であることを例とすると、ディープラーニングネットワークを用いて人体裏面テクスチャ４１の予測を行い、当該人体裏面テクスチャ４１と単一の人体画像中の人体正面テクスチャ４２とを用いて、初期三次元モデルに対してテクスチャマッピングを行い、即ち、初期三次元モデルに対してテクスチャ再構成を行ってもよい。図４における三次元モデル４３には、既に上記人体裏面および正面テクスチャを初期三次元モデルにマッピングした。ステップ１０４で得られた初期三次元モデルは、人体幾何構造のメッシュＭｅｓｈである。本ステップでは、当該メッシュモデルを基に、モデルに人体テクスチャを付加する。また、残りのいくつかの不可視の人体部位領域について、補間技術を用いてモデルのいくつかの隙間にテクスチャの充填を行うことで初期三次元モデルのテクスチャを補完し、目標人体の三次元人体モデル４４を取得してもよい。 As shown in FIG. 4, for example, if the single human body image of the target human body is a front image, a deep learning network may be used to predict the human body back texture 41, and the human body back texture 41 and the human body front texture 42 in the single human body image may be used to perform texture mapping on the initial three-dimensional model, that is, texture reconstruction may be performed on the initial three-dimensional model. The three-dimensional model 43 in FIG. 4 has already been mapped with the above human body back and front textures on the initial three-dimensional model. The initial three-dimensional model obtained in step 104 is a mesh of a human body geometric structure. In this step, a human body texture is added to the model based on the mesh model. In addition, for some remaining invisible human body part regions, an interpolation technique may be used to fill some gaps in the model with texture to complement the texture of the initial three-dimensional model, thereby obtaining a three-dimensional human body model 44 of the target human body.

本実施例の三次元人体再構成方法では、目標人体の局所部位に対して局所幾何再構成を行い、当該局所幾何再構成で得られた局所部位の三次元メッシュモデルと目標人体の三次元メッシュモデルとを融合することにより、目標人体の初期三次元モデル中の局所部位がより鮮明、繊細且つ正確になり、局所部位の再構成効果が向上する。また、当該方法では、目標人体の単一の人体画像に基づいて再構成を行うため、ユーザの協力手順が簡素化され、三次元人体再構成がより簡便になる。 In the 3D human body reconstruction method of this embodiment, local geometric reconstruction is performed on a local part of a target human body, and a 3D mesh model of the local part obtained by the local geometric reconstruction is merged with a 3D mesh model of the target human body, so that the local part in the initial 3D model of the target human body becomes clearer, more delicate and more accurate, and the reconstruction effect of the local part is improved. In addition, in this method, reconstruction is performed based on a single human body image of the target human body, which simplifies the user's cooperation procedure and makes 3D human body reconstruction easier.

また、人体の三次元人体モデルが取得された後、前記三次元人体モデルと目標人体の人体骨格構造とに基づいて、前記三次元人体モデルを駆動するためのスキニング重みを特定してもよい。当該スキニング重みは、構築された三次元人体モデルを駆動するために用いられる。例えば、三次元人体モデルの各種の動作を駆動しようとする場合に、モデルを人体骨格構造にバインディングする必要がある。このようにモデルを骨格にバインディングすることは、スキニングである。次に、骨格の運動によってモデルを動かすことができる。スキニング重みは、モデルの頂点への骨格の関節点の影響の大きさを表すために用いられる。当該スキニング重みに基づいて、三次元人体モデルにおける各頂点が各骨格の関節点から受ける影響の大きさを制御可能であるため、モデルの運動をより良好に制御する。 After a three-dimensional human body model of a human body is obtained, skinning weights for driving the three-dimensional human body model may be specified based on the three-dimensional human body model and the human body skeletal structure of a target human body. The skinning weights are used to drive the constructed three-dimensional human body model. For example, when trying to drive various operations of the three-dimensional human body model, it is necessary to bind the model to the human body skeletal structure. Binding the model to the skeleton in this way is skinning. The model can then be moved by the movement of the skeleton. The skinning weights are used to represent the magnitude of the influence of the joint points of the skeleton on the vertices of the model. Based on the skinning weights, it is possible to control the magnitude of the influence that each vertex in the three-dimensional human body model receives from each joint point of the skeleton, thereby better controlling the movement of the model.

具体的に、当該三次元人体モデルのスキニング重みを算出することは、以下の処理を含んでもよい。ステップ１００において目標人体の単一の人体画像に基づいて人体骨格構造を取得した。本ステップでは、当該人体骨格構造と上記取得された三次元人体モデルとをディープラーニングネットワークに入力し、ディープラーニングネットワークを介してモデルのスキニング重みを自動的に取得してもよい。 Specifically, calculating the skinning weight of the three-dimensional human body model may include the following process. In step 100, a human body skeletal structure is obtained based on a single human body image of a target human body. In this step, the human body skeletal structure and the obtained three-dimensional human body model may be input into a deep learning network, and the skinning weight of the model may be automatically obtained through the deep learning network.

図５の例示を参照すると、まず、三次元人体モデル５１と人体骨格構造５２とに基づいて前記三次元人体モデル５１中の各頂点に対応する属性特徴を生成してもよい。当該属性特徴は、各頂点と人体骨格構造との空間位置関係を用いて構築されたものであってもよい。例えば、その中の１つの頂点にとって、当該頂点の属性特徴は、以下の４つの特徴を含んでもよい。
１）当該頂点の位置座標。
２）当該頂点に最も近いＫ個の骨格の関節点の位置座標。
３）当該頂点の位置から上記Ｋ個の骨格の関節点のうちの各骨格の関節点のそれぞれまでの測地線距離。
４）上記Ｋ個の骨格の関節点のうちの各骨格の関節点を始点とし、当該始点から前記頂点を指すベクトルと前記骨格の関節点の所在する骨格との間の角度。
ただし、Ｋは、正の整数である。 5, first, attribute features corresponding to each vertex in the 3D human body model 51 may be generated based on the 3D human body model 51 and the human body skeleton 52. The attribute features may be constructed using the spatial relationship between each vertex and the human body skeleton. For example, for one of the vertices, the attribute features of the vertex may include the following four features:
1) The position coordinates of the vertex.
2) The position coordinates of the K skeletal joint points closest to the vertex.
3) The geodesic distance from the position of the vertex to each of the K skeletal articulation points.
4) The angle between a vector pointing from each of the K skeletal joint points to the vertex and the skeleton on which the skeletal joint point is located, the vector starting from the starting point being one of the K skeletal joint points.
Here, K is a positive integer.

引き続き図５を参照すると、各頂点の属性特徴が取得された後、当該各頂点の属性特徴、および各頂点の間の隣接関係特徴をディープラーニングネットワークのうちの空間グラフ畳み込みアテンションネットワークの入力としてもよい。これらの特徴を空間グラフ畳み込みアテンションネットワークに入力する前に、１つの多層パーセプトロンによって上記特徴を隠れ層特徴に変換してもよい。空間グラフ畳み込みアテンションネットワークは、上記隠れ層特徴に基づいて各頂点が上記Ｋ個の骨格の関節点のうちの各骨格の関節点から受ける影響の重みを予測してもよい。ディープラーニングネットワークにおける後の１つの多層パーセプトロンは、当該重みに対して正規化処理を行うために用いられ、ある頂点にとって、当該頂点への各骨格の関節点の影響の重みの和が１となるようにしてもよい。最後に得られた三次元人体モデル中の各頂点に対応する、各骨格の関節点から受ける影響の重みは、当該頂点のスキニング重みである。 Still referring to FIG. 5, after the attribute features of each vertex are obtained, the attribute features of each vertex and the adjacent relationship features between each vertex may be input to a spatial graph convolutional attention network of a deep learning network. Before inputting these features to the spatial graph convolutional attention network, the features may be converted into hidden layer features by a multi-layer perceptron. The spatial graph convolutional attention network may predict the weight of influence of each vertex from each of the K skeletal articulation points based on the hidden layer features. A later multi-layer perceptron in the deep learning network may be used to perform a normalization process on the weight, so that for a vertex, the sum of the weights of influence of each skeletal articulation point on the vertex is 1. The weight of influence of each skeletal articulation point corresponding to each vertex in the finally obtained 3D human body model is the skinning weight of the vertex.

本実施例の三次元人体再構成方法では、目標人体の単一の人体画像に基づいて人体骨格構造を取得し、当該人体骨格構造と再構成して得られた三次元人体モデルとに基づいてスキニング重みを自動的に算出可能であるため、異なる入力画像での骨格の意味構造の一致性を保証するだけでなく、異なる衣類・服飾形状を考慮して適切なスキニング重みを迅速に生成することができる。ここで、骨格の意味一致性は、モデルと既存の動作ライブラリの登録を容易にすることができ、意味が一致する利点は、生成されたモデルおよび骨格を動作ライブラリに適用（登録）しやすくなることである。動作ライブラリには、人のいくつかの動作シーケンス、例えばダンス、ボクシングなどが予め記憶されてもよい。動作ライブラリは、一連の運動する骨格を記憶する。動作ライブラリにおけるこれらの骨格の意味および構造は、一致している。生成された骨格がランダム性（関節意味が不確定である）を有すると、動作ライブラリにおける動作を適用することは、生成されたモデルにとって不利になる。したがって、本実施例では、生成された骨格の意味構造の一致性を保証することにより、動作ライブラリの登録をより容易にする。具体的な形状に応じて計算して生成されたスキニング重みにより、異なる人体モデルの運動の視覚効果をより自然にすることができる。 In the three-dimensional human body reconstruction method of this embodiment, a human body skeletal structure is obtained based on a single human body image of a target human body, and skinning weights can be automatically calculated based on the human body skeletal structure and the three-dimensional human body model obtained by reconstruction. Therefore, not only can the consistency of the semantic structure of the skeleton in different input images be guaranteed, but also appropriate skinning weights can be quickly generated taking into account different clothing and clothing shapes. Here, the semantic consistency of the skeleton can facilitate the registration of the model and the existing motion library, and the advantage of the semantic consistency is that it is easy to apply (register) the generated model and skeleton to the motion library. The motion library may pre-store several motion sequences of a person, such as dancing, boxing, etc. The motion library stores a series of moving skeletons. The meanings and structures of these skeletons in the motion library are consistent. If the generated skeleton has randomness (joint meaning is uncertain), applying the motion in the motion library is disadvantageous to the generated model. Therefore, in this embodiment, the consistency of the semantic structure of the generated skeleton is guaranteed, making it easier to register the motion library. The skinning weights calculated and generated according to the specific shapes can make the visual effect of the movement of different human body models more natural.

本発明の別の実施例は、三次元人体再構成の方法を提供する。本実施例の再構成フローは、図１の実施例と比較すると、その相違点が、ステップ１００において目標人体の単一の人体画像に基づいて人体幾何再構成を行うフローを改良することで再構成で得られた目標人体の三次元メッシュモデルの幾何再構成精度を向上させることにある。本実施例では、図１の実施例と同じ処理ステップについて詳細に記述せず、相違点のみを重点的に記述する。 Another embodiment of the present invention provides a method for three-dimensional human body reconstruction. The difference between the reconstruction flow of this embodiment and the embodiment of FIG. 1 is that the flow of performing human body geometric reconstruction based on a single human body image of the target human body in step 100 is improved to improve the geometric reconstruction accuracy of the three-dimensional mesh model of the target human body obtained by reconstruction. In this embodiment, the same processing steps as in the embodiment of FIG. 1 will not be described in detail, and only the differences will be described.

図６に示すように、図２に示すネットワーク構造に加えて、第２深層ニューラルネットワークブランチ６１を追加する。当該第２深層ニューラルネットワークブランチ６１は、局所特徴サブネットワーク６１１および第２フィッティングサブネットワーク６１２を含んでもよい。目標人体の単一の人体画像２１から局所領域の画像を抽出して局所画像６２を取得してもよい。第２深層ニューラルネットワークは、当該局所画像６２に対して三次元再構成を行うためのものである。 As shown in FIG. 6, in addition to the network structure shown in FIG. 2, a second deep neural network branch 61 is added. The second deep neural network branch 61 may include a local feature sub-network 611 and a second fitting sub-network 612. A local image 62 may be obtained by extracting an image of a local region from a single human body image 21 of a target human body. The second deep neural network is for performing three-dimensional reconstruction on the local image 62.

説明すべきことは、ここでの局所画像に含まれる目標人体の人体領域は、ステップ１０２における局所幾何再構成に対応する局所部位と完全に同じでなくてもよい。例えば、ここでの局所画像が目標人体の肩部以上の領域範囲を含んでもよい一方、ステップ１０２における再構成の局所部位は、目標人体の顔であってもよい。もちろん、図６において目標人体の肩部以上に対して再構成を行うことは、単に例示であり、目標人体の他の人体領域に対して微細化幾何再構成を行ってもよい。 It should be noted that the body region of the target body included in the local image here may not be exactly the same as the local portion corresponding to the local geometric reconstruction in step 102. For example, the local image here may include a region range from the shoulders of the target body or higher, while the local portion of the reconstruction in step 102 may be the face of the target body. Of course, performing reconstruction on the shoulders of the target body or higher in FIG. 6 is merely an example, and refined geometric reconstruction may be performed on other body regions of the target body.

具体的に、引き続き図６を参照すると、第１深層ニューラルネットワークブランチ２２を介して再構成を行って第１人体モデルを取得し、局所画像６２を第２深層ニューラルネットワークブランチ６１に入力し、局所特徴サブネットワーク６１１を介して前記局所画像に対して特徴抽出を行い、第２画像特徴を取得する。次に、第２フィッティングサブネットワーク６１２を介して、前記第２画像特徴と第１フィッティングサブネットワーク２２２から出力された中間特徴とに基づいて、第２人体モデルを取得する。前記中間特徴は、第１フィッティングサブネットワーク２２２における一部のネットワーク構造から出力された特徴であってもよい。例示として、第１フィッティングサブネットワーク２２２が一定の数の全結合層を含むとすれば、その中の一部の数の全接続層の出力を前記中間特徴として第２フィッティングサブネットワーク６１２に入力してもよい。 Specifically, still referring to FIG. 6, a first human body model is obtained by performing reconstruction through the first deep neural network branch 22, a local image 62 is input to the second deep neural network branch 61, and feature extraction is performed on the local image through the local feature sub-network 611 to obtain a second image feature. Next, a second human body model is obtained through the second fitting sub-network 612 based on the second image feature and the intermediate feature output from the first fitting sub-network 222. The intermediate feature may be a feature output from a part of the network structure in the first fitting sub-network 222. For example, if the first fitting sub-network 222 includes a certain number of fully connected layers, the output of a part of the fully connected layers may be input to the second fitting sub-network 612 as the intermediate feature.

例示として、第２深層ニューラルネットワークブランチ６１の構造は、第１深層ニューラルネットワークブランチ２２の構造と基本的に同じであってもよい。例えば、第１深層ニューラルネットワークブランチ２２におけるグローバル特徴サブネットワーク２２１は、４つのＢｌｏｃｋを含んでもよく、各Ｂｌｏｃｋは、一定の数の畳み込み層、プーリング層などの特徴抽出層を含んでもよい一方、第２深層ニューラルネットワークブランチ６１における局所特徴サブネットワーク６１１は、１つの上記Ｂｌｏｃｋを含んでもよい。第１人体モデルと第２人体モデルとが取得された後、次に、第１人体モデルと第２人体モデルとを融合し、融合人体モデルを取得してもよい。引き続き、当該融合人体モデルに対してメッシュ化処理を行い、目標人体の三次元メッシュモデルを取得する。 For example, the structure of the second deep neural network branch 61 may be basically the same as that of the first deep neural network branch 22. For example, the global feature sub-network 221 in the first deep neural network branch 22 may include four blocks, each of which may include a certain number of feature extraction layers, such as convolution layers, pooling layers, etc., while the local feature sub-network 611 in the second deep neural network branch 61 may include one of the above blocks. After the first human body model and the second human body model are obtained, the first human body model and the second human body model may then be fused to obtain a fused human body model. Then, a meshing process is performed on the fused human body model to obtain a three-dimensional mesh model of the target human body.

本実施例の三次元人体再構成方法では、目標人体の局所部位に対して局所幾何再構成を行って局所部位の再構成効果を向上させるだけでなく、目標人体の単一の人体画像に基づいて再構成を行ってユーザの協力手順を簡素化する。また、さらに第２深層ニューラルネットワークを介して局所画像を再構成するため、目標人体の局所人体領域に対する再構成効果を向上させた。 In the 3D human body reconstruction method of this embodiment, not only is local geometric reconstruction performed on a local portion of the target human body to improve the reconstruction effect of the local portion, but reconstruction is performed based on a single human body image of the target human body to simplify the user's collaboration procedure. In addition, the local image is reconstructed via a second deep neural network, thereby improving the reconstruction effect on the local human body region of the target human body.

本発明のさらに別の実施例は、三次元人体再構成の方法を提供する。当該さらに別の実施例の再構成フローは、図１の実施例と比較すると、具体的なディープラーニングネットワークを介して人体裏面テクスチャの予測を行う方式を提供する。本実施例では、図１の実施例と同じ処理ステップについて詳細に記述せず、相違点のみを重点的に記述する。 Yet another embodiment of the present invention provides a method for 3D human body reconstruction. Compared with the embodiment of FIG. 1, the reconstruction flow of this yet another embodiment provides a method for predicting the texture of the rear surface of the human body through a specific deep learning network. In this embodiment, the same processing steps as in the embodiment of FIG. 1 will not be described in detail, and only the differences will be described.

図７に示すように、目標人体の単一の人体画像が背景画像と人体の正面テクスチャとを含む場合がある。この場合に、まず画像分割を行って人体の正面テクスチャを切り出して、次に当該正面テクスチャに基づいて人体の裏面テクスチャを予測してもよい。例えば、目標人体の正面画像７１に対して人体分割を行い、第１分割マスク７２と、分割後の目標人体の正面テクスチャ７３とを取得してもよい。また、当該第１分割マスク７２を水平反転して第２分割マスク７４を取得し、正面テクスチャ７３と、第１分割マスク７２と、第２分割マスク７４とをテクスチャ生成ネットワーク７５に入力し、当該テクスチャ生成ネットワーク７５から出力された目標人体の裏面テクスチャを最終的に取得する。 As shown in FIG. 7, a single human body image of a target human body may include a background image and a front texture of the human body. In this case, image segmentation may be performed to extract the front texture of the human body, and then the back texture of the human body may be predicted based on the front texture. For example, a front image 71 of the target human body may be segmented to obtain a first segmented mask 72 and a front texture 73 of the target human body after segmentation. The first segmented mask 72 may be horizontally inverted to obtain a second segmented mask 74, and the front texture 73, the first segmented mask 72, and the second segmented mask 74 may be input to a texture generation network 75, and the back texture of the target human body output from the texture generation network 75 may finally be obtained.

また、図７は、第１分割マスク７２を水平反転して第２分割マスク７４を取得することを例としたが、実際の実施においてこれに限定されない。例えば、目標人体の正面画像を予めトレーニングされた１つのニューラルネットワークに入力して、当該ニューラルネットワークが第１分割マスクおよび第２分割マスクを直接出力してもよい。目標人体の正面テクスチャおよび裏面テクスチャが取得された後、当該正面テクスチャおよび裏面テクスチャを人体の初期三次元モデルにマッピングすることにより、目標人体の三次元人体モデルを取得してもよい。 In addition, while FIG. 7 illustrates an example in which the first segmented mask 72 is horizontally flipped to obtain the second segmented mask 74, this is not limited to the actual implementation. For example, a front image of the target human body may be input to a pre-trained neural network, which directly outputs the first and second segmented masks. After the front texture and back texture of the target human body are obtained, the front texture and back texture may be mapped onto an initial three-dimensional model of the human body to obtain a three-dimensional human body model of the target human body.

上記テクスチャ生成ネットワーク７５のトレーニング手順は、以下の処理を含んでもよい。図８を参照すると、支援テクスチャ生成ネットワーク７６を用いてもよい。当該支援テクスチャ生成ネットワーク７６は、一部のテクスチャ生成ネットワーク７５のネットワーク構造を含んでもよい。例えば、テクスチャ生成ネットワーク７５は、支援テクスチャ生成ネットワーク７６を基に一定の数の畳み込み層を追加したものであってもよい。 The training procedure of the texture generating network 75 may include the following processes. Referring to FIG. 8, a supporting texture generating network 76 may be used. The supporting texture generating network 76 may include a network structure of a part of the texture generating network 75. For example, the texture generating network 75 may be based on the supporting texture generating network 76 with a certain number of convolutional layers added.

トレーニング時に、トレーニングサンプル画像セットにおける支援人体画像、第３サンプル分割マスクおよび第４サンプル分割マスクに基づいて、支援テクスチャ生成ネットワークをトレーニングし、当該支援テクスチャ生成ネットワークのトレーニングが完了した後、支援テクスチャ生成ネットワークの少なくとも一部のネットワークパラメータをテクスチャ生成ネットワークの一部の初期化ネットワークパラメータとし、人体サンプルの正面テクスチャ、第１サンプル分割マスクおよび第２サンプル分割マスクに基づいて、前記テクスチャ生成ネットワークをトレーニングしてもよい。ここで、支援人体画像は、人体サンプルの単一の画像に対して解像度を低減することで取得されたものである。第１サンプル分割マスクは、人体サンプルの正面テクスチャのマスク領域に対応し、第２サンプル分割マスクは、人体サンプルの裏面テクスチャのマスク領域に対応し、第３サンプル分割マスクは、前記支援人体画像中の人体の正面テクスチャのマスク領域に対応し、前記第４サンプル分割マスクは、前記支援人体画像中の人体の裏面テクスチャのマスク領域に対応する。 During training, the assisting texture generation network may be trained based on the assisting human body image, the third sample segmentation mask, and the fourth sample segmentation mask in the training sample image set, and after the training of the assisting texture generation network is completed, at least some of the network parameters of the assisting texture generation network may be set as some of the initialization network parameters of the texture generation network, and the texture generation network may be trained based on the front texture of the human body sample, the first sample segmentation mask, and the second sample segmentation mask. Here, the assisting human body image is obtained by reducing the resolution of a single image of the human body sample. The first sample segmentation mask corresponds to a mask region of the front texture of the human body sample, the second sample segmentation mask corresponds to a mask region of the back texture of the human body sample, the third sample segmentation mask corresponds to a mask region of the front texture of the human body in the assisting human body image, and the fourth sample segmentation mask corresponds to a mask region of the back texture of the human body in the assisting human body image.

引き続き図８を参照すると、支援人体画像８１に対して画像分割を行い、支援人体画像８１中の人体の正面テクスチャ８２、第３サンプル分割マスク８３および第４サンプル分割マスク８４を取得し、それらを支援テクスチャ生成ネットワーク７６に入力して支援人体画像８１中の人体の裏面テクスチャの第１予測値を取得し、さらに第１予測値と前記支援人体画像８１中の人体の裏面テクスチャの第１真の値とに基づいて、前記支援テクスチャ生成ネットワーク７６のネットワークパラメータを調整してもよい。複数回繰り返せば、トレーニングが完了した支援テクスチャ生成ネットワーク７６を取得することができる。ここで、支援テクスチャ生成ネットワークに対するトレーニング監督は、第１予測値および第１真の値に基づいて算出された損失Ｌｏｓｓの他に、第１予測値に基づく他の損失、例えば、支援人体画像および第１予測値のテクスチャ特徴に基づいて算出された特徴損失なども含んでもよい。前記支援人体画像は、図７における人体正面画像７１に対して解像度を低減することで取得されてもよい。従って、支援人体画像８１中の人体の正面テクスチャ８２の解像度も図７中の正面テクスチャ７３の解像度よりも低い。前記第３サンプル分割マスクは、前記支援人体画像中の人体の正面テクスチャのマスク領域に対応し、前記第４サンプル分割マスクは、支援人体画像中の人体の裏面テクスチャのマスク領域に対応する。 Continuing to refer to FIG. 8, image segmentation may be performed on the supporting human body image 81 to obtain a front texture 82 of the human body in the supporting human body image 81, a third sample segmentation mask 83, and a fourth sample segmentation mask 84, which may be input to the supporting texture generation network 76 to obtain a first predicted value of the back texture of the human body in the supporting human body image 81, and the network parameters of the supporting texture generation network 76 may be adjusted based on the first predicted value and the first true value of the back texture of the human body in the supporting human body image 81. By repeating this process multiple times, a training-completed supporting texture generation network 76 may be obtained. Here, the training supervision for the supporting texture generation network may include, in addition to the loss Loss calculated based on the first predicted value and the first true value, other losses based on the first predicted value, such as feature losses calculated based on the texture features of the supporting human body image and the first predicted value. The supporting human body image may be obtained by reducing the resolution of the human body front image 71 in FIG. 7. Therefore, the resolution of the front texture 82 of the human body in the supporting human body image 81 is also lower than the resolution of the front texture 73 in FIG. 7. The third sample division mask corresponds to a mask area of the front texture of the human body in the supporting human body image, and the fourth sample division mask corresponds to a mask area of the back texture of the human body in the supporting human body image.

前記支援テクスチャ生成ネットワークのトレーニングが完了した後、支援テクスチャ生成ネットワークのネットワークパラメータをテクスチャ生成ネットワークの一部のネットワークパラメータの初期化としてもよい。即ち、テクスチャ生成ネットワークのネットワークパラメータは、トレーニングが完了した前記支援テクスチャ生成ネットワークの少なくとも一部のネットワークパラメータを含む。即ち、支援テクスチャ生成ネットワークとテクスチャ生成ネットワークとは、一部のネットワーク重みを共有する。その後、テクスチャ生成ネットワークをトレーニングするためのトレーニングサンプル画像セットにおける人体正面テクスチャ、第１サンプル分割マスクおよび第２サンプル分割マスクを前記テクスチャ生成ネットワークに入力し、人体サンプルの裏面テクスチャの第２予測値を取得する。前記第２予測値と前記裏面テクスチャの第２真の値とに基づいて、テクスチャ生成ネットワークのネットワークパラメータを調整する。前記第２真の値の解像度は、第１真の値の解像度よりも高く、即ち、テクスチャ生成ネットワークから出力された裏面テクスチャの解像度は、支援テクスチャ生成ネットワークから出力された裏面テクスチャの解像度よりも若干高い。 After the training of the assisting texture generation network is completed, the network parameters of the assisting texture generation network may be used as the initialization of some of the network parameters of the texture generation network. That is, the network parameters of the texture generation network include at least some of the network parameters of the assisting texture generation network that has been trained. That is, the assisting texture generation network and the texture generation network share some of the network weights. Then, the human body front texture, the first sample division mask, and the second sample division mask in the training sample image set for training the texture generation network are input to the texture generation network to obtain a second predicted value of the back texture of the human body sample. The network parameters of the texture generation network are adjusted based on the second predicted value and the second true value of the back texture. The resolution of the second true value is higher than the resolution of the first true value, that is, the resolution of the back texture output from the texture generation network is slightly higher than the resolution of the back texture output from the assisting texture generation network.

本実施例の三次元人体再構成方法では、目標人体の局所部位に対して局所幾何再構成を行って局所部位の再構成効果を向上させるだけでなく、目標人体の単一の人体画像に基づいて再構成を行ってユーザの協力手順を簡素化する。また、さらに、ニューラルネットワークを介してテクスチャの予測を自動的に行うことにより、生成されたテクスチャ効果をより良好にする。例えば、人体全身のテクスチャをより均一にし、色をよりリアルにする。そして、支援テクスチャ生成ネットワークをトレーニングしてからテクスチャ生成ネットワークをトレーニングする方式により、テクスチャ生成ネットワークのトレーニング手順がより安定になり、より収束しやすくなる。 In the 3D human body reconstruction method of this embodiment, not only is local geometric reconstruction performed on local parts of the target human body to improve the reconstruction effect of the local parts, but reconstruction is also performed based on a single human body image of the target human body to simplify the user's cooperation procedure. In addition, the texture prediction is automatically performed through a neural network to improve the generated texture effect. For example, the texture of the entire human body is made more uniform and the color is made more realistic. And by training the supporting texture generation network before training the texture generation network, the training procedure of the texture generation network is more stable and more likely to converge.

他の実施例において、再構成の効果を向上させるために、目標人体の複数枚の異なる角度の画像を取得して当該目標人体の三次元再構成を総合的に行ってもよい。例えば、当該目標人体の３枚の画像を取得したことを例とすると、この３枚の画像は、異なる角度で収集されたものであってもよい。図２を参照すると、この３枚の画像をそれぞれグローバル特徴サブネットワーク２２１の入力として、グローバル特徴サブネットワーク２２１から出力された、この３枚の画像にそれぞれ対応する１つの第１画像特徴を取得してもよい。その後、３つの第１画像特徴を融合し、融合された画像特徴を第１フィッティングサブネットワーク２２２の入力として引き続き処理する。 In another embodiment, in order to improve the effect of reconstruction, multiple images of the target human body from different angles may be acquired to comprehensively perform 3D reconstruction of the target human body. For example, take three images of the target human body as an example, and the three images may be collected at different angles. Referring to FIG. 2, the three images may be input to the global feature sub-network 221, and a first image feature corresponding to each of the three images may be obtained from the global feature sub-network 221. Then, the three first image features are fused, and the fused image feature is input to the first fitting sub-network 222 for further processing.

三次元人体再構成が図６に示すネットワーク構造を採用するときに、上記３枚の画像をそれぞれグローバル特徴サブネットワーク２２１の入力とする以外、当該３枚の画像から局所領域を抽出して局所画像を取得し、３つの局所画像をそれぞれ局所特徴サブネットワーク６１１の入力として、局所特徴サブネットワーク６１１から出力された、この３枚の局所画像にそれぞれ対応する第２画像特徴を取得してから、３つの第２画像特徴を融合し、融合で得られた画像特徴を第２フィッティングサブネットワーク６１２の入力として引き続き処理してもよい。 When the 3D human body reconstruction employs the network structure shown in FIG. 6, in addition to using each of the above three images as an input to the global feature sub-network 221, local regions may be extracted from the three images to obtain local images, each of the three local images may be used as an input to the local feature sub-network 611, second image features corresponding to each of the three local images output from the local feature sub-network 611 may be obtained, and the three second image features may then be fused, and the image features obtained by the fusion may be further processed as an input to the second fitting sub-network 612.

以上のように、目標人体の複数枚の異なる角度の画像を取得して当該目標人体の三次元人体再構成を総合的に行うことにより、当該目標人体に対応するより繊細な三次元人体モデルを取得することができる。 As described above, by acquiring multiple images of the target human body from different angles and comprehensively reconstructing the target human body in three dimensions, it is possible to obtain a more detailed three-dimensional human body model corresponding to the target human body.

また、さらに説明すべきことは、本発明のいずれかの実施例に記述された三次元人体再構成方法の各フローステップにおいて、係るニューラルネットワークモデルのいずれについて、個別にトレーニングされてもよい。例えば、第１深層ニューラルネットワークブランチとテクスチャ生成ネットワークとは、それぞれ個別にトレーニングされてもよい。 It should also be noted that in each flow step of the 3D human body reconstruction method described in any embodiment of the present invention, any of the neural network models may be trained separately. For example, the first deep neural network branch and the texture generation network may each be trained separately.

以下では、１つの三次元人体再構成フローの例示を記述する。なお、上記いずれかの方法実施例に記述された手順と同じ処理について、ここで簡単に説明し、詳細な手順は、上記実施例を参照すればよい。 Below, an example of one 3D human body reconstruction flow is described. Note that the same processing steps as those described in any of the above method embodiments are briefly described here, and for detailed procedures, please refer to the above embodiments.

当該例において、ユーザＵ１の単一の人体画像に基づいて当該Ｕ１の三次元人体モデルを構築しようとすると仮定すると、前記単一の人体画像は、ユーザＵ１の正面画像であってもよく、ユーザＵ１の正面テクスチャおよび背景画像を含む。図９を参照すると、ユーザＵ１の単一の人体画像９１は、当該ユーザの正面テクスチャ９２および背景画像９３を含む。 In this example, assuming that a three-dimensional human body model of user U1 is to be constructed based on a single human body image of user U1, the single human body image may be a frontal image of user U1, and includes a frontal texture and a background image of user U1. Referring to FIG. 9, the single human body image 91 of user U1 includes a frontal texture 92 and a background image 93 of the user.

まず、ユーザＵ１の単一の人体画像９１に基づいて２態様の再構成をそれぞれ行ってもよい。 First, two types of reconstruction may be performed based on a single human body image 91 of user U1.

一態様の再構成は、単一の人体画像９１に基づいて人体幾何再構成を行ってＵ１の三次元メッシュモデルおよび人体骨格構造を取得することである。例示として、図６に示すネットワークを介して単一の人体画像９１を処理し、第１深層ニューラルネットワークブランチにおけるグローバル特徴サブネットワークおよび第１フィッティングサブネットワークを介して単一の人体画像９１を処理し、第１人体モデルを取得し、且つ、第２深層ニューラルネットワークブランチにおける局所特徴サブネットワークおよび第２フィッティングサブネットワークを介して単一の人体画像９１中の人体肩部以上の領域の画像を処理し、第２人体モデルを取得してもよい。第１人体モデルと第２人体モデルとを融合した後、融合人体モデルを取得する。融合人体モデルに対してメッシュ化処理を行い、ユーザＵ１の三次元メッシュモデル（ｍｅｓｈ）を取得する。 One aspect of the reconstruction is to perform human body geometric reconstruction based on a single human body image 91 to obtain a three-dimensional mesh model and a human body skeletal structure of U1. For example, the single human body image 91 may be processed through the network shown in FIG. 6, the single human body image 91 may be processed through a global feature sub-network and a first fitting sub-network in a first deep neural network branch to obtain a first human body model, and an image of a region above the human shoulders in the single human body image 91 may be processed through a local feature sub-network and a second fitting sub-network in a second deep neural network branch to obtain a second human body model. After fusing the first human body model and the second human body model, a fused human body model is obtained. A meshing process is performed on the fused human body model to obtain a three-dimensional mesh model (mesh) of the user U1.

もう一つの態様の再構成は、単一の人体画像９１に基づいてユーザＵ１の顔に対して局所幾何再構成を行って顔の三次元メッシュモデルを取得することである。具体的に、単一の人体画像９１に対して特徴抽出を行い、抽出された画像特徴および顔三次元トポロジーテンプレートをグラフ畳み込みニューラルネットワークに入力し、当該ユーザＵ１の顔ｍｅｓｈを取得してもよい。 Another aspect of reconstruction is to perform local geometric reconstruction on the face of user U1 based on a single human body image 91 to obtain a 3D mesh model of the face. Specifically, feature extraction may be performed on the single human body image 91, and the extracted image features and a 3D facial topology template may be input to a graph convolutional neural network to obtain the face mesh of user U1.

次に、上記再構成で得られた顔ｍｅｓｈ（顔の三次元メッシュモデル）とユーザＵ１の人体ｍｅｓｈ（Ｕ１人体の三次元メッシュモデル）とを組み合わせ、両者の融合を行い、Ｕ１の初期三次元モデルを取得してもよい。 Next, the face mesh (three-dimensional mesh model of the face) obtained by the above reconstruction may be combined with the body mesh of user U1 (three-dimensional mesh model of U1's body) and fused to obtain an initial three-dimensional model of U1.

具体的に、図３の模式的なフローに基づいて、顔部のキーポイントを考慮し、キーポイントの、顔ｍｅｓｈと人体ｍｅｓｈとのそれぞれにおける対応する各モデルキーポイントの識別子および位置を特定し、これらのモデルキーポイントの識別子および位置、モデルのカメラ外部パラメータなどのパラメータに基づいて、モデルの間の座標変換関係を特定してもよい。当該座標変換関係に基づいて、顔ｍｅｓｈを人体ｍｅｓｈの座標系に変換し、顔ｍｅｓｈで人体ｍｅｓｈ中の顔を置き換え、ポアソン再構成によって顔ｍｅｓｈと人体ｍｅｓｈとを融合し、ユーザＵ１の初期三次元モデルを取得する。 Specifically, based on the schematic flow of FIG. 3, the key points of the face may be considered, the identifiers and positions of the corresponding model key points in the face mesh and the human body mesh may be identified, and a coordinate transformation relationship between the models may be identified based on the identifiers and positions of these model key points and parameters such as the external camera parameters of the models. Based on the coordinate transformation relationship, the face mesh is transformed into the coordinate system of the human body mesh, the face in the human body mesh is replaced with the face mesh, and the face mesh and the human body mesh are fused by Poisson reconstruction to obtain an initial three-dimensional model of user U1.

その後、上記初期三次元モデルとユーザＵ１の単一の人体画像９１とに基づいて、Ｕ１の人体テクスチャの再構成を行う。ここで、単一の人体画像９１がユーザＵ１の正面テクスチャであるため、当該正面テクスチャに基づいてＵ１の裏面テクスチャを予測してもよい。 Then, the human body texture of U1 is reconstructed based on the initial 3D model and the single human body image 91 of user U1. Here, since the single human body image 91 is the front texture of user U1, the back texture of U1 may be predicted based on the front texture.

具体的に、単一の人体画像９１に対して人体分割を行い、背景画像が除去された人体正面テクスチャと、人体正面テクスチャ領域を表すための第１分割マスクとを取得し、第１分割マスクを反転して、人体裏面テクスチャ領域を表すための第２分割マスクを取得してもよい。次に、当該人体正面テクスチャ、第１分割マスクおよび第２分割マスクを予めトレーニングされたテクスチャ生成ネットワークに入力してユーザＵ１の裏面テクスチャを取得する。最後に、当該正面テクスチャおよび裏面テクスチャに基づいて初期三次元モデルに対してテクスチャマッピングを行い、モデルの隙間領域にテクスチャの充填および補完を行い、テクスチャを有するＵ１の三次元人体モデルを最終的に取得する。 Specifically, a single human body image 91 may be subjected to human body segmentation to obtain a human body front texture from which the background image has been removed and a first segmentation mask for representing the human body front texture region, and the first segmentation mask may be inverted to obtain a second segmentation mask for representing the human body rear texture region. Next, the human body front texture, the first segmentation mask, and the second segmentation mask may be input into a pre-trained texture generation network to obtain the rear texture of user U1. Finally, texture mapping may be performed on the initial three-dimensional model based on the front texture and the rear texture, and the gap regions of the model may be filled and complemented with texture, to finally obtain a three-dimensional human body model of U1 having texture.

構築された三次元人体モデルに対してモデル駆動を便利に行うために、さらに、再構成で得られたＵ１の三次元人体モデルと、Ｕ１の三次元メッシュモデルを再構成するときに得られた人体骨格構造とを使用し、三次元人体モデルのスキニング重みを算出してもよい。後は、当該スキニング重みにより、モデルが動作を実行するように駆動してもよい。 To facilitate model driving of the constructed 3D human body model, the 3D human body model of U1 obtained by reconstruction and the human body skeletal structure obtained when reconstructing the 3D mesh model of U1 may be used to calculate skinning weights for the 3D human body model. The model may then be driven to perform an action using the skinning weights.

図１０は、三次元人体再構成装置の構造模式図を例示する。図１０に示すように、当該装置は、全体再構成モジュール１００１、局所再構成モジュール１００２、融合処理モジュール１００３およびテクスチャ再構成モジュール１００４を備えてもよい。 Figure 10 illustrates a structural schematic diagram of a 3D human body reconstruction device. As shown in Figure 10, the device may include a global reconstruction module 1001, a local reconstruction module 1002, a fusion processing module 1003, and a texture reconstruction module 1004.

全体再構成モジュール１００１は、目標人体の単一の人体画像に基づいて人体幾何再構成を行い、前記目標人体の三次元メッシュモデルを取得する。 The overall reconstruction module 1001 performs body geometry reconstruction based on a single body image of a target body to obtain a three-dimensional mesh model of the target body.

局所再構成モジュール１００２は、前記目標人体の単一の人体画像に基づいて前記目標人体の局所部位に対して局所幾何再構成を行い、前記局所部位の三次元メッシュモデルを取得する。 The local reconstruction module 1002 performs local geometric reconstruction of a local portion of the target body based on a single body image of the target body, and obtains a three-dimensional mesh model of the local portion.

融合処理モジュール１００３は、前記局所部位の三次元メッシュモデルと前記目標人体の三次元メッシュモデルとを融合し、初期三次元モデルを取得する。 The fusion processing module 1003 fuses the 3D mesh model of the local area with the 3D mesh model of the target human body to obtain an initial 3D model.

テクスチャ再構成モジュール１００４は、前記初期三次元モデルと前記単一の人体画像とに基づいて前記目標人体の人体テクスチャの再構成を行い、前記目標人体の三次元人体モデルを取得する。 The texture reconstruction module 1004 reconstructs the body texture of the target body based on the initial 3D model and the single body image to obtain a 3D body model of the target body.

一例において、全体再構成モジュール１００１は、前記目標人体の三次元メッシュモデルを取得するときに、具体的に、第１深層ニューラルネットワークブランチを介して前記目標人体の単一の人体画像に対して三次元再構成を行い、第１人体モデルを取得し、第２深層ニューラルネットワークブランチを介して前記単一の人体画像中の局所画像に対して三次元再構成を行い、第２人体モデルを取得し、前記第１人体モデルと前記第２人体モデルとを融合し、融合人体モデルを取得し、前記融合人体モデルに対してメッシュ化処理を行い、前記目標人体の三次元メッシュモデルを取得するために用いられる。前記局所画像は、前記目標人体の局所領域を含む。 In one example, when obtaining a three-dimensional mesh model of the target human body, the overall reconstruction module 1001 is specifically used to perform three-dimensional reconstruction on a single human body image of the target human body via a first deep neural network branch to obtain a first human body model, perform three-dimensional reconstruction on a local image in the single human body image via a second deep neural network branch to obtain a second human body model, fuse the first human body model and the second human body model to obtain a fused human body model, and perform a meshing process on the fused human body model to obtain a three-dimensional mesh model of the target human body. The local image includes a local region of the target human body.

一例において、局所再構成モジュール１００２は、具体的に、前記目標人体の単一の人体画像に対して特徴抽出を行い、第３画像特徴を取得し、前記第３画像特徴と前記局所部位の三次元トポロジーテンプレートとに基づいて、前記局所部位の三次元メッシュモデルを特定するために用いられる。 In one example, the local reconstruction module 1002 is specifically used to perform feature extraction on a single human body image of the target human body, obtain third image features, and identify a three-dimensional mesh model of the local region based on the third image features and a three-dimensional topology template of the local region.

一例において、融合処理モジュール１００３は、具体的に、前記目標人体の単一の人体画像に基づいて、前記局所部位の複数のキーポイントを取得し、前記目標人体の三次元メッシュモデルにおける、前記複数のキーポイントに対応する第１モデルキーポイントの情報を特定し、且つ、前記局所部位の三次元メッシュモデルにおける、前記複数のキーポイントに対応する第２モデルキーポイントの情報を特定し、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するために用いられる。 In one example, the fusion processing module 1003 is specifically used to obtain multiple key points of the local region based on a single human body image of the target human body, identify information of first model key points corresponding to the multiple key points in a three-dimensional mesh model of the target human body, and identify information of second model key points corresponding to the multiple key points in the three-dimensional mesh model of the local region, and fuse the three-dimensional mesh model of the local region with the three-dimensional mesh model of the target human body based on the information of the first model key points and the information of the second model key points to obtain the initial three-dimensional model.

一例において、融合処理モジュール１００３は、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するときに、具体的に、前記第１モデルキーポイントの情報と前記第２モデルキーポイントの情報とに基づいて、前記目標人体の三次元メッシュモデルと前記局所部位の三次元メッシュモデルとの間の座標変換関係を特定し、前記座標変換関係に基づいて、前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルの座標系に変換し、変換後の座標系において前記局所部位の三次元メッシュモデルを前記目標人体の三次元メッシュモデルと融合し、前記初期三次元モデルを取得するために用いられる。 In one example, the fusion processing module 1003 is used to fuse the 3D mesh model of the local region with the 3D mesh model of the target human body based on the information of the first model keypoints and the information of the second model keypoints to obtain the initial 3D model. Specifically, the fusion processing module 1003 is used to determine a coordinate transformation relationship between the 3D mesh model of the target human body and the 3D mesh model of the local region based on the information of the first model keypoints and the information of the second model keypoints, transform the 3D mesh model of the local region into the coordinate system of the 3D mesh model of the target human body based on the coordinate transformation relationship, fuse the 3D mesh model of the local region with the 3D mesh model of the target human body in the transformed coordinate system, and obtain the initial 3D model.

一例において、テクスチャ再構成モジュール１００４は、具体的に、前記単一の人体画像に対して人体分割を行い、第１分割マスクと、第２分割マスクと、目標人体の正面テクスチャとを取得し、前記正面テクスチャと、前記第１分割マスクと、前記第２分割マスクとをテクスチャ生成ネットワークに入力し、前記目標人体の裏面テクスチャを取得し、前記裏面テクスチャと前記正面テクスチャとに基づいて、前記目標人体に対応する、テクスチャを有する三次元人体モデルを取得するために用いられ、前記第１分割マスクは、前記正面テクスチャのマスク領域に対応し、前記第２分割マスクは、目標人体の裏面テクスチャのマスク領域に対応する。 In one example, the texture reconstruction module 1004 specifically performs body segmentation on the single human body image, obtains a first segmentation mask, a second segmentation mask, and a front texture of a target human body, inputs the front texture, the first segmentation mask, and the second segmentation mask into a texture generation network to obtain a back texture of the target human body, and is used to obtain a three-dimensional human body model having texture corresponding to the target human body based on the back texture and the front texture, where the first segmentation mask corresponds to a mask area of the front texture and the second segmentation mask corresponds to a mask area of the back texture of the target human body.

一例において、図１１に示すように、当該装置は、モデルトレーニングモジュール１００５をさらに備えてもよい。 In one example, as shown in FIG. 11, the device may further include a model training module 1005.

モデルトレーニングモジュール１００５は、前記テクスチャ生成ネットワークのトレーニングを行うためのものであり、具体的に、トレーニングサンプル画像セットにおける人体サンプルの単一の画像に対して人体分割を行い、第１サンプル分割マスクと、第２サンプル分割マスクと、前記人体サンプルの正面テクスチャとを取得し、前記人体サンプルの単一の画像の解像度を低減することで取得された支援人体画像中の人体の正面テクスチャと、第３サンプル分割マスクと、第４サンプル分割マスクとに基づいて、支援テクスチャ生成ネットワークをトレーニングし、前記支援テクスチャ生成ネットワークのトレーニングが完了した後、前記人体サンプルの正面テクスチャと、前記第１サンプル分割マスクと、前記第２サンプル分割マスクとに基づいて、前記テクスチャ生成ネットワークをトレーニングするために用いられる。前記第１サンプル分割マスクは、前記人体サンプルの正面テクスチャのマスク領域に対応し、前記第２サンプル分割マスクは、前記人体サンプルの裏面テクスチャのマスク領域に対応し、前記第３サンプル分割マスクは、前記支援人体画像中の人体の正面テクスチャのマスク領域に対応し、前記第４サンプル分割マスクは、前記支援人体画像中の人体の裏面テクスチャのマスク領域に対応し、前記テクスチャ生成ネットワークのネットワークパラメータは、トレーニングが完了した前記支援テクスチャ生成ネットワークの少なくとも一部のネットワークパラメータを含む。 The model training module 1005 is for training the texture generation network, and is specifically used to perform body segmentation on a single image of a human body sample in a training sample image set, obtain a first sample segmentation mask, a second sample segmentation mask, and a frontal texture of the human body sample, train the supporting texture generation network based on the frontal texture of the human body in the supporting human body image obtained by reducing the resolution of the single image of the human body sample, a third sample segmentation mask, and a fourth sample segmentation mask, and after the training of the supporting texture generation network is completed, train the texture generation network based on the frontal texture of the human body sample, the first sample segmentation mask, and the second sample segmentation mask. The first sample division mask corresponds to a mask region of the front texture of the human body sample, the second sample division mask corresponds to a mask region of the back texture of the human body sample, the third sample division mask corresponds to a mask region of the front texture of the human body in the supporting human body image, and the fourth sample division mask corresponds to a mask region of the back texture of the human body in the supporting human body image, and the network parameters of the texture generation network include at least a portion of the network parameters of the supporting texture generation network that has completed training.

いくつかの実施例において、上記装置は、上述したいずれかの方法を実行することが可能であり、簡潔のために、ここで繰り返し説明しない。 In some embodiments, the apparatus may perform any of the methods described above, which will not be repeated here for the sake of brevity.

本発明の実施例は、電子デバイスをさらに提供する。前記電子デバイスは、メモリと、プロセッサとを備え、前記メモリは、コンピュータ可読命令を記憶するために用いられ、前記プロセッサは、前記コンピュータ命令を呼び出すことにより、本明細書のいずれかの実施例の方法を実施するために用いられる。 An embodiment of the present invention further provides an electronic device. The electronic device includes a memory and a processor, the memory being adapted to store computer-readable instructions, and the processor being adapted to implement a method of any embodiment of the present specification by invoking the computer instructions.

本発明の実施例は、コンピュータ可読記憶媒体をさらに提供する。当該コンピュータ可読記憶媒体には、コンピュータプログラムが記憶され、前記コンピュータプログラムがプロセッサによって実行されると、本明細書のいずれかの実施例の方法が実施される。 An embodiment of the present invention further provides a computer-readable storage medium. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, a method according to any of the embodiments of the present specification is performed.

当業者であれば理解できるように、本発明の１つまたは複数の実施例は、方法、システムまたはコンピュータプログラム製品として提供され得る。当該コンピュータプログラム製品は、コンピュータプログラムを含み、当該コンピュータプログラムがプロセッサによって実行されると、本明細書のいずれかの実施例の方法が実施され得る。したがって、本発明の１つまたは複数の実施例は、１００％ハードウェアの実施例、１００％ソフトウェアの実施例、またはソフトウェアとハードウェアとを組み合わせた態様の実施例の形式を採用してもよい。また、本発明の１つまたは複数の実施例は、１つまたは複数の、コンピュータ利用可能なプログラムコードを含むコンピュータ利用可能な記憶媒体（磁気ディスクメモリ、ＣＤ－ＲＯＭ、光学メモリなどを含むが、それらに限定されない）上で実施されるコンピュータプログラム製品の形式を採用してもよい。 As will be appreciated by those skilled in the art, one or more embodiments of the present invention may be provided as a method, a system, or a computer program product. The computer program product includes a computer program that, when executed by a processor, may perform the method of any of the embodiments herein. Thus, one or more embodiments of the present invention may take the form of a 100% hardware embodiment, a 100% software embodiment, or an embodiment that combines software and hardware aspects. Also, one or more embodiments of the present invention may take the form of a computer program product embodied on one or more computer usable storage media (including, but not limited to, magnetic disk memory, CD-ROM, optical memory, etc.) that includes computer usable program code.

本発明の実施例に記載の「および／または」は、両者のうちの１つを少なくとも有することを表す。例えば、「Ａおよび／またはＢ」は、Ａ、Ｂ、および「ＡとＢ」という３つの形態を含む。 The term "and/or" in the examples of the present invention means that at least one of the two is present. For example, "A and/or B" includes the three forms A, B, and "A and B."

本発明における各実施例は、いずれも漸進の方式で記述され、各実施例は、他の実施例との相違点を重点的に説明し、各実施例同士の同じまたは類似する部分は互いに参照すればよい。特にデータ処理デバイスの実施例は、方法実施例に基本的に類似するため、記述が相対的に簡単であり、関連箇所については方法実施例の一部の説明を参照すればよい。 Each embodiment of the present invention is described in an incremental manner, with the focus being on the differences between each embodiment and the other embodiments, and reference may be made to the same or similar parts between the embodiments. In particular, the data processing device embodiment is essentially similar to the method embodiment, and therefore the description is relatively simple, and reference may be made to the partial description of the method embodiment for the relevant parts.

以上、本発明の特定の実施例について記述した。他の実施例は、添付する特許請求の範囲の範囲内に含まれる。いくつかの場合において、特許請求の範囲に記載の行為またはステップは、実施例における順番と異なる順番で実行してもよく、依然として所望の結果を得ることができる。また、図面に描かれた手順は、示された特定の順番または連続順番でないと所望の結果を得られないことを要求するとは限らない。いくつかの実施形態において、マルチタスク処理および並行処理も可能または有利である。 Specific embodiments of the present invention have been described above. Other embodiments are within the scope of the appended claims. In some cases, the acts or steps recited in the claims may be performed in an order different from that shown in the examples and still achieve desirable results. Also, the procedures depicted in the figures do not necessarily require the particular order or sequential order shown to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or advantageous.

本発明に記述されたテーマおよび機能操作の実施例は、デジタル電子回路、タンジブルに具現化されたコンピュータソフトウェアもしくはファームウェア、本発明に開示された構造およびその構造的均等物を含むコンピュータハードウェア、またはそれらのうちの１つまたは複数の組み合わせにおいて実現され得る。本明細書に記述されたテーマの実施例は、１つまたは複数のコンピュータプログラム、即ち、有形の非一時的なプログラムキャリア上にコーディングされることでデータ処理装置によって実行され、またはデータ処理装置の操作を制御するコンピュータプログラム命令における１つまたは複数のモジュールとして実現され得る。代替的にまたは追加的に、プログラム命令は、人工で生成された伝送信号、例えば機器で生成された電気、光または電磁的信号にコーディングされてもよい。当該信号は、生成されることで情報を符号化して適切な受信機装置へ伝送してデータ処理装置に実行させる。コンピュータ記憶媒体は、機器可読記憶デバイス、機器可読記憶基板、ランダムもしくはシリアルアクセスメモリデバイス、またはそれらのうちの１つまたは複数の組み合わせであってもよい。 The subject matter and functional operations described herein may be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed herein and structural equivalents thereof, or any combination of one or more of them. The subject matter described herein may be implemented as one or more modules in one or more computer programs, i.e., computer program instructions coded on a tangible, non-transitory program carrier for execution by or control of the operation of a data processing device. Alternatively or additionally, the program instructions may be coded into an artificially generated transmission signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode information and transmit it to a suitable receiver device for execution by the data processing device. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or any combination of one or more of them.

本発明に記述された処理および論理フローは、入力データに応じて操作を行って出力を生成して対応する機能を実行する、１つまたは複数のコンピュータプログラムを実行する１つまたは複数のプログラマブルコンピュータによって実施され得る。前記処理および論理フローは、専用論理回路、例えばＦＰＧＡ（フィールドプログラマブルゲートアレイ）またはＡＳＩＣ（特定用途向け集積回路）によって実行されてもよく、装置も専用論理回路として実現されてもよい。 The processes and logic flows described herein may be implemented by one or more programmable computers executing one or more computer programs that operate on input data to generate output and perform corresponding functions. The processes and logic flows may also be implemented by special purpose logic circuitry, such as an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit), and devices may also be implemented as special purpose logic circuitry.

コンピュータプログラムの実行に適するコンピュータは、例えば、汎用および／または専用マイクロプロセッサ、またはいかなる他のタイプの中央処理装置を含む。通常、中央処理装置は、読み出し専用メモリおよび／またはランダムアクセスメモリから命令およびデータを受信する。コンピュータの基本コンポーネントは、命令を実施や実行するための中央処理ユニットと、命令およびデータを記憶するための１つまたは複数のメモリデバイスとを備える。通常、コンピュータは、さらに、データを記憶するための１つまたは複数の大容量記憶デバイス、例えば、磁気ディスク、磁光ディスクまたは光ディスクなどを含み、または、コンピュータは、この大容量記憶デバイスに操作可能にカップリングされてデータを受信したり伝送したりし、または、２種の状況を兼ね備える。しかし、コンピュータは、このようなデバイスを必ず有するとは限らない。また、コンピュータは、別のデバイス、例えば、携帯電話、パーソナルデジタルアシスタント（ＰＤＡ）、モバイルオーディオまたはビデオプレーヤ、ゲームコンソール、全地球測位システム（ＧＰＳ）受信機、または、例えばユニバーサルシリアルバス（ＵＳＢ）フラッシュメモリドライバの携帯型記憶デバイスに組み込まれてもよい。以上は、単にいくつかの例である。 A computer suitable for executing a computer program may include, for example, a general-purpose and/or dedicated microprocessor, or any other type of central processing unit. Typically, the central processing unit receives instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing and executing instructions, and one or more memory devices for storing instructions and data. Typically, a computer also includes one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, for storing data, or the computer is operatively coupled to the mass storage device to receive and transmit data, or the two situations may be combined. However, a computer does not necessarily have such devices. A computer may also be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a portable storage device, such as a universal serial bus (USB) flash memory driver. These are just a few examples.

コンピュータプログラム命令およびデータを記憶するのに適するコンピュータ可読媒体は、あらゆる形態の不揮発性メモリと、メディアと、メモリデバイスとを含み、例えば、半導体メモリデバイス（例えば、ＥＰＲＯＭ、ＥＥＰＲＯＭ、フラッシュメモリデバイス）、磁気ディスク（例えば、内部ハードディスクまたはリムーバブルディスク）、磁光ディスクおよびＣＤＲＯＭとＤＶＤ－ＲＯＭディスクを含む。プロセッサとメモリは、専用論理回路によって補充されまたは専用論理回路に統合されてもよい。 Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, flash memory devices), magnetic disks (e.g., internal hard disks or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and memory may be supplemented by or integrated into special purpose logic circuitry.

本発明が大量の具体的な実施詳細を含むが、これらの詳細は、いかなる開示範囲または保護請求される範囲を制限するとは解釈されるべきではなく、主に特定の開示された具体的な実施例の特徴を記述するために用いられる。本発明の複数の実施例に記述されたいくつかの特徴は、単一の実施例において組み合わせて実施されてもよい。その一方、単一の実施例に記述された各種の特徴は、複数の実施例に分けて実施され、または、いかなる適切なサブ組み合わせとして実施されてもよい。また、特徴が上記のようにいくつかの組み合わせにおいて役割を果たし、当初はそのようなものとして保護を主張するが、保護請求される組み合わせからの１つまたは複数の特徴は、いくつかの場合において当該組み合わせから除去されてもよく、さらに、保護請求される組み合わせは、サブ組み合わせまたはサブ組み合わせの変形を指してもよい。 Although the present invention includes a large number of specific implementation details, these details should not be construed as limiting any of the scope of the disclosure or the scope of the claimed protection, but are used primarily to describe the features of certain disclosed specific embodiments. Some features described in multiple embodiments of the present invention may be implemented in combination in a single embodiment. Conversely, various features described in a single embodiment may be implemented separately in multiple embodiments or in any suitable subcombination. Also, although features may play a role in several combinations as described above and are initially claimed as such, one or more features from the claimed combination may in some cases be removed from the combination, and the claimed combination may refer to a subcombination or a variation of the subcombination.

類似的に、図面に特定の順番で操作が描かれたが、これらの操作が示された特定の順番で実行されまたは順に実行されまたは全ての例示の操作が実行されて所望の結果を得ることを要求するとして理解されるべきではない。いくつかの場合に、マルチタスクおよび並行処理は、有利である可能性がある。また、上記実施例における各種のシステムモジュールとユニットの分離は、全ての実施例においてこのような分離を必要とすると理解されるべきではない。さらに、理解できるように、記述されるプログラムユニットおよびシステムは、通常、単一のソフトウェア製品に統合されてもよく、または複数のソフトウェア製品としてパッケージ化されてもよい。 Similarly, although operations are depicted in a particular order in the figures, this should not be understood as requiring that these operations be performed in the particular order or sequence shown, or that all of the illustrated operations be performed to achieve the desired results. In some cases, multitasking and parallel processing may be advantageous. Also, the separation of various system modules and units in the above examples should not be understood as requiring such separation in all embodiments. Moreover, as will be appreciated, the program units and systems described may typically be integrated into a single software product or packaged as multiple software products.

このように、テーマの特定実施例が記述されている。他の実施例は、添付する特許請求の範囲の範囲内に含まれる。いくつかの場合において、特許請求の範囲に記載の行為は、異なる順番で実行され、且つ依然として所望の結果を得ることができる。また、図面に描かれた処理が必ずしも示された特定の順番または連続順番で所望の結果を得るとは限らない。いくつかの実施形態において、マルチタスク処理および並行処理は、有利である可能性がある。 Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Also, the processes depicted in the figures do not necessarily achieve desirable results in the particular order or sequential order shown. In some embodiments, multitasking and parallel processing may be advantageous.

上述したのは、本発明の１つまたは複数の実施例の好適な実施例に過ぎず、本発明の１つまたは複数の実施例を制限するためのものではない。本発明の１つまたは複数の実施例の精神および原則内でなされたいかなる変更、等価置換、改良なども、本発明の１つまたは複数の実施例の保護範囲内に含まれるべきである。 The above are merely preferred embodiments of one or more embodiments of the present invention, and are not intended to limit the one or more embodiments of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of one or more embodiments of the present invention should be included within the protection scope of one or more embodiments of the present invention.

２１人体画像
２２第１深層ニューラルネットワークブランチ
３１キーポイント
４１人体裏面テクスチャ
４２人体正面テクスチャ
４３三次元モデル
４４三次元人体モデル
５１三次元人体モデル
５２人体骨格構造
６１第２深層ニューラルネットワークブランチ
６２局所画像
７１正面画像
７２第１分割マスク
７３正面テクスチャ
７４第２分割マスク
７５テクスチャ生成ネットワーク
８１支援人体画像
８２正面テクスチャ
８３第３サンプル分割マスク
８４第４サンプル分割マスク
９１人体画像
９２正面テクスチャ
９３背景画像
２２１グローバル特徴サブネットワーク
２２２第１フィッティングサブネットワーク
６１１局所特徴サブネットワーク
６１２第２フィッティングサブネットワーク
１００１全体再構成モジュール
１００２局所再構成モジュール
１００３融合処理モジュール
１００４テクスチャ再構成モジュール
１００５モデルトレーニングモジュール 21 Human body image 22 First deep neural network branch 31 Key point 41 Human body back texture 42 Human body front texture 43 Three-dimensional model 44 Three-dimensional human body model 51 Three-dimensional human body model 52 Human body skeletal structure 61 Second deep neural network branch 62 Local image 71 Front image 72 First segmentation mask 73 Front texture 74 Second segmentation mask 75 Texture generation network 81 Support human body image 82 Front texture 83 Third sample segmentation mask 84 Fourth sample segmentation mask 91 Human body image 92 Front texture 93 Background image 221 Global feature sub-network 222 First fitting sub-network 611 Local feature sub-network 612 Second fitting sub-network 1001 Global reconstruction module 1002 Local reconstruction module 1003 Fusion processing module 1004 Texture reconstruction module 1005 Model training module

第５態様は、コンピュータプログラムを提供する。当該コンピュータプログラムは、コンピュータ可読命令を含み、前記コンピュータ可読命令がプロセッサによって実行されると、本発明のいずれかの実施例に記載の方法が実施される。

A fifth aspect provides a computer program comprising computer readable instructions which, when executed by a processor, perform a method according to any embodiment of the present invention.

Claims

performing human body geometry reconstruction based on the human body image of the target human body to obtain a 3D mesh model of the target human body;
performing local geometric reconstruction on a local portion of the target human body based on a human body image of the target human body to obtain a three-dimensional mesh model of the local portion;
Fusing the 3D mesh model of the local region with the 3D mesh model of the target human body to obtain an initial 3D model;
and reconstructing the body texture of the target body based on the initial 3D model and the human body image to obtain a 3D human body model of the target body.

The step of performing human body geometry reconstruction based on the human body image of the target human body to obtain a three-dimensional mesh model of the target human body includes:
performing 3D reconstruction on the human body image of the target human body through a first deep neural network branch to obtain a first human body model;
performing 3D reconstruction on a local image in the human body image through a second deep neural network branch to obtain a second human body model;
fusing the first human body model and the second human body model to obtain a fused human body model;
performing a meshing process on the fused human body model to obtain a three-dimensional mesh model of the target human body;
The method of claim 1 , wherein the local images include local regions of the target body.

the first deep neural network branch includes a global feature sub-network and a first fitting sub-network, and the second deep neural network branch includes a local feature sub-network and a second fitting sub-network;
The step of performing 3D reconstruction on the human body image of the target human body via the first deep neural network branch to obtain a first human body model includes: performing feature extraction on the human body image via the global feature sub-network to obtain first image features; and obtaining the first human body model based on the first image features via the first fitting sub-network;
3. The method of claim 2, wherein the step of performing 3D reconstruction on a local image in the human body image via the second deep neural network branch to obtain a second human body model includes: performing feature extraction on the local image via the local feature sub-network to obtain second image features; and obtaining the second human body model via the second fitting sub-network based on the second image features and intermediate features output from the first fitting sub-network.

The step of performing local geometric reconstruction on a local portion of the target human body based on a human body image of the target human body to obtain a three-dimensional mesh model of the local portion includes:
performing feature extraction on the human body image of the target human body to obtain a third image feature;
The method of any one of claims 1 to 3, further comprising: identifying a three-dimensional mesh model of the local area based on the third image feature and a three-dimensional topological template of the local area.

The step of fusing the 3D mesh model of the local region and the 3D mesh model of the target human body to obtain an initial 3D model includes:
obtaining a plurality of key points of the local region based on a human body image of the target human body;
Identifying information of first model key points corresponding to the plurality of key points in a three-dimensional mesh model of the target human body, and identifying information of second model key points corresponding to the plurality of key points in a three-dimensional mesh model of the local region;
The method according to any one of claims 1 to 4, further comprising: fusing the 3D mesh model of the local area with a 3D mesh model of the target body based on information of the first model keypoints and information of the second model keypoints to obtain the initial 3D model.

Fusing the 3D mesh model of the local region with the 3D mesh model of the target human body based on the information of the first model keypoints and the information of the second model keypoints to obtain the initial 3D model,
determining a coordinate transformation relationship between the 3D mesh model of the target human body and the 3D mesh model of the local region based on information of the first model key points and information of the second model key points;
transforming the 3D mesh model of the local region into a coordinate system of the 3D mesh model of the target human body based on the coordinate transformation relationship;
and fusing the 3D mesh model of the local area with a 3D mesh model of the target body in the transformed coordinate system to obtain the initial 3D model.

The human body image includes a front texture of the target human body and a background image;
The step of reconstructing a body texture of the target human body based on the initial 3D model and the human body image to obtain a 3D human body model of the target human body includes:
performing a human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask, and a front texture of the target human body;
inputting the front texture, the first segmentation mask, and the second segmentation mask into a texture generation network to obtain a rear texture of the target human body;
and obtaining a textured three-dimensional human body model corresponding to the target human body based on the back texture and the front texture;
7. The method of claim 1, wherein the first segmentation mask corresponds to a mask area of the front texture and the second segmentation mask corresponds to a mask area of the back texture of the target body.

Training the texture generation network includes:
performing body segmentation on an image of a body sample in a training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask, and a front texture of the body sample;
A process of training an assisting texture generation network based on a front texture of the human body in the assisting human body image obtained by reducing the resolution of the image of the human body sample, a third sample division mask, and a fourth sample division mask;
After the training of the auxiliary texture generation network is completed, a process of training the texture generation network based on the front texture of the human body sample, the first sample segmentation mask, and the second sample segmentation mask;
The first sample division mask corresponds to a mask area of a front texture of the human body sample, and the second sample division mask corresponds to a mask area of a back texture of the human body sample;
The third sample division mask corresponds to a mask area of a front texture of a human body in the supporting human body image, and the fourth sample division mask corresponds to a mask area of a back texture of a human body in the supporting human body image;
The method of claim 7 , wherein the network parameters of the texture generating network include network parameters of at least a portion of the supporting texture generating network that has completed training.

The local area of the target body is a face of the target body, and/or
The method according to any one of claims 1 to 8, wherein the human body image is an RGB image.

obtaining a human body skeleton structure of the target human body when performing human body geometry reconstruction based on a human body image of the target human body;
The method according to any one of claims 1 to 9, further comprising: after a three-dimensional human body model of the target human body is obtained, determining a skinning weight for driving the three-dimensional human body model based on the three-dimensional human body model and the human body skeletal structure.

a global reconstruction module for performing human body geometric reconstruction based on the human body image of a target human body to obtain a three-dimensional mesh model of the target human body;
a local reconstruction module for performing local geometric reconstruction on a local portion of the target human body based on a human body image of the target human body to obtain a three-dimensional mesh model of the local portion;
a fusion processing module for fusing the 3D mesh model of the local region and the 3D mesh model of the target human body to obtain an initial 3D model;
a texture reconstruction module for reconstructing the body texture of the target body based on the initial 3D model and the human body image, and obtaining a 3D human body model of the target body.

When obtaining the 3D mesh model of the target human body, the overall reconstruction module specifically includes:
Perform 3D reconstruction on the human body image of the target human body through a first deep neural network branch to obtain a first human body model;
Performing 3D reconstruction on a local image in the human body image via a second deep neural network branch to obtain a second human body model;
Fusing the first human body model and the second human body model to obtain a fused human body model;
A meshing process is performed on the fused human body model to obtain a three-dimensional mesh model of the target human body;
The apparatus of claim 11 , wherein the local image includes a local region of the target body.

Specifically, the local reconstruction module:
Perform feature extraction on the human body image of the target human body to obtain a third image feature;
13. The apparatus according to claim 11 or 12, adapted to identify a 3D mesh model of the local area based on the third image feature and a 3D topological template of the local area.

The fusion processing module specifically includes:
Obtaining a plurality of key points of the local region based on a human body image of the target human body;
Identifying information of first model key points corresponding to the plurality of key points in a three-dimensional mesh model of the target human body, and identifying information of second model key points corresponding to the plurality of key points in a three-dimensional mesh model of the local region;
The apparatus according to any one of claims 11 to 13, characterized in that it is used to fuse a 3D mesh model of the local area with a 3D mesh model of the target body based on information of the first model keypoints and information of the second model keypoints to obtain the initial 3D model.

The fusion processing module is configured to fuse the 3D mesh model of the local region with the 3D mesh model of the target body based on the information of the first model keypoints and the information of the second model keypoints to obtain the initial 3D model, specifically:
Identifying a coordinate transformation relationship between the 3D mesh model of the target human body and the 3D mesh model of the local region based on the information of the first model keypoints and the information of the second model keypoints;
Transforming the 3D mesh model of the local region into a coordinate system of the 3D mesh model of the target body based on the coordinate transformation relationship;
The apparatus according to claim 14, further comprising: a 3D mesh model of the local area in a transformed coordinate system, the 3D mesh model being adapted to fuse with a 3D mesh model of the target body to obtain the initial 3D model.

Specifically, the texture reconstruction module:
Performing a human body segmentation on the human body image to obtain a first segmentation mask, a second segmentation mask, and a front texture of the target human body;
Input the front texture, the first segmentation mask, and the second segmentation mask into a texture generation network to obtain a rear texture of the target human body;
to obtain a textured three-dimensional human body model corresponding to the target human body based on the back texture and the front texture;
16. The apparatus of claim 11, wherein the first segmentation mask corresponds to a mask area of the front texture and the second segmentation mask corresponds to a mask area of a back texture of the target body.

The 3D human body reconstruction apparatus further comprises a model training module for training the texture generation network;
The model training module specifically includes:
Performing body segmentation on an image of a human body sample in the training sample image set to obtain a first sample segmentation mask, a second sample segmentation mask, and a front texture of the human body sample;
Training an assisting texture generation network based on the front texture of the human body in the assisting human body image obtained by reducing the resolution of the image of the human body sample, a third sample segmentation mask, and a fourth sample segmentation mask;
After the training of the auxiliary texture generation network is completed, the texture generation network is trained based on the front texture of the human body sample, the first sample segmentation mask, and the second sample segmentation mask;
The apparatus of claim 16, wherein the first sample division mask corresponds to a mask region of a front texture of the human body sample, the second sample division mask corresponds to a mask region of a back texture of the human body sample, the third sample division mask corresponds to a mask region of a front texture of the human body in the supporting human body image, and the fourth sample division mask corresponds to a mask region of a back texture of the human body in the supporting human body image, and the network parameters of the texture generation network include at least a portion of the network parameters of the supporting texture generation network that has completed training.

1. An electronic device comprising:
A memory and a processor,
11. An electronic device, characterized in that the memory is adapted to store computer readable instructions and the processor is adapted to implement the method according to any one of claims 1 to 10 by invoking the computer readable instructions.

A computer-readable storage medium storing a computer program,
A computer-readable storage medium, comprising a computer program, the computer program being executed by a processor to perform the method according to any one of claims 1 to 10.

A computer program product comprising a computer program,
A computer program product, characterized in that the method according to any one of claims 1 to 10 is performed when the computer program is executed by a processor.