CN115375847A

CN115375847A - Material recovery method, three-dimensional model generation method and model training method

Info

Publication number: CN115375847A
Application number: CN202211029148.9A
Authority: CN
Inventors: 吴进波; 刘星; 赵晨; 丁二锐; 吴甜; 王海峰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-08-25
Filing date: 2022-08-25
Publication date: 2022-11-22
Anticipated expiration: 2042-08-25
Also published as: CN115375847B

Abstract

The disclosure provides a material recovery method, a three-dimensional model generation method, a model training method, a device, equipment and a medium, relates to the field of artificial intelligence, in particular to the technical fields of augmented reality, virtual reality, computer vision, deep learning and the like, and can be applied to scenes such as the meta universe. The concrete implementation scheme of the material recovery method is as follows: generating a mesh model of the target object from the voxel data for the target object; determining pixel positions of pixel points corresponding to each grid on a material chartlet aiming at a target object in the grid model; and inputting the grid model and the pixel position into a material estimation network to obtain a material map for the target object.

Description

Material recovery method, three-dimensional model generation method and model training method

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of augmented reality, virtual reality, computer vision, deep learning, and the like, and can be applied to scenes such as the meta universe.

Background

With the development of computer technology and network technology, virtual reality and augmented reality technologies and the like have been rapidly developed. In virtual reality and augmented reality techniques, reconstruction of objects is often required. Reconstruction of the object requires material information relating to the object. Typically, the reconstructed three-dimensional model is manually populated with material information according to artistic design.

Disclosure of Invention

The present disclosure is directed to a material recovery method based on deep learning, a three-dimensional model generation method, and a material recovery model training method, apparatus, device, and medium, so as to generate a three-dimensional model based on recovered material information, and enable the generated three-dimensional model to be applied to a conventional rendering engine.

According to an aspect of the present disclosure, there is provided a material quality recovery method, including: generating a mesh model of the target object from the voxel data for the target object; determining pixel positions of pixel points corresponding to each grid on a material chartlet aiming at a target object in the grid model; and inputting the grid model and the pixel position into a material estimation network to obtain a material map for the target object.

According to another aspect of the present disclosure, there is provided a method of generating a three-dimensional model, including: generating a material map for the target object according to the voxel data for the target object; and generating a three-dimensional model of the target object according to the material mapping and the grid model of the target object, wherein the material mapping is obtained by adopting the material recovery method provided by the disclosure, and the grid model is generated according to the voxel data.

According to another aspect of the present disclosure, there is provided a training method of a material recovery model, wherein the material recovery model includes a material estimation network; the training method comprises the following steps: generating a mesh model of the target object from an original image including the target object and a camera pose for the original image; determining pixel positions of pixel points corresponding to each grid in the grid model on the original image; inputting the grid model and the pixel position into a material estimation network to obtain a material chartlet aiming at a target object; rendering to obtain a target image comprising a target object according to the material mapping, the grid model and the camera pose; and training the material estimation network according to the difference between the target image and the original image.

According to an aspect of the present disclosure, there is provided a material recovering device including: a model generation module for generating a mesh model of the target object from the voxel data for the target object; the pixel position determining module is used for determining the pixel positions of pixel points corresponding to each grid in the grid model on the material chartlet aiming at the target object; and the map obtaining module is used for inputting the grid model and the pixel position into the material estimation network to obtain a material map aiming at the target object.

According to another aspect of the present disclosure, there is provided a three-dimensional model generation apparatus including: the map generating module is used for generating a material map for the target object according to the voxel data for the target object; and the model generation module is used for generating a three-dimensional model of the target object according to the material mapping and the grid model of the target object, wherein the material mapping is obtained by adopting the material recovery device provided by the disclosure, and the grid model is generated according to the voxel data.

According to another aspect of the present disclosure, there is provided a training apparatus for a material recovery model, wherein the material recovery model includes a material estimation network, the training apparatus including: a model generation module for generating a mesh model of the target object from an original image including the target object and a camera pose for the original image; the pixel position determining module is used for determining the pixel positions of pixel points corresponding to each grid in the grid model on the original image; the map obtaining module is used for inputting the grid model and the pixel position into a material estimation network to obtain a material map aiming at the target object; the first image rendering module is used for rendering a target image comprising a target object according to the material chartlet, the grid model and the camera pose; and the training module is used for training the material estimation network according to the difference between the target image and the original image.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform at least one of the following methods provided by the present disclosure: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform at least one of the following methods provided by the present disclosure: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement at least one of the following methods provided by the present disclosure: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a material recovery method, a three-dimensional model training method, and a material recovery model training method and apparatus according to an embodiment of the disclosure;

FIG. 2 is a schematic flow chart diagram illustrating a texture restoration method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a principle of generating a mesh model of a target object from voxel data, according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of a method of generating a three-dimensional model according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart diagram illustrating a method for training a texture restoration model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a training material recovery model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a training material recovery model according to another embodiment of the present disclosure;

FIG. 8 is a block diagram of a texture restoration apparatus according to an embodiment of the present disclosure;

fig. 9 is a block diagram of a three-dimensional model generation apparatus according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of a training apparatus for a texture retrieval model according to an embodiment of the present disclosure; and

FIG. 11 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the construction of three-dimensional models, material restoration is an essential step. Artistic designs can often be relied upon to manually add material information to each mesh in a three-dimensional model. The method for adding the material information has the problems of time and labor consumption.

For example, a neural rendering based approach may also be employed to recover the material information. However, the neural rendering-based method generally generates an image at a predetermined viewing angle, and cannot output a mesh model and a texture map, so that a conventional three-dimensional rendering engine cannot be used to render the image. Thus, the use scenario of the method is greatly limited.

Based on this, the present disclosure aims to provide a material recovery method, a three-dimensional model generation method, and a material recovery model training method that can generate a material map for a three-dimensional model so that the three-dimensional model can be applied to a conventional rendering engine.

The following first explains the terms related to the present disclosure:

the three-dimensional rendering engine is a set of algorithms for abstracting various objects in reality in the form of various curves or polygons and outputting final images through a computer.

The neural rendering is a general term of various methods for synthesizing images through a depth network, and the aim of the neural rendering is to realize all or part of functions of modeling and rendering in the image rendering.

Image rendering is a process of converting three-dimensional light energy transfer processing into a two-dimensional image. The work to be completed in image rendering is as follows: and performing geometric transformation, projection transformation, perspective transformation and window clipping on the three-dimensional model, and generating an image according to the acquired material and shadow information.

The signed Distance field, sign Distance Function, SDF, also known as the oriented Distance Function, is used to determine the Distance of a point to a boundary of an area over a limited area in space and to define the Sign of the Distance. If a point is located outside the zone boundary, the sign distance is positive, if the point is located inside the zone boundary, the sign distance is negative, and if the point is located on the zone boundary, the sign distance is 0. For example, the distance between any point in space and the boundary of a region can be represented by f (x), where the value of f (x) constitutes the field, i.e., the symbolic distance field.

Rasterization is the process of converting a primitive into a two-dimensional image. Each point on the two-dimensional image contains color, depth and texture data. The purpose of rasterization is to find the pixels covered by a geometric element, such as a triangle. Rasterization can determine how many pixel points are needed to form the triangle according to the position of the vertex of the triangle, and information such as UV coordinates and the like which should be obtained by each pixel point is obtained by interpolating vertex data. In other words, rasterization is the process of rendering geometric data on a display device through a series of transformations, ultimately converting to pixels. Each three-dimensional model is defined by vertices and triangular faces formed by the vertices. When a three-dimensional model is drawn on a screen, a process of filling each pixel (grid) covered by each triangular surface according to three vertexes of the triangular surface is called rasterization.

The UV spreads and all images are represented by a two-dimensional plane, with U in the horizontal direction and V in the vertical direction. By means of a two-dimensional UV coordinate on the plane, a pixel on the image can be located. The UV coordinate is the abbreviation of the U and V texture mapping coordinates, the information of the position of each pixel point on the image is defined, and each pixel point is mutually connected with the three-dimensional model so as to determine the position of the surface point corresponding to each pixel point in the texture mapping. Each pixel point on the image is accurately corresponding to the surface of the three-dimensional model of the object, and the image smooth interpolation processing is carried out on the gap position between the corresponding positions of two adjacent pixel points on the surface of the three-dimensional model by software, so that the UV chartlet can be obtained. The process of creating a UV map is called UV unfolding.

The voxel is an abbreviation of Volume element (Volume Pjxel), and a solid containing the voxel can be represented by solid rendering or a polygonal isosurface with a given threshold contour extracted. Voxel data is stored in a three-dimensional array, which may also be considered as being stored as a three-dimensional texture.

Mesh is a polygonal Mesh, which is a data structure used for modeling various irregular objects in computer graphics, and a triangular patch among patches of the polygonal Mesh is a minimum unit for segmentation, and is widely used because the triangular patch is relatively simple and flexible in representation and convenient in topology description, and Mesh often refers to a triangular patch.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a material recovery method, a three-dimensional model training method, and a material recovery model training method and device according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functionality, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so on.

The electronic device 110 may, for example, process the acquired image 120 to obtain voxel data representing the target object in the image 120. The electronic device 110 may further generate a texture map of the target object from the obtained voxel data, and combine the texture map and the mesh model of the target object to obtain a three-dimensional model 130 of the target object including texture information.

The electronic device 110 may process the image 120 using a deep learning based voxelization model, for example, to obtain voxel data of the target object. The voxel data may be pre-acquired and stored in a database local to the electronic device 110 or coupled to the electronic device 110 to be retrieved from the local or database when the three-dimensional model 130 of the target object needs to be generated.

The mesh model of the target object may be, for example, a triangular patch model, that is, the mesh model is formed by splicing a plurality of triangular patches, and each mesh in the mesh model is a triangular patch. It is to be understood that, in addition to the triangular patch model, the mesh model may also be a patch model including any number of corners, such as a four-corner patch model, where each mesh is a patch, and this disclosure does not limit this.

In one embodiment, the electronic device 110 may generate a material map of the target object using a material recovery model, for example. After the texture map and the grid model are obtained, an image of the target object at any view angle can be obtained through a rendering engine.

In an embodiment, as shown in fig. 1, the application scenario 100 may further include a server 150, and the server 150 may be, for example, a background management server supporting the running of the client application in the electronic device 110. The electronic device 110 may be communicatively coupled to the server 150 via a network, which may include wired or wireless communication links.

For example, the server 150 may train the material recovery model based on the plurality of images and send the trained material recovery model 140 to the electronic device 110 in response to a request from the electronic device 110, so that the electronic device 110 generates material information of the target object according to the material recovery model 140. It is to be understood that the plurality of images used in training the material recovery model are images that include the target object. When multiple target objects exist, then a material recovery model may be trained for each target object.

In an embodiment, the electronic device 110 may also send the acquired voxel data to the server 150, and the server 150 processes the voxel data by using the trained material recovery model, so as to obtain a material map of the target object, and generate a three-dimensional model of the target object based on the material map and the mesh model.

The material recovery method and the three-dimensional model generation method provided by the present disclosure may be executed by the electronic device 110, or may be executed by the server 150. Accordingly, the material recovery apparatus and the three-dimensional model generation apparatus provided in the present disclosure may be provided in the electronic device 110, or may be provided in the server 150. The training method of the material recovery model provided by the present disclosure may be performed by the server 150. Accordingly, the training device of the material recovery model provided by the present disclosure may be disposed in the server 150.

It should be understood that the number and type of electronic devices 110 and servers 150 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 150, as desired for an implementation.

The material recovery method provided by the present disclosure will be described in detail below with reference to fig. 2 to 3.

As shown in FIG. 2, the texture restoration method 200 of this embodiment may include operations S210-S230.

In operation S210, a mesh model of the target object is generated according to the voxel data for the target object.

According to an embodiment of the present disclosure, the voxel data for the target object may represent, for example, a cubic structure of a predetermined size. For example, the voxel data for the target object may constitute a tensor of N × N to represent a cubic structure of size N × N. Each element in the tensor represents one voxel of data. It is understood that N is a positive integer.

According to embodiments of the present disclosure, the surface rendering concept based on iso-surfaces may be employed to derive the mesh model. Firstly, each voxel data is processed in turn, and an equivalent surface patch contained in each voxel data is determined. The patches of all voxels are then stitched together to form the surface of the entire target object. Wherein, the isosurface refers to that if the voxel data is regarded as a sampling set related to a certain physical property in a certain space region, and sampling values on non-sampling points and adjacent sampling points are estimated through interpolation, all point sets with a certain property in the space region are defined as one or more curved surfaces, which are called as the isosurface. Specifically, for example, six surfaces of boundary voxel data in the voxel data may be used to fit an isosurface, or an isosurface extraction algorithm (MC), a mobile tetrahedron subdivision algorithm (MT), or the like may be used to generate a mesh model from the voxel data.

In operation S220, a pixel position of a pixel point corresponding to each grid in the grid model on the texture map for the target object is determined.

According to the embodiment of the disclosure, the pixel positions of the pixel points corresponding to each grid in the grid model on the material chartlet of the target object can be obtained by performing UV expansion on the grid model. For example, any one of the following UV unfolding algorithms or tools may be employed to UV unfold the mesh model: a Blender tool, a RizomUV tool, an MVS-texting algorithm, etc.

It can be understood that, in operation S220, for example, the three-dimensional coordinates of the vertices of each mesh may be converted into the two-dimensional coordinates of the image coordinate system according to the preset virtual camera position and view direction based on the conversion relationship between the world coordinate system and the camera coordinate system, and the camera coordinate system and the image coordinate system, pixel points corresponding to the three two-dimensional coordinates obtained according to the coordinate values of the three vertices of each mesh are connected to form a triangular region, and the pixel points included in the triangular region are used as the pixel points corresponding to each mesh.

In operation S230, the mesh model and the pixel position are input into a material estimation network, and a material map for the target object is obtained.

According to an embodiment of the present disclosure, the material map may be represented by a two-dimensional matrix, for example, where each element in the two-dimensional matrix represents material information of one pixel. The material information may include at least one of Diffuse reflection color information (Diffuse), roughness information (Roughness), and metal information (Metallic), for example. Setting the material information to include the three information, each element in the two-dimensional matrix may be represented by quintuple data, and the quintuple data is respectively represented by: the reflection ratio, roughness and metallization degree of the three colors of R, G and B. It will be appreciated that the texture map may be output by a texture estimation network.

According to an embodiment of the present disclosure, the material estimation network may be constituted by a Multi Layer Perceptron (MLP), for example. It is understood that the material estimation network may also be any deep neural network. The texture estimation network learns the mapping relation between each position point of the target object and the texture information through training. In this embodiment, the grid model and the pixel positions of the pixels corresponding to the grids are input to the material estimation network, so that the material estimation network can obtain the material information of the grids in the grid model according to the learned mapping relationship, and the material information is corresponding to the corresponding pixels according to the pixel positions of the pixels corresponding to the grids, so as to obtain the material chartlet.

According to the texture mapping method and device, the texture mapping is generated by adopting the texture estimation network constructed based on the deep learning technology, and the accuracy of the obtained texture information can be improved. Furthermore, because the material map and the mesh model are generated, the method of the embodiment of the disclosure can be applied to a scene in which the image rendering of the target object is performed by adopting a traditional rendering engine, and the robustness of the material recovery method provided by the disclosure can be improved.

The implementation of operation S210 described above will be further expanded and defined below in conjunction with fig. 3.

FIG. 3 is a schematic diagram of a principle of generating a mesh model of a target object from voxel data according to an embodiment of the disclosure.

According to the embodiment of the present disclosure, a mesh model of a target object may be generated from voxel data based on a deep learning method, thereby improving the accuracy of the generated mesh model.

For example, the embodiment may set an area surrounded by the surface mesh of the mesh model as a limited area in space, and learn a symbolic distance between each sampling point in the space and an area boundary of the limited area based on the deep neural network. Thus, the embodiment may determine the plurality of spatial sampling points according to the voxel data, and then based on the deep neural network, may estimate a symbolic distance of the plurality of spatial sampling points with respect to the target object, that is, a symbolic distance between the plurality of spatial sampling points and a boundary of an area surrounded by the mesh model.

Specifically, as shown in fig. 3, the embodiment 300 may take voxel data 310 of a target object as an input of a deep neural network 320, and output a symbolic distance 330 of each of a plurality of spatial sampling points for the target object by the deep neural network 320. Wherein, the central point of the volume element represented by each voxel data can be used as the spatial sampling point determined by each voxel data.

The deep neural network 320 may be formed of MLP, for example. Trained, the deep neural network 320 can learn the symbolic distance field between a spatial sample point and a target object. Alternatively, the deep neural network 320 may be a fully connected network, which is not limited by this disclosure.

In one embodiment, in training the deep neural network 320, the deep neural network 320 is used to estimate the symbol distance of each sampling point on a certain ray (corresponding to a certain pixel in the sample image) from the virtual camera center and the feature of each sampling point. The symbol distance of each sample point and the characteristic of each sample point are input into the MLP of the estimated color, and the color information of each sample point can be estimated from the MLP of the estimated color. The deep neural network 320 is optimized by the L1 loss between the color information and the true color information of the corresponding pixel points in the sample image. To make the network more robust in weak texture regions, the deep neural network 320 can also be optimized by adding a priori constraints on geometry.

According to an embodiment of the present disclosure, after obtaining the symbolic distances 330 of the plurality of spatial sampling points determined by the voxel data, which are respectively specific to the target object, the embodiment 300 may generate the mesh model 360 of the target object by using the iso-surface extraction algorithm 350 according to the plurality of spatial sampling points and the symbolic distances 330.

For example, the embodiment may first determine, as the target sampling point 340, a sampling point having a symbol distance of 0 among the plurality of sampling points determined from the voxel data, according to the symbol distance 330. The target sampling points 340 are surface points of the mesh model. Then, according to the target sampling points 340, an iso-surface extraction algorithm 350 is used to generate a three-dimensional patch model, which is a mesh model 360 of the target object.

Based on the material recovery method provided by the present disclosure, the present disclosure further provides a method for generating a three-dimensional model, which will be described in detail below with reference to fig. 4.

Fig. 4 is a flow chart diagram of a method of generating a three-dimensional model according to an embodiment of the disclosure.

As shown in fig. 4, the generation method 400 of the three-dimensional model of this embodiment may include operations S410 to S420.

In operation S410, a material map for the target object is generated according to the voxel data for the target object.

The operation S410 may employ the material recovery method described above to generate a material map according to the voxel data, which is not described herein again. It is understood that in the process of generating the texture map, a mesh model of the target object, i.e., a three-dimensional patch model of the target object, may be obtained.

In operation S420, a three-dimensional model of the target object is generated according to the texture map and the mesh model of the target object.

According to the embodiment, the material information at the pixel position of the corresponding pixel point is assigned to each grid according to the pixel position of the pixel point corresponding to each grid on the material chartlet in the grid model of the target object, so that the three-dimensional model of the target object is obtained. The three-dimensional model may represent material information of each surface point on the three-dimensional surface, in addition to the three-dimensional surface of the target object.

To facilitate implementation of the above-described material recovery method, the present disclosure also provides a training method of a material recovery model, which includes the above-described material estimation network. The training method of the texture restoration model will be described in detail below with reference to fig. 5.

FIG. 5 is a flowchart illustrating a method for training a texture restoration model according to an embodiment of the disclosure.

As shown in FIG. 5, the training method 500 of the texture retrieval model of this embodiment includes operations S510-S550.

In operation S510, a mesh model of a target object is generated from an original image including the target object and a relative pose with respect to the original image.

According to an embodiment of the present disclosure, the original image may be any image acquired in advance as a training sample, which is not limited in this disclosure.

According to the embodiment of the disclosure, the relative pose of the original image can be calculated according to the original image by using, for example, a simultaneous localization and mapping (SLAM) algorithm. For example, the original image may be a plurality of images, and the embodiment may set an image captured in a predetermined camera pose among the plurality of images as a reference image, and set other images than the reference image into an image pair with the reference image. And then extracting the feature points of each image in the image pair, establishing a matching relation between the feature points of the two images in the image pair, and calculating the relative camera pose between the two images according to the matching relation between the feature points of the two images. And obtaining the camera pose of each image in the other images except the reference image according to the preset camera position and the relative camera pose.

In the embodiment, according to the camera pose of the original image, a plurality of virtual rays passing through pixel points in the original image are led out from the position of the virtual camera, and the plurality of virtual rays are sampled to obtain a plurality of spatial sampling points. And taking the plurality of spatial sampling points as the central points of the plurality of volume elements to obtain a plurality of voxel data. Subsequently, a mesh model of the target object may be generated from the plurality of voxel data, using principles similar to operation S210 described above.

In operation S520, a pixel position of a pixel point corresponding to each mesh in the mesh model on the original image is determined.

This embodiment may employ a principle similar to that described above in operation S220 to obtain a correspondence between each mesh in the mesh model and a pixel point on the original image, thereby obtaining a pixel position.

In an embodiment, the grid model may be rasterized to obtain pixel positions of each grid in the grid model in the pixel coordinate system.

In operation S530, the mesh model and the pixel positions are input into a material estimation network, and a material map for the target object is obtained.

According to an embodiment of the present disclosure, the implementation principle of operation S530 is similar to the implementation principle of operation S230 described above, and is not described herein again.

In operation S540, a target image including the target object is rendered according to the material map, the mesh model, and the camera pose.

According to an embodiment of the present disclosure, the embodiment may employ a Physical Based Rendering (PBR) technique to render the target image. The attributes used by the PBR may include diffuse, roughnesss, metallic, normal, and the like. Wherein normal is a normal of each grid in the grid model. It will be appreciated that the above-described rendering techniques are merely examples to facilitate understanding of the present disclosure, and that the present disclosure may also employ other rendering techniques to render the target image. According to the embodiment, the camera pose is considered when the target image is generated, so that the observation angle of the target object in the generated target image is consistent with the observation angle of the target object in the original image, the number of variables during difference determination can be reduced, and the training precision and efficiency are improved.

In operation S550, a texture estimation network is trained according to a difference between the target image and the original image.

In the embodiment, pixel-wise loss between a target image and an original image is calculated, and network parameters in a material estimation network are adjusted with the aim of minimizing the pixel-wise loss, so that the training of the material estimation network is realized.

According to the method and the device for predicting the texture mapping, the texture estimation network is trained according to the difference between the images, so that the accuracy of predicting the texture mapping by the texture estimation network can be improved. Based on the method, the fidelity of the generated three-dimensional model of the target object can be improved, the reality of a virtual reality or augmented reality scene is improved, and the user experience is improved.

The principles of training the texture restoration model will be further extended and defined below in conjunction with FIG. 6.

FIG. 6 is a schematic diagram of a training texture restoration model according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, as shown in fig. 6, in this embodiment 600, the material recovery model further includes a deep neural network 610 for estimating a symbol distance in addition to the material estimation network 620. The mesh model of the target object may be generated using a deep neural network 610 for estimating symbol distances.

Thus, when training the material quality restoration model, for example, a plurality of spatial sampling points 603 corresponding to pixel points in the original image may be obtained by sampling according to the original image 601 and the camera pose 602 for the original image, that is, a plurality of spatial sampling points are obtained by collecting spatial points from a ray leading from the optical center position of the virtual camera to the pixel points in the original image.

Subsequently, the voxel data obtained based on the plurality of spatial sampling points is processed by using the deep neural network 610, so as to obtain the symbolic distance 604 of the plurality of spatial sampling points to the target object. The principle of obtaining the symbol distance 604 is similar to the principle of obtaining the symbol distance in the above description of fig. 3, and is not described herein again.

An iso-surface extraction algorithm may then be employed to generate a mesh model 605 of the target object based on the spatial sample points and the symbolic distances. The principle of generating the mesh model 605 is similar to that of generating the mesh model described above with respect to fig. 3, and is not described herein again.

After obtaining the mesh model, the embodiment may obtain the pixel positions 606 of the pixel points corresponding to each mesh on the original image in the mesh model 605 by using the operation S520 described above. Subsequently, the pixel locations 606 and the mesh model 605 are input into the material estimation network 620, and a material map 607 for the target object may be output by the material estimation network 620. After the material map 607 is obtained, a target image 608 may be rendered according to the material map 607, the mesh model 605, and the camera pose 602.

After obtaining the target image 608, the embodiment may calculate inter-pixel loss between the target image 608 and the original image 601, and train the texture estimation network 620 according to the loss.

In an embodiment, after mesh model 605 is obtained, a reference image 609 including the target object may also be rendered, for example, based on mesh model 605 and camera pose 602. After the reference image 609 is obtained, the deep neural network 610 may be trained based on the difference between the reference image 609 and the original image 601.

Therein, the mesh model 605 may be projected onto an image plane, e.g. according to the camera pose 602, resulting in a reference image 609. It is understood that a rendering engine may also be employed to render an image of the mesh model 605 for the camera pose 602 as the reference image 609, and the present disclosure is not limited to the method of generating the reference image 609. Therein, the reference image 609 may also be rendered in combination with the environment map for the original image.

It will be appreciated that the difference between the reference image 609 and the original image 601 may be represented, for example, by inter-pixel loss. This embodiment may train the deep neural network 610 with the goal of minimizing the difference between the reference image 609 and the original image 601.

In one embodiment, the deep neural network 610 may be trained using the original image as a training sample, and the texture estimation network 620 may be trained using the original image if the loss of the deep neural network 610 converges. This is due to the generally low accuracy requirements of the deep neural network 610 that generates the generative lattice model. Alternatively, the deep neural network 610 and the texture estimation network 620 may be trained simultaneously.

The embodiment of the disclosure trains the deep neural network for estimating the symbol distance, so that the grid model can be generated based on a deep learning method, thereby generating a high-quality grid model.

Operation S550 described above will be further expanded and defined below in conjunction with fig. 7 to 8.

FIG. 7 is a schematic diagram illustrating training a texture estimation network according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, when training the material estimation network, for example, a mask image for the target object may be generated first, and the original image and the target image may be processed based on the mask image to compare only for an image portion of the target object when determining a difference between the original image and the target image. Therefore, the influence of the environmental information on the training of the material estimation network can be avoided, the precision of the material estimation network is improved, and the training efficiency is improved. This is because the material estimation network estimates only the material information of the target object.

As shown in fig. 7, after the mesh model 701 is obtained by the operation S510 described above, the embodiment 700 may render a reference image 703 including the target object according to the mesh model 701 and the camera pose 702. The principle of rendering the reference image 703 is similar to that described above, and is not described herein again.

The embodiment 700 may generate a mask image 704 for the target object according to the position of the target object in the reference image 703. The mask image 704 may be a binary image, and the mask value of the region corresponding to the target object is 1, and the mask values of the other regions are 0.

After obtaining the mask image 704, the embodiment 700 can use the mask image to mask the target image 705 to obtain a masked first image 707, and use the mask image to mask the original image 706 to obtain a masked second image 708. For example, the masking process may be implemented by multiplying the mask value of each pixel in the mask image by the pixel value of the corresponding pixel in the target image 705 or the original image 706.

After obtaining the first image 707 and the second image 708, the embodiment may train the texture estimation network 710 according to the difference between the two images.

In an embodiment, the color gradient of each pixel point in the original image may also be used as a supervision signal to train the material estimation network. This is because the color gradient and the material gradient of each pixel point generally tend to be the same, and the difference of the materials can be reflected by the color generally. Therefore, the embodiment can train the material estimation network from multiple dimensions, and is beneficial to improving the training efficiency and precision of the material estimation network, thereby reducing the material seams in the generated material mapping.

For example, when the material recovery model is trained, the Color Gradient (Color Gradient) of each pixel point in the original image may be determined in real time, or the Color Gradient may be obtained from a storage space in which the Color Gradient is stored in advance. After the texture map is obtained through the processing of the texture estimation network, the texture gradient of each pixel point in the texture map can be determined. It will be appreciated that the color gradient and the material gradient may be determined, for example, by invoking a preset gradient interface.

After the color gradient and the material gradient are obtained, the loss of the material estimation network can be determined according to the corresponding relation between the pixel points in the original image and the pixel points in the material map and the relative difference between the color gradient of the first pixel point in the original image and the material gradient of the second pixel point corresponding to the first pixel point in the material map. The texture estimation network is then trained with the goal of minimizing the loss.

In an embodiment, when determining the color gradient and the material gradient, in order to facilitate calculation, the color gradient of each pixel point in the width direction and the height direction of the original image may be determined respectively, so as to obtain a color gradient in the horizontal direction and a color gradient in the vertical direction. Similarly, the material gradient in the horizontal direction and the material gradient in the vertical direction of each pixel point in the material map can be determined. And taking the weighted sum of a first relative difference between the horizontal color gradient of the first pixel point and the horizontal material gradient of the second pixel point and a second relative difference between the vertical color gradient of the first pixel point and the vertical material gradient of the second pixel point as the loss of the material estimation network. The weight used in calculating the weighted sum may be set according to actual requirements, which is not limited in this disclosure. The present disclosure may improve the accuracy and computational efficiency of the determined loss by decomposing the gradient into two directions to determine the loss.

In one embodiment, the horizontal direction may be taken as the first direction and the vertical direction as the second direction. Setting the color gradient of the first direction of the first pixel point as a first gradient, setting the color gradient of the second direction of the first pixel point as a second gradient, setting the material gradient of the first direction of the second pixel point as a third gradient, and setting the material gradient of the second direction of the second pixel point as a fourth gradient. When the material estimation network is trained, for example, a first gradient weight for a first direction may be determined according to a first gradient of a first pixel; and determining a second gradient weight aiming at a second direction according to the second gradient of the first pixel point. For example, the first gradient weight may be negatively correlated with the first gradient and the second gradient weight may be negatively correlated with the second gradient. Subsequently, this embodiment may determine a weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight. The embodiment may train the texture estimation network based on the weighted sum, using the weighted sum as a loss of the texture estimation network. By the embodiment, when the material gradient is large and the color gradient is small, the loss of the material estimation network is large, and the material estimation network is trained according to the loss, so that the change trend of the material gradient and the color gradient tends to be consistent, the precision of the trained material estimation network can be improved, and the accuracy of the obtained material chartlet is improved.

In one embodiment, the relationship between the gradient weight and the gradient can be expressed by an exponential function, for example, setting the first gradient to Idx, the first gradient weight of the first direction _dx The following equation (1) can be used to express that, similarly, the first gradient is set to Idy, and the second gradient weight in the second direction is set to Idy _dy Can be expressed by the following formula (2):

weight _dx ＝e ^-λIdx formula (1);

weight _dy ＝e ^-λIdy equation (2).

Wherein, lambda is a hyper-parameter and can be set according to actual requirements.

In one embodiment, the texture information may include at least two of the following information: diffuse reflectance color information, roughness information, and metal degree information. When determining the material gradient, the embodiment may calculate the gradient for each material information, resulting in at least two gradients for at least two information. Subsequently, the embodiment may determine the material gradient of the second pixel in the material map according to the at least two gradients of the second pixel. For example, the weighted sum of at least two gradients may be used as the material gradient of the second pixel, and the weight during weighting may be set according to actual requirements. It can be understood that the above-mentioned manner of determining the weighted sum of at least two gradients as the material gradient is only used as an example to facilitate understanding of the disclosure, and the disclosure may also use, for example, an average value of at least two gradients, etc. as the material gradient of the second pixel point, which is not limited by the disclosure.

In an embodiment, a loss value of the material estimation network may be calculated by the above-described method according to the material gradient for each material information, and a sum of at least two loss values calculated from at least two material information may be used to represent the loss of the material estimation network. It is to be understood that the material gradient of each material information may include the above-described gradient in the first direction and the above-described gradient in the second direction.

Based on the material recovery method provided by the present disclosure, the present disclosure also provides a material recovery device, which will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a texture restoration apparatus according to an embodiment of the present disclosure.

As shown in FIG. 8, the texture restoration apparatus 800 of this embodiment may include a model generation module 810, a pixel position determination module 820, and a map obtaining module 830.

The model generation module 810 may be used to generate a mesh model of a target object from voxel data for the target object. In an embodiment, the model generating module 810 may be configured to perform the operation S210 described above, which is not described herein again.

The pixel position determining module 820 is configured to determine pixel positions of pixel points corresponding to each grid in the grid model on the texture map for the target object. In an embodiment, the pixel position determining module 820 may be configured to perform the operation S220 described above, which is not described herein again.

The map obtaining module 830 is configured to input the mesh model and the pixel position into a material estimation network, and obtain a material map for the target object. In an embodiment, the map obtaining module 830 may be configured to perform the operation S230 described above, which is not described herein again.

According to an embodiment of the present disclosure, the model generating module 810 includes: the symbolic distance obtaining sub-module is used for processing the voxel data by adopting a deep neural network to obtain symbolic distances of a plurality of spatial sampling points aiming at the target object respectively, wherein the plurality of spatial sampling points are determined according to the voxel data; and the model generation submodule is used for generating a grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of spatial sampling points and the symbolic distance.

Based on the method for generating the three-dimensional model provided by the present disclosure, the present disclosure also provides a device for generating the three-dimensional model, which will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of a three-dimensional model generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the generation apparatus 900 of the three-dimensional model of this embodiment may include a map generation module 910 and a model generation module 920.

The map generation module 910 is configured to generate a material map for the target object according to the voxel data for the target object. Wherein, the material chartlet is obtained with the material recovery device that this disclosure provided. In an embodiment, the map generating module 910 may be configured to perform the operation S410 described above, which is not described herein again.

The model generation module 920 is configured to generate a three-dimensional model of the target object according to the texture map and the mesh model of the target object. Wherein the mesh model is generated from the voxel data. In an embodiment, the model generating module 920 may be configured to perform the operation S420 described above, which is not described herein again.

Based on the training method of the material recovery model provided by the present disclosure, the present disclosure also provides a training apparatus of the material recovery model, which will be described in detail below with reference to fig. 10.

FIG. 10 is a block diagram of a training apparatus for a texture restoration model according to an embodiment of the present disclosure.

As shown in fig. 10, the training apparatus 1000 for a material restoration model of this embodiment may include a model generation module 1010, a pixel position determination module 1020, a map obtaining module 1030, a first image rendering module 1040, and a training module 1050. Wherein the material recovery model comprises the material estimation network.

The model generation module 1010 is configured to generate a mesh model of the target object from an original image including the target object and a camera pose for the original image. In an embodiment, the model generating module 1010 may be configured to perform the operation S510 described above, which is not described herein again.

The pixel position determining module 1020 is configured to determine pixel positions of pixel points corresponding to each grid in the grid model on the original image. In an embodiment, the pixel position determining module 1020 may be configured to perform the operation S520 described above, which is not described herein again.

The map obtaining module 1030 is configured to input the mesh model and the pixel position into a material estimation network, and obtain a material map for the target object. In an embodiment, the map obtaining module 1030 may be configured to perform the operation S530 described above, which is not described herein again.

The first image rendering module 1040 is configured to render a target image including a target object according to the material map, the mesh model, and the camera pose. In an embodiment, the first image rendering module 1040 may be configured to perform the operation S540 described above, and is not described herein again.

The training module 1050 is configured to train the material estimation network according to a difference between the target image and the original image. In an embodiment, the training module 1050 may be configured to perform the operation S550 described above, which is not described herein again.

According to an embodiment of the present disclosure, the apparatus 1000 may further include: the second image rendering module is used for rendering to obtain a reference image comprising the target object according to the grid model and the camera pose; and the mask image generation module is used for generating a mask image aiming at the target object according to the position of the target object in the reference image. The training module 1050 may include: the mask processing submodule is used for respectively performing mask processing on the target image and the original image by adopting a mask image to obtain a first image and a second image after the mask processing; and the training submodule is used for training the material estimation network according to the difference between the first image and the second image.

According to an embodiment of the present disclosure, the apparatus 1000 may further include: and the material gradient determining module is used for determining the material gradient of each pixel point in the material mapping. The training module 1050 may also be configured to train the material estimation network according to the color gradient of the first pixel point in the original image and the material gradient of the second pixel point corresponding to the first pixel point in the material map.

According to an embodiment of the present disclosure, the color gradient includes a first gradient in a first direction and a second gradient in a second direction; the material gradient comprises a third gradient in the first direction and a fourth gradient in the second direction; the first direction and the second direction are perpendicular to each other. The training module 1050 may include: the gradient weight determining submodule is used for respectively determining a first gradient weight aiming at a first direction and a second gradient weight aiming at a second direction according to a first gradient and a second gradient of the first pixel point; the weighting submodule is used for determining the weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight; and the training submodule is used for training the material estimation network according to the weighted sum.

According to an embodiment of the present disclosure, each pixel point in the texture map includes at least two pieces of information among the following texture information: diffuse reflectance color information, roughness information, and metal degree information. The gradient determining module may include: the first determining submodule is used for determining the gradient of each information in at least two pieces of information of each pixel point in the material mapping to obtain at least two gradients; and the second determining submodule is used for determining the material gradient of each pixel point in the material mapping according to the at least two gradients.

According to an embodiment of the present disclosure, the material recovery model further includes a deep neural network for estimating a symbol distance. The model generation module 1010 may include: the sampling submodule is used for sampling to obtain a plurality of space sampling points corresponding to pixel points in the original image according to the original image and the camera pose; the symbolic distance determining submodule is used for processing the voxel data obtained based on the plurality of spatial sampling points by adopting a deep neural network to obtain the symbolic distances of the plurality of spatial sampling points to the target object respectively; and the model generation submodule is used for generating a grid model of the target object by adopting an isosurface extraction algorithm according to the space sampling point and the symbolic distance.

According to an embodiment of the present disclosure, the apparatus 1000 may further include: and the third image rendering module is used for rendering to obtain a reference image comprising the target object according to the grid model and the camera pose. The training module 1050 may also be configured to: and training the deep neural network according to the difference between the reference image and the original image.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users all conform to the regulations of related laws and regulations, and necessary security measures are taken without violating the good customs of the public order. In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement the methods of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as at least one of the following methods: a material recovery method, a three-dimensional model generation method and a material recovery model training method. For example, in some embodiments, at least one of the following methods: the material restoration method, the generation method of the three-dimensional model, and the training method of the material restoration model may be implemented as computer software programs that are tangibly embodied on a machine-readable medium, such as the storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When loaded into RAM 1103 and executed by the computing unit 1101, the computer program may perform one or more steps of at least one of the following methods described above: a material recovery method, a three-dimensional model generation method and a material recovery model training method. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform at least one of the following methods: a material recovery method, a three-dimensional model generation method and a material recovery model training method.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of material recovery comprising:

generating a mesh model of a target object from voxel data for the target object;

determining pixel positions of pixel points corresponding to each grid in the grid model on the material chartlet aiming at the target object; and

and inputting the grid model and the pixel position into a material estimation network to obtain a material chartlet aiming at the target object.

2. The method of claim 1, wherein the generating a mesh model of a target object from voxel data for the target object comprises:

processing the voxel data by adopting a deep neural network to obtain the symbol distance of each spatial sampling point aiming at the target object; wherein the plurality of spatial sampling points are determined from the voxel data; and

and generating a grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of spatial sampling points and the symbolic distance.

3. A method of generating a three-dimensional model, comprising:

generating a material map for a target object according to voxel data for the target object; and

generating a three-dimensional model of the target object based on the texture map and the mesh model of the target object,

wherein the texture map is obtained by the method according to any one of claims 1 to 2; the mesh model is generated from the voxel data.

4. A training method of a material recovery model is provided, wherein the material recovery model comprises a material estimation network; the method comprises the following steps:

generating a mesh model of a target object from an original image comprising the target object and a camera pose for the original image;

determining pixel positions of pixel points corresponding to each grid in the grid model on the original image;

inputting the grid model and the pixel position into a material estimation network to obtain a material chartlet aiming at the target object;

rendering to obtain a target image comprising the target object according to the material mapping, the grid model and the camera pose; and

and training the material estimation network according to the difference between the target image and the original image.

5. The method of claim 4, further comprising:

rendering a reference image comprising the target object according to the grid model and the camera pose; and

generating a mask image for the target object according to the position of the target object in the reference image;

wherein the training the material estimation network according to the difference between the target image and the original image comprises:

respectively performing mask processing on the target image and the original image by using the mask image to obtain a first image and a second image after the mask processing; and

training the material estimation network according to the difference between the first image and the second image.

6. The method of claim 4, further comprising:

determining the material gradient of each pixel point in the material chartlet; and

and training the material estimation network according to the color gradient of a first pixel point in the original image and the material gradient of a second pixel point corresponding to the first pixel point in the material map.

7. The method of claim 6, wherein the color gradient comprises a first gradient in a first direction and a second gradient in a second direction; the material gradient comprises a third gradient in the first direction and a fourth gradient in the second direction; the first direction and the second direction are perpendicular to each other; the training the material estimation network according to the color gradient of a first pixel point in the original image and the material gradient of a second pixel point corresponding to the first pixel point in the material mapping comprises:

according to the first gradient and the second gradient of the first pixel point, respectively determining a first gradient weight aiming at the first direction and a second gradient weight aiming at the second direction;

determining a weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight; and

and training the material estimation network according to the weighted sum.

8. The method of claim 6, wherein each pixel in the texture map comprises at least two of the following texture information: diffuse reflection color information, roughness information, and metal degree information; the determining the material gradient of each pixel point in the material map comprises:

determining the gradient of each information of the at least two information of each pixel point in the texture mapping to obtain at least two gradients; and

and determining the material gradient of each pixel point in the material chartlet according to the at least two gradients.

9. The method of claim 4, wherein the material recovery model further comprises a deep neural network for estimating a symbol distance; said generating a mesh model of a target object from an original image comprising the target object and a relative pose for the original image comprises:

sampling according to the original image and the camera pose to obtain a plurality of spatial sampling points corresponding to pixel points in the original image;

processing voxel data obtained based on the plurality of spatial sampling points by adopting the deep neural network to obtain the symbol distances of the plurality of spatial sampling points to the target object; and

10. The method of claim 9, further comprising:

and training the deep neural network according to the difference between the reference image and the original image.

11. A material recovery device comprising:

a model generation module for generating a mesh model of a target object from voxel data for the target object;

a pixel position determining module, configured to determine pixel positions of pixel points corresponding to each grid in the grid model on the material chartlet for the target object; and

and the map obtaining module is used for inputting the grid model and the pixel position into a material estimation network to obtain a material map aiming at the target object.

12. The apparatus of claim 11, wherein the model generation module comprises:

the symbol distance obtaining sub-module is used for processing the voxel data by adopting a deep neural network for estimating the symbol distance to obtain the symbol distance of each space sampling point aiming at the target object; wherein the plurality of spatial sampling points are determined from the voxel data; and

and the model generation submodule is used for generating a grid model of the target object by adopting an isosurface extraction algorithm according to the plurality of spatial sampling points and the symbolic distance.

13. An apparatus for generating a three-dimensional model, comprising:

the map generating module is used for generating a material map aiming at a target object according to voxel data aiming at the target object; and

a model generation module for generating a three-dimensional model of the target object according to the material mapping and the mesh model of the target object,

wherein the texture map is obtained using the apparatus of any one of claims 11 to 12; the mesh model is generated from the voxel data.

14. A training device of a material recovery model, wherein the material recovery model comprises a material estimation network; the device comprises:

a model generation module to generate a mesh model of a target object from an original image including the target object and a camera pose for the original image;

a pixel position determining module, configured to determine pixel positions of pixel points corresponding to each grid in the grid model on the original image;

a map obtaining module, configured to input the grid model and the pixel position into a material estimation network to obtain a material map for the target object;

a first image rendering module, configured to render a target image including the target object according to the material map, the grid model, and the camera pose; and

and the training module is used for training the material estimation network according to the difference between the target image and the original image.

15. The apparatus of claim 14, further comprising:

a second image rendering module, configured to render a reference image including the target object according to the grid model and the camera pose; and

the mask image generation module is used for generating a mask image aiming at the target object according to the position of the target object in the reference image;

wherein the training module comprises:

the mask processing submodule is used for respectively performing mask processing on the target image and the original image by adopting the mask image to obtain a first image and a second image after the mask processing; and

and the training submodule is used for training the material estimation network according to the difference between the first image and the second image.

16. The apparatus of claim 14, further comprising:

the material gradient determining module is used for determining the material gradient of each pixel point in the material chartlet;

the training module is further used for training the material estimation network according to the color gradient of a first pixel point in the original image and the material gradient of a second pixel point corresponding to the first pixel point in the material chartlet.

17. The apparatus of claim 16, wherein the color gradient comprises a first gradient in a first direction and a second gradient in a second direction; the material gradient comprises a third gradient in the first direction and a fourth gradient in the second direction; the first direction and the second direction are perpendicular to each other; the training module comprises:

a gradient weight determining submodule, configured to determine, according to the first gradient and the second gradient of the first pixel point, a first gradient weight for the first direction and a second gradient weight for the second direction, respectively;

the weighting submodule is used for determining the weighted sum of the third gradient and the fourth gradient of the second pixel point according to the first gradient weight and the second gradient weight; and

and the training submodule is used for training the material estimation network according to the weighted sum.

18. The apparatus of claim 16, wherein each pixel in the texture map comprises at least two of the following texture information: diffuse reflection color information, roughness information, and metal degree information; the gradient determination module comprises:

the first determining submodule is used for determining the gradient of each information of the at least two pieces of information of each pixel point in the material mapping to obtain at least two gradients; and

and the second determining submodule is used for determining the material gradient of each pixel point in the material mapping according to the at least two gradients.

19. The apparatus of claim 14, wherein the material recovery model further comprises a deep neural network for estimating a symbol distance; the model generation module includes:

the sampling submodule is used for sampling to obtain a plurality of space sampling points corresponding to pixel points in the original image according to the original image and the camera pose;

the symbolic distance determining submodule is used for processing the voxel data obtained based on the plurality of spatial sampling points by adopting the deep neural network to obtain the symbolic distances of the plurality of spatial sampling points to the target object; and

20. The apparatus of claim 19, further comprising:

a third image rendering module for rendering a reference image comprising the target object according to the mesh model and the camera pose,

wherein the training module is further configured to: and training the deep neural network according to the difference between the reference image and the original image.

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of claims 1 to 10.

23. A computer program product comprising computer program/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor implement the steps of the method according to any one of claims 1 to 10.