CN114677292A

CN114677292A - High-resolution material recovery method based on two image inverse rendering neural network

Info

Publication number: CN114677292A
Application number: CN202210217527.4A
Authority: CN
Inventors: 沈旭昆; 李志强; 胡勇
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-06-28
Anticipated expiration: 2042-03-07
Also published as: CN114677292B

Abstract

The embodiment of the present disclosure discloses a high-resolution material restoration method based on an inverse rendering neural network of two images. A specific implementation of the method includes: photographing to obtain an initial flash map and an initial guide map; performing segmentation processing on the initial flash map and the initial guide map; block; perform segmentation processing to obtain flash tile set and guide tile set; filter from flash tile set and guide tile set to obtain flash sub-tile set and guide sub-tile set; perform flash sub-tile set Rearrange to obtain a set of heavy lighting main tiles; generate a main material map group; based on each guide sub-tile, main material map group and guiding main tile in the guide sub-tile set, obtain a sub-material map group set; The set of sub-material map groups is spliced to obtain a high-resolution material map group. This implementation achieves the effect of reducing the consumption of video memory.

Description

High-resolution material restoration method based on two-image inverse rendering neural network

技术领域technical field

本公开的实施例涉及高分辨率材质恢复技术领域，具体涉及基于两张图像逆渲染神经网络的高分辨率材质恢复方法。Embodiments of the present disclosure relate to the technical field of high-resolution material restoration, and in particular, to a high-resolution material restoration method based on an inverse rendering neural network for two images.

背景技术Background technique

真实感图形学在电子游戏和电影等领域有广泛应用。它的主要目标是借助计算机强大的计算能力，通过算法模拟真实世界中光照与场景的交互过程，最终产生具有高度真实感的图像，其中材质表达这个交互过程。材质表观建模用于表示光、材质和几何之间相互作用的复杂光照效果，这个交互作用效果在物理学中需要使用一个高维的数学函数来表示。对于不透明物体，这个复杂的过程由六维空间变化双向反射分布函数(SVBRDF)定义。这种函数的高维特性导致使用手持设备重建逼真的表观是一个亟待解决的技术难点。Photorealistic graphics has a wide range of applications in fields such as video games and movies. Its main goal is to use the powerful computing power of computers to simulate the interaction process of lighting and scenes in the real world through algorithms, and finally generate a highly realistic image, in which materials express this interaction process. Material appearance modeling is used to represent the complex lighting effects of the interaction between light, material and geometry. This interaction effect needs to be represented in physics using a high-dimensional mathematical function. For opaque objects, this complex process is defined by a six-dimensional spatially varying bidirectional reflectance distribution function (SVBRDF). The high-dimensional nature of this function makes it an urgent technical difficulty to reconstruct realistic appearances using handheld devices.

现有的恢复材质的方法重建出较差的表观导致渲染出的平面材质表面外观有较多的伪影，当重建高分辨率材质时，渲染出的平面效果极差，不能达到人眼视觉可接受的光照效果。此外，现有恢复材质的方法需要拍摄超过两张的图像，增加了拍摄的复杂度，且对显存的要求很高。The existing method of restoring the material reconstructs a poor appearance, which results in more artifacts in the surface appearance of the rendered plane material. When the high-resolution material is reconstructed, the rendered plane effect is extremely poor and cannot reach human vision. acceptable lighting effects. In addition, the existing method for recovering materials needs to shoot more than two images, which increases the complexity of shooting and requires high video memory.

发明内容SUMMARY OF THE INVENTION

本公开的内容部分用于以简要的形式介绍构思，这些构思将在后面的具体实施方式部分被详细描述。本公开的内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征，也不旨在用于限制所要求的保护的技术方案的范围。This summary of the disclosure serves to introduce concepts in a simplified form that are described in detail in the detailed description that follows. The content section of this disclosure is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

本公开的一些实施例提出了基于两张图像逆渲染神经网络的高分辨率材质恢复方法，来解决以上背景技术部分提到的技术问题中的一项或多项。Some embodiments of the present disclosure propose a high-resolution material restoration method based on an inverse rendering neural network of two images to solve one or more of the technical problems mentioned in the above background art section.

控制移动终端进行图像拍摄，得到初始闪光灯图和初始引导图，其中，上述初始闪光灯图是上述移动终端在打开闪光灯时拍摄的图像，上述初始引导图是上述移动终端在未打开闪光灯时的环境光下拍摄的图像；对上述初始闪光灯图和上述初始引导图进行分割处理，得到闪光灯图和引导图；从上述闪光灯图和上述引导图中分别选择图像区域，作为闪光灯主图块和引导主图块，其中，上述闪光灯主图块和上述引导主图块在上述闪光灯图和上述引导图中的图像区域的位置是相匹配的；对上述闪光灯图和上述引导图进行分割处理，得到闪光灯图块集合和引导图块集合；从上述闪光灯图块集合和上述引导图块集合中随机筛选闪光灯图块和引导图块作为闪光灯副图块和引导副图块，得到闪光灯副图块集合和引导副图块集合；基于上述引导副图块集合，对上述闪光灯副图块集合进行重排列，得到重光照主图块集合；基于上述闪光灯图、上述重光照主图块集合和上述闪光灯主图块，生成主材质图组；基于上述引导副图块集合中的每个引导副图块、上述主材质图组和上述引导主图块，生成副材质图组，得到副材质图组集合；对上述副材质图组集合中的各个副材质图组进行拼接，得到高分辨率材质图组。Controlling the mobile terminal to capture images to obtain an initial flash map and an initial guide map, wherein the initial flash map is an image taken by the mobile terminal when the flash is turned on, and the initial guide map is the ambient light of the mobile terminal when the flash is not turned on. The above-mentioned initial flash map and the above-mentioned initial guide map are divided into the above-mentioned initial flash map and the above-mentioned initial guide map are divided to obtain the flash map and guide map; the image areas are respectively selected from the above-mentioned flash map and the above-mentioned guide map, as the flash master tile and the guide master tile , wherein the positions of the flash main image block and the guide main image block in the image regions of the flash image and the guide image are matched; the flash image and the guide image are segmented to obtain a flash image block set and a guide tile set; randomly filter flash tiles and guide tiles from the above flash tile set and the above guide tile set as flash sub-tiles and guiding sub-tiles to obtain flash sub-tiles and guiding sub-tiles set; based on the above-mentioned guide sub-tile set, rearrange the above-mentioned flash sub-tile set to obtain a re-illumination main tile set; material map group; based on each guide sub-tile in the above-mentioned guide sub-tile set, the above-mentioned main material map group and the above-mentioned guiding main tile, generate a sub-material map group, and obtain a sub-material map group set; Each sub-material map group in the group set is spliced to obtain a high-resolution material map group.

大多数现实世界的物体都是由稳态空间分布一类的材质(例如，stationary材质)组合而成，这种stationary材质表现出不同位置具有相同或相似材质和纹理相似的特性，即stationary特性。例如，木材、一些金属、织物、皮革、人造材料等。关注足够自相似的stationary材质样本，这些样本具有不同位置出现高相似小邻域的特性。由于同一材质在不同空间位置具有不同的入射光方向和视点方向，这些不同的反射能为丢失的特征和歧义性提供线索，因此，结合这些不同的反射能改善表面外观重建的逼真度。Most real-world objects are composed of materials with a class of stationary spatial distribution (for example, stationary materials), which exhibit the same or similar materials and textures at different locations, that is, stationary properties. For example, wood, some metals, fabrics, leather, man-made materials, etc. Focus on sufficiently self-similar stationary material samples that have the property of having highly similar small neighborhoods at different locations. Since the same material has different incident light directions and viewpoint directions at different spatial locations, these different reflections can provide clues to missing features and ambiguities, so combining these different reflections can improve the fidelity of surface appearance reconstructions.

虽然一些早期的方法能通过可控光源和视点进行密集采样来恢复逼真的表面外观，但这个恢复材质的过程需要极端的采集条件、很长的耗时和高昂的设备。另外一些方法可以通过使用消费级相机尽可能准确地确定材质，但它们仍然需要拍摄大量图片，增加了采集图片和重建材质过程的复杂性。近年来，深度学习促进了从单张图片恢复材质的发展。然而，对于stationary材质，这些方法没有利用不同位置有相同材质属性的特性恢复材质从而重建更加逼真的表面外观，因此它们恢复出较差的材质导致渲染出的物体有较多的伪影，而且它们以重建低分辨率材质为主要目的，而当重建高分辨率材质时，由于感受野很小，这些训练模型往往产生更差的结果，导致渲染出的效果极差，不能达到人眼视觉可接受的表面外观。增加输入图像的数量能为重建过程提供用于解决缺失或歧义特性的更多线索，然而这些方法没有利用stationary特性，导致不能从更少的图像恢复合理的材质，同时增加了采集图片的复杂度。而且，当直接使用这些方法实现高分辨率材质的重建时，图片数量和分辨率的增加会极大地加大显存的消耗，导致普通用户使用一般配置的计算机不能完成此类任务。While some early methods can recover realistic surface appearance through dense sampling of controllable light sources and viewpoints, this process of recovering materials requires extreme acquisition conditions, long time-consuming and expensive equipment. Other methods can determine the material as accurately as possible by using a consumer-grade camera, but they still require taking a lot of images, adding complexity to the process of capturing images and reconstructing materials. In recent years, deep learning has facilitated the development of texture recovery from a single image. However, for stationary materials, these methods do not take advantage of the properties of having the same material properties in different locations to restore the material to reconstruct a more realistic surface appearance, so they restore poor materials resulting in rendered objects with more artifacts, and they The main purpose of reconstructing low-resolution materials is to reconstruct high-resolution materials. Due to the small receptive field, these training models often produce worse results, resulting in extremely poor rendering effects that are not acceptable to the human eye. surface appearance. Increasing the number of input images can provide more clues for the reconstruction process to resolve missing or ambiguous features, however these methods do not take advantage of stationary properties, resulting in inability to recover reasonable textures from fewer images and increasing the complexity of acquiring images . Moreover, when these methods are directly used to reconstruct high-resolution materials, the increase in the number and resolution of images will greatly increase the consumption of video memory, making it impossible for ordinary users to complete such tasks with computers with general configuration.

本公开提出了针对stationary材质，利用材质自相似特性从两张图像中确定高分辨率材质的方法。为了降低采集图像的复杂度，本公开拍摄两张图像，提高了表面外观重建的逼真度，并从两张图像中获取更多的信息，支持普通用户完成高分辨率材质的恢复。由于只需拍摄两张图像，降低了图像采集的复杂程度，不仅能确定高分辨率材质，而且由于网络处理低分辨率图，避免了高显存的需要，因此本公开适用于具有普通设备的用户使用，并可以从两张图像中确定更合理的材质以重建逼真的表面外观。The present disclosure proposes a method for determining a high-resolution material from two images using the material self-similarity for stationary materials. In order to reduce the complexity of collecting images, the present disclosure captures two images, improves the fidelity of surface appearance reconstruction, and obtains more information from the two images, so as to support ordinary users to complete the restoration of high-resolution materials. Since only two images need to be taken, the complexity of image acquisition is reduced, high-resolution materials can be determined, and because the network processes low-resolution images, the need for high video memory is avoided, so the present disclosure is suitable for users with ordinary equipment. , and can determine a more plausible material from the two images to reconstruct a realistic surface appearance.

附图说明Description of drawings

结合附图并参考以下具体实施方式，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中，相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的，元件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

图1是根据本公开的基于两张图像逆渲染神经网络的高分辨率材质恢复方法的一些实施例的流程图；1 is a flowchart of some embodiments of a high-resolution material restoration method based on two-image inverse rendering neural network according to the present disclosure;

图2是根据本公开的一些实施例的基于两张图像逆渲染神经网络的高分辨率材质恢复方法的一个应用场景的示意图；2 is a schematic diagram of an application scenario of a high-resolution material restoration method based on two-image inverse rendering neural network according to some embodiments of the present disclosure;

图3是根据本公开的一些实施例的基于两张图像逆渲染神经网络的高分辨率材质恢复方法的另一个应用场景的示意图；3 is a schematic diagram of another application scenario of the high-resolution material restoration method based on two-image inverse rendering neural network according to some embodiments of the present disclosure;

图4是根据本公开的一些实施例的基于两张图像逆渲染神经网络的高分辨率材质恢复方法的又一个应用场景的示意图。FIG. 4 is a schematic diagram of yet another application scenario of the high-resolution material restoration method based on two-image inverse rendering neural network according to some embodiments of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例。相反，提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

另外还需要说明的是，为了便于描述，附图中仅示出了与有关发明相关的部分。在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings. The embodiments of this disclosure and features of the embodiments may be combined with each other without conflict.

需要注意，本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分，并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.

需要注意，本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的，本领域技术人员应当理解，除非在上下文另有明确指出，否则应该理解为“一个或多个”。It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的，而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

下面将参考附图并结合实施例来详细说明本公开。The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

图1是根据本公开的基于两张图像逆渲染神经网络的高分辨率材质恢复方法的一些实施例的流程图。该基于两张图像逆渲染神经网络的高分辨率材质恢复方法，包括以下步骤：FIG. 1 is a flowchart of some embodiments of a high-resolution material restoration method based on two-image inverse rendering neural network according to the present disclosure. The high-resolution material restoration method based on two-image inverse rendering neural network includes the following steps:

步骤101，控制移动终端进行图像拍摄，得到初始闪光灯图和初始引导图。Step 101: Control the mobile terminal to capture images to obtain an initial flash map and an initial guide map.

在一些实施例中，基于两张图像逆渲染神经网络的高分辨率材质恢复方法的执行主体可以通过有线连接方式或者无线连接方式控制移动终端进行图像拍摄，得到初始闪光灯图和初始引导图。其中，上述初始闪光灯图是上述移动终端在打开闪光灯时拍摄的图像，上述初始引导图是上述移动终端在未打开闪光灯时的环境光下拍摄的图像。In some embodiments, the execution subject of the high-resolution material restoration method based on the inverse rendering neural network of two images can control the mobile terminal to capture images through wired connection or wireless connection, and obtain the initial flash map and the initial guide map. The initial flash map is an image captured by the mobile terminal when the flash is turned on, and the initial guide map is an image captured by the mobile terminal under ambient light when the flash is not turned on.

步骤102，对初始闪光灯图和初始引导图进行分割处理，得到闪光灯图和引导图。In step 102, the initial flash map and the initial guide map are segmented to obtain the flash map and the guide map.

在一些实施例中，上述执行主体可以对上述初始闪光灯图和上述初始引导图进行分割处理，得到闪光灯图和引导图。In some embodiments, the execution body may perform segmentation processing on the initial flash map and the initial guide map to obtain the flash map and the guide map.

步骤103，从闪光灯图和引导图中分别选择图像区域，作为闪光灯主图块和引导主图块。Step 103 , respectively selecting image regions from the flash map and the guide map as the flash main image block and the guide main image block.

在一些实施例中，上述执行主体可以从上述闪光灯图和上述引导图中分别选择图像区域，作为闪光灯主图块和引导主图块。其中，上述闪光灯主图块和上述引导主图块在上述闪光灯图和上述引导图中的图像区域的位置是相匹配的。例如，上述闪光灯主图块和上述引导主图块在上述闪光灯图和上述引导图中的图像区域的位置坐标相同。In some embodiments, the above-mentioned executive body may select image regions from the above-mentioned flash map and the above-mentioned guide map, respectively, as the flash main image block and the guide main image block. Wherein, the positions of the flash main block and the guide main block in the image areas of the flash map and the guide map are matched. For example, the positional coordinates of the flash main tile and the guide main tile in the image area in the flash map and the guide map are the same.

步骤104，对闪光灯图和引导图进行分割处理，得到闪光灯图块集合和引导图块集合。In step 104, the flash map and the guide map are segmented to obtain a set of flash blocks and a set of guide blocks.

在一些实施例中，上述执行主体可以对上述闪光灯图和上述引导图进行分割处理，得到闪光灯图块集合和引导图块集合。In some embodiments, the execution body may perform segmentation processing on the flash map and the guide map to obtain a flash tile set and a guide tile set.

步骤105，从闪光灯图块集合和引导图块集合中随机筛选闪光灯图块和引导图块作为闪光灯副图块和引导副图块，得到闪光灯副图块集合和引导副图块集合。Step 105: Randomly filter flash tiles and guide tiles from the flash tile set and guide tile set as flash sub-tiles and guide sub-tiles to obtain a flash sub-tile set and guide sub-tile set.

在一些实施例中，上述执行主体可以从上述闪光灯图块集合和上述引导图块集合中随机筛选闪光灯图块和引导图块作为闪光灯副图块和引导副图块，得到闪光灯副图块集合和引导副图块集合。In some embodiments, the execution subject may randomly select flash tiles and guide tiles from the flash tile set and the guide tile set as flash sub-tiles and guide sub-tiles to obtain the flash sub-tile set and Bootstrap sub-tile collection.

步骤106，基于引导副图块集合，对闪光灯副图块集合进行重排列，得到重光照主图块集合。Step 106 , based on the guide sub-tile set, rearrange the flash sub-tile set to obtain the re-lighting main tile set.

在一些实施例中，上述执行主体可以基于上述引导副图块集合，对上述闪光灯副图块集合进行重排列，得到重光照主图块集合。In some embodiments, the above-mentioned execution body may rearrange the above-mentioned flash light sub-tile set based on the above-mentioned guide sub-tile set to obtain a relighting main tile set.

步骤107，基于闪光灯图、重光照主图块集合和闪光灯主图块，生成主材质图组。Step 107 , based on the flash map, the set of relighting main tiles, and the main flash tiles, generate a main material map group.

在一些实施例中，上述执行主体可以基于上述闪光灯图、上述重光照主图块集合和上述闪光灯主图块，生成主材质图组。其中，材质可以指光和物体相互作用的光照效果，此材质是一组纹理重复的材质图，纹理结构可以是随机的或者重复的。材质也可以表达为六维空间变化双向反射分布函数，它的参数化表达可以表达为多张贴图(diffuse map：漫反射贴图，normal map：法线贴图，roughness map：粗糙度贴图，specular map：镜面反射贴图)。将这些贴图和视点、光源放入渲染方程中可以渲染出图像。In some embodiments, the execution subject may generate a main material map group based on the flash map, the set of relighting main tiles, and the flash main tile. Among them, the material can refer to the lighting effect of the interaction between the light and the object. The material is a set of material maps with repeated textures, and the texture structure can be random or repeated. The material can also be expressed as a six-dimensional space-varying bidirectional reflection distribution function, and its parametric expression can be expressed as a multi-map (diffuse map: diffuse reflection map, normal map: normal map, roughness map: roughness map, specular map: specular map). Putting these maps and viewpoints and lights into the rendering equation renders the image.

在一些实施例的一些可选的实现方式中，上述对上述开发工程信息集合包括的各个开发语言信息进行信息抽取，得到通用命令信息集合，可以包括以下步骤：In some optional implementation manners of some embodiments, the above-mentioned information extraction is performed on each development language information included in the above-mentioned development project information set to obtain a general command information set, which may include the following steps:

第一步，将上述闪光灯图输入至预先训练的材质生成模型，得到初始高分辨率材质图组。The first step is to input the above flash map into the pre-trained material generation model to obtain the initial high-resolution material map group.

第二步，对上述初始高分辨率材质图组进行分割处理，得到初始主材质图组。分割处理可以是图像分割处理。In the second step, the above-mentioned initial high-resolution material map group is segmented to obtain an initial main material map group. The segmentation process may be an image segmentation process.

第三步，基于上述初始主材质图组、上述重光照主图块集合、上述闪光灯主图块和预先训练的自动编码器，得到主材质图组。In the third step, a main material map group is obtained based on the above-mentioned initial main material map group, the above-mentioned re-lighting main tile set, the above-mentioned flash main tile and the pre-trained auto-encoder.

可选地，上述预先训练的自动编码器包括编码器和解码器；以及上述基于上述初始主材质图组、上述重光照主图块集合、上述闪光灯主图块和预先训练的自动编码器，得到主材质图组，可以包括以下步骤：Optionally, the above-mentioned pre-trained auto-encoder includes an encoder and a decoder; and the above-mentioned auto-encoder based on the above-mentioned initial main material map group, the above-mentioned re-lighting main tile set, the above-mentioned flash main tile and the pre-trained auto-encoder is obtained. The main material graph group, which can include the following steps:

第一步，将上述初始主材质图组输入至上述编码器和上述解码器模块，得到初始解码主材质图组。其中，上述解码器模块包括参数优化模块和解码器。In the first step, the above-mentioned initial main material map group is input to the above-mentioned encoder and the above-mentioned decoder module, and an initial decoded main material map group is obtained. Wherein, the above-mentioned decoder module includes a parameter optimization module and a decoder.

第二步，基于上述初始解码主材质图组和上述解码器模块，执行以下逆渲染步骤：The second step is to perform the following inverse rendering steps based on the above-mentioned initial decoded main material map group and the above-mentioned decoder module:

基于上述初始解码主材质图组、上述重光照主图块集合和上述闪光灯主图块，生成联合损失值。响应于确定上述联合损失值收敛于预定阈值，将上述初始解码主材质图组作为主材质图组。响应于确定上述联合损失值不收敛于预定阈值，调整解码器模块中的潜在向量参数，将调整后的解码器模块作为解码器模块，将调整后的潜在向量参数输入到解码器模块以得到初始解码主材质图组，再次执行上述逆渲染步骤。其中，为了能获得潜在空间，并从潜在空间中解码输出材质贴图，该技术使用全卷积自动编码器对这个潜在空间进行建模，该自动编码器由一个编码器E()和一个解码器D()组成，编码器E()将材质转换为相应的潜在空间，解码器D()再将潜在空间转换为材质：z＝E(s)，s＝D(z)。其中，s表征材质。z表征相应的潜在空间。自动编码器使用的损失函数由两部分构成：

其中，L_train表示自动编码器使用的损失函数。L_map表示材质贴图的损失值。L_render表示确定出的材质渲染的9张图取对数后的损失值。A joint loss value is generated based on the above-mentioned initial decoded main texture map group, the above-mentioned re-lighting main tile set, and the above-mentioned flash main tile. In response to determining that the joint loss value has converged to a predetermined threshold, the initially decoded set of primary texture maps is used as the set of primary texture maps. In response to determining that the above joint loss value does not converge to the predetermined threshold, the latent vector parameters in the decoder module are adjusted, the adjusted decoder module is used as the decoder module, and the adjusted latent vector parameters are input to the decoder module to obtain the initial value. Decode the main texture map group and perform the above inverse rendering steps again. Among them, in order to obtain the latent space and decode the output texture map from the latent space, the technique uses a fully convolutional autoencoder to model this latent space, which consists of an encoder E() and a decoder. D(), the encoder E() converts the texture into the corresponding latent space, and the decoder D() converts the latent space into the texture: z=E(s), s=D(z). Among them, s represents the material. z characterizes the corresponding latent space. The loss function used by the autoencoder consists of two parts:

where L _train represents the loss function used by the autoencoder. L _map represents the loss value of the material map. L _render represents the loss value after the logarithm of the 9 images rendered by the determined material.

可选地，上述基于上述初始解码主材质图组、上述重光照主图块集合和上述闪光灯主图块，生成联合损失值，可以包括以下步骤：Optionally, the above-mentioned generating a joint loss value based on the above-mentioned initial decoded main material map group, the above-mentioned re-illumination main tile set, and the above-mentioned flash main tile may include the following steps:

第一步，在与上述重光照主图块集合中各个重光照主图块相同的光照条件下，对上述初始解码主材质图组进行渲染，得到第一渲染图集合。The first step is to render the above initially decoded main material map group under the same lighting conditions as each re-illuminated main tile in the above-mentioned re-illuminated main tile set to obtain a first rendered map set.

第二步，基于上述第一渲染图集合和上述重光照主图块集合，生成第一渲染损失值。In the second step, a first rendering loss value is generated based on the above-mentioned first rendering image set and the above-mentioned relighting main tile set.

第三步，在与上述闪光灯主图块相同的光照条件下，对上述初始解码主材质图组进行渲染，得到第二渲染图。The third step is to render the above-mentioned initial decoded main material image group under the same lighting conditions as the above-mentioned main flash tile, to obtain a second rendered image.

第四步，基于上述第二渲染图和上述闪光灯主图块，生成第二渲染损失值。The fourth step is to generate a second rendering loss value based on the second rendering image and the main flash tile.

第五步，基于上述第一渲染损失值、上述第二渲染损失值、第一预设参数和第二预设参数，得到联合损失值。生成联合损失值的过程可以参见图3。The fifth step is to obtain a joint loss value based on the first rendering loss value, the second rendering loss value, the first preset parameter and the second preset parameter. The process of generating joint loss values can be seen in Figure 3.

可选地，上述基于上述第一渲染损失值、上述第二渲染损失值、第一预设参数和第二预设参数，可以通过以下步骤，得到联合损失值：Optionally, based on the above-mentioned first rendering loss value, the above-mentioned second rendering loss value, the first preset parameter and the second preset parameter, the joint loss value may be obtained through the following steps:

基于上述第一渲染损失值、上述第二渲染损失值、第一预设参数和第二预设参数，利用以下公式，得到联合损失值：Based on the above-mentioned first rendering loss value, the above-mentioned second rendering loss value, the first preset parameter and the second preset parameter, the following formula is used to obtain the joint loss value:

其中，L表示上述联合损失值。α表示上述第一预设参数。R¹表示上述第一渲染图集合中的第一渲染图。i表示序号。

表示上述第一渲染图集合中第i个第一渲染图。I¹表示上述重光照主图块集合中的重光照主图块。

表示上述重光照主图块集合中第i个重光照主图块。β表示上述第二预设参数。R²表示上述第二渲染图。I²表示上述闪光灯主图块。‖‖表示范数。

的取值为上述第一渲染图集合中第i个第一渲染图的像素值减去上述重光照主图块集合中第i个重光照主图块对应位置的像素值，然后取绝对值并相加，最后除以像素的个数得到的值。‖R²-I²‖的取值为上述第二渲染图的像素值减去上述闪光灯主图块对应位置的像素值，然后取绝对值并相加，最后除以像素的个数得到的值。where L represents the above joint loss value. α represents the above-mentioned first preset parameter. R ¹ represents the first rendering in the first rendering set. i represents the serial number.

Indicates the i-th first rendering in the first rendering set. I ¹ represents the relighted main tile in the above-mentioned relighted main tile set.

Indicates the i-th relighted main tile in the above relighted main tile set. β represents the above-mentioned second preset parameter. R ² represents the second rendering above. I ² represents the above-mentioned flash main tile. ‖‖ represents the norm.

The value is the pixel value of the i-th first rendering image in the above-mentioned first rendering image set minus the pixel value of the corresponding position of the i-th re-illumination main tile in the above-mentioned re-illumination main tile set, and then take the absolute value and add Add, and finally divide by the number of pixels to get the value. The value of ‖R ² -I ² ‖ is the pixel value of the second rendering image minus the pixel value of the corresponding position of the main flash block, then the absolute value is taken and added, and finally divided by the number of pixels. .

步骤108，基于引导副图块集合中的每个引导副图块、主材质图组和引导主图块，生成副材质图组，得到副材质图组集合。Step 108 , based on each of the guide sub-tiles, the main material map group and the guiding main tile in the guide sub-tile set, generate a sub-material map group to obtain a sub-material map group set.

在一些实施例中，上述执行主体可以基于上述引导副图块集合中的每个引导副图块、上述主材质图组和上述引导主图块，生成副材质图组，得到副材质图组集合。In some embodiments, the above-mentioned execution body may generate a sub-material graph group based on each guiding sub-tile in the above-mentioned guiding sub-tile set, the above-mentioned main material graph group and the above-mentioned guiding main tile, and obtain a sub-material graph group set .

在一些实施例的一些可选的实现方式中，上述基于上述引导副图块集合中的每个引导副图块、上述主材质图组和上述引导主图块，生成副材质图组，可以包括以下步骤：In some optional implementations of some embodiments, the above-mentioned generating a sub-material map group based on each guiding sub-tile in the above-mentioned guiding sub-tile set, the above-mentioned main material map group, and the above-mentioned guiding main tile may include: The following steps:

第一步，基于上述引导副图块中的每个像素点，生成描述符，得到描述符集合。In the first step, a descriptor is generated based on each pixel in the above-mentioned guiding sub-block, and a descriptor set is obtained.

第二步，基于上述引导主图块中的每个参考像素点，生成参考描述符，得到参考描述符集合。In the second step, a reference descriptor is generated based on each reference pixel point in the above-mentioned guiding main image block, and a reference descriptor set is obtained.

第三步，对于上述描述符集合中的每个描述符，在上述参考描述符集合中确定出与上述描述符对应的参考描述符作为匹配描述符，得到匹配描述符集合。In the third step, for each descriptor in the above-mentioned descriptor set, a reference descriptor corresponding to the above-mentioned descriptor in the above-mentioned reference descriptor set is determined as a matching descriptor to obtain a matching descriptor set.

第四步，基于上述引导副图块中的各个像素点的位置信息和上述匹配描述符集合对应的参考像素点的位置信息，生成重排列映射表。The fourth step is to generate a rearrangement map based on the position information of each pixel in the guide sub-block and the position information of the reference pixel corresponding to the matching descriptor set.

第五步，根据上述重排列映射表和上述引导副图块，生成副材质图组。The fifth step is to generate a sub-material map group according to the above-mentioned rearrangement mapping table and the above-mentioned guide sub-tiles.

可选地，上述基于上述引导副图块中的每个像素点，生成描述符，可以包括以下步骤：Optionally, generating a descriptor based on each pixel in the above-mentioned guiding sub-block may include the following steps:

第一步，分别使用窗口大小为16、标准差为4和窗口大小为32、标准差为8的高斯核，对上述引导副图块做高斯平滑，得到第一匹配副图块、第二匹配副图块。The first step is to use Gaussian kernels with a window size of 16, a standard deviation of 4 and a window size of 32 and a standard deviation of 8 to perform Gaussian smoothing on the above-mentioned guide sub-blocks to obtain the first matching sub-block and the second matching sub-block. Sub block.

第二步，从上述第一匹配副图块、上述第二匹配副图块和上述引导副图块中分别选取33×33、65×65和5×5的红、绿、蓝三个像素邻域，以及使用二进制鲁棒独立的基本特征描述符方法分别从33×33、65×65和5×5的红、绿、蓝三个像素邻域选取128、96和32个点对。其中，二进制鲁棒独立的基本特征描述符方法可以是使用BRIEF G II进行点对的选取。BRIEF G II可以是二进制鲁棒独立的基本特征G II。The second step is to select 33×33, 65×65, and 5×5 red, green, and blue pixel neighbors from the first matching sub-block, the second matching sub-block, and the guiding sub-block, respectively. domain, and select 128, 96, and 32 point pairs from 33×33, 65×65, and 5×5 red, green, and blue pixel neighborhoods, respectively, using a binary robust independent base feature descriptor method. Among them, the binary robust and independent basic feature descriptor method may be point pair selection using Brief G II. BRIEF G II can be a binary robust independent base feature G II.

第三步，基于所选取的128、96和32个点对，确定128位、96位和32位描述符。In the third step, 128-bit, 96-bit and 32-bit descriptors are determined based on the selected 128, 96 and 32 point pairs.

第四步，将所确定的128位、96位和32位描述符相连，得到768位描述符。The fourth step is to connect the determined 128-bit, 96-bit and 32-bit descriptors to obtain a 768-bit descriptor.

可选地，上述基于上述引导主图块中的每个参考像素点，生成参考描述符，可以包括以下步骤：Optionally, generating a reference descriptor based on each reference pixel in the above-mentioned guiding main image block may include the following steps:

第一步，分别使用窗口大小为16、标准差为4和窗口大小为32、标准差为8的高斯核，对上述引导主图块做高斯平滑，得到第一匹配主图块、第二匹配主图块。The first step is to use Gaussian kernels with a window size of 16, a standard deviation of 4 and a window size of 32 and a standard deviation of 8, respectively, to perform Gaussian smoothing on the above-mentioned guiding main tiles to obtain the first matching main tile and the second matching. main tile.

第二步，从上述第一匹配主图块、上述第二匹配主图块和上述引导主图块中分别选取33×33、65×65和5×5的红、绿、蓝三个像素邻域，以及使用二进制鲁棒独立的基本特征描述符方法分别从33×33、65×65和5×5的红、绿、蓝三个像素邻域选取128、96和32个点对。The second step is to select 33×33, 65×65, and 5×5 red, green, and blue pixel neighbors from the first matching main tile, the second matching main tile, and the guiding main tile, respectively. domain, and select 128, 96, and 32 point pairs from 33×33, 65×65, and 5×5 red, green, and blue pixel neighborhoods, respectively, using a binary robust independent base feature descriptor method.

可选地，上述对于上述描述符集合中的每个描述符，在上述参考描述符集合中确定出与上述描述符对应的参考描述符作为匹配描述符，可以包括以下步骤：Optionally, for each descriptor in the above-mentioned descriptor set, determining a reference descriptor corresponding to the above-mentioned descriptor in the above-mentioned reference descriptor set as a matching descriptor may include the following steps:

第一步，确定上述描述符与上述参考描述符集合中的各个参考描述符的汉明距离，得到汉明距离集合。The first step is to determine the Hamming distance between the above-mentioned descriptor and each reference descriptor in the above-mentioned reference descriptor set to obtain a Hamming distance set.

第二步，对上述汉明距离集合中的各个汉明距离按照从小到大的顺序进行排序，得到汉明距离序列。The second step is to sort each Hamming distance in the above Hamming distance set in ascending order to obtain a Hamming distance sequence.

第三步，将上述汉明距离序列中最小的汉明距离对应的参考描述符确定为匹配描述符。In the third step, the reference descriptor corresponding to the smallest Hamming distance in the above Hamming distance sequence is determined as a matching descriptor.

步骤109，对副材质图组集合中的各个副材质图组进行拼接，得到高分辨率材质图组。Step 109 , splicing each sub-material graph group in the sub-material graph group set to obtain a high-resolution material graph group.

在一些实施例中，上述执行主体可以对副材质图组集合中的各个副材质图组进行拼接，得到高分辨率材质图组。In some embodiments, the above-mentioned execution body may splicing each sub-material graph group in the sub-material graph group set to obtain a high-resolution material graph group.

可选地，上述执行主体还可以将上述高分辨率材质图组发送至三维建模设备，以供上述三维建模设备进行三维模型的构建。Optionally, the above-mentioned executive body may also send the above-mentioned high-resolution material map group to a three-dimensional modeling device, so that the three-dimensional modeling device can construct a three-dimensional model.

图2是根据本公开的一些实施例的基于两张图像逆渲染神经网络的高分辨率材质恢复方法的一个应用场景的示意图。FIG. 2 is a schematic diagram of an application scenario of a high-resolution material restoration method based on two-image inverse rendering neural network according to some embodiments of the present disclosure.

为了减少支持高分辨率图像的材质估计，且降低显存消耗，降低拍摄图像的复杂度。通过两张图像，利用材质特性和深度学习的强大学习能力重建高质量的材质。本技术提出利用材质特性，基于深度逆渲染框架，从两张图像(一张可以是手机开闪光灯拍摄的，即初始闪光灯图，一张可以是环境光下拍摄的图，即初始引导图)确定高分辨率材质的流程方法。首先，为了利用结构重复或随机重复的纹理，引入了反射样本传输方法将多个观察结果组合成一个主图块。具体来说，裁剪出两张2048*2048分辨率的图像，并从这两张图像中随机裁剪相同位置的256*256像素作为闪光灯图和引导图的主图块。同时，可以将闪光灯图和引导图分别分成64个256*256像素的图块，并选择32个图块作为副图块。可以将主图块和副图块的材质定义为主材质和副材质。反射样本传输方法能够将副图块转换为粗糙的主图块，这类似于在与副图块的入射光和出射光方向相同的条件下重新渲染主图块，从而得到重光照主图块。其次，可以采用深度逆渲染来恢复主图块的材质。为了初始化自动编码器的潜在向量，利用单张图像作为输入的卷积神经网络，从闪光灯图中得到初始高分辨率材质图组，并从初始高分辨率材质图组中裁剪初始主材质图组作为自动编码器的输入。此外，可以使用重光照主图块集合和闪光灯主图块作为损失函数的输入可以重建更合理的主图块材质。然后，可以使用细化后处理步骤重新为主材质引入细节。最后，可以在使用逆反射样本传输方法从主材质生成所有副材质，拼接这些副材质得到高分辨率材质时，为了去除高分辨率材质中随机伪影和主图块区域的边缘伪影，在逆反射样本传输过程中，可以利用跳跃重排列方法将主材质映射到副材质。再次对高分辨率材质进行细化。In order to reduce the estimation of materials that support high-resolution images, and reduce the consumption of video memory, the complexity of capturing images is reduced. From two images, reconstruct high-quality materials using material properties and the powerful learning ability of deep learning. This technology proposes to use material characteristics, based on the depth inverse rendering framework, to determine from two images (one can be taken with the flash of the mobile phone, that is, the initial flash image, and one can be taken under ambient light, that is, the initial guide image) to determine Process method for high resolution materials. First, in order to exploit structurally repeated or randomly repeated textures, a reflection sample transfer method is introduced to combine multiple observations into a single master tile. Specifically, two images with a resolution of 2048*2048 are cropped, and 256*256 pixels in the same position are randomly cropped from these two images as the main tiles of the flash map and the guide map. At the same time, the flash map and guide map can be divided into 64 tiles of 256*256 pixels respectively, and 32 tiles can be selected as sub tiles. The materials of the main and sub-tiles can be defined as main and sub-materials. The reflection sample transfer method converts the secondary tile into a rough primary tile, which is similar to re-rendering the primary tile in the same direction as the secondary tile's incoming and outgoing light, resulting in a heavily lit primary tile. Second, deep inverse rendering can be used to restore the material of the main tile. To initialize the latent vector of the autoencoder, a convolutional neural network with a single image as input is used to obtain an initial set of high-resolution texture maps from the flash map, and the initial set of primary texture maps is cropped from the initial set of high-resolution texture maps. as the input to the autoencoder. Additionally, a collection of heavy lighting main tiles and flash main tiles can be used as input to the loss function to reconstruct more reasonable main tile materials. The detail can then be reintroduced to the main material using a refinement post-processing step. Finally, all sub-materials can be generated from the main material using the reverse reflection sample transfer method, and when these sub-materials are spliced to obtain a high-resolution material, in order to remove random artifacts in the high-resolution material and edge artifacts in the main tile area, in During the retroreflection sample transfer, the primary material can be mapped to the secondary material using the skip rearrangement method. Refine the high-res material again.

这里详细地阐述整个流程为什么能在减少显存消耗的同时从两张图像重建出合理的高分辨率材质，损失函数和跳跃重排列三方面内容。Here is a detailed explanation of why the whole process can reconstruct reasonable high-resolution materials from two images while reducing memory consumption, loss function and jump rearrangement.

1.高分辨率材质的重建1. Reconstruction of high-resolution materials

完整框架能为确定高分辨率图像的材质降低显存消耗和拍摄图像的复杂度。为此，利用纹理相似性和深度学习的强大学习能力，通过两张图像重建高质量的材质。这里主要解释框架能有效解决这些问题的原因。The complete framework reduces memory consumption and the complexity of capturing images for determining textures for high-resolution images. To this end, we use texture similarity and the powerful learning ability of deep learning to reconstruct high-quality materials from two images. Here we mainly explain the reasons why the framework can effectively solve these problems.

(1)在低分辨率图像数据集上训练的全卷积网络也可以处理高分辨率图像。然而，感受野相对地变小导致确定出的高分辨率材质质量很差。对于材质，可以通过采用反向传输方法将主图块映射到副图块，为了能从两张图像确定高分辨率材质，类似地，也可以将确定出的主材质传输到副材质。然后，可以通过拼接这些副材质来生成高分辨率的材质。最后，使用细化来生成更合理的高分辨率材质。因此，确定一个合理的主材质以恢复高分辨率材质非常关键。(1) Fully convolutional networks trained on low-resolution image datasets can also process high-resolution images. However, the relatively small receptive field results in poor quality of determined high-resolution materials. For the material, the main tile can be mapped to the sub-tile by using the reverse transfer method, in order to be able to determine the high-resolution material from the two images, the determined main material can be similarly transferred to the sub-material. These sub-materials can then be stitched together to generate high-resolution materials. Finally, refinement is used to generate more reasonably high resolution materials. Therefore, it is critical to determine a reasonable master material to restore the high-resolution material.

(2)单个推理网络设法从单张图像或固定数量的图像中确定出合理的材质。然而，为了支持更多的图像作为输入，这些方法必须大大增加显存消耗来训练网络。此外，由于重光照主图块存在噪声和伪影且这些噪声和伪影不规则，因此没有重光照主图块数据集可供训练，而且缺乏大量具有纹理相似特征的数据。这些问题阻碍训练推理网络来确定主材质。此外，闪光灯的轻微移动会改变局部光照条件，从而显著改变图块的亮度，这种图块亮度的显著变化给推理网络的确定结果带来了伪影。为了确定合理的主材质，并降低显存的消耗，使用深度逆渲染方法来确定主材质。仅使用单个推理网络对单张闪光灯图确定初始的高分辨率材质，使用分辨率很小的重光照主图块和原始主图块确定主图块材质，因此显存消耗不会随着图像分辨率的增加而上升。准确来说，对于两张2560*2560分辨率的图像，网络所需的显存可以为7G。(2) A single inference network manages to determine plausible materials from a single image or a fixed number of images. However, in order to support more images as input, these methods must greatly increase the memory consumption to train the network. Furthermore, due to the presence of noise and artifacts in relighted main tiles and these noise and artifacts are irregular, there is no relighted main tile dataset for training, and there is a lack of a large amount of data with texture-similar features. These issues prevent training the inference network to determine the primary material. In addition, slight movement of the flash can change the local lighting conditions, thereby significantly changing the brightness of the patch, and this significant change in the brightness of the patch brings artifacts to the determination results of the inference network. In order to determine a reasonable main material and reduce the consumption of video memory, the depth inverse rendering method is used to determine the main material. Use only a single inference network to determine the initial high-resolution material for a single flash image, and use a small-resolution relighted main tile and the original main tile to determine the main tile material, so the memory consumption does not scale with the image resolution. increase and rise. To be precise, for two images with a resolution of 2560*2560, the video memory required by the network can be 7G.

2.损失函数2. Loss function

需要尽可能确定出更加合理的主材质，因为主材质会影响确定的高分辨率材质的质量。直观的方法是直接实现一个可微的渲染损失L_relit，它在与重光照主图块相同的光照条件下渲染预测的材质贴图s，并计算这些渲染图R(s，L(i))和重光照主图块集合I_i之间的第一范式L1差异：It is necessary to determine a more reasonable main material as much as possible, because the main material will affect the quality of the determined high-resolution material. The intuitive approach is to directly implement a differentiable rendering loss L _relit , which renders the predicted material map s under the same lighting conditions as the relighted main tile, and computes these rendering maps R(s, L(i)) and First normal form L1 difference between relighted master tile sets I _i :

L_relit＝||R(s，L_i)-I_i||。L _relit =||R(s, L _i )-I _i ||.

然而，主图块的点和副图块的像素点之间不完美的匹配给重光照主图块引入了噪声和伪影。这些粗糙的重光照主图块也会给确定的主材质带来噪声。虽然潜在空间可以对优化进行正则化，但不足以减少噪声和伪影的干扰。为了进一步确定更合理的主材质，加入另一个渲染损失L_orig，该损失计算闪光灯主图块I_orig和使用相同光照条件L_orig渲染的图块R(s，L_orig)之间的第一范式L1差异：However, the imperfect match between the points of the main tile and the pixels of the sub-tile introduces noise and artifacts to relighting the main tile. These rough relight master tiles can also add noise to certain master materials. While the latent space can regularize the optimization, it is not sufficient to reduce the interference of noise and artifacts. To further determine a more reasonable primary material, another rendering loss, _Lorig , is added, which computes the first normal form between the flash primary tile I _orig and the tile R(s, _Lorig ) rendered with the same lighting conditions _Lorig L1 Difference:

L_orig＝||R(s，L_orig)-I_orig||。L _orig =||R(s, L _orig )-I _orig ||.

虽然L_relit的公式生成带有噪声和伪影的主图块材质，但它恢复出了粗尺度结构信息。基于这个粗糙的结果，L_orig的公式对细节纹理约束进行建模，以减少噪声和伪影的干扰，通过使用联合损失函数获得最佳重建结果：While L _relit 's formulation generates a main tile material with noise and artifacts, it recovers coarse-scale structural information. Based on this rough result, _Lorig 's formulation models the detailed texture constraints to reduce the interference of noise and artifacts, obtaining the best reconstruction results by using a joint loss function:

L＝α×L_relit+β×L_orig。L=α×L _relit +β×L _orig .

其中，L表示上述联合损失值。α表示上述第一预设参数。β表示上述第二预设参数。L_relit表示渲染图R(s，L(i))和重光照主图块集合之间的L1差异。L_orig表示闪光灯主图块I_orig和使用相同光照条件L_orig渲染的图块R(s，L_orig)之间的第一范式L1差异。where L represents the above joint loss value. α represents the above-mentioned first preset parameter. β represents the above-mentioned second preset parameter. L _relit represents the L1 difference between the rendering map R(s, L(i)) and the relit master tile set. _Lorig represents the first normal form L1 difference between the flash main tile I _orig and a tile R(s, _Lorig ) rendered using the same lighting condition _Lorig .

整个损失函数图如图3所示，减少噪声和伪影以确定合理的细尺度信息的前提是恢复粗尺度结构。因此，α的值需要设置为比β大以执行从粗到细的优化。根据经验，将α设置为6，β设置为1以平衡每个损失的大小。其中，为了细化主图块材质，还可以采用类似公式L的损失函数进行细化，但将α和β设置为1。The entire loss function graph is shown in Figure 3. The premise of reducing noise and artifacts to determine reasonable fine-scale information is to restore the coarse-scale structure. Therefore, the value of α needs to be set larger than β to perform coarse-to-fine optimization. As a rule of thumb, set α to 6 and β to 1 to balance the size of each loss. Among them, in order to refine the main tile material, a loss function similar to formula L can also be used for refinement, but α and β are set to 1.

3.跳跃重排列3. Jump rearrangement

当主材质传输到高分辨率材质时，逆传输方法引入了明显的边缘伪影。观察到恢复的主图块材质贴图的边缘像素值低于或高于整个图块的平均像素值，这种像素值的差异在漫反射图和镜面反射图中非常明显，偶尔会出现在粗糙度图中。具体来说，当镜面反射图边缘的平均像素值稍低时，为保证渲染图像与输入图像相同，网络生成的漫反射图的边缘平均像素值会变高。将较低或较高的像素定义为高低伪影。基于重排列的逆传输方法将主材质映射到副材质，并将这些副材质拼接起来以生成高分辨率的材质。然而，使用重排列方法会将伪影引入副材质中。具体来说，可以假设副图块包含主图块的一部分，在这部分的区域中，重新排列的副图块相当于复制了主图块的一部分，此时，生成的高分辨率材质中主图块区域的边缘像素值或者全部高于周围区域，或者都低于周围区域，因此在主图块区域的边缘明显出现了高低伪影。The inverse transfer method introduces noticeable edge artifacts when the main material is transferred to a high resolution material. Observed that the recovered main tile texture maps have edge pixel values lower or higher than the average pixel value of the entire tile, this difference in pixel values is very noticeable in the diffuse and specular maps, and occasionally in roughness Figure. Specifically, when the average pixel value of the edge of the specular map is slightly lower, in order to ensure that the rendered image is the same as the input image, the average pixel value of the edge of the diffuse map generated by the network will be higher. Define lower or higher pixels as high-low artifacts. A rearrangement-based inverse transfer method maps the main material to sub-materials and stitches these sub-materials together to generate high-resolution materials. However, using the rearrangement method introduces artifacts into the secondary material. Specifically, it can be assumed that the sub-tile contains a part of the main tile. In this part of the area, the rearranged sub-tile is equivalent to copying a part of the main tile. At this time, the main tile in the generated high-resolution material is The edge pixel values of the tile area are either all higher than the surrounding area, or all lower than the surrounding area, so the high-low artifact appears obviously at the edge of the main tile area.

为了去除这些高低伪影(例如，方形伪影和随机伪影)，本技术提出了一种跳跃重排列方法来生成从主图块到副图块的重排列映射图。跳跃重排列的过程如图4所示。首先，将引导图的中心区域裁剪为引导主图块(裁剪区域可以是216×216像素)。对于引导副图块的每个像素点A，在引导主图块中找到与之材质最相似的像素点B，然后记录下这两个点的位置信息，其中使用二进制鲁棒独立的基本特征描述符(BRIEF描述符)来判断两个点(A和B)的相似性。BRIEF描述符可以是特征点的描述符。此过程的实现方法如下：考虑到多个尺度的相似性，使用窗口大小为16、32，标准差为4、8的高斯核分别对引导主图块做高斯平滑来降低噪声的干扰，加上引导主图块得到三个处理后的引导主图块P1、P2、P3，对于P1、P2、P3，针对一个像素点，选取33*33,65*65和5*5像素邻域使用BRIEF G II分别选出128、96、32个点对，计算出128位、96位和32位描述符，对三个颜色通道重复以上操作，最终将描述符相连得768位描述符。对引导副图块的各像素点也做以上操作，得到相应的描述符。针对引导副图块中的一个像素点，遍历引导主图块的所有像素点，计算出相应的描述符。然后计算两个点描述符的汉明距离，汉明距离最小的两个像素点即为最佳匹配点，因此存储汉明距离最小的两个点的位置信息。基于这种方法，遍历引导副图块的所有像素点从引导主图块中找到相应的最佳匹配点，记录下相应的位置信息来生成重排列映射表。随后，使用重排列映射表将主材质重排列映射到副材质。图4中也给出了重排列后的重光照主图块，其中将重排列后的重光照主图块和跳跃重排列后的重光照主图块做效果对比。To remove these high and low artifacts (eg, square and random artifacts), the present technique proposes a skip rearrangement method to generate a rearrangement map from primary tiles to secondary tiles. The process of jump rearrangement is shown in Figure 4. First, the central area of the guide map is cropped as the guide main tile (the cropped area can be 216×216 pixels). For each pixel point A of the guide sub-block, find the pixel point B with the most similar material in the guide main block, and then record the position information of these two points, which uses the binary robust independent basic feature description Descriptor (BRIEF descriptor) to judge the similarity of two points (A and B). BRIEF descriptors can be descriptors of feature points. The implementation method of this process is as follows: considering the similarity of multiple scales, use Gaussian kernels with window sizes of 16, 32, and standard deviations of 4 and 8 to perform Gaussian smoothing on the guiding main block to reduce the interference of noise, plus Guide the main block to get three processed guide main blocks P1, P2, P3, for P1, P2, P3, for a pixel, select 33*33, 65*65 and 5*5 pixel neighborhoods to use Brief G II selects 128, 96, and 32 point pairs respectively, calculates 128-bit, 96-bit and 32-bit descriptors, repeats the above operations for the three color channels, and finally connects the descriptors to obtain 768-bit descriptors. The above operations are also performed on each pixel of the guide sub-block to obtain the corresponding descriptor. For a pixel in the guide sub-block, traverse all the pixels in the guide main block, and calculate the corresponding descriptor. Then calculate the Hamming distance of the two point descriptors. The two pixels with the smallest Hamming distance are the best matching points, so the position information of the two points with the smallest Hamming distance is stored. Based on this method, traverse all the pixels of the guide sub-block to find the corresponding best matching point from the guide main block, and record the corresponding position information to generate a rearrangement map. Then, use the rearrangement map to map the main material rearrangement to the secondary material. Figure 4 also shows the rearranged re-illuminated main tile, in which the re-arranged re-illuminated main tile and the re-arranged re-arranged re-illuminated main tile are compared for effects.

对于所有副图块，使用相同的方法从主材质重排列映射到副材质。最后，在拼接所有副材质时，应用一个窗口中值滤波器来清除拼接区域之间的边界。由于这种跳跃重排列方法没有从主图块的边缘选择匹配点，这减少了生成的高分辨率材质的方形伪影和随机伪影。For all sub-tiles, use the same method to rearrange the map from the main material to the sub-material. Finally, when stitching all sub-materials, a window median filter is applied to clean up the boundaries between the stitched regions. Since this skip-rearrangement method does not select matching points from the edges of the main tile, this reduces square and random artifacts for the resulting high-resolution materials.

本发明能降低图像采集的复杂度，自动地重建合理或精确的材质贴图，当贴图质量十分精确时，能代替专业人员制作材质，当贴图质量稍差时，依然能为人员创建材质贴图提供基准和目标线索，从而辅助专业人员制作材质，最终达到减少材质制作的人工成本的作用。另外，由于只需拍摄两张图像，大大降低的图像采集的复杂程度，不仅能确定高分辨率材质而且对显存的消耗很低，因此本发明适用于拥有普通设备的用户使用，从两张图确定出的材质质量更好，重建的表面外观逼真度更高，更加有利于辅助专业人员制作材质。The invention can reduce the complexity of image acquisition, automatically reconstruct reasonable or accurate material maps, when the quality of the maps is very accurate, it can replace professionals to make materials, and when the quality of the maps is slightly poor, it can still provide benchmarks for personnel to create material maps and target clues, so as to assist professionals in making materials, and ultimately reduce the labor cost of material production. In addition, since only two images need to be taken, the complexity of image acquisition is greatly reduced, high-resolution materials can not only be determined, but also the consumption of video memory is very low. Therefore, the present invention is suitable for users with ordinary equipment. From the two images The quality of the determined materials is better, and the reconstructed surface appearance is more realistic, which is more conducive to assisting professionals to make materials.

以上描述仅为本公开的一些较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开的实施例中所涉及的发明范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述发明构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above descriptions are merely some preferred embodiments of the present disclosure and illustrations of the applied technical principles. Those skilled in the art should understand that the scope of the invention involved in the embodiments of the present disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned inventive concept, the above-mentioned Other technical solutions formed by any combination of technical features or their equivalent features. For example, a technical solution is formed by replacing the above-mentioned features with the technical features disclosed in the embodiments of the present disclosure (but not limited to) with similar functions.

Claims

1. A high-resolution material recovery method based on two image inverse rendering neural networks comprises the following steps:

controlling a mobile terminal to shoot an image to obtain an initial flash image and an initial guide image, wherein the initial flash image is an image shot by the mobile terminal when a flash is turned on, and the initial guide image is an image shot by the mobile terminal under ambient light when the flash is not turned on;

performing segmentation processing on the initial flash map and the initial guide map to obtain a flash map and a guide map;

Selecting image regions from the flash map and the guide map as a flash master tile and a guide master tile, respectively, wherein the positions of the image regions in the flash map and the guide map of the flash master tile and the guide master tile are matched;

the flash lamp graph and the guide graph are subjected to segmentation processing to obtain a flash lamp graph block set and a guide graph block set;

randomly screening the flash lamp pattern blocks and the guide pattern blocks from the flash lamp pattern block set and the guide pattern block set to serve as flash lamp auxiliary pattern blocks and guide auxiliary pattern blocks, and obtaining a flash lamp auxiliary pattern block set and a guide auxiliary pattern block set;

rearranging the flash lamp secondary image block set based on the guide secondary image block set to obtain a relighting main image block set;

generating a master material map group based on the flash map, the relighting master pattern set, and the flash master pattern;

generating an auxiliary material graph group based on each guide auxiliary graph block in the guide auxiliary graph block set, the main material graph group and the guide main graph block to obtain an auxiliary material graph group set;

and splicing each auxiliary material graph group in the auxiliary material graph group set to obtain a high-resolution material graph group.

2. The method of claim 1, wherein the method further comprises:

and sending the high-resolution material map group to three-dimensional modeling equipment so as to construct a three-dimensional model for the three-dimensional modeling equipment.

3. The method of claim 2, wherein the generating a master material map group based on the flash map, the relighting master tile set, and the flash master tile comprises:

inputting the flash lamp graph into a pre-trained material generation model to obtain an initial high-resolution material graph group;

segmenting the initial high-resolution material map group to obtain an initial main material map group;

and obtaining a main material image group based on the initial main material image group, the relighting main image block set, the flash lamp main image block and a pre-trained automatic encoder.

4. The method of claim 3, wherein the pre-trained auto-encoder comprises an encoder and a decoder; and

obtaining a master material map group based on the initial master material map group, the relighting master pattern set, the flash master pattern block, and a pre-trained auto-encoder, comprising:

inputting the initial main material picture group into the encoder and the decoder module to obtain an initial decoding main material picture group, wherein the decoder module comprises a parameter optimization module and a decoder;

Based on the initial decoding main material graph group and the decoder module, performing the following inverse rendering steps:

generating a joint penalty value based on the initial decoding master texture map group, the relighting master tile set, and the flash master tile;

in response to determining that the joint loss value converges to a predetermined threshold, treat the initial decoded master material map group as a master material map group;

in response to determining that the joint loss value does not converge to a predetermined threshold, adjusting potential vector parameters in a decoder module, taking the adjusted decoder module as the decoder module, inputting the adjusted potential vector parameters to the decoder module to obtain an initial decoded main material map group, and performing the inverse rendering step again.

5. The method of claim 4, wherein said generating a joint penalty value based on said initial decoding master texture map group, said relighting master tile set, and said flash master tile comprises:

rendering the initial decoding main material graph group under the same illumination condition with each relighting main graph block in the relighting main graph block set to obtain a first rendering graph set;

generating a first rendering penalty value based on the first rendering graph set and the relighting master graph set;

Rendering the initial decoding main material graph group under the same illumination condition with the flash lamp main graph block to obtain a second rendering graph;

generating a second rendering penalty value based on the second rendering map and the flash master tile;

and obtaining a joint loss value based on the first rendering loss value, the second rendering loss value, the first preset parameter and the second preset parameter.

6. The method of claim 5, wherein the deriving a joint penalty value based on the first rendering penalty value, the second rendering penalty value, a first preset parameter, and a second preset parameter comprises:

based on the first rendering loss value, the second rendering loss value, the first preset parameter and the second preset parameter, obtaining a joint loss value by using the following formula:

wherein L represents the joint loss value, α represents the first preset parameter, and R¹Representing a first rendering in the first set of renderings, i representing a sequence number,

representing the ith first rendering in the first rendering set, I¹Representing a relighting master tile in the set of relighting master tiles,

represents the ith relighting master pattern block in the relighting master pattern set, beta represents the second preset parameter, R ²Represents the second rendering graph, I²Represents the flash master pattern, | represents a norm,

the value of (a) is the value obtained by subtracting the pixel value of the corresponding position of the ith relighting main block in the relighting main block set from the pixel value of the ith first rendering in the first rendering set, then taking absolute values and adding the absolute values, and finally dividing the absolute values by the number of the pixels, | R²-I²The value of | is obtained by subtracting the pixel value of the corresponding position of the flash light master block from the pixel value of the second rendering block, then taking absolute values and adding the absolute values, and finally dividing the absolute values by the number of pixels.

7. The method of claim 6, wherein the generating a set of secondary material maps based on each of the set of guide secondary tiles, the set of primary material maps, and the guide primary tile comprises:

generating descriptors based on each pixel point in the guide sub-image blocks to obtain descriptor sets;

generating a reference descriptor based on each reference pixel point in the guide master image block to obtain a reference descriptor set;

for each descriptor in the descriptor set, determining a reference descriptor corresponding to the descriptor in the reference descriptor set as a matching descriptor to obtain a matching descriptor set;

Generating a rearrangement mapping table based on the position information of each pixel point in the guide auxiliary image block and the position information of the reference pixel point corresponding to the matching descriptor set;

and generating an auxiliary material graph group according to the rearrangement mapping table and the guide auxiliary image block.

8. The method of claim 7, wherein the generating a descriptor based on each pixel point in the boot sub-tile comprises:

respectively using Gaussian kernels with the window size of 16, the standard deviation of 4, the window size of 32 and the standard deviation of 8 to conduct Gaussian smoothing on the guide sub-image blocks to obtain a first matching sub-image block and a second matching sub-image block;

selecting three pixel neighborhoods 33 x 33, 65 x 65 and 5 x 5 red, green and blue from the first matching sub-tile, the second matching sub-tile and the guiding sub-tile, respectively, and selecting 128, 96 and 32 point pairs from the three pixel neighborhoods 33 x 33, 65 x 65 and 5 x 5 red, green and blue, respectively, using a binary robust independent primitive feature descriptor method;

determining 128-bit, 96-bit, and 32-bit descriptors based on the selected 128, 96, and 32 point pairs;

the determined 128-bit, 96-bit and 32-bit descriptors are concatenated, resulting in a 768-bit descriptor.

9. The method of claim 8, wherein the generating a reference descriptor based on each reference pixel point in the guidance master tile comprises:

respectively using Gaussian kernels with the window size of 16, the standard deviation of 4, the window size of 32 and the standard deviation of 8 to conduct Gaussian smoothing on the guide master image block to obtain a first matching master image block and a second matching master image block;

selecting three 33 x 33, 65 x 65 and 5 x 5 pixel neighborhoods of red, green and blue from the first matching master tile, the second matching master tile and the guide master tile, respectively, and selecting 128, 96 and 32 point pairs from the three 33 x 33, 65 x 65 and 5 x 5 pixel neighborhoods of red, green and blue, respectively, using a binary robust independent primitive descriptor method;

10. The method of claim 9, wherein the determining, for each descriptor in the set of descriptors, a reference descriptor in the set of reference descriptors that corresponds to the descriptor as a matching descriptor comprises:

Determining Hamming distances between the descriptors and the reference descriptors in the reference descriptor set to obtain a Hamming distance set;

sequencing each Hamming distance in the Hamming distance set from small to large to obtain a Hamming distance sequence;

and determining the reference descriptor corresponding to the minimum Hamming distance in the Hamming distance sequence as a matching descriptor.