CN111292264B

CN111292264B - A high dynamic range image reconstruction method based on deep learning

Info

Publication number: CN111292264B
Application number: CN202010072803.3A
Authority: CN
Inventors: 肖春霞; 刘文焘
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2020-01-21
Filing date: 2020-01-21
Publication date: 2023-04-21
Anticipated expiration: 2040-01-21
Also published as: CN111292264A

Abstract

The invention discloses an image high dynamic range reconstruction method based on deep learning, and belongs to the field of computational photography and digital image processing. The invention establishes a mapping network from a single LDR image to an HDR image by adopting a method based on deep learning. The method first sequentially generates LDR training data, HDR sample labels with aligned brightness units and mask images of high brightness areas from the collected HDR data set. The neural network is then constructed and trained to obtain a network model with an LDR to HDR mapping relationship. And finally, directly inputting the LDR image into the network model by utilizing the generated network model obtained by training, and outputting the reconstructed HDR image. The method can effectively reconstruct the dynamic range of the real scene from a single common digital image, and can be used for HDR simulation effect display of the common digital image or providing more realistic rendering effect for the image illumination technology.

Description

A method for image high dynamic range reconstruction based on deep learning

技术领域Technical Field

本发明属于计算摄影学和数字图像处理领域，涉及图像的高动态范围重建方法，尤其是基于深度学习的图像高动态范围重建方法。The present invention belongs to the fields of computational photography and digital image processing, and relates to a method for reconstructing a high dynamic range of an image, in particular to a method for reconstructing a high dynamic range of an image based on deep learning.

背景技术Background Art

高动态范围成像(High Dynamic Range Imaging,HDRI)技术是用来实现比普通数字图像更大曝光范围的一种图像表示方法，高动态范围(High Dynamic Range,HDR)图像可以提供比普通数字图像更大的亮度变化范围和更多的明暗细节，这使得HDR图像能够呈现更加接近真实场景的亮度变化信息。近年来，随着显示设备的不断进化和基于物理渲染的需求提高，高动态范围成像技术在实际应用中变得越来越重要。然而，目前直接获取HDR图像的方法需要较高的专业技能，成本较高且耗时。针对从单幅普通数字图像中重建HDR的方法，传统方法只能通过增加约束的方法来尽可能减少问题的非适定性，这使得它们只能针对某些特定应用场景有效。一些学者也基于深度学习做了一些卓有成效的工作，但他们未能考虑到诸如HDR图片间的亮度等级不变性等因素导致重建效果有局限性。该发明可以从单幅普通数字图像中有效地重建出真实场景的动态范围，可用于普通数字图像的HDR模拟效果显示或为基于图像照明技术提供更逼真的渲染效果。High Dynamic Range Imaging (HDRI) technology is an image representation method used to achieve a larger exposure range than ordinary digital images. High Dynamic Range (HDR) images can provide a larger brightness range and more light and dark details than ordinary digital images, which enables HDR images to present brightness change information closer to the real scene. In recent years, with the continuous evolution of display devices and the increasing demand for physical rendering, high dynamic range imaging technology has become increasingly important in practical applications. However, the current method of directly obtaining HDR images requires high professional skills, is costly and time-consuming. For methods of reconstructing HDR from a single ordinary digital image, traditional methods can only reduce the non-adaptability of the problem as much as possible by adding constraints, which makes them only effective for certain specific application scenarios. Some scholars have also done some fruitful work based on deep learning, but they have failed to consider factors such as the invariance of brightness levels between HDR images, which leads to limitations in the reconstruction effect. The invention can effectively reconstruct the dynamic range of a real scene from a single ordinary digital image, which can be used for HDR simulation effect display of ordinary digital images or provide more realistic rendering effects based on image lighting technology.

发明内容Summary of the invention

本发明的目的是从单张普通数字图像中尽可能恢复原场景的高动态范围图像。这里普通数字图像是指以8位颜色深度、256色阶保存的低动态范围(Low Dynamic Range,LDR)图像，高动态范围图像指以“.EXR”或“.HDR”等格式保存的接近真实场景明暗变化的高动态范围图像。The purpose of the present invention is to restore the high dynamic range image of the original scene from a single ordinary digital image as much as possible. Here, the ordinary digital image refers to a low dynamic range (LDR) image saved with 8-bit color depth and 256 color levels, and the high dynamic range image refers to a high dynamic range image saved in the format of ".EXR" or ".HDR" that is close to the light and dark changes of the real scene.

为了达到上述目的，本发明采用基于深度学习的方法建立了一个从LDR图像到HDR图像的映射网络，通过训练数据对网络进行学习训练，使其建立一个端到端的LDR图像到HDR图像的映射关系，整体框架图如附图1所示。算法分为数据预处理和深度神经网络训练两个部分。数据预处理部分包括训练样本对的生成、HDR图像亮度单位的对齐和图像高光区域掩码的生成三个部分。神经网络结构分为基本HDR重建网络和训练优化网络，如附图2所示。其损失函数包括三项，分别为HDR重建图像的尺度不变损失、高光区域分类的交叉熵损失和生成对抗损失。In order to achieve the above-mentioned purpose, the present invention adopts a deep learning-based method to establish a mapping network from LDR images to HDR images, and trains the network through training data to establish an end-to-end mapping relationship from LDR images to HDR images. The overall framework diagram is shown in Figure 1. The algorithm is divided into two parts: data preprocessing and deep neural network training. The data preprocessing part includes three parts: the generation of training sample pairs, the alignment of HDR image brightness units, and the generation of image highlight area masks. The neural network structure is divided into a basic HDR reconstruction network and a training optimization network, as shown in Figure 2. Its loss function includes three items, namely, the scale-invariant loss of the HDR reconstructed image, the cross entropy loss of the highlight area classification, and the generative adversarial loss.

该方法具体包括以下内容和步骤：The method specifically includes the following contents and steps:

一、数据预处理1. Data Preprocessing

1)生成LDR训练样本输入1) Generate LDR training sample input

在使用深度神经网络进行有监督训练之前，需要获取对应网络输入输出的训练数据集。训练数据集包含若干LDR-HDR图像对，其中HDR图像数据可使用现有可用的HDR图片，该数据作为训练样本的标签，用于监督网络的训练；LDR图像数据作为HDR图像对应的样本输入，需要从原HDR图像中生成，其生成方法有两中途径，一是使用色调映射算法完成从HDR图片到LDR图片的生成，二是通过构建虚拟相机的方式以HDR图像作为模拟场景对齐进行模拟拍摄从而得到LDR图片。Before using a deep neural network for supervised training, it is necessary to obtain a training dataset corresponding to the network input and output. The training dataset contains several LDR-HDR image pairs, where the HDR image data can use existing available HDR images, which are used as labels for training samples to supervise the training of the network; the LDR image data, as the sample input corresponding to the HDR image, needs to be generated from the original HDR image. There are two ways to generate it: one is to use a tone mapping algorithm to complete the generation from HDR images to LDR images, and the other is to construct a virtual camera to align the HDR image as a simulated scene for simulated shooting to obtain an LDR image.

使用色调映射算法生成LDR图像：选择一种适当的色调映射算法，直接将HDR图像作为算法输入即可得到对应的LDR图像输出。Generate LDR image using tone mapping algorithm: Select an appropriate tone mapping algorithm and directly use the HDR image as the algorithm input to get the corresponding LDR image output.

通过构建虚拟相机来获取LDR图像：首先基于常用的数码单反相机确定虚拟相机动态范围的取值范围，对于每一次获取LDR图像都随机选取范围内的一个值作为该次模拟拍摄的相机的动态范围；然后虚拟相机根据输入的HDR图像进行自动曝光，对亮度超出虚拟相机动态范围的像素取边界值，再将其线性映射至LDR图像的低动态范围；最后，将所得到的图像从线性空间通过随机选择的近似相机响应函数映射为普通数字图像，即得到所需要的LDR图像。The LDR image is obtained by constructing a virtual camera: first, the value range of the virtual camera's dynamic range is determined based on the commonly used digital SLR camera, and each time an LDR image is obtained, a value within the range is randomly selected as the dynamic range of the camera for the simulated shooting; then the virtual camera automatically exposes according to the input HDR image, takes the boundary value of the pixels whose brightness exceeds the dynamic range of the virtual camera, and then linearly maps it to the low dynamic range of the LDR image; finally, the obtained image is mapped from the linear space to an ordinary digital image through a randomly selected approximate camera response function, that is, the required LDR image is obtained.

2)对齐HDR样本标签2) Align HDR sample labels

对于保存在相对亮度域的HDR图像，在将其作为训练样本标签前，需对齐它们的亮度单位。设原始HDR图像为H，LDR图像转换到线性空间并归一化到[0,1]，设为L，H_l,c,L_l,c分别为图像在位置l，通道c处的像素值，其对齐方法为：For HDR images stored in the relative brightness domain, their brightness units need to be aligned before they are used as training sample labels. Let the original HDR image be H, the LDR image be converted to linear space and normalized to [0,1], let L, H _l,c , L _l,c be the pixel values of the image at position l and channel c respectively, and the alignment method is:

其中

为对齐后的HDR图像，m_l,c定义为in

is the aligned HDR image, m _l,c is defined as

其中τ为[0,1]的常数。对齐后的HDR图像与其对应的LDR图像组成供神经网络训练的训练样本对。Where τ is a constant in [0,1]. The aligned HDR image and its corresponding LDR image form a training sample pair for neural network training.

3)生成高光掩码图像3) Generate highlight mask image

得到对齐的HDR图像后，可以通过二值化的方式获取图像中高亮度区域的掩码图像。该掩码图像中值为1的区域代表场景中拥有较高亮度的物体或表面，如光源、强光反射面等。这些高光区域在LDR图像中往往会由于过曝光而被裁剪亮度，通过从HDR图像生成的高光掩码图像可作为网络优化训练部分的样本标签，用于优化神经网络的训练过程。After obtaining the aligned HDR image, the mask image of the high brightness area in the image can be obtained by binarization. The area with a value of 1 in the mask image represents objects or surfaces with higher brightness in the scene, such as light sources, strong light reflective surfaces, etc. These highlight areas are often clipped in brightness due to overexposure in LDR images. The highlight mask image generated from the HDR image can be used as a sample label for the network optimization training part to optimize the training process of the neural network.

二、神经网络的训练2. Training of Neural Networks

1)神经网络结构1) Neural network structure

网络结构主要分为生成网络和判别网络两个部分，结构示意图如附图3所示。其中生成网络为一个U-Net结构，网络接受一张LDR图像作为输入，经过由ResNet50模型构成的编码网络和6层“上采样+卷积层”模块构成解码网络后，分别输出一张HDR图像和高光掩码图像。网络输出的HDR图像即为根据LDR输入图像得到的HDR重建图像，而高光掩码图像作为网络对LDR图像中高光区域的预测，可作为优化网络训练的数据。判别器网络为一个由4层卷积层构成的全卷积网络，该判别器接受一张HDR图像与高光掩码图像作为输入，输出一张表征所输入的HDR图像为真实HDR图像或者是网络生成的虚假HDR图像的概率的特征图，该特征图可作为训练神经网络的数据。The network structure is mainly divided into two parts: the generating network and the discriminating network. The structural diagram is shown in Figure 3. The generating network is a U-Net structure. The network accepts an LDR image as input. After the encoding network composed of the ResNet50 model and the decoding network composed of the 6-layer "upsampling + convolution layer" module, an HDR image and a highlight mask image are output respectively. The HDR image output by the network is the HDR reconstructed image obtained according to the LDR input image, and the highlight mask image is the network's prediction of the highlight area in the LDR image, which can be used as data for optimizing network training. The discriminator network is a fully convolutional network composed of 4 convolutional layers. The discriminator accepts an HDR image and a highlight mask image as input, and outputs a feature map representing the probability that the input HDR image is a real HDR image or a false HDR image generated by the network. The feature map can be used as data for training the neural network.

2)神经网络的训练方法2) Neural network training method

该发明采用有监督学习的方式对上述网络进行训练。训练采用Adam优化器分别逐次对生成网络和判别网络进行反向传播优化，其中生成网络共包含三组优化函数，损失函数定义如下：The invention uses supervised learning to train the above network. The training uses the Adam optimizer to perform back propagation optimization on the generative network and the discriminative network one by one, where the generative network contains three sets of optimization functions, and the loss function is defined as follows:

L_G＝α₁L_hdr+α₂L_mask+α₃L_gan L _G ＝α ₁ L _hdr +α ₂ L _mask +α ₃ L _gan

该损失分别由三个损失函数控制，包括HDR重建图像的尺度不变损失、高光区域分类的交叉熵损失和生成对抗损失。The loss is controlled by three loss functions, including the scale-invariant loss of HDR reconstructed images, the cross-entropy loss for highlight area classification, and the generative adversarial loss.

HDR重建图像的尺度不变损失：该项损失函数依据HDR图像在相对亮度域中具有尺度不变性提出，该项损失控制网络输出的HDR图像使其尽可能与HDR标签值接近。其定义如下：Scale invariance loss of HDR reconstructed images: This loss function is proposed based on the scale invariance of HDR images in the relative brightness domain. This loss controls the HDR image output by the network to make it as close to the HDR label value as possible. Its definition is as follows:

其中y表示网络输出的HDR图像，

为目标图像，下标l，c分别表示像素位置和颜色通道，

表示在对数域中网络输出与样本标签的差值，∈为一个防止对数计算为零的微小值。该损失第一项即为普通的L₂损失，引入第二项后，该项损失在计算时仅由预测值与样本标签的差值所影响，与预测值或样本标签的实际大小无关。Where y represents the HDR image output by the network,

is the target image, subscripts l and c represent pixel position and color channel respectively.

It represents the difference between the network output and the sample label in the logarithmic domain, and ∈ is a small value that prevents the logarithm from being calculated to zero. The first term of this loss is the ordinary _L2 loss. After the introduction of the second term, this loss is only affected by the difference between the predicted value and the sample label during calculation, and has nothing to do with the actual size of the predicted value or the sample label.

高光区域分类的交叉熵损失：该项损失控制网络对图片中高亮度区域的检测。网络输出的高光掩码图像为对输入LDR图像中高光区域或非高光区域的分类结果，该结果应与预处理步骤时生成的高光掩码图像尽可能接近。该损失函数定义如下：Cross entropy loss for highlight area classification: This loss controls the network's detection of high brightness areas in the image. The highlight mask image output by the network is the classification result of the highlight area or non-highlight area in the input LDR image. The result should be as close as possible to the highlight mask image generated in the preprocessing step. The loss function is defined as follows:

其中m,

分别为网络预测值和标签值。Where m,

are the network prediction value and label value respectively.

生成对抗损失：该项损失能使网络预测的HDR图像在分布上尽可能接近真实的HDR图像，防止预测结果与标签结果仅在数值差异上减小而忽视整体分布的差异。该损失函数定义如下：Generate adversarial loss: This loss can make the HDR image predicted by the network as close to the real HDR image as possible in distribution, preventing the predicted result from being reduced in numerical difference from the label result while ignoring the difference in overall distribution. The loss function is defined as follows:

其中D(y)为判别网络以生成网络输出为输入的计算结果。Where D(y) is the calculation result of the discriminant network with the output of the generative network as input.

判别网络的损失函数为标准WGAN-GP损失，该项控制判别网络使其尽可能准确判断输入判别网络的图像是否为真实HDR图像。其定义如下：The loss function of the discriminant network is the standard WGAN-GP loss, which controls the discriminant network to make it judge as accurately as possible whether the image input to the discriminant network is a real HDR image. It is defined as follows:

其中∈为[0,1]的随机数。Where ∈ is a random number in [0,1].

根据上述训练方法对网络进行训练，当损失函数收敛后，生成网络便建立了一个从单张LDR图像到HDR图像的映射关系。利用该训练好的生成网络，即可完成从单幅普通数字图像尽可能重建真实的高动态范围图像的目标。The network is trained according to the above training method. When the loss function converges, the generative network establishes a mapping relationship from a single LDR image to an HDR image. Using this trained generative network, the goal of reconstructing a realistic high dynamic range image as much as possible from a single ordinary digital image can be achieved.

与现有技术相比，本发明有如下优点：Compared with the prior art, the present invention has the following advantages:

1.本发明构建了一个端到端的神经网络，仅需一张图片就能重建出足够真实的HDR图像，且无需人工交互的步骤；1. The present invention constructs an end-to-end neural network that can reconstruct a sufficiently realistic HDR image with only one picture, without the need for manual interaction;

2.本发明基于尺度不变性对HDR数据进行对齐和训练网络，能达到更好的重建效果；2. The present invention aligns HDR data and trains the network based on scale invariance, which can achieve better reconstruction effect;

3.本发明通过生成高光掩码图像对网络训练进行优化，能获得高亮度区域更好的重建效果。3. The present invention optimizes network training by generating highlight mask images, which can achieve better reconstruction effects in high-brightness areas.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的总框架图；Fig. 1 is a general framework diagram of the present invention;

图2是本发明的深度神经网络结构图；FIG2 is a diagram of a deep neural network structure of the present invention;

图3是本发明的效果示意图。FIG. 3 is a schematic diagram showing the effect of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本发明的技术方案作进一步说明。The technical solution of the present invention is further described below in conjunction with the accompanying drawings and embodiments.

如图1所示，本发明为一种基于深度学习的图像高动态范围重建方法，包括以下步骤：As shown in FIG1 , the present invention is a method for reconstructing an image with high dynamic range based on deep learning, comprising the following steps:

步骤1，预处理HDR数据集以构建神经网络的训练数据。首先从收集到的HDR数据集中生成LDR数据，然后使用LDR-HDR数据来对齐HDR数据，再利用对齐后的HDR图像生成高光掩码图像，最后将这三者整合作为神经网络的训练数据。其中LDR数据为数据输入，对齐的HDR数据和高光掩码图像为标签数据。Step 1: Preprocess the HDR dataset to construct training data for the neural network. First, generate LDR data from the collected HDR dataset, then use the LDR-HDR data to align the HDR data, and then use the aligned HDR image to generate a highlight mask image. Finally, integrate the three as training data for the neural network. The LDR data is the data input, and the aligned HDR data and highlight mask image are the label data.

步骤2，构建并训练神经网络以得到具有LDR到HDR映射关系的网络模型。根据训练策略，逐次轮替地对生成网络和判别网络进行反向传播训练，直到损失函数收敛。此时，生成网络部分即为最终用于重建高动态范围图像的网络模型。Step 2: Build and train a neural network to obtain a network model with a mapping relationship from LDR to HDR. According to the training strategy, the generator network and the discriminator network are back-propagated and trained alternately until the loss function converges. At this point, the generator network is the network model that is ultimately used to reconstruct high dynamic range images.

步骤3，根据步骤2训练得到的生成网络模型，直接将待重建的LDR图像输入至网络模型，即可输出其重建的HDR图像。Step 3: According to the generated network model trained in step 2, the LDR image to be reconstructed is directly input into the network model to output the reconstructed HDR image.

下面结合实例对该方法进行详细说明。The method is described in detail below with reference to examples.

一、数据预处理1. Data Preprocessing

HDR数据集选取现有的公共HDR数据集构成，通过对收集数据进行裁剪、缩放等操作组成一组尺寸、类型相同的数据集。根据此整合的数据集，分别应用显示适应的色调映射(Display Adaptive Tone Mapping)算法和前述的虚拟相机拍摄的方式获取LDR数据。关于色调映射，根据实际应用时图片的类型选取，比如如果应用时的大多图片是未经后期处理拍摄的图片，就选择本实施例这个的方法。如果是针对某处理的图片，就选取一种与后处理效果相近的方式。具体地，对每一张HDR图像，应用显示适应的色调映射算法获取一张LDR图像，再应用随机参数的虚拟相机进行一次拍摄获取一张LDR图像，即每张HDR图像生成两张不同方法获取的LDR图像。生成的LDR图像转换至线性空间，并从范围[0,255]的整数值归一化至[0,1]的小数值。The HDR data set is composed of an existing public HDR data set, and a set of data sets of the same size and type are formed by cropping, scaling and other operations on the collected data. According to this integrated data set, the display adaptive tone mapping algorithm and the aforementioned virtual camera shooting method are respectively applied to obtain LDR data. Regarding tone mapping, it is selected according to the type of picture in actual application. For example, if most of the pictures in the application are pictures taken without post-processing, the method of this embodiment is selected. If it is a picture for a certain processing, a method similar to the post-processing effect is selected. Specifically, for each HDR image, a display adaptive tone mapping algorithm is applied to obtain an LDR image, and then a virtual camera with random parameters is used to shoot once to obtain an LDR image, that is, each HDR image generates two LDR images obtained by different methods. The generated LDR image is converted to a linear space and normalized from an integer value in the range [0,255] to a decimal value [0,1].

然后基于每一对LDR-HDR图像对其中的HDR图像进行亮度对齐。对齐方法参照说明内容中的公式进行：Then, based on each pair of LDR-HDR images, the HDR images are brightness aligned. The alignment method is based on the formula in the description:

其中

为对齐后的HDR图像，m_l,c定义为in

is the aligned HDR image, m _l,c is defined as

其中τ可为[0,1]的常数，这里取τ＝0.08。将该公式应用到每一对LDR-HDR图像后，得到对齐的HDR图像数据，该数据将作为训练网络时的样本标签，而LDR图像数据作为样本输入。Where τ can be a constant in [0,1], and here τ = 0.08 is taken. After applying this formula to each pair of LDR-HDR images, the aligned HDR image data is obtained, which will be used as the sample label when training the network, and the LDR image data is used as the sample input.

最后，基于对齐的HDR数据，由如下公式计算高光掩码图像：Finally, based on the aligned HDR data, the highlight mask image is calculated by the following formula:

其中

为对齐后的HDR图像的通道均值图像，t为常数，这里取t＝e^0.1。该掩码图像将作为训练网络时的另一项样本标签，同HDR图像一起监督网络的学习过程。in

is the channel mean image of the aligned HDR image, t is a constant, here we take t = e ^0.1 . The mask image will be used as another sample label when training the network, and supervise the learning process of the network together with the HDR image.

二、神经网络训练2. Neural Network Training

神经网络的结构依照附图2所示进行搭建。具体地，对于ResNet50模型，这里使用现有的基于ImageNet数据集的分类任务中已训练好的模型作为该部分网络权重的初始化值；其余每一个网络块都代表一个由卷积层和实例规范化操作以及relu激活函数组成的卷积块；生成网络中decoder部分的输入由上一层卷积块的输出和encoder对称位置的输出拼接而成；对于判别器网络部分，其输入由生成网络的输出组成，在经过四层卷积块后，输出一张判断输入HDR图像是否为真实图像的概率特征图。The structure of the neural network is constructed as shown in Figure 2. Specifically, for the ResNet50 model, the trained model in the existing classification task based on the ImageNet dataset is used as the initialization value of the weight of this part of the network; each of the remaining network blocks represents a convolutional block consisting of a convolutional layer, an instance normalization operation, and a relu activation function; the input of the decoder part of the generated network is composed of the output of the previous convolutional block and the output of the encoder at a symmetrical position; for the discriminator network part, its input is composed of the output of the generated network, and after passing through four layers of convolutional blocks, a probability feature map is output to determine whether the input HDR image is a real image.

其中，生成网络的输出与对应输入的样本标签数据，以及判别网络根据该输出数据计算的到的概率特征图，一同根据如下公式计算生成器损失：Among them, the output of the generator network and the corresponding input sample label data, as well as the probability feature map calculated by the discriminant network based on the output data, are used together to calculate the generator loss according to the following formula:

其中y表示网络输出的HDR图像，

为目标图像，下标l，c分别表示像素位置和颜色通道，

其中m,

分别为网络预测值和标签值。Where m,

are the network prediction value and label value respectively.

其中∈为[0,1]的随机数。Where ∈ is a random number in [0,1].

其中α₁,α₂,α₃分别取1,0.1,0.02，每一项损失的具体计算公式如发明内容中描述。根据计算得到的损失值，使用Adam优化算法对网络进行反向传播并更新权重。另外，每次更新完生成网络的权重，需要继续计算判别网络的损失并更新判别网络的权重，其更新算法同样使用Adam算法，具体公式如发明内容中描述。Wherein α ₁ , α ₂ , α ₃ are 1, 0.1, and 0.02 respectively, and the specific calculation formula of each loss is described in the content of the invention. According to the calculated loss value, the network is back-propagated and the weight is updated using the Adam optimization algorithm. In addition, each time the weight of the generating network is updated, it is necessary to continue to calculate the loss of the discriminant network and update the weight of the discriminant network. The update algorithm also uses the Adam algorithm, and the specific formula is described in the content of the invention.

依照上述训练方法，每次交付一对或多对训练数据供网络训练，对整个训练数据集依次循环迭代，直至损失函数收敛，网络即训练完成。According to the above training method, one or more pairs of training data are delivered to the network for training each time, and the entire training data set is iterated in sequence until the loss function converges, and the network training is completed.

(三)网络模型应用(III) Network model application

在网络训练完成后，将生成网络部分的网络及其权重参数提取出来即得到了最终的高动态范围图像的重建模型。利用该模型，每次仅需要一张LDR图片作为输入，即可得到近似真实的HDR图片。附图3展示了一个应用的示例，其中生成网络模型即通过前述方法训练好的神经网络中的生成网络部分，该模型可接受任意尺寸的LDR图片作为输入，并直接输出同尺寸下的HDR重建图像。After the network training is completed, the network and its weight parameters of the generative network part are extracted to obtain the final reconstruction model of the high dynamic range image. Using this model, only one LDR picture is needed as input each time to obtain an approximately realistic HDR picture. Figure 3 shows an example of an application, in which the generative network model is the generative network part of the neural network trained by the above method. The model can accept LDR pictures of any size as input and directly output HDR reconstructed images of the same size.

本发明基于深度学习提出了一个单张普通数字图像重建高动态范围的方法，可对一般场景的图像进行有效、逼真的高动态范围重建。本发明应用广泛，根据对不同数据集进行训练可以适应不同需求的场景，且对于同一训练集仅需训练一次便可多次应用。The present invention proposes a method for reconstructing high dynamic range from a single ordinary digital image based on deep learning, which can effectively and realistically reconstruct high dynamic range for images of general scenes. The present invention has a wide range of applications. It can adapt to scenes with different requirements by training different data sets, and the same training set only needs to be trained once and can be applied multiple times.

Claims

1. The image high dynamic range reconstruction method based on deep learning is characterized by comprising the following steps of:

step 1, a neural network is established based on a deep learning method, and comprises a generation network from a low dynamic range image to a high dynamic range image and a discrimination network for discriminating whether the high dynamic network image is real;

step 2, preprocessing an HDR data set to form training data, wherein the data preprocessing is divided into three parts of generation of LDR data, alignment of HDR image brightness units and generation of image highlight region masks, the LDR data obtained after preprocessing is used as training input data for generating a network and is output as an HDR image and a highlight mask image, the aligned HDR data and the highlight mask image after the data preprocessing are used as sample label data for training, the network is judged to accept one HDR image and the highlight mask image as input, and a feature map representing the probability that the input HDR image is a real HDR image or a false HDR image generated by the network is output;

step 3, training the neural network based on three loss functions, training the neural network in a supervised learning mode, performing back propagation optimization on a generation network and a discrimination network successively by adopting an Adam optimizer, wherein the three loss functions are respectively a scale invariant loss function of an HDR reconstructed image, a cross entropy loss function of highlight region classification and a generation antagonism loss function, and the loss functions are defined as follows:

L _G ＝α ₁ L _hdr +α ₂ L _mask +α ₃ L _gan

the scale invariant loss function of the HDR reconstructed image is defined as follows:

where y represents the HDR image output by the network,

for an aligned HDR image, +.>

Representing the difference between the network output and the sample label in the logarithmic domain, e being a small value preventing logarithmic calculation to zero, subscript l, c representing pixel position and color channel, respectively;

a cross entropy loss function for highlight region classification, defined as follows:

wherein the method comprises the steps of

Respectively a network predicted value and a label value;

the challenge loss function is generated as defined below:

wherein D (y) is a calculation result for discriminating the network to generate network output as input;

training the network according to the loss function, and extracting and generating a network model as a final algorithm model after the loss function converges.

2. The method according to claim 1, characterized in that: the low dynamic range image refers to a low dynamic range image stored at 8-bit color depth, 256-tone scale, and the high dynamic range image refers to a high dynamic range image stored in ". EXR" or ". HDR" format that approximates the change in the brightness of a real scene.

3. The method according to claim 1, characterized in that: the neural network described in the step 1 comprises a generating network and a judging network, wherein the generating network is of a U-Net structure, the network receives an LDR image as input, and after a decoding network is formed by a coding network formed by a ResNet50 model and a 6-layer 'up-sampling+convolution layer' module, an HDR image and a highlight mask image are respectively output; the judging network is a full convolution network formed by 4 layers of convolution layers, receives an HDR image and a high light mask image as inputs, and outputs a feature map representing the probability that the input HDR image is a real HDR image or a false HDR image generated by the network.

4. The method according to claim 1, characterized in that: the data preprocessing described in the step 2 comprises the following specific processes:

step 2.1, generating LDR data, generating LDR training sample input, respectively shooting each HDR image by using a tone mapping algorithm and a virtual camera to obtain an LDR image, selecting a proper tone mapping algorithm, and directly inputting the HDR image as the algorithm to obtain a corresponding LDR image output; meanwhile, obtaining an LDR image by constructing a virtual camera, firstly determining a value range of a dynamic range of the virtual camera based on a common digital single-phase inverter, randomly selecting a value in the range as the dynamic range of the camera for simulating shooting every time the LDR image is obtained, then automatically exposing the virtual camera according to an input HDR image, taking a boundary value for a pixel with brightness exceeding the dynamic range of the virtual camera, linearly mapping the pixel to a low dynamic range of the LDR image, and finally mapping the obtained image from a linear space to a common digital image through a randomly selected approximate camera response function to obtain the required LDR image;

step 2.2, alignment of HDR image luminance units for preservingHDR images with relative luminance domains exist, aligning their luminance units before they are labeled as training samples; let the original HDR image be H, the LDR image is converted to linear space and normalized to [0,1]]Let L, H _l，c ，L _l，c The pixel values of the image at the position l and the channel c are aligned by the following methods:

wherein the method comprises the steps of

For an aligned HDR image, m _l，c Is defined as

Wherein τ is a constant of [0,1], and the aligned HDR image and the corresponding LDR image form a training sample pair for training of the neural network;

step 2.3, generating an image highlight region mask, obtaining an aligned HDR image brightness unit, and obtaining a mask image of a highlight region in the image in a binarization mode, wherein the formula is as follows:

wherein the method comprises the steps of

For the channel mean image of the aligned HDR image, t is a constant, and the region with the median value of 1 in the mask image represents an object or surface with higher brightness in the scene, including a light source and a strong light reflecting surface. />