CN116547696A

CN116547696A - Image enhancement method and device

Info

Publication number: CN116547696A
Application number: CN202180079713.XA
Authority: CN
Inventors: 沈枫易; 奥纳伊·优厄法利欧格路
Original assignee: Huawei Technologies Co Ltd
Current assignee: Shenzhen Yinwang Intelligent Technology Co ltd
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2023-08-04
Also published as: WO2022174908A1; EP4232944A1

Abstract

An image enhancement method comprising: generating an input image by concatenating an original input image with a depth map. The method includes: encoding the input image by using an encoder to generate a bottleneck feature; injecting a disturbance vector into the bottleneck feature. The bottleneck features injected into the perturbation vector are fed to an image generator. The method further includes: at the image generator, generating an enhanced image according to the bottleneck feature and the disturbance vector. The method also includes receiving, at a discriminator, the enhanced image and a randomly selected clear image from the clear image dataset, and determining an image enhancement based on a comparison between the enhanced image and the randomly selected clear image Score. The method can provide multiple output images, reduce processing complexity, and improve the quality of the output images.

Description

A method and device for image enhancement

技术领域technical field

本发明大体涉及计算机视觉和机器学习领域，更具体地，涉及一种以无监督方式进行多模态图像增强的方法和装置。The present invention generally relates to the fields of computer vision and machine learning, and more particularly relates to a method and device for multimodal image enhancement in an unsupervised manner.

背景技术Background technique

目前，处理在不同的天气和照明条件(例如，晴朗、多雨、多雾、阴暗、模糊或多雪条件)下捕获的图像以进行图像增强是一项突出的技术挑战。例如，在自动驾驶时，需要在不同的天气和照明条件下捕获图像，并以实时或近实时的方式对此类图像进行图像增强处理，以促进安全自动驾驶，这是一项巨大的挑战。原因在于，由于不同的天气和照明条件，此类图像的许多特征通常不会显现出来，这些特征有助于进行感知。Currently, processing images captured under different weather and lighting conditions (e.g., clear, rainy, foggy, overcast, blurry, or snowy conditions) for image enhancement is an outstanding technical challenge. For example, in autonomous driving, it is a great challenge to capture images under different weather and lighting conditions and perform image enhancement processing on such images in real-time or near real-time to facilitate safe autonomous driving. The reason is that due to different weather and lighting conditions, many features of such images that are helpful for perception often do not emerge.

目前，已经提出了某些方法来处理此类图像。例如，传统方法使用输入图像及其对应的输出清晰图像的配对数据来训练传统模型，该传统模型用于将输入图像映射到其对应的输出清晰图像。然而，使用标记数据集(例如，输入图像及其对应的输出清晰图像)对传统模型进行此类监督学习需要付出大量努力。存在另一种传统图像去雾方法，该方法基于传统的大气散射模型(即，基于物理的模型)，而不是在监督学习中使用的配对数据。然而，传统图像去雾方法是特定于任务的，难以推广应用于其它图像增强任务，如图像去模糊、弱光增强等。此外，在传统图像增强方法中，仅从输入图像获取一个输出图像，信息量较小，效率较低，因此用途有限。由于现有方法是复杂、低效且特定于任务的图像增强方法，用途有限且不适合整体图像增强，因此存在有关如何充分、整体增强在不同天气和照明条件下捕获以用于实际应用(如促进安全自动驾驶)的图像的技术问题。Currently, certain methods have been proposed to process such images. For example, traditional methods use paired data of an input image and its corresponding output clear image to train a traditional model for mapping an input image to its corresponding output clear image. However, performing such supervised learning on traditional models using labeled datasets (e.g., input images and their corresponding output clean images) requires significant effort. Another traditional approach to image dehazing exists that is based on traditional atmospheric scattering models (i.e., physics-based models) instead of paired data used in supervised learning. However, traditional image dehazing methods are task-specific and difficult to generalize to other image enhancement tasks, such as image deblurring, low-light enhancement, etc. In addition, in traditional image enhancement methods, only one output image is obtained from the input image, which is less informative and less efficient, so its usefulness is limited. Since existing methods are complex, inefficient, and task-specific image enhancement methods with limited utility and unsuitable for overall image enhancement, there are studies on how sufficient, overall enhancements can be captured under different weather and lighting conditions for practical applications (such as Technical issues with images that promote safe autonomous driving).

因此，根据上述讨论，需要克服与传统图像增强方法相关联的上述缺点。Therefore, in light of the above discussion, there is a need to overcome the above-mentioned disadvantages associated with conventional image enhancement methods.

发明内容Contents of the invention

本发明提供了一种多模态图像增强方法和装置，以促进安全自动驾驶。由于现有方法是复杂、低效且特定于任务的图像增强方法，用途有限且不适合整体图像增强，因此本发明提供了一种解决有关如何充分、整体增强在不同天气和照明条件下捕获的图像这一现有问题的方案。本发明的目的在于提供一种至少部分解决现有技术中遇到的问题的方案，并提供一种多模态图像增强方法和装置，所述多模态图像增强能够充分、整体增强在不同天气和照明条件下捕获以用于各种实际应用(如促进安全自动驾驶)的图像。The present invention provides a multimodal image enhancement method and device to promote safe automatic driving. Since existing methods are complex, inefficient, and task-specific image enhancement methods with limited utility and unsuitable for overall image enhancement, the present invention provides a solution to the problem of how to adequately and overall enhance images captured under different weather and lighting conditions. Image of this existing problem scenario. The purpose of the present invention is to provide a solution to at least partially solve the problems encountered in the prior art, and provide a multi-modal image enhancement method and device, the multi-modal image enhancement can fully and overall enhance the image quality in different weather Images captured under and lighting conditions for use in various practical applications such as facilitating safe autonomous driving.

本发明的目的是通过所附独立权利要求中提供的方案实现的。本发明的有利实现方式在从属权利要求中进一步定义。The objects of the invention are achieved by the solutions presented in the appended independent claims. Advantageous implementations of the invention are further defined in the dependent claims.

在一个方面，本发明提供了一种图像增强方法，其中，所述方法包括：通过将原始输入图像与锐化注意力图或深度图级联，生成输入图像。所述方法还包括：通过使用编码器对所述输入图像进行编码，生成瓶颈特征。所述方法还包括：将扰动向量注入所述瓶颈特征。所述方法还包括：将注入所述扰动向量的所述瓶颈特征馈送至图像生成器。所述方法还包括：在所述图像生成器处，根据所述瓶颈特征和所述扰动向量，生成增强图像。所述方法还包括：在判别器处，接收所述增强图像以及从清晰图像数据集中随机选择的清晰图像，并且根据所述增强图像与所述随机选择的清晰图像之间的比较来确定图像增强得分。In one aspect, the present invention provides an image enhancement method, wherein the method comprises: generating an input image by concatenating the original input image with a sharpened attention map or a depth map. The method further includes generating bottleneck features by encoding the input image using an encoder. The method also includes injecting a perturbation vector into the bottleneck feature. The method also includes feeding the bottleneck feature injected into the perturbation vector to an image generator. The method further includes: at the image generator, generating an enhanced image according to the bottleneck feature and the disturbance vector. The method also includes receiving, at a discriminator, the enhanced image and a randomly selected clear image from the clear image dataset, and determining an image enhancement based on a comparison between the enhanced image and the randomly selected clear image Score.

所公开的方法生成用于图像增强的可控解空间(或一组多个输出图像)，而不是仅生成一个非最优输出图像。通过考虑所述原始输入图像中的不确定度，可以更合理地创建所述可控解空间，并允许搜索最优解(例如，改进和增强的输出图像)。因此，所公开的方法提供多个输出图像，并且使得能够处理最优输出图像，从而表现出更高的可靠性和效率。所公开的方法使用所述锐化注意力图或所述深度图，并且因此提供更好的指导，以便生成更清晰、更明亮的输出图像。此外，所公开的方法提供了用于多个图像增强任务(例如，去雾、去模糊、弱光增强等)的统一方法，处理在不同天气和照明条件下(如多雾、晴朗、多雨、阴暗或多雪环境中)捕获的图像，以便对此类图像进行无监督、可控的整体图像增强。例如，在一个示例性实际应用中，所述方法使得此类增强图像的各种图像特征非常明显，这些特征有助于进行感知，即使在不同的天气和照明条件下也是如此，以促进安全自动驾驶。The disclosed method generates a controllable solution space (or a set of multiple output images) for image enhancement, rather than just one non-optimal output image. By taking into account the uncertainty in the original input image, the controllable solution space can be more rationally created and allows searching for optimal solutions (eg, improved and enhanced output images). Thus, the disclosed method provides multiple output images and enables processing of an optimal output image, thereby exhibiting greater reliability and efficiency. The disclosed method uses either the sharpening attention map or the depth map and thus provides better guidance in order to generate a sharper and brighter output image. Furthermore, the disclosed method provides a unified approach for multiple image enhancement tasks (e.g., dehazing, deblurring, low-light enhancement, etc.), dealing with different weather and lighting conditions (such as foggy, clear, rainy, shady or snowy environments) for unsupervised, controllable overall image enhancement of such images. For example, in one exemplary practical application, the method makes distinct image features of such enhanced images that aid in perception, even under varying weather and lighting conditions, to facilitate safe automatic drive.

在一种实现方式中，所述方法还包括：将所述判别得分反馈给所述编码器和所述图像生成器。In an implementation manner, the method further includes: feeding back the discrimination score to the encoder and the image generator.

通过将所述判别得分反馈给所述编码器和所述图像生成器，逐渐提高图像质量。此外，基于预先训练的卷积神经网络(Convolutional Neural Network，CNN)(如VGG神经网络)，计算所述原始输入图像与所述增强图像之间的感知损失，以使结构保持在特征级别。By feeding the discriminant score back to the encoder and the image generator, the image quality is gradually improved. In addition, based on a pre-trained convolutional neural network (Convolutional Neural Network, CNN) (such as a VGG neural network), the perceptual loss between the original input image and the enhanced image is calculated to keep the structure at the feature level.

在另一种实现方式中，根据高斯分布对所述扰动向量进行采样。In another implementation, the disturbance vector is sampled according to a Gaussian distribution.

所述根据高斯分布对所述扰动向量进行采样导致生成多模态输出(或多个输出图像)，从而创建输出图像的解空间，以改进图像增强。Said sampling said perturbation vector according to a Gaussian distribution results in the generation of a multimodal output (or multiple output images), thereby creating a solution space of output images for improved image enhancement.

在另一种实现方式中，根据预先训练的网络权重更新所述扰动向量。In another implementation manner, the perturbation vector is updated according to pre-trained network weights.

所述扰动向量用于从所述输出图像的可控解空间中确定最优输出图像(即，改进的输出)。在训练过程中，所述扰动向量没有预先确定的网络权重，并且充当在每个训练步骤中根据高斯分布进行采样的随机向量。然而，在所述训练完成之后，根据所述预先训练(即，固定)的网络权重更新所述扰动向量，然后调整(或微调)所述扰动向量，以搜索最优解(即，改进的输出图像)。The perturbation vectors are used to determine the optimal output image (ie improved output) from the controllable solution space of the output images. During training, the perturbation vector has no predetermined network weights and acts as a random vector sampled according to a Gaussian distribution at each training step. However, after the training is complete, the perturbation vector is updated according to the pre-trained (i.e., fixed) network weights, and then adjusted (or fine-tuned) to search for an optimal solution (i.e., improved output image).

在另一种实现方式中，使用梯度下降，通过降低对抗损失、Frechet Inception距离得分或结构相似性指数得分，调整所述扰动向量。In another implementation, gradient descent is used to adjust the perturbation vector by reducing adversarial loss, Frechet Inception distance score or structural similarity index score.

随机初始化所述扰动向量，然后使用梯度下降，通过降低所述对抗损失、所述Frechet Inception距离得分或所述结构相似性指数得分，调整所述扰动向量，其中，所述调整(或微调)的扰动向量促使生成改进或最令人视觉愉悦的图像。Randomly initialize the perturbation vector, and then use gradient descent to adjust the perturbation vector by reducing the adversarial loss, the Frechet Inception distance score or the structural similarity index score, wherein the adjusted (or fine-tuned) The perturbation vector results in an improved or most visually pleasing image.

在另一种实现方式中，所述判别器是基于梯度的多图块判别器。In another implementation, the discriminator is a gradient-based multi-tile discriminator.

使用所述基于梯度的多图块判别器会进一步提高输出图像的质量。Using the gradient-based multi-patch discriminator further improves the quality of the output image.

在另一种实现方式中，所述判别器至少包括以下三个网络分支：第一网络分支，用于采集高斯模糊生成器输出；第二网络分支，用于采集身份图像；第三网络分支，用于采集高斯拉普拉斯算子模糊生成器输出；其中，在对每个卷积层后所述三个分支生成的输出求和之后，获取所述判别器的结果。In another implementation, the discriminator includes at least the following three network branches: the first network branch is used to collect the output of the Gaussian blur generator; the second network branch is used to collect the identity image; the third network branch, It is used to collect the output of the Gaussian Laplacian fuzzy generator; wherein, after summing the outputs generated by the three branches after each convolutional layer, the result of the discriminator is obtained.

使用所述判别器的所述三个网络分支可在所述增强图像中提供改进的照明控制和锐边信息。Using the three network branches of the discriminator provides improved lighting control and sharp edge information in the enhanced image.

在另一方面，本发明提供了一种图像增强装置，其中，所述装置用于：通过将原始输入图像与锐化注意力图或深度图级联，生成输入图像。所述装置还用于：通过使用编码器对所述输入图像进行编码，生成瓶颈特征。所述装置还用于：将扰动向量注入所述瓶颈特征。所述装置还用于：将注入所述扰动向量的所述瓶颈特征馈送至图像生成器。所述装置还用于：在所述图像生成器处，根据所述瓶颈特征和所述扰动向量，生成增强图像。所述装置还用于：在判别器处，接收所述增强图像以及从清晰图像数据集中随机选择的清晰图像，并且根据所述增强图像与所述随机选择的清晰图像之间的差异来确定图像增强得分。In another aspect, the present invention provides an image enhancement apparatus, wherein the apparatus is configured to generate an input image by concatenating the original input image with a sharpened attention map or a depth map. The device is further configured to: generate bottleneck features by using an encoder to encode the input image. The apparatus is further configured to: inject a disturbance vector into the bottleneck feature. The apparatus is further configured to: feed the bottleneck feature injected into the perturbation vector to an image generator. The device is further configured to: at the image generator, generate an enhanced image according to the bottleneck feature and the disturbance vector. The apparatus is further configured to: at the discriminator, receive the enhanced image and a randomly selected clear image from the clear image dataset, and determine an image according to a difference between the enhanced image and the randomly selected clear image Boost score.

本发明装置实现了所述方法的所有优点和效果。The device of the invention achieves all the advantages and effects of the method described.

在又一方面，本发明提供了一种计算机程序，包括程序代码，所述程序代码在由计算机执行时，使所述计算机执行所述方法。In yet another aspect, the present invention provides a computer program comprising program code which, when executed by a computer, causes the computer to perform the method.

在执行本发明所述的方法之后，所述计算机实现了所述方法的所有优点和效果。After executing the method described in the present invention, the computer realizes all the advantages and effects of the method.

在又一方面，本发明提供了一种安装在车辆上的电子元件，所述电子元件可操作以执行所述方法。In yet another aspect, the present invention provides an electronic component mounted on a vehicle, the electronic component operable to perform the method.

在执行本发明所述的方法之后，所述电子元件实现了所述方法的所有优点和效果。After carrying out the method according to the invention, the electronic component realizes all the advantages and effects of said method.

应当理解的是，可以组合所有上述实现方式。It should be understood that all above-described implementations may be combined.

需要说明的是，本申请中描述的所有设备、元件、电路、单元和构件均可以通过软件或硬件元件或其任何类型的组合实现。本申请中描述的各种实体执行的所有步骤和所描述的将由各种实体执行的功能旨在表明相应的实体用于执行相应的步骤和功能。虽然在以下具体实施例的描述中，外部实体执行的具体功能或步骤没有在执行具体步骤或功能的实体的具体详述元件的描述中反映，但是技术人员应清楚，这些方法和功能可以通过相应的硬件或软件元件或其任何组合实现。应当理解的是，在不脱离所附权利要求书定义的本发明的范围的情况下，可以对本发明的特征进行各种组合。It should be noted that all devices, elements, circuits, units and components described in this application can be implemented by software or hardware elements or any combination thereof. All steps described in this application as being performed by various entities and functions described to be performed by the various entities are intended to indicate that the corresponding entities are used to perform the corresponding steps and functions. Although in the description of the following specific embodiments, the specific functions or steps performed by the external entity are not reflected in the description of the specific detailed elements of the entity that performs the specific steps or functions, it should be clear to those skilled in the art that these methods and functions can be implemented through corresponding implemented by hardware or software elements or any combination thereof. It will be understood that various combinations of features of the invention are possible without departing from the scope of the invention as defined by the appended claims.

本发明的附加方面、优点、特征和目的从附图和结合以下所附权利要求书解释的说明性实现方式的详细描述中变得显而易见。Additional aspects, advantages, features and objects of the present invention will become apparent from the accompanying drawings and detailed description of illustrative implementations interpreted in conjunction with the following appended claims.

附图说明Description of drawings

结合附图阅读，可以更好地理解以上概述以及以下说明性实施例的详细描述。为了说明本发明，在附图中示出了本发明的示例性结构。但是，本发明不限于本文公开的具体方法和工具。此外，本领域技术人员应理解，附图不是按比例绘制的。在可能的情况下，相同的元件用相同的数字表示。The foregoing summary, as well as the following detailed description of illustrative embodiments, may be better understood when read in conjunction with the accompanying drawings. In order to illustrate the present invention, exemplary structures of the present invention are shown in the drawings. However, the invention is not limited to the particular methodology and tools disclosed herein. Furthermore, those skilled in the art will appreciate that the drawings are not drawn to scale. Where possible, like elements have been given like numerals.

现在参考以下附图、仅通过示例的方式描述本发明的实施例，其中：Embodiments of the present invention are now described, by way of example only, with reference to the following drawings, in which:

图1示出了本发明实施例提供的图像增强方法的流程图；FIG. 1 shows a flowchart of an image enhancement method provided by an embodiment of the present invention;

图2示出了本发明实施例提供的装置的各种示例性组件的框图；Fig. 2 shows a block diagram of various exemplary components of a device provided by an embodiment of the present invention;

图3A示出了本发明实施例提供的学习(或训练)图像去雾模型的图示表征；FIG. 3A shows a graphical representation of a learning (or training) image dehazing model provided by an embodiment of the present invention;

图3B示出了本发明实施例提供的编码器的图示表征；Fig. 3B shows a graphical representation of an encoder provided by an embodiment of the present invention;

图3C示出了本发明实施例提供的密集连接块的图示表征；Figure 3C shows a graphical representation of a densely connected block provided by an embodiment of the present invention;

图3D示出了本发明实施例提供的具有高斯扰动向量的编码器-解码器结构的图示表征；FIG. 3D shows a graphical representation of an encoder-decoder structure with a Gaussian perturbation vector provided by an embodiment of the present invention;

图3E示出了本发明实施例提供的判别器的图示表征；FIG. 3E shows a graphical representation of a discriminator provided by an embodiment of the present invention;

图3F示出了本发明实施例提供的微调以获取最优输出图像的图示表征；Fig. 3F shows a pictorial representation of fine-tuning provided by an embodiment of the present invention to obtain an optimal output image;

图4示出了本发明实施例提供的图像增强方法和装置的示例性实现场景的图示。Fig. 4 shows an illustration of an exemplary implementation scenario of the image enhancement method and apparatus provided by the embodiments of the present invention.

在附图中，带下划线的数字用于表示带下划线的数字所在的项目或与带下划线的数字相邻的项目。不带下划线的数字是指由将不带下划线的数字与项目连接的线所标识的项目。当一个数字不带下划线并具有关联的箭头时，不带下划线的数字用于标识箭头指向的一般项目。In the figures, an underlined number is used to denote an item on which the underlined number is located or an item adjacent to the underlined number. A non-underlined number refers to an item identified by a line connecting the non-underlined number with the item. When a number is not underlined and has an associated arrow, the ununderlined number is used to identify the general item that the arrow points to.

具体实施方式Detailed ways

以下详细描述说明了本发明的实施例以及可以实现这些实施例的方式。尽管已经公开了实施本发明的一些模式，但本领域技术人员应认识到，也可以存在用于实施或实践本发明的其它实施例。The following detailed description illustrates embodiments of the invention and the manner in which these embodiments can be implemented. While certain modes of carrying out the invention have been disclosed, those skilled in the art will recognize that there may be other embodiments for carrying out or practicing the invention.

图1示出了本发明实施例提供的图像增强方法的流程图。参考图1，图中示出了图像增强方法100。所述方法100包括步骤102至步骤112。所述方法100由诸如图2中详细描述的装置执行。Fig. 1 shows a flowchart of an image enhancement method provided by an embodiment of the present invention. Referring to FIG. 1 , an image enhancement method 100 is shown. The method 100 includes steps 102 to 112 . The method 100 is performed by an apparatus such as that described in detail in FIG. 2 .

本发明提供了所述图像增强方法100，其中，所述方法100包括：The present invention provides the image enhancement method 100, wherein the method 100 includes:

(i)通过将原始输入图像与锐化注意力图或深度图级联，生成输入图像；(i) Generate an input image by concatenating the original input image with a sharpened attention map or a depth map;

(ii)通过使用编码器对所述输入图像进行编码，生成瓶颈特征；(ii) generating bottleneck features by encoding said input image using an encoder;

(iii)将扰动向量注入所述瓶颈特征；(iii) injecting a perturbation vector into the bottleneck feature;

(iv)将注入所述扰动向量的所述瓶颈特征馈送至图像生成器；(iv) feeding said bottleneck feature injected into said perturbation vector to an image generator;

(v)在所述图像生成器处，根据所述瓶颈特征和所述扰动向量，生成增强图像；(v) at the image generator, generate an enhanced image based on the bottleneck feature and the perturbation vector;

(vi)在判别器处，接收所述增强图像以及从清晰图像数据集中随机选择的清晰图像，并且根据所述增强图像与所述随机选择的清晰图像之间的差异来确定图像增强得分。(vi) At a discriminator, receiving the enhanced image and a randomly selected clear image from the clear image dataset, and determining an image enhancement score based on a difference between the enhanced image and the randomly selected clear image.

本发明提供了所述图像增强方法100。所述方法100基于生成对抗网络(generative adversarial network，GAN)。所述生成对抗网络可以用于通过使用图像生成器相对于输入图像生成多个输出图像。所述生成的多个输出图像表现出更高的视觉质量，并且提供所需的有用图像特征，以便以实时或近实时的方式进行感知。所述生成对抗网络还可以用于包括支持所述图像生成器生成真实输出的判别器。The present invention provides the image enhancement method 100 . The method 100 is based on a generative adversarial network (GAN). The GAN can be used to generate a plurality of output images relative to an input image by using an image generator. The generated multiple output images exhibit higher visual quality and provide useful image features required for perception in real-time or near real-time. The GAN can also be configured to include a discriminator that enables the image generator to generate realistic outputs.

在步骤102中，所述方法100包括：通过将原始输入图像与锐化注意力图或深度图级联，生成输入图像。所述原始输入图像对应于在下列其中一种条件下捕获的图像：雾天、多雾、晴朗、多雨、阴暗、多雪环境中或其它不利天气或照明条件。所述原始输入图像也可以称为降质图像，因为所述原始输入图像没有揭示适合实际应用或用途的有用图像特征。例如，所述原始输入图像可以由安装在自动驾驶车辆上的摄像头捕获，这可能不会揭示在不同天气和照明条件下进行安全自动驾驶所需的特征。在另一示例中，所述原始输入图像包括一个或多个对象，可以由手持设备(如智能手机)在弱光多雨环境中捕获，在此类环境中捕获的一个或多个对象可能不清晰。因此，为了从所述原始输入图像中获取有用的特征(如对象形状和边缘等)，使用所述锐化注意力图或所述深度图来处理所述原始输入图像以生成所述输入图像，从而在下一操作中充当编码器的输入。与所述原始输入图像相比，所述输入图像表现出改进的特征，如得到改进的视觉质量、对象形状和边缘细节。所述锐化注意力图可以定义为标量矩阵，所述标量矩阵表示在不同的二维(two dimensional，2D)空间位置处多层激活相对于目标任务(例如，输出清晰图像)的相对重要性。所述深度图可以定义为图像或图像通道，所述图像或图像通道提供场景对象表面与视点之间距离的相关信息。在一种实现方式中，所述锐化注意力图或所述深度图可以从预先训练的模型中获取，也可以以无监督方式在网络中进行联合训练。In step 102, the method 100 includes generating an input image by concatenating the original input image with a sharpened attention map or a depth map. The original input image corresponds to an image captured under one of the following conditions: foggy, foggy, clear, rainy, dark, snowy, or other adverse weather or lighting conditions. The original input image may also be referred to as a degraded image because the original input image does not reveal useful image features suitable for the actual application or use. For example, the raw input image may be captured by a camera mounted on an autonomous vehicle, which may not reveal the characteristics required for safe autonomous driving in different weather and lighting conditions. In another example, the raw input image includes one or more objects, which may be captured by a handheld device (such as a smartphone) in a low-light and rainy environment, where the one or more objects captured may not be clear . Therefore, in order to obtain useful features (such as object shapes and edges, etc.) from the original input image, the original input image is processed using the sharpened attention map or the depth map to generate the input image, thereby Acts as input to the encoder in the next operation. Compared to the original input image, the input image exhibits improved features, such as improved visual quality, object shape and edge detail. The sharpened attention map can be defined as a scalar matrix representing the relative importance of multi-layer activations at different two dimensional (2D) spatial locations with respect to a target task (eg, outputting a sharp image). The depth map may be defined as an image or image channel that provides information about the distance between the scene object surface and the viewpoint. In an implementation manner, the sharpening attention map or the depth map can be obtained from a pre-trained model, or can be jointly trained in a network in an unsupervised manner.

在步骤104中，所述方法100还包括：通过使用编码器对所述输入图像进行编码，生成瓶颈特征。将所述输入图像发送给所述编码器，从而生成所述瓶颈特征。所述瓶颈特征是指从所述输入图像中提取的编码特征图，与所述输入图像相比，所述编码特征图具有更小的空间大小但具有更多的通道。例如，图2对所述编码器进行了详细描述。In step 104, the method 100 further includes: generating bottleneck features by using an encoder to encode the input image. The input image is sent to the encoder to generate the bottleneck feature. The bottleneck feature refers to an encoded feature map extracted from the input image, which has a smaller spatial size but more channels than the input image. For example, Figure 2 provides a detailed description of the encoder.

在步骤106，所述方法100还包括：将扰动向量注入所述瓶颈特征。通过将所述扰动向量注入所述瓶颈特征，可以生成多模态图像输出。通过改变输出图像外观，生成所述多模态图像输出。At step 106, the method 100 further includes: injecting a disturbance vector into the bottleneck feature. By injecting the perturbation vector into the bottleneck feature, a multimodal image output can be generated. The multimodal image output is generated by changing the appearance of the output image.

根据一个实施例，根据高斯分布对所述扰动向量进行采样。在一种实现方式中，所述扰动向量对应于根据所述高斯分布进行采样的六维扰动向量。通常，所述高斯分布可以定义为遵循正态分布的钟形曲线，其在平均值之上和之下具有相等数量的测量值。通过使用多层感知器(multi-layer perceptron，MLP)网络，对所述扰动向量进行上采样。例如，图3A对所述MLP网络进行了详细描述。通过使用自适应实例归一化(adaptive instancenormalization，AdaIn)方法，将所述上采样扰动向量注入所述瓶颈特征。所述自适应实例归一化(adaptive instance normalization，AdaIn)方法对图像生成阶段产生影响，并生成多模态输出图像。例如，图3A对所述自适应实例归一化(adaptive instancenormalization，AdaIn)方法进行了详细描述。According to one embodiment, said perturbation vector is sampled according to a Gaussian distribution. In an implementation manner, the disturbance vector corresponds to a six-dimensional disturbance vector sampled according to the Gaussian distribution. In general, the Gaussian distribution can be defined as following a normal distribution with an equal number of measurements above and below the mean. The perturbation vectors are up-sampled by using a multi-layer perceptron (MLP) network. For example, Figure 3A details the MLP network. The upsampled perturbation vector is injected into the bottleneck feature by using an adaptive instance normalization (AdaIn) method. The adaptive instance normalization (AdaIn) method affects the image generation stage and generates multimodal output images. For example, FIG. 3A describes in detail the adaptive instance normalization (AdaIn) method.

根据一个实施例，根据预先训练的网络权重更新所述扰动向量。在训练过程中，所述扰动向量没有预先确定的网络权重，并且充当在每个训练步骤中根据高斯分布进行采样的随机向量。然而，在所述训练完成之后，根据所述预先训练(即，固定)的网络权重更新所述扰动向量，然后调整(或微调)所述扰动向量，以搜索最优解(即，改进的输出图像)。所述预先训练的网络权重基于所述预先训练的编码器、所述图像生成器以及所述判别器网络权重。在所述训练完成之后，固定此类网络权重，然后对所述扰动向量进行所述微调。According to one embodiment, said perturbation vector is updated according to pre-trained network weights. During training, the perturbation vector has no predetermined network weights and acts as a random vector sampled according to a Gaussian distribution at each training step. However, after the training is complete, the perturbation vector is updated according to the pre-trained (i.e., fixed) network weights, and then adjusted (or fine-tuned) to search for an optimal solution (i.e., improved output image). The pre-trained network weights are based on the pre-trained encoder, image generator, and discriminator network weights. After the training is completed, such network weights are fixed, and then the fine-tuning is performed on the perturbation vector.

根据一个实施例，使用梯度下降，通过降低对抗损失、Frechet Inception距离得分或结构相似性指数得分，调整所述扰动向量。随机初始化所述扰动向量，然后使用梯度下降，通过最小化所述对抗损失、所述Frechet Inception距离(Frechet InceptionDistance，FID)得分或所述结构相似性指数(Structural Similarity Index，SSI)得分等各种损失，调整所述扰动向量。通常，所述对抗损失可以定义为地面真值数据(或原始数据或源数据)与通过使用生成对抗网络(generative adversarial network，GAN)计算的生成数据之间的差异。所述FID得分可以定义为一个度量，所述度量借助预先训练的Inception网络计算针对真实图像与生成图像计算的特征向量之间的距离。所述SSI得分也可以称为结构相似性指数度量(structural similarity index measure，SSIM)。所述SSIM可以定义为用于预测数字电视和电影图片以及其它类型数字图像和视频的感知质量的方法。所述SSIM可以用于度量两个图像之间的相似性。所述更新的扰动向量(或得到的扰动向量)最终收敛到所述更新的扰动向量可以与解码器(或图像生成器)配合以生成视觉上令人愉悦的输出图像的点。所述更新的扰动向量(或得到的扰动向量)可以应用于一个或多个图像(例如，测试图像)。According to one embodiment, the perturbation vector is adjusted by reducing the adversarial loss, the Frechet Inception distance score or the structural similarity index score using gradient descent. Randomly initialize the perturbation vector, and then use gradient descent to minimize the various loss, adjusts the perturbation vector. In general, the adversarial loss can be defined as the difference between the ground truth data (or raw or source data) and the generated data computed by using a generative adversarial network (GAN). The FID score can be defined as a metric that computes the distance between feature vectors computed for real images and generated images with the aid of a pre-trained Inception network. The SSI score may also be called a structural similarity index measure (SSIM). The SSIM can be defined as a method for predicting the perceived quality of digital television and movie pictures, as well as other types of digital images and video. The SSIM can be used to measure the similarity between two images. The updated perturbation vector (or resulting perturbation vector) eventually converges to a point where the updated perturbation vector can cooperate with a decoder (or image generator) to generate a visually pleasing output image. The updated perturbation vector (or resulting perturbation vector) may be applied to one or more images (eg, test images).

在步骤108中，所述方法100还包括：将注入所述扰动向量的所述瓶颈特征馈送至图像生成器。将注入所述扰动向量的所述瓶颈特征进一步馈送至所述图像生成器。所述图像生成器可以用于生成多个输出图像，所述多个输出图像可提高视觉质量，并且提供所需的有用图像特征，以便以实时或近实时的方式进行感知。在步骤110中，所述方法100还包括：在所述图像生成器处，根据所述瓶颈特征和所述扰动向量，生成增强图像。在一种实现方式中，所述图像生成器可以用于在每次迭代中使用注入所述扰动向量的所述瓶颈特征来生成所述增强图像(即，所述输出图像)。在每次迭代中，使用不同的扰动向量会导致各方面的变化，如所述增强图像的照明或亮度控制。通过这种方式，创建用于改变所述增强图像的外观的高斯解空间，并且使得能够在测试阶段控制图像增强输出。另外，可以将所述增强图像发送回所述编码器，以迫使所述编码器生成与所述原始输入图像相同的编码特征(例如，另一类似的瓶颈特征)。因此，这两个编码特征(例如，两个瓶颈特征)之间存在L1范数特征重建损失。In step 108, the method 100 further includes: feeding the bottleneck feature injected into the perturbation vector to an image generator. The bottleneck feature injected into the perturbation vector is further fed to the image generator. The image generator can be used to generate multiple output images that improve visual quality and provide useful image features needed for perception in real-time or near real-time. In step 110, the method 100 further includes: at the image generator, generating an enhanced image according to the bottleneck feature and the disturbance vector. In one implementation manner, the image generator may be configured to use the bottleneck feature injected into the perturbation vector to generate the enhanced image (ie, the output image) in each iteration. In each iteration, using a different perturbation vector results in changes in aspects such as illumination or brightness control of the enhanced image. In this way, a Gaussian solution space for changing the appearance of the augmented image is created and enables control of the image augmentation output during the testing phase. Additionally, the enhanced image may be sent back to the encoder to force the encoder to generate the same encoded features (eg, another similar bottleneck feature) as the original input image. Therefore, there is an L1-norm feature reconstruction loss between these two encoded features (e.g., two bottleneck features).

在步骤112中，所述方法100还包括：在判别器处，接收所述增强图像以及从所述清晰图像数据集中随机选择的清晰图像，并且根据所述增强图像与所述随机选择的清晰图像之间的比较来确定图像增强得分。所述比较是从所述增强图像相对于所述随机选择的清晰图像的真实性的角度进行的。根据所述接收的增强图像以及从所述清晰图像数据集中随机选择的清晰图像，所述判别器用于确定所述增强图像是虚假图像还是真实清晰图像。In step 112, the method 100 further includes: at the discriminator, receiving the enhanced image and a randomly selected clear image from the clear image data set, and according to the enhanced image and the randomly selected clear image The comparison between them to determine the image enhancement score. The comparison is made in terms of the authenticity of the enhanced image relative to the randomly selected sharp image. According to the received enhanced image and a randomly selected clear image from the clear image dataset, the discriminator is used to determine whether the enhanced image is a fake image or a real clear image.

根据一个实施例，所述判别器是基于梯度的多图块判别器。所述基于梯度的多图块判别器包括多个网络分支，并且因此提高输出图像质量。According to one embodiment, said discriminator is a gradient based multi-tile discriminator. The gradient-based multi-patch discriminator includes multiple network branches and thus improves output image quality.

根据一个实施例，所述方法100还包括：将所述判别得分反馈给所述编码器和所述图像生成器。通过将所述判别得分反馈给所述编码器和所述图像生成器，逐渐提高图像质量。此外，基于预先训练的卷积神经网络(Convolutional Neural Network，CNN)(如VGG神经网络)，计算所述原始输入图像与所述增强图像之间的感知损失，以将结构保持在特征级别。According to an embodiment, the method 100 further includes: feeding back the discrimination score to the encoder and the image generator. By feeding the discriminant score back to the encoder and the image generator, the image quality is gradually improved. In addition, based on a pre-trained convolutional neural network (Convolutional Neural Network, CNN) (such as a VGG neural network), the perceptual loss between the original input image and the enhanced image is calculated to maintain the structure at the feature level.

根据一个实施例，所述判别器至少包括以下三个网络分支：According to one embodiment, the discriminator includes at least the following three network branches:

(a)第一网络分支，用于采集高斯模糊生成器输出；(a) a first network branch for collecting the Gaussian blur generator output;

(b)第二网络分支，用于采集身份图像；(b) a second network branch for collecting identity images;

(c)第三网络分支，用于采集高斯拉普拉斯算子模糊生成器输出，(c) the third network branch, used to collect the Gaussian Laplacian fuzzy generator output,

其中，在对每个卷积层后所述三个分支生成的输出求和之后，获取所述判别器的结果。Wherein, after summing the outputs generated by the three branches after each convolutional layer, the result of the discriminator is obtained.

所述高斯模糊生成器输出负责在输出图像中引入更接近目标数据(即，所述增强图像)分布的照明分布。所述第二网络分支采集的所述身份图像类似于标准判别器(例如，判别器)生成的图像。高斯拉普拉斯算子(Laplacian of Gaussian，LoG)模糊生成器输出负责在所述输出图像中生成更接近所述目标数据(即，所述增强图像)分布的更锐利的边缘。所述LoG可以定义为图像的二阶空间导数的二维各向同性度量。或者，换句话说，所述LoG突出显示图像中强度快速变化的区域，并且通常用于边缘检测。从所述判别器中获取对每个卷积层后所述三个分支生成的输出的求和，以对所述输出图像中的不同图块进行真假预测。通过这种方式，所述判别器可提高输出图像质量。The Gaussian blur generator output is responsible for introducing an illumination distribution in the output image that is closer to the distribution of the target data (ie, the enhanced image). The identity image captured by the second network branch is similar to a standard discriminator (e.g., discriminator) generated images. A Laplacian of Gaussian (LoG) blur generator output is responsible for generating sharper edges in the output image that are closer to the distribution of the target data (ie, the enhanced image). The LoG can be defined as a two-dimensional isotropic measure of the second spatial derivative of an image. Or, in other words, the LoG highlights regions of the image with rapidly changing intensities, and is often used for edge detection. The sum of the outputs generated by the three branches after each convolutional layer is obtained from the discriminator to make true and false predictions for different tiles in the output image. In this way, the discriminator can improve the output image quality.

因此，所述方法100生成用于图像增强的可控解空间(或一组多个输出图像)，而不是仅生成一个非最优输出图像。通过考虑所述原始输入图像中的不确定度，可以更合理地创建所述可控解空间，并允许搜索最优解(例如，改进和增强的输出图像)。因此，所述方法100提供多个输出图像，并且使得能够处理最优输出图像，从而表现出更高的可靠性和效率。所述方法100使用所述锐化注意力图或所述深度图，并且因此提供更好的指导，以便生成更清晰、更明亮的输出图像。此外，所述方法100执行多个图像增强任务(如图像去雾、图像去模糊和/或弱光增强)，以整体增强在不同天气和照明条件下(如多雾、晴朗、多雨、阴暗或多雪环境中)捕获的图像。例如，在一个示例性实际应用中，所述方法100使得此类增强图像的各种特征非常明显，这些特征有助于进行感知，即使在不同的天气和照明条件下也是如此，以促进安全自动驾驶。此外，所述方法100可降低处理复杂度，并提高所述输出图像的质量。Thus, the method 100 generates a controllable solution space (or a set of multiple output images) for image enhancement, rather than generating only one non-optimal output image. By taking into account the uncertainty in the original input image, the controllable solution space can be more rationally created and allows searching for optimal solutions (eg, improved and enhanced output images). Thus, the method 100 provides multiple output images and enables processing of an optimal output image, thereby exhibiting greater reliability and efficiency. The method 100 uses the sharpening attention map or the depth map and thus provides better guidance in order to generate a sharper and brighter output image. In addition, the method 100 performs multiple image enhancement tasks (such as image defogging, image deblurring, and/or low-light enhancement) to overall enhance images under different weather and lighting conditions (such as foggy, clear, rainy, dark or images captured in a snowy environment). For example, in one exemplary practical application, the method 100 makes apparent various features of such enhanced images that facilitate perception, even under varying weather and lighting conditions, to facilitate safe automatic drive. In addition, the method 100 can reduce processing complexity and improve the quality of the output image.

步骤102至步骤112仅仅是说明性的，还可以提供其它可选方案，在不脱离本文权利要求书的范围的情况下，添加一个或多个步骤，删除一个或多个步骤，或以不同的顺序提供一个或多个步骤。Step 102 to step 112 are only illustrative, and other optional solutions can also be provided, without departing from the scope of claims herein, adding one or more steps, deleting one or more steps, or using different A sequence provides one or more steps.

图2示出了本发明实施例提供的装置的各种示例性组件的框图。参考图2，图中示出了装置202的框图200。所述装置202包括锐化注意力图或深度图204、编码器206、图像生成器208、存储器210、判别器212和处理器214。Fig. 2 shows a block diagram of various exemplary components of a device provided by an embodiment of the present invention. Referring to FIG. 2 , a block diagram 200 of an apparatus 202 is shown. The apparatus 202 includes a sharpening attention map or depth map 204 , an encoder 206 , an image generator 208 , a memory 210 , a discriminator 212 and a processor 214 .

所述装置202包括适当的逻辑、电路、接口和/或代码，所述适当的逻辑、电路、接口和/或代码用于增强图像。所述装置202用于执行(图1中的)所述方法100。所述装置202的示例包括但不限于可以安装在车辆(例如，自动驾驶车辆或半自动驾驶车辆)、移动设备、便携式设备等上的手持设备或电子设备或元件，其可操作以执行所述方法100。The apparatus 202 includes suitable logic, circuits, interfaces and/or codes for enhancing images. The device 202 is configured to execute the method 100 (in FIG. 1 ). Examples of the apparatus 202 include, but are not limited to, handheld devices or electronic devices or components that may be mounted on a vehicle (e.g., an autonomous vehicle or a semi-autonomous vehicle), a mobile device, a portable device, etc., that are operable to perform the described method 100.

所述锐化注意力图或所述深度图204用于提供更好的指导，以便生成与传统散射模型相比具有更高视觉质量的输出图像。所述锐化注意力图或所述深度图204可以是可以安装在所述装置202中的软件程序或数学表达式或应用。The sharpening attention map or the depth map 204 is used to provide better guidance in order to generate output images with higher visual quality compared to traditional scattering models. The sharpening attention map or the depth map 204 may be a software program or a mathematical expression or application that may be installed in the device 202 .

所述编码器206(也表示为Enc)包括适当的逻辑、电路、接口和/或代码，所述适当的逻辑、电路、接口和/或代码可以定义为网络(例如，卷积神经网络(convolutionalneural network，CNN)或递归神经网络(recurrent neural network，RNN)等)，所述网络获取输入数据(例如，图像)，并根据表示所述输入数据的潜在信息的特征图或向量或张量提供输出数据(例如，输出图像)。所述编码器206的示例包括但不限于递归神经网络、前馈神经网络、深度信念网络和卷积深度信念网络、自组织映射、深度玻尔兹曼机和栈式降噪自编码器等。The encoder 206 (also denoted Enc) includes appropriate logic, circuitry, interfaces and/or code that may be defined as a network (e.g., a convolutional neural network). network, CNN) or a recurrent neural network (RNN, etc.), which takes input data (e.g., images) and provides outputs based on feature maps or vectors or tensors representing latent information about said input data data (e.g. output image). Examples of the encoder 206 include, but are not limited to, recurrent neural networks, feedforward neural networks, deep belief networks and convolutional deep belief networks, self-organizing maps, deep Boltzmann machines, and stacked denoising autoencoders.

所述图像生成器208(也表示为G)包括适当的逻辑、电路、接口和/或代码，所述适当的逻辑、电路、接口和/或代码用于生成接近真实数据分布的一个或多个增强的清晰输出图像。在一种实现方式中，所述图像生成器208还可以定义为网络，所述网络用于根据所述特征图重建所述输入数据(即，所述输入图像)或将所述特征图更改为不同但相关的表示。所述图像生成器208也可以称为解码器。The image generator 208 (also denoted G) includes suitable logic, circuitry, interfaces, and/or code for generating one or more images that approximate the real data distribution. Enhanced clarity output image. In one implementation, the image generator 208 can also be defined as a network, and the network is used to reconstruct the input data (ie, the input image) according to the feature map or change the feature map to Different but related representations. The image generator 208 may also be called a decoder.

所述存储器210包括适当的逻辑、电路或接口，所述适当的逻辑、电路或接口用于存储可由所述处理器214执行的指令。所述存储器210还可以用于存储清晰图像数据集。所述存储器210的实现示例可以包括但不限于电可擦除可编程只读存储器(ElectricallyErasable Programmable Read-Only Memory，EEPROM)、随机存取存储器(Random AccessMemory，RAM)、只读存储器(Read Only Memory，ROM)、硬盘驱动器(Hard Disk Drive，HDD)、闪存、固态硬盘(Solid-State Drive，SSD)和/或CPU高速缓冲存储器。所述存储器210可以存储操作系统或其它程序产品(包括一个或多个操作算法)以操作所述装置202。The memory 210 includes suitable logic, circuitry or interfaces for storing instructions executable by the processor 214 . The memory 210 can also be used to store sharp image data sets. Implementation examples of the memory 210 may include, but are not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (Random Access Memory, RAM), Read Only Memory (Read Only Memory) , ROM), hard disk drive (Hard Disk Drive, HDD), flash memory, solid-state drive (Solid-State Drive, SSD) and/or CPU cache memory. The memory 210 may store an operating system or other program product (including one or more operating algorithms) to operate the apparatus 202 .

所述判别器212(也表示为D)包括适当的逻辑、电路、接口和/或代码，所述适当的逻辑、电路、接口和/或代码用于根据从所述图像生成器208接收的所述一个或多个增强的清晰输出图像与从存储在所述存储器210中的所述清晰图像数据集中随机选择的清晰图像的比较，来确定图像增强得分。换言之，检查或比较所述一个或多个增强的清晰输出图像相对于所述随机选择的清晰图像的真实性。The discriminator 212 (also denoted D) includes suitable logic, circuitry, interface and/or code for An image enhancement score is determined by comparing the one or more enhanced sharp output images with a randomly selected sharp image from the sharp image dataset stored in the memory 210. In other words, checking or comparing the authenticity of said one or more enhanced sharp output images with respect to said randomly selected sharp images.

所述处理器214包括适当的逻辑、电路、接口和/或代码，所述适当的逻辑、电路、接口和/或代码用于执行存储在所述存储器210中的指令。在一个示例中，所述处理器214可以是通用处理器。所述处理器214的其它示例可以包括但不限于微处理器、微控制器、复杂指令集计算(complex instruction set computing，CISC)处理器、专用集成电路(application-specific integrated circuit，ASIC)处理器、精简指令集(reducedinstruction set，RISC)处理器、超长指令字(very long instruction word，VLIW)处理器、中央处理单元(central processing unit，CPU)、状态机、数据处理单元和其它处理器或控制电路。此外，所述处理器214可以指一个或多个单独的处理器、处理设备、作为机器一部分的处理单元，如所述装置202。The processor 214 includes suitable logic, circuitry, interfaces and/or code for executing instructions stored in the memory 210 . In one example, the processor 214 may be a general purpose processor. Other examples of the processor 214 may include, but are not limited to, microprocessors, microcontrollers, complex instruction set computing (complex instruction set computing, CISC) processors, application-specific integrated circuit (application-specific integrated circuit, ASIC) processors , reduced instruction set (reduced instruction set, RISC) processor, very long instruction word (very long instruction word, VLIW) processor, central processing unit (central processing unit, CPU), state machine, data processing unit and other processors or Control circuit. Additionally, the processor 214 may refer to one or more individual processors, processing devices, or processing units that are part of a machine, such as the apparatus 202 .

在操作中，提供了用于图像增强的所述装置202，其中，所述装置202用于：通过将原始输入图像与所述锐化注意力图或所述深度图204级联，生成输入图像。所述装置202还用于：通过使用所述编码器206对所述输入图像进行编码，生成瓶颈特征。所述装置202还用于：将扰动向量注入所述瓶颈特征。所述装置202还用于：将注入所述扰动向量的所述瓶颈特征馈送至所述图像生成器208。所述装置202还用于：在所述图像生成器208处，根据所述瓶颈特征和所述扰动向量，生成增强图像。所述装置202还用于：在判别器212处，接收所述增强图像以及从所述清晰图像数据集中随机选择的清晰图像，并且根据所述增强图像与所述随机选择的清晰图像之间的差异来确定图像增强得分。In operation, said means 202 for image enhancement are provided, wherein said means 202 are configured to: generate an input image by concatenating an original input image with said sharpened attention map or said depth map 204 . The device 202 is further configured to: generate bottleneck features by using the encoder 206 to encode the input image. The device 202 is further configured to: inject a disturbance vector into the bottleneck feature. The device 202 is further configured to: feed the bottleneck feature injected into the disturbance vector to the image generator 208 . The device 202 is further configured to: generate an enhanced image at the image generator 208 according to the bottleneck feature and the disturbance vector. The device 202 is further configured to: at the discriminator 212, receive the enhanced image and a randomly selected clear image from the clear image data set, and according to the difference between the enhanced image and the randomly selected clear image difference to determine the image enhancement score.

根据一个实施例，根据高斯分布对所述扰动向量进行采样。在一种实现方式中，所述扰动向量是根据所述高斯分布进行采样的六维扰动向量。According to one embodiment, said perturbation vector is sampled according to a Gaussian distribution. In an implementation manner, the disturbance vector is a six-dimensional disturbance vector sampled according to the Gaussian distribution.

根据一个实施例，根据预先训练的网络权重更新所述扰动向量。在另一种实现方式中，所述扰动向量与所述编码器206或所述图像生成器208的所述预先训练的网络权重配合使用。在这种实现方式中，所述装置202用于从随机扰动向量开始处理。According to one embodiment, said perturbation vector is updated according to pre-trained network weights. In another implementation manner, the perturbation vector is used in conjunction with the pre-trained network weights of the encoder 206 or the image generator 208 . In this implementation, said means 202 are configured to start processing from a random perturbation vector.

根据一个实施例，使用梯度下降，通过降低对抗损失、Frechet Inception距离得分或结构相似性指数得分，调整所述扰动向量。在所述随机扰动向量的情况下，使用所述梯度下降，通过最小化所述对抗损失、所述Frechet Inception距离(Frechet InceptionDistance，FID)得分或所述结构相似性指数(Structural Similarity Index，SSI)得分等各种损失，更新所述随机扰动向量的值。According to one embodiment, the perturbation vector is adjusted by reducing the adversarial loss, the Frechet Inception distance score or the structural similarity index score using gradient descent. In the case of the random perturbation vector, using the gradient descent, by minimizing the adversarial loss, the Frechet Inception Distance (FID) score or the Structural Similarity Index (SSI) Various losses such as scores update the value of the random perturbation vector.

根据一个实施例，所述判别器212是基于梯度的多图块判别器。所述判别器212(或基于梯度的多图块判别器)包括多个网络分支，并且因此提高输出图像质量。According to one embodiment, the discriminator 212 is a gradient-based multi-tile discriminator. The discriminator 212 (or gradient-based multi-tile discriminator) includes multiple network branches and thus improves the output image quality.

根据一个实施例，所述判别器212包括三个网络分支：According to one embodiment, the discriminator 212 includes three network branches:

(c)第三网络分支，用于采集所生成输出的高斯拉普拉斯算子，其中，在对每个卷积层后所述三个分支生成的输出求和之后，获取所述判别器212的结果。通过使用所述第一网络分支、所述第二网络分支和所述第三网络分支，所述判别器212用于对所述输出图像中的不同图块进行真假预测。(c) a third network branch for capturing the Laplacian of Gaussian of the output generated, wherein the discriminator is obtained after summing the outputs generated by the three branches after each convolutional layer 212 results. By using the first network branch, the second network branch and the third network branch, the discriminator 212 is used to make true and false predictions for different blocks in the output image.

根据一个实施例，提供了一种计算机程序，包括程序代码，所述程序代码在由计算机执行时，使所述计算机执行所述方法100。所述计算机的示例包括但不限于笔记本电脑、车辆中的电子控制单元(electronic control unit，ECU)或车辆的车载计算机、台式计算机、大型计算机、手持计算机、处理器214和其它计算设备。According to one embodiment, a computer program is provided, including program code, which, when executed by a computer, causes the computer to execute the method 100 . Examples of the computer include, but are not limited to, a laptop computer, an electronic control unit (ECU) in a vehicle or onboard a vehicle, a desktop computer, a mainframe computer, a handheld computer, the processor 214 and other computing devices.

因此，所述装置202生成用于图像增强的可控解空间(或一组多个输出图像)，而不是仅生成一个非最优输出图像。通过考虑所述原始输入图像中的不确定度，可以更合理地创建所述可控解空间，并允许搜索最优解(例如，改进和增强的输出图像)。因此，所述装置202提供多个输出图像，并且使得能够处理最优输出图像，从而表现出更高的可靠性和效率。所述装置202使用所述锐化注意力图或所述深度图，并且因此提供更好的指导，以便生成更清晰、更明亮的输出图像。此外，所述装置202执行多个图像增强任务(如图像去雾、图像去模糊和/或弱光增强)，以整体增强在不同天气和照明条件下(如多雾、晴朗、多雨、阴暗或多雪环境中)捕获的图像。例如，在一个示例性实际应用中，所述装置202使得此类增强图像的各种特征非常明显，这些特征有助于进行感知，即使在不同的天气和照明条件下也是如此，以促进安全自动驾驶。Thus, the apparatus 202 generates a controllable solution space (or a set of multiple output images) for image enhancement, rather than generating only one non-optimal output image. By taking into account the uncertainty in the original input image, the controllable solution space can be more rationally created and allows searching for optimal solutions (eg, improved and enhanced output images). Thus, the device 202 provides multiple output images and enables processing of an optimal output image, thereby exhibiting greater reliability and efficiency. The means 202 use the sharpening attention map or the depth map and thus provide better guidance in order to generate a sharper, brighter output image. In addition, the device 202 performs multiple image enhancement tasks (such as image defogging, image deblurring and/or low-light enhancement) to overall enhance images under different weather and lighting conditions (such as foggy, sunny, rainy, dark or images captured in a snowy environment). For example, in one exemplary practical application, the apparatus 202 makes distinct features of such enhanced images that facilitate perception, even under varying weather and lighting conditions, to facilitate safe automatic drive.

图3A示出了本发明实施例提供的学习(或训练)图像去雾模型的图示表征。图3A是结合图1和图2中的元素描述的。参考图3A，图中示出了包括原始输入图像302、密集连接块304、瓶颈特征306、扰动向量308、多层感知器(multi-layer perceptron，MLP)网络310、残差块312、输出图像(即，增强的清晰输出图像314)、另一瓶颈特征316和卷积网络318的图示表征300A。Fig. 3A shows a graphical representation of a learning (or training) image dehazing model provided by an embodiment of the present invention. FIG. 3A is described in conjunction with elements of FIGS. 1 and 2 . Referring to FIG. 3A, it is shown that the original input image 302, densely connected block 304, bottleneck feature 306, perturbation vector 308, multi-layer perceptron (multi-layer perceptron, MLP) network 310, residual block 312, output image (ie, enhanced sharp output image 314 ), another bottleneck feature 316 , and a graphical representation 300A of a convolutional network 318 .

所述原始输入图像302(也表示为X_input)对应于在自动驾驶过程中在雾天环境中捕获的图像。因此，所述原始输入图像302(即，X_input)可以称为降质图像，所述降质图像不会揭示用于引导安全自动驾驶的多个参数或特征。在所述图示表征300A中，考虑对所述原始输入图像302(例如，模糊输入图像)进行增强。然而，所述图示表征300A同样适用于模糊图像或在多雾、晴朗、多雨、阴暗或多雪环境中捕获的图像。The original input image 302 (also denoted as X _input ) corresponds to an image captured in a foggy environment during automatic driving. Therefore, the original input image 302 (ie, X _input ) may be referred to as a degraded image that does not reveal parameters or features for guiding safe automated driving. In the graphical representation 300A, an enhancement of the original input image 302 (eg, a blurred input image) is considered. However, the pictorial representation 300A is equally applicable to blurred images or images captured in foggy, clear, rainy, dark or snowy environments.

所述锐化注意力图或所述深度图204对应于锐化注意力或深度估计模型，所述模型用于处理所述原始输入图像302(即，X_input)并提供输入图像(例如，锐化图像)。所述输入图像(即，所述锐化图像)表现出高视觉质量，并且因此提供图像中存在的更好的对象形状和边缘信息。此后，通过使用所述编码器206(即，Enc)，对所述输入图像(即，所述锐化图像)进行编码。The sharpening attention map or the depth map 204 corresponds to a sharpening attention or depth estimation model that processes the original input image 302 (i.e., X _input ) and provides an input image (e.g., sharpening image). The input image (ie the sharpened image) exhibits high visual quality and thus provides better object shape and edge information present in the image. Thereafter, the input image (ie, the sharpened image) is encoded by using the encoder 206 (ie, Enc).

所述密集连接块304(也表示为DenseBlk)可以称为卷积神经网络(convolutionalneural network，CNN)中的一个或多个卷积层，所述卷积神经网络用于利用从所述编码器206(即，Enc)接收的所述瓶颈特征进行图像分类训练。所述一个或多个卷积层通过乘法或点积以前馈方式彼此连接(或卷积)。所述密集连接块304(即，DenseBlk)是全连接层，通常用于图像分类任务。The densely connected block 304 (also denoted as DenseBlk) may be referred to as one or more convolutional layers in a convolutional neural network (CNN), which is used to utilize (ie, Enc) the received bottleneck features for image classification training. The one or more convolutional layers are connected (or convolved) to each other in a feed-forward manner via multiplication or dot product. The densely connected block 304 (ie, DenseBlk) is a fully connected layer, typically used for image classification tasks.

所述瓶颈特征306(也表示为C)是指所述输入图像(由所述编码器206进行编码)的编码特征图，与所述输入图像相比，所述编码特征图具有更小的空间大小但具有更多的通道。所述瓶颈特征306(即，C)也可以称为瓶颈内容特征，所述瓶颈内容特征以编码形式表示所述输入图像的所述内容特征。The bottleneck feature 306 (also denoted C) refers to the encoded feature map of the input image (encoded by the encoder 206), which has a smaller spatial size but with more channels. The bottleneck feature 306 (ie, C) may also be referred to as a bottleneck content feature, which represents the content feature of the input image in encoded form.

所述扰动向量308是指六维扰动向量。在所述图示表征300A中，根据高斯分布对所述扰动向量308进行采样。通过使用多层感知器(multi-layer perceptron，MLP)网络310，对所述扰动向量308进行上采样。所述MLP网络310定义为一类前馈人工神经网络(artificial neural network，ANN)。所述MLP网络310包括非线性激活节点的多个神经网络层(例如，输入层、输出层和一个或多个隐藏层)。所述MLP网络310是全连接网络，因此，一层中的每个节点以一定的权重连接到另一层中的每个其它节点。通过使用自适应实例归一化(adaptive instance normalization，AdaIn)方法，将所述扰动向量308(即，所述上采样扰动向量)注入所述瓶颈特征。所述AdaIn方法对图像生成阶段产生影响，并生成多模态输出图像。所述AdaIn方法通常用于图像风格转换和图像生成任务，以改变所述原始输入图像302(即，X_input)的外观。The disturbance vector 308 refers to a six-dimensional disturbance vector. In the graphical representation 300A, the perturbation vector 308 is sampled according to a Gaussian distribution. The perturbation vector 308 is up-sampled by using a multi-layer perceptron (MLP) network 310 . The MLP network 310 is defined as a type of feedforward artificial neural network (artificial neural network, ANN). The MLP network 310 includes multiple neural network layers (eg, an input layer, an output layer, and one or more hidden layers) of nonlinear activation nodes. The MLP network 310 is a fully connected network, so each node in one layer is connected with a certain weight to every other node in another layer. The perturbation vector 308 (ie, the upsampled perturbation vector) is injected into the bottleneck feature by using an adaptive instance normalization (AdaIn) method. The AdaIn method affects the image generation stage and generates a multimodal output image. The AdaIn method is commonly used in image style transfer and image generation tasks to change the appearance of the original input image 302 (ie, X _input ).

所述残差块312(也表示为ResBlk)包括两个卷积层(例如，输入层和输出层)，所述两个卷积层之间具有允许身份映射的跳跃连接。所述残差块312(即，ResBlk)用于防止梯度消失并提供所述瓶颈特征的更好的特征表征。The residual block 312 (also denoted ResBlk) comprises two convolutional layers (eg, an input layer and an output layer) with skip connections between them allowing identity mapping. The residual block 312 (ie, ResBlk) is used to prevent vanishing gradients and provide better characterization of the bottleneck features.

此后，将注入所述扰动向量308(即，所述上采样扰动向量)的所述瓶颈特征馈送至所述图像生成器208(即，G)。所述图像生成器208(即，G)用于在每次训练迭代中根据注入所述上采样扰动向量308的所述瓶颈特征生成增强的清晰输出图像314(也表示为X_clear)，并将所述增强的清晰输出图像314(即，X_clear)添加到存储在所述存储器210中的清晰图像数据集。Thereafter, the bottleneck features injected into the perturbation vector 308 (ie, the upsampled perturbation vector) are fed to the image generator 208 (ie, G). The image generator 208 (i.e., G) is configured to generate an enhanced clear output image 314 (also denoted as X _clear ) based on the bottleneck features injected into the upsampled perturbation vector 308 in each training iteration, and The enhanced clear output image 314 (ie, X _clear ) is added to the clear image dataset stored in the memory 210 .

通过使用所述编码器206(即，Enc)对所述增强的清晰输出图像314(即，X_clear)进行编码，然后通过所述锐化注意力图或所述深度图204处理所述增强的清晰输出图像314(即，X_clear)，获取所述另一瓶颈特征316(也表示为C’)。所述另一瓶颈特征316(即，C’)用于通过L1范数损失与所述瓶颈特征306(即，C)保持一致。Encoding the enhanced clear output image 314 (i.e., X _clear ) by using the encoder 206 (i.e., Enc ) and then processing the enhanced clear through the sharpening attention map or the depth map 204 Outputting an image 314 (ie, X _clear ), the further bottleneck feature 316 (also denoted C') is obtained. The other bottleneck feature 316 (ie, C') is used to be consistent with the bottleneck feature 306 (ie, C) through an L1 norm loss.

此外，所述判别器212(即，D)用于从所述图像生成器208(即，G)接收所述增强的清晰输出图像314(即，X_clear)以及从存储在所述存储器210中的所述清晰图像数据集中随机选择的清晰图像。所述判别器212(即，D)还用于根据所述增强的清晰输出图像314(即，X_clear)与所述随机选择的清晰图像之间的比较，来确定图像增强得分。In addition, the discriminator 212 (ie, D) is configured to receive the enhanced clear output image 314 (ie, X _clear ) from the image generator 208 (ie, G) and to store in the memory 210 Randomly selected clear images from the clear image dataset of . The discriminator 212 (ie, D) is further configured to determine an image enhancement score based on a comparison between the enhanced clear output image 314 (ie, X _clear ) and the randomly selected clear image.

所述卷积网络318对应于VGG网络(例如，VGG 16或VGG 19)，所述VGG网络用于图像分类和图像特征检测。此外，通过使用所述预先训练的卷积网络，将所述原始输入图像302(即，X_input)和所述增强的清晰输出图像314(即，X_clear)配置为相对于感知损失(即，L_percep)保持一致，从而使所述图像结构保持在特征级别。如图所示，所述卷积网络318独立于整个框架，并且仅用于增加所述整个框架的额外损失。The convolutional network 318 corresponds to a VGG network (eg, VGG 16 or VGG 19 ), which is used for image classification and image feature detection. Furthermore, the original input image 302 (i.e., X _input ) and the enhanced clear output image 314 (i.e., X _clear ) are configured relative to the perceptual loss (i.e., X clear ) by using the pre-trained convolutional network. L _percep ) are kept consistent so that the image structure is kept at the feature level. As shown, the convolutional network 318 is independent of the overall framework and is only used to add additional losses to the overall framework.

在操作中，将所述原始输入图像302(即，X_input)与所述锐化注意力图或所述深度图204级联，并将其传送至所述编码器206(即，Enc)和所述密集连接块304(即，DenseBlk)，以获取所述瓶颈特征306(即，C)。通过在所述MLP网络310中使用的所述AdaIn方法，利用所述MLP网络310对所述扰动向量308进行上采样，并将其与所述瓶颈特征306(即，C)集成。将所述扰动向量308与所述瓶颈特征306(即，C)集成之后，通过所述残差块312(即，ResBlk)将所述瓶颈特征306(即，C)馈送至所述图像生成器208(即，G)。所述图像生成器208(即，G)用于在每次迭代中生成所述增强的清晰输出图像314(即，X_clear)。在所述图像去雾模型的每次训练(或学习)迭代中，将基于根据所述高斯分布进行采样的不同扰动向量308改变一些参数，如所述增强的清晰输出图像314(即，X_clear)的照明或亮度对比度。所述不同扰动向量308导致创建用于图像增强的多个输出图像的高斯解空间，而不是仅创建一个输出图像，并且使得能够处理最优输出图像(如所述增强的清晰输出图像314)。在每次训练迭代中，将所述增强的清晰输出图像314(即，X_clear)与从所述清晰图像数据集中随机选择的清晰图像一起馈送至所述判别器212(即，D)，以确定所述增强的清晰输出图像314是虚假图像还是真实清晰图像。此后，所述图像生成器208(即，G)用于通过最小化所述判别器212(即，D)的对抗损失来改进自身。此外，将所述增强的清晰输出图像314(即，X_clear)与所述锐化注意力图或所述深度图204进一步级联，并通过使用所述编码器206(即，Enc)和所述密集连接块304(即，DenseBlk)对其进行编码，以获取所述另一瓶颈特征316(即，C’)。所述另一瓶颈特征316(即，C’)用于通过L1范数损失(即，C_recon)与所述瓶颈特征306(即，C)保持一致。迫使所述原始输入图像302(即，X_input)与所述增强的清晰输出图像314(即，X_clear)相对于由所述卷积网络318(即，VGG)生成的感知损失(例如，L_percep)保持一致，以便在所述增强的清晰输出图像314(即，X_clear)中保持所述原始输入图像302(即，X_input)的所述特征。In operation, the original input image 302 (i.e., X _input ) is concatenated with the sharpened attention map or the depth map 204 and passed to the encoder 206 (i.e., Enc) and the The dense connection block 304 (ie, DenseBlk) to obtain the bottleneck feature 306 (ie, C). The perturbation vector 308 is up-sampled by the MLP network 310 and integrated with the bottleneck feature 306 (ie, C) by the AdaIn method used in the MLP network 310 . After integrating the perturbation vector 308 with the bottleneck feature 306 (ie, C), the bottleneck feature 306 (ie, C) is fed to the image generator through the residual block 312 (ie, ResBlk) 208 (ie, G). The image generator 208 (ie, G) is configured to generate the enhanced clear output image 314 (ie, X _clear ) in each iteration. In each training (or learning) iteration of the image dehazing model, some parameters will be changed based on different perturbation vectors 308 sampled according to the Gaussian distribution, such as the enhanced clear output image 314 (i.e., X _clear ) lighting or brightness contrast. The different perturbation vectors 308 result in the creation of a Gaussian solution space for multiple output images for image enhancement, rather than just one output image, and enable processing of optimal output images such as the enhanced sharp output image 314 . In each training iteration, the enhanced clear output image 314 (i.e., X _clear ) is fed to the discriminator 212 (i.e., D) together with a randomly selected clear image from the clear image dataset to A determination is made as to whether the enhanced sharp output image 314 is a fake image or a real sharp image. Thereafter, the image generator 208 (ie, G) is used to improve itself by minimizing the adversarial loss of the discriminator 212 (ie, D). Furthermore, the enhanced clear output image 314 (i.e., X _clear ) is further concatenated with the sharpened attention map or the depth map 204 , and by using the encoder 206 (i.e., Enc ) and the The dense connection block 304 (ie, DenseBlk) encodes it to obtain the other bottleneck feature 316 (ie, C'). The other bottleneck feature 316 (ie, C′) is used to be consistent with the bottleneck feature 306 (ie, C) through an L1 norm loss (ie, C _recon ). Forcing the original input image 302 (ie, X _input ) and the enhanced clear output image 314 (ie, X _clear ) with respect to the perceptual loss (eg, L _percep ) to maintain the features of the original input image 302 (ie, X _input ) in the enhanced clear output image 314 (ie, X _clear ).

通过执行多次迭代训练所述图像去雾模型之后，生成所述输出图像(或增强图像)可控解空间，并从所述输出图像(或增强图像)可控解空间中搜索最优输出图像。为了微调所述最优输出图像，使用两种方法，例如，图3F中对所述两种方法进行了详细描述。After performing multiple iterations to train the image defogging model, generate the output image (or enhanced image) controllable solution space, and search for the optimal output image from the output image (or enhanced image) controllable solution space . To fine-tune the optimal output image, two methods are used, which are described in detail, for example, in Fig. 3F.

在该实施例中，所述图示表征300A用于学习(或训练)所述图像去雾模型。在另一个实施例中，所述图示表征300A也可以用于图像去模糊或弱光增强，或者用于增强在多雾、晴朗、多雨、阴暗或多雪环境中捕获的图像。In this embodiment, the graph representation 300A is used to learn (or train) the image dehazing model. In another embodiment, the graphical representation 300A may also be used for image deblurring or low light enhancement, or for enhancing images captured in foggy, clear, rainy, dark or snowy environments.

图3B示出了本发明实施例提供的编码器的图示表征。图3B是结合图1、图2和图3A中的元素描述的。参考图3B，图中示出了(图2中的)所述编码器206的图示表征300B。所述编码器206包括Inception残差块320。所述Inception残差块320包括多个1×1卷积块320A和多个3×3卷积块320B。Fig. 3B shows a graphical representation of an encoder provided by an embodiment of the present invention. Figure 3B is described in conjunction with elements of Figures 1, 2 and 3A. Referring to FIG. 3B , a pictorial representation 300B of the encoder 206 (of FIG. 2 ) is shown. The encoder 206 includes an Inception residual block 320 . The Inception residual block 320 includes multiple 1×1 convolutional blocks 320A and multiple 3×3 convolutional blocks 320B.

所述Inception残差块320可以定义为卷积块，所述卷积块组合多个卷积分支，所述多个卷积分支能够捕获图像的不同图块大小的各种特征(如所述图示表征300A中使用的所述瓶颈特征)。The Inception residual block 320 can be defined as a convolution block, and the convolution block combines a plurality of convolution branches, and the plurality of convolution branches can capture various features of different block sizes of the image (as shown in the figure represents the bottleneck feature used in table 300A).

与传统残差块相反，所述编码器206包括所述Inception残差块320，并且因此提供结合局部图像内容和全局图像内容的改进的编码特征图。由于所述编码器206的所述Inception残差块320包括所述多个1×1卷积块320A和所述多个3×3卷积块320B，获取所述改进的编码特征图。In contrast to traditional residual blocks, the encoder 206 includes the Inception residual block 320 and thus provides an improved encoded feature map combining local image content and global image content. Since the Inception residual block 320 of the encoder 206 includes the plurality of 1×1 convolutional blocks 320A and the plurality of 3×3 convolutional blocks 320B, the improved encoding feature map is obtained.

图3C示出了本发明实施例提供的密集连接块的图示表征。图3C是结合图1、图2、图3A和图3B中的元素描述的。参考图3C，图中示出了(图3A中的)所述密集连接块304的图示表征300C。所述密集连接块304包括所述多个3×3卷积块320B。Figure 3C shows a graphical representation of a densely connected block provided by an embodiment of the present invention. Figure 3C is described in conjunction with elements of Figures 1, 2, 3A and 3B. Referring to FIG. 3C , a graphical representation 300C of the densely connected block 304 (of FIG. 3A ) is shown. The densely connected block 304 includes the plurality of 3x3 convolutional blocks 320B.

为了获取具有从所述编码器206(即，Enc)接收的所述瓶颈特征的更有意义、更具鲁棒性的特征表示的所述瓶颈特征306(即，C)，将所述传统残差块替换为所述密集连接块304(即，DenseBlk)。在所述密集连接块304(即，DenseBlk)中，所述多个3×3卷积块320B中的每一个相互连接，并且因此提供所述瓶颈特征的改进特征表示。In order to obtain the bottleneck feature 306 (i.e., C) with a more meaningful and robust feature representation of the bottleneck feature received from the encoder 206 (i.e., Enc), the conventional residual The poor block is replaced with the densely connected block 304 (ie, DenseBlk). In the densely connected block 304 (ie, DenseBlk), each of the plurality of 3x3 convolutional blocks 320B is connected to each other and thus provides an improved feature representation of the bottleneck feature.

图3D示出了本发明实施例提供的具有高斯扰动向量的编码器-解码器结构的图示表征。图3D是结合图1、图2、图3A、图3B和图3C中的元素描述的。参考图3D，图中示出了具有高斯扰动向量322的编码器-解码器结构的图示表征300D。所述具有高斯扰动向量322的编码器-解码器结构包括所述编码器206和所述图像生成器208(也称为解码器)。所述具有高斯扰动向量322的编码器-解码器结构还包括所述密集连接块304、所述瓶颈特征306、所述扰动向量308、所述MLP网络310和所述残差块312。FIG. 3D shows a graphical representation of an encoder-decoder structure with a Gaussian perturbation vector provided by an embodiment of the present invention. Figure 3D is described in conjunction with elements of Figures 1, 2, 3A, 3B and 3C. Referring to FIG. 3D , a graphical representation 300D of an encoder-decoder structure with a Gaussian perturbation vector 322 is shown. The encoder-decoder structure with Gaussian perturbation vector 322 includes the encoder 206 and the image generator 208 (also referred to as a decoder). The encoder-decoder structure with Gaussian perturbation vector 322 also includes the densely connected block 304 , the bottleneck feature 306 , the perturbation vector 308 , the MLP network 310 and the residual block 312 .

传统的编码器-解码器结构包括所述传统残差块，并且仅提供一个非最优输出图像，因此，相比之下，所述传统编码器-解码器结构并非首选结构。然而，所述具有高斯扰动向量322的编码器-解码器结构包括所述密集连接块304(即，DenseBlk)、所述扰动向量308和所述MLP网络310，并且提供用于图像增强的多个输出图像(或多模态输出图像)高斯解空间，而不是仅提供一个输出图像，并且使得能够处理最优输出图像(如所述增强的清晰输出图像314)。The conventional encoder-decoder structure comprises the conventional residual block and provides only one non-optimal output image, and is therefore not the preferred structure in comparison. However, the encoder-decoder structure with Gaussian perturbation vector 322 includes the densely connected block 304 (i.e., DenseBlk), the perturbation vector 308 and the MLP network 310, and provides multiple The output image (or multimodal output image) Gaussian solution space, rather than providing only one output image, and enables processing of an optimal output image (such as the enhanced sharp output image 314).

图3E示出了本发明实施例提供的判别器的图示表征。图3E是结合图1、图2、图3A、图3B、图3C和图3D中的元素描述的。参考图3E，图中示出了(图2中的)所述判别器212的图示表征。所述判别器212(即，D)包括第一网络分支212A、第二网络分支212B和第三网络分支212C。所述判别器212(即，D)还包括第一卷积层324A、第二卷积层324B、第三卷积层324C和输出图像328。Fig. 3E shows a graphical representation of the discriminator provided by the embodiment of the present invention. Figure 3E is described in conjunction with elements of Figures 1, 2, 3A, 3B, 3C and 3D. Referring to FIG. 3E , a graphical representation of the discriminator 212 (of FIG. 2 ) is shown. The discriminator 212 (ie, D) includes a first network branch 212A, a second network branch 212B and a third network branch 212C. The discriminator 212 (ie, D) also includes a first convolutional layer 324A, a second convolutional layer 324B, a third convolutional layer 324C, and an output image 328 .

所述第一卷积层324A、所述第二卷积层324B和所述第三卷积层324C中的每一个也可以称为卷积神经网络(convolutional neural network，CNN)。通常，所述卷积神经网络(convolutional neural network，CNN)可以定义为高度互连的处理元素网络。可选地，每个元素与本地存储器(即，所述存储器210)相关联，并且用于图像识别和处理(如用于处理所述增强的清晰输出图像314)。Each of the first convolutional layer 324A, the second convolutional layer 324B, and the third convolutional layer 324C may also be called a convolutional neural network (CNN). Generally, the convolutional neural network (CNN) can be defined as a network of highly interconnected processing elements. Optionally, each element is associated with a local memory (ie, the memory 210 ) and is used for image recognition and processing (eg, for processing the enhanced sharp output image 314 ).

所述第一网络分支212A用于采集高斯模糊生成器输出326A，所述高斯模糊生成器输出326A负责在所述增强的清晰输出图像314中引入更接近目标数据分布的照明分布。The first network branch 212A is used to acquire a Gaussian blur generator output 326A responsible for introducing an illumination distribution in the enhanced sharp output image 314 that is closer to the target data distribution.

所述第二网络分支212B用于采集身份图像326B，所述身份图像326B类似于标准判别器(例如，判别器)生成的图像。The second network branch 212B is used to acquire an identity image 326B similar to a standard discriminator (e.g., discriminator) generated images.

所述第三网络分支212C用于采集高斯拉普拉斯算子(Laplacian of Gaussian，LoG)模糊生成器输出326C，所述LoG模糊生成器输出326C负责在所述增强的清晰输出图像314中生成更接近所述目标数据分布的更锐利的边缘。The third network branch 212C is used to acquire a Laplacian of Gaussian (Laplacian of Gaussian, LoG) blur generator output 326C, which is responsible for generating in the enhanced sharp output image 314 The sharper edges of the data distribution closer to the target.

此后，在对每个卷积层(即，所述第一卷积层324A、所述第二卷积层324B和所述第三卷积层324C)后所述三个网络分支(即，所述第一网络分支212A、所述第二网络分支212B和所述第三网络分支212C)生成的输出(即，326A、326B、326C)求和之后，从所述判别器212(即，D)中获取所述输出图像328。根据所述增强的清晰输出图像314和所述输出图像328中形成的不同图块，感知所述输出图像328是真实图像还是虚假图像。所述真实图像是指在所有图像特征方面与诸如所述增强的清晰输出图像314(也可以称为增强图像)等输入图像相似的图像。所述虚假图像可以描述为可以使用软件工具人为生成的图像。通过这种方式，与仅使用一个网络分支生成输出图像从而表现出较低图像质量的传统判别器相比，所述判别器212可提高输出图像质量。Thereafter, after each convolutional layer (ie, the first convolutional layer 324A, the second convolutional layer 324B and the third convolutional layer 324C), the three network branches (ie, the After summing the outputs (ie, 326A, 326B, 326C) generated by the first network branch 212A, the second network branch 212B, and the third network branch 212C), the discriminator 212 (ie, D) The output image 328 is obtained in . Based on the enhanced clear output image 314 and the different tiles formed in the output image 328 , it is perceived whether the output image 328 is a real image or a fake image. The real image refers to an image that is similar to an input image such as the enhanced sharp output image 314 (also referred to as an enhanced image) in all image features. The fake image can be described as an image that can be artificially generated using software tools. In this way, the discriminator 212 can improve the output image quality compared to conventional discriminators that use only one network branch to generate the output image, thus exhibiting lower image quality.

图3F示出了本发明实施例提供的微调以获取最优输出图像的图示表征。图3F是结合图1、图2、图3A、图3B、图3C、图3D和图3E中的元素描述的。参考图3F，图中示出了微调以从一组多个清晰输出图像中获取最优输出图像的图示表征300F。FIG. 3F shows a pictorial representation of fine-tuning to obtain an optimal output image provided by an embodiment of the present invention. Figure 3F is described in conjunction with elements of Figures 1, 2, 3A, 3B, 3C, 3D, and 3E. Referring to FIG. 3F , there is shown a pictorial representation 300F of fine-tuning to obtain an optimal output image from a set of multiple sharp output images.

在所述图示表征300F中，可以使用两种方法从所述一组多个清晰输出图像中获取所述最优输出图像，所述一组多个清晰输出图像可以通过使用所述图示表征300A生成。In said graphical representation 300F, two methods may be used to obtain said optimal output image from said set of multiple sharp output images which may be obtained by using said graphical representation 300A generated.

在第一种方法中，根据所述高斯分布对所述扰动向量308进行采样。在这种方法中，通过对所述高斯扰动中每两个维度的值进行插值，采用网格搜索，并检查具有最佳视觉质量的图像。对所有测试图像应用所述对应的扰动向量。In a first method, the perturbation vector 308 is sampled according to the Gaussian distribution. In this method, a grid search is employed and the image with the best visual quality is checked by interpolating the values of each two dimensions in the Gaussian perturbation. The corresponding perturbation vectors are applied to all test images.

在第二种方法中，根据预先训练的网络权重更新所述扰动向量308。在这种方法中，与所述编码器206、所述图像生成器208、所述密集连接块304、所述瓶颈特征306、所述残差块312和所述判别器212相关联的权重具有固定值。此后，从随机扰动向量开始处理，并使用所述梯度下降，通过最小化所述对抗损失、所述Frechet Inception距离(FrechetInception Distance，FID)得分或所述结构相似性指数(Structural Similarity Index，SSI)得分等各种损失，更新所述随机扰动向量的值。所述更新的扰动向量(或得到的扰动向量)最终收敛到所述更新的扰动向量可以与所述图像生成器208配合以生成视觉上令人愉悦的输出图像的点。所述更新的扰动向量(或得到的扰动向量)还可以应用于一个或多个测试图像。In a second method, the perturbation vector is updated 308 according to pre-trained network weights. In this approach, the weights associated with the encoder 206, the image generator 208, the densely connected block 304, the bottleneck features 306, the residual block 312 and the discriminator 212 have Fixed value. Thereafter, starting from a random perturbation vector, and using the gradient descent, by minimizing the adversarial loss, the Frechet Inception Distance (FID) score or the Structural Similarity Index (SSI) Various losses such as scores update the value of the random perturbation vector. The updated perturbation vector (or resulting perturbation vector) eventually converges to a point where the updated perturbation vector can cooperate with the image generator 208 to generate a visually pleasing output image. The updated perturbation vector (or resulting perturbation vector) may also be applied to one or more test images.

图4示出了本发明实施例提供的图像增强方法和装置的示例性实现场景的图示。图4是结合图1、图2、图3A至图3F中的元素描述的。参考图4，图中示出了描述所公开的方法和装置(图1和图2)在自动驾驶领域的实际应用的示例性场景400。在所述示例性场景400中，示出了沿道路部分移动的车辆402。此外，还示出了电子元件404和一个或多个图像捕获设备，如安装在所述车辆402上的图像捕获设备406。应当理解的是，所述车辆402可以包括通常在自动驾驶车辆中使用的许多其它已知组件，为了简洁起见，在此省略这些组件。例如，所述车辆402可以包括蓄电池，用于为所述图像捕获设备406和所述电子元件404供电。Fig. 4 shows an illustration of an exemplary implementation scenario of the image enhancement method and apparatus provided by the embodiments of the present invention. Figure 4 is described in conjunction with elements in Figures 1, 2, 3A-3F. Referring to FIG. 4 , there is shown an exemplary scenario 400 describing a practical application of the disclosed method and apparatus ( FIGS. 1 and 2 ) in the field of autonomous driving. In the exemplary scene 400, a vehicle 402 is shown moving along a road section. Also shown are electronic components 404 and one or more image capture devices, such as image capture device 406 mounted on the vehicle 402 . It should be appreciated that the vehicle 402 may include many other known components commonly used in autonomous vehicles, which are omitted here for the sake of brevity. For example, the vehicle 402 may include a battery for powering the image capture device 406 and the electronic components 404 .

在所述示例性场景400中，所述车辆402可以是自动驾驶车辆或半自动驾驶车辆。所述电子元件404可以包括适当的逻辑、电路、接口和/或代码，所述适当的逻辑、电路、接口和/或代码用于执行多模态图像增强，所述多模态图像增强充分、整体增强在不同天气和照明条件下捕获的图像。例如，所述电子元件404用于通过以整体方式执行各种图像增强任务(如图像去雾、图像去模糊和整体弱光增强)，以实时或近实时的方式处理和增强所述图像捕获设备406在不同天气和照明条件下(如多雾、晴朗、多雨、阴暗或多雪环境中)捕获的图像，以促进安全自动驾驶。或者，换句话说，所述电子元件404使得此类增强图像的许多特征非常明显，这些特征有助于针对所述车辆40周围的真实环境进行感知，即使在不同的天气和照明条件下也是如此。这反过来使所述车辆402能够安全自动驾驶。所述电子元件404的示例包括但不限于用于所述车辆402的电子控制单元(electronic control unit，ECU)、车载设备、车载计算机或其它电子元件。所述电子元件404可以对应于(图2中的)所述装置202，其中，所述电子元件404用于执行(图1中的)所述方法100。In the example scenario 400, the vehicle 402 may be an autonomous vehicle or a semi-autonomous vehicle. The electronic components 404 may include suitable logic, circuits, interfaces and/or codes for performing multimodal image enhancement that is substantially, Overall enhancement of images captured under different weather and lighting conditions. For example, the electronic components 404 are configured to process and enhance the image capture device in real-time or near-real-time by performing various image enhancement tasks in a holistic manner, such as image defogging, image deblurring, and overall low-light enhancement. 406 Images captured under different weather and lighting conditions, such as foggy, clear, rainy, shady or snowy environments, to facilitate safe autonomous driving. Or, in other words, the electronics 404 make apparent many of the features of such enhanced images that facilitate perception of the real environment surrounding the vehicle 40, even under varying weather and lighting conditions . This in turn enables the vehicle 402 to safely drive itself. Examples of the electronic component 404 include, but are not limited to, an electronic control unit (ECU), an on-board device, an on-board computer, or other electronic components for the vehicle 402 . The electronic component 404 may correspond to the apparatus 202 (in FIG. 2 ), wherein the electronic component 404 is configured to perform the method 100 (in FIG. 1 ).

所述电子元件404可以用于在所述车辆402驾驶的同时执行各种图像增强任务。所述电子元件404可以用于使用根据所述高斯分布进行采样的所述扰动向量308，所述扰动向量308进一步导致相对于一个原始输入图像(即，降质图像)生成多个清晰输出图像，并且因此可以从所述多个清晰输出图像中选择最优输出图像(例如，增强和改进的图像)。因此，所述电子元件404提供所述最优输出图像(即，所述增强和改进的图像)以进行感知，并促使所述车辆402能够可靠、安全地驾驶。此外，所述电子元件404可以用于使用所述判别器212(即，所述基于梯度的多图块判别器)，所述判别器212可进一步提高所述多个清晰输出图像的视觉质量。The electronics 404 may be used to perform various image enhancement tasks while the vehicle 402 is driving. Said electronic components 404 may be adapted to use said perturbation vector 308 sampled according to said Gaussian distribution, said perturbation vector 308 further resulting in the generation of sharp output images with respect to one original input image (i.e. degraded image), And thus an optimal output image (eg enhanced and improved image) may be selected from said plurality of clear output images. Accordingly, the electronics 404 provide the optimal output image (ie, the enhanced and improved image) for perception and enable the vehicle 402 to drive reliably and safely. In addition, the electronic component 404 can be configured to use the discriminator 212 (ie, the gradient-based multi-tile discriminator), and the discriminator 212 can further improve the visual quality of the plurality of clear output images.

在另一实现场景中，所述装置202可以实现为手持设备，所述手持设备可操作以执行(图1中的)所述方法100。在一个示例中，所述手持设备可以是智能手机，其能够使用所述方法100充分处理在不同天气和照明条件下捕获的一个或多个图像，以生成增强图像，如被人眼感知为高质量、真实、类似照片的清晰输出图像。所述方法100使所述手持设备能够预测所述输出图像中不同图块的“真实性”或“虚假性”，从而提高输出图像质量。In another implementation scenario, the apparatus 202 may be implemented as a handheld device operable to execute the method 100 (in FIG. 1 ). In one example, the handheld device may be a smartphone capable of substantially processing one or more images captured under different weather and lighting conditions using the method 100 to generate an enhanced image, as perceived by the human eye as high Quality, real, photo-like sharp output images. The method 100 enables the handheld device to predict the "realness" or "fakeness" of different tiles in the output image, thereby improving output image quality.

在不脱离所附权利要求所定义的本发明范围的情况下，可以对上文描述的本发明的实施例进行修改。如“包括”、“结合”、“具有”、“是”等用于描述和要求保护本发明的表述旨在以非排他的方式解释，即允许未明确描述的项目、部件或元素也存在。对单数的引用也应解释为与复数有关。本文使用的词语“示例性”表示“作为一个示例、实例或说明”。任何被描述为“示例性的”实施例不一定解释为比其它实施例更优选或更有利，和/或排除其它实施例的特征的结合。本文使用的词语“可选地”表示“在一些实施例中提供而在其它实施例中没有提供”。应当理解的是，为了清楚起见而在单独实施例的上下文中描述的本发明的一些特征还可以通过组合提供在单个实施例中。相反，为了简洁起见而在单个实施例的上下文中描述的本发明的各个特征也可以单独提供、以任何合适的组合提供，或作为本发明的任何其它描述的实施例提供。Modifications may be made to the embodiments of the invention described above without departing from the scope of the invention as defined in the appended claims. Expressions such as "comprises", "combines", "has", "is", etc. used to describe and claim the present invention are intended to be interpreted in a non-exclusive manner, ie to allow the presence of items, components or elements not explicitly described. References to the singular should also be construed as relating to the plural. The word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features of other embodiments. The word "optionally" is used herein to mean "provided in some embodiments and not provided in other embodiments". It is to be appreciated that features of the invention which are, for clarity, described in the context of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately, in any suitable combination or as any other described embodiment of the invention.

Claims

1. An image enhancement method (100), the method comprising (100):

(i) Generating an input image by concatenating an original input image (302) with a sharpening attention map or depth map (204);

(ii) -generating bottleneck characteristics by encoding the input image using an encoder (206);

(iii) Injecting a disturbance vector (308) into the bottleneck feature;

(iv) Feeding the bottleneck characteristics injected into the disturbance vector (308) to an image generator (208);

(v) Generating, at the image generator (208), an enhanced image from the bottleneck characteristics and the disturbance vector (308);

(vi) At a arbiter (212), the enhanced image and a randomly selected sharp image from a sharp image dataset are received, and an image enhancement score is determined from a difference between the enhanced image and the randomly selected sharp image.

2. The method (100) of claim 1, wherein the method (100) further comprises: the discrimination score is fed back to the encoder (206) and the image generator.

3. The method (100) of claim 1, wherein the disturbance vector (308) is sampled according to a gaussian distribution.

4. The method (100) of claim 1, wherein the perturbation vector (308) is updated according to pre-trained network weights.

5. The method (100) of claim 4, wherein the disturbance vector (308) is adjusted using gradient descent by reducing a challenge loss, a Frechet Inception distance score, or a structural similarity index score.

6. The method (100) of claim 1, wherein the arbiter (212) is a gradient-based multi-tile arbiter.

7. A method (100) according to claim 3, wherein the arbiter (212) comprises at least three network branches:

(a) A first network branch (212A) for collecting gaussian blur generator output (326A);

(b) -a second network branch (212B) for acquiring an identity image (326B);

(c) A third network branch (212C) for collecting a Gaussian Laplacian blur generator output (326C),

wherein the result of the arbiter (212) is obtained after summing the outputs generated by the three branches after each convolution layer.

8. An image enhancement device (202), characterized in that the device (202) is adapted to:

(iii) Injecting a disturbance vector (308) into the bottleneck feature;

(vi) At a arbiter (212), the enhanced image and a randomly selected sharp image from a sharp image dataset are received, and an image enhancement score is determined from a comparison between the enhanced image and the randomly selected sharp image.

9. The apparatus (202) of claim 8, wherein the disturbance vector (308) is sampled according to a gaussian distribution.

10. The apparatus (202) of claim 8, wherein the disturbance vector (308) is updated according to pre-trained network weights.

11. The apparatus (202) of claim 10, wherein the disturbance vector (308) is adjusted using gradient descent by reducing a challenge loss, a Frechet Inception distance score, or a structural similarity index score.

12. The apparatus (202) of claim 8, wherein the arbiter (212) is a gradient-based multi-tile arbiter.

13. The apparatus (202) of claim 8, wherein the arbiter (212) comprises three network branches:

(b) -a second network branch (212B) for acquiring an identity image (326B);

(c) A third network branch (212C) for collecting the Gaussian Laplacian of the generated output (326C),

14. A computer program characterized by comprising program code which, when executed by a computer, causes the computer to perform the method (100) according to claim 1.

15. An electronic component (404) mounted on a vehicle (402), characterized in that the electronic component (404) is operable to perform the method (100) according to claim 1.