WO2022174908A1 - Method and apparatus for image enhancement - Google Patents
Method and apparatus for image enhancement Download PDFInfo
- Publication number
- WO2022174908A1 WO2022174908A1 PCT/EP2021/054088 EP2021054088W WO2022174908A1 WO 2022174908 A1 WO2022174908 A1 WO 2022174908A1 EP 2021054088 W EP2021054088 W EP 2021054088W WO 2022174908 A1 WO2022174908 A1 WO 2022174908A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- perturbation vector
- discriminator
- network
- enhanced
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 239000013598 vector Substances 0.000 claims abstract description 107
- 230000002708 enhancing effect Effects 0.000 claims abstract description 12
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 12
- 238000005286 illumination Methods 0.000 description 23
- 238000013459 approach Methods 0.000 description 15
- 238000012549 training Methods 0.000 description 15
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 14
- 238000013527 convolutional neural network Methods 0.000 description 12
- 238000013528 artificial neural network Methods 0.000 description 8
- 230000008447 perception Effects 0.000 description 8
- 230000000007 visual effect Effects 0.000 description 7
- 238000007796 conventional method Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000001994 activation Methods 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
- G06F18/2178—Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
- G06F18/2185—Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor the supervisor being an automated module, e.g. intelligent oracle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
- G06V10/993—Evaluation of the quality of the acquired pattern
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
Definitions
- the present disclosure relates generally to the field of computer vision and machine learning; and more specifically, to a method and an apparatus for multimodal image enhancement in an unsupervised manner.
- a conventional method uses paired data of input images and their corresponding output clear images to train a conventional model, which is configured to map the input images to their corresponding output clear images.
- a labelled data set e.g., the input images and their corresponding output clear images
- image dehazing is based on a conventional atmospheric scattering model (i.e., a physics-based model) instead of the paired data used in the supervised learning.
- a conventional atmospheric scattering model i.e., a physics-based model
- the conventional method of image dehazing is task- specific and difficult to generalize and apply in other image enhancement tasks, such as image deblurring, low light enhancement, etc.
- the present disclosure provides a method and an apparatus for a multimodal image enhancement in order to promote a safe autonomous driving.
- the present disclosure provides a solution to the existing problem of how to adequately and holistically enhance images captured under different weather and illumination conditions as the current methods are complex, inefficient, and task-specific image enhancement methods of limited use and not suited for holistic image enhancement.
- An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art, and provides a method and an apparatus for a multimodal image enhancement that adequately and holistically enhances images captured under different weather and illumination conditions for various practical applications, such as to promote safe autonomous driving.
- the present disclosure provides a method for enhancing an image, wherein the method comprises generating an input image by concatenating a raw input image with a sharpened attention map or with a depth map.
- the method further comprises generating a bottleneck feature by encoding the input image using an encoder.
- the method further comprises injecting a perturbation vector into the bottleneck feature.
- the method further comprises feeding the bottleneck feature, with the perturbation vector injected, to an image generator.
- the method further comprises generating an enhanced image from the bottleneck feature and the perturbation vector at the image generator.
- the method further comprises at a discriminator, receiving the enhanced image and a randomly chosen clear image from the dataset of clear images, and determining a score of image enhancement based on a comparison between the enhanced image and the randomly chosen clear image.
- the disclosed method generates a controllable solution space (or a set of multiple output images) for image enhancement instead of producing only one output image without providing its optimality.
- a controllable solution space or a set of multiple output images
- the disclosed method provides multiple output images and makes possible to work with an optimal output image, hence, manifests an improved reliability and efficiency.
- the disclosed method uses the sharpened attention map or the depth map and hence, provides a better guidance in order to produce clearer and brighter output images.
- the disclosed method provides a unified approach for multiple image enhancement tasks (e.g., dehazing, deblurring, low-light enhancement, and the like), dealing with images captured in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment, in order to perform unsupervised, controllable, and holistic image enhancement of such images.
- image enhancement tasks e.g., dehazing, deblurring, low-light enhancement, and the like
- images captured in different weather and illumination conditions such as in foggy, sunny, rainy, dark or snowy environment
- unsupervised, controllable, and holistic image enhancement of such images such as in foggy, sunny, rainy, dark or snowy environment.
- the method makes various image features of such enhanced images very revealing, which are useful to make a perception, even in different weather and illumination conditions in order to promote safe autonomous driving.
- the method further comprises feeding the discriminated score back to the encoder and the image generator.
- a perceptual loss between the raw input image and the enhanced image is computed based on a pretrained Convolutional Neural Network (CNN), such as a VGG neural network, to enable preserve structure at feature level.
- CNN Convolutional Neural Network
- the perturbation vector is sampled from a Gaussian distribution
- the perturbation vector sampled from the Gaussian distribution leads to generation of multimodal outputs (or multiple output images) creating a solution space of output images for improved image enhancement.
- the perturbation vector is updated based on pretrained network weights.
- the perturbation vector is used to determine an optimal output image (i.e., an improved output) from the controllable solution space of output images.
- an optimal output image i.e., an improved output
- perturbation vector has no predetermined networks weights, and is a random vector sampled from Gaussian distribution during each training step.
- the perturbation vector is updated based on the pretrained (i.e., fixed) network weights, followed by adjustment (or fine-tuning) of the perturbation vector for searching an optimal solution, i.e., the improved output image.
- the perturbation vector is adjusted, using gradient descent, by reducing an adversarial loss, a Frechet Inception Distance score or a Structural Similarity Index score.
- the perturbation vector is randomly initialized but then adjusted using gradient descent by reducing the adversarial loss, the Frechet Inception Distance score or the Structural Similarity Index score, where the adjusted (or fine-tuned ) perturbation vector results into an improved or the most visually pleasing image.
- the discriminator is a gradient-based multi-patch discriminator.
- the discriminator comprises at least the following three network branches: a first network branch acquiring a gaussian blurred generator output, a second network branch acquiring an identity image, and a third network branch acquiring a Laplacian of a gaussian blurred generator output, wherein a result from the discriminator is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
- the use of the three network branches of the discriminator provides an improved illumination control and sharp edges information in the enhanced image.
- the present disclosure provides an apparatus for enhancing an image, wherein the apparatus is configured to generate an input image by concatenating a raw input image with a sharpened attention map or with a depth map.
- the apparatus is further configured to generate a bottleneck feature by encoding the input image using an encoder.
- the apparatus is further configured to inject a perturbation vector into the bottleneck feature.
- the apparatus is further configured to feed the bottleneck feature, with the perturbation vector injected, to an image generator.
- the apparatus is further configured to generate an enhanced image from the bottleneck feature and the perturbation vector at the image generator.
- the apparatus at a discriminator, is further configured to receive the enhanced image and a randomly chosen clear image from the dataset of clear images, and determine a score of image enhancement based on a difference between the enhanced image and the randomly chosen clear image.
- the apparatus of the present disclosure achieves all the advantages and effects of the method.
- the present disclosure provides a computer program comprising program code which when executed by a computer causes the computer to perform the method.
- the computer achieves all the advantages and effects of the method of the present disclosure after execution of the method.
- the present disclosure provides an electronic component to be mounted to a vehicle, the electronic component operable to perform the method.
- the electronic component computer achieves all the advantages and effects of the method of the present disclosure after executing the method.
- FIG. 1 is a flowchart of a method for enhancing an image, in accordance with an embodiment of the present disclosure
- FIG. 2 is a block diagram that illustrates various exemplary components of an apparatus, in accordance with an embodiment of the present disclosure
- FIG. 3A is a schematic representation of learning (or training) an image dehazing model, in accordance with an embodiment of the present disclosure
- FIG. 3B is a schematic representation of an encoder, in accordance with an embodiment of the present disclosure.
- FIG. 3C is a schematic representation of a densely connected block, in accordance with an embodiment of the present disclosure
- FIG. 3D is a schematic representation of an encoder-decoder structure with Gaussian perturbation vector, in accordance with an embodiment of the present disclosure
- FIG. 3E is a schematic representation of a discriminator, in accordance with an embodiment of the present disclosure.
- FIG. 3F is a schematic representation of fine tuning to obtain an optimal output image, in accordance with an embodiment of the present disclosure.
- FIG. 4 is an illustration of an exemplary scenario of implementation of a method and apparatus for image enhancement, in accordance with an embodiment of the present disclosure.
- an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent.
- a non-underlined number relates to an item identified by a line linking the non- underlined number to the item.
- the non-underlined number is used to identify a general item at which the arrow is pointing.
- FIG. 1 is a flowchart of a method of enhancing an image, in accordance with an embodiment of the present disclosure.
- a method 100 of enhancing an image there is shown a method 100 of enhancing an image.
- the method 100 includes steps 102 to 112.
- the method 100 is executed by an apparatus, described in detail, for example, in FIG. 2.
- the present disclosure provides the method 100 for enhancing an image, wherein the method 100 comprises:
- the present disclosure provides the method 100 for enhancing an image.
- the method 100 is based on a generative adversarial network (GAN).
- GAN generative adversarial network
- the generative adversarial network may be configured to generate multiple output images with respect to an input image by use of an image generator.
- the generated multiple output images manifest improved visual quality and provide useful image features which are required in order to make a perception in real time or near real-time.
- the generative adversarial network may be further configured to include a discriminator which supports the image generator to generate realistic outputs.
- the method 100 comprises generating an input image by concatenating a raw input image with a sharpened attention map or with a depth map.
- the raw input image corresponds to an image captured in one of a hazy, foggy, sunny, rainy, dark, snowy environment or other adverse weather or illumination condition.
- the raw input image may also be referred to as a degraded image because the raw input image does not reveal useful image features suitable for a practical application or purpose.
- the raw input image may be captured by a camera mounted on an autonomous vehicle, which may not reveal features that are required to lead a safe autonomous driving in different weather and illumination conditions.
- the raw input image including one or more objects may be captured by a hand-held device, such as a smartphone, in a low -light and rainy environment, in which the captured one or more objects may be unclear. Therefore, in order to obtain useful features, such as object shape(s) and edges, etc., from the raw input image, the raw input image is processed using the sharpened attention map or the depth map which results into the input image, which is input for an encoder in next operation.
- the input image manifests improved features, such as an improved visual quality, object shape(s) and edges details in comparison to the raw input image.
- the sharpened attention map may be defined as a scalar matrix that represents a relative importance of multiple layer activations at different two dimensional (2D) spatial locations with respect to a target task (e.g., an output clear image).
- the depth map may be defined as an image or image channel that provides an information relating to a distance of a surface(s) of scene objects from a viewpoint.
- the sharpened attention map or the depth map may be obtained either from a pretrained model or jointly trained within a network in an unsupervised manner.
- the method 100 further comprises generating a bottleneck feature by encoding the input image using an encoder.
- the input image is sent to the encoder that results to generation of the bottleneck feature.
- the bottleneck feature refers to an encoded feature map extracted from the input image, which has smaller spatial size but greater number of channels than the input image.
- the encoder is described in detail, for example, in FIG. 2.
- the method 100 further comprises injecting a perturbation vector into the bottleneck feature.
- a perturbation vector into the bottleneck feature.
- multimodal image outputs may be generated.
- the multimodal image outputs are generated by altering an output image appearance.
- the perturbation vector is sampled from a Gaussian distribution.
- the perturbation vector corresponds to a six-dimensional perturbation vector sampled from the Gaussian distribution.
- the Gaussian distribution may be defined as a bell-shaped curve that follow a normal distribution with an equal number of measurements above and below a mean value.
- the perturbation vector is up-sampled by use of a multi-layer perceptron (MLP) network.
- MLP multi-layer perceptron
- the MLP network is described in detail, for example, in FIG. 3A.
- the up-sampled perturbation vector is injected into the bottleneck feature by use of an adaptive instance normalization (Adaln) approach.
- the adaptive instance normalization (Adaln) approach induces an impact on an image generation phase and produces multimodal output images.
- the adaptive instance normalization (Adaln) approach is described in more detail, for example, in FIG. 3A.
- the perturbation vector is updated based on pretrained network weights.
- perturbation vector has no predetermined networks weights, and is a random vector sampled from Gaussian distribution during each training step.
- the perturbation vector is updated based on the pretrained (i.e., fixed) network weights, followed by adjustment (or fine-tuning) of the perturbation vector for searching an optimal solution, i.e., the improved output image.
- the pretrained network weights are based on the pretrained encoder, the image generator as well as discriminator network weights. Once the training is completed, such network weights are fixed, and then the fine-tuning of the perturbation vector is carried out.
- the perturbation vector is adjusted, using gradient descent, by reducing an adversarial loss, a Frechet Inception Distance score and a Structural Similarity Index score.
- the perturbation vector is randomly initialized but then adjusted using gradient descent by minimizing various losses, such as the adversarial loss, the Frechet Inception Distance (FID) score and the Structural Similarity Index (SSI) score.
- the adversarial loss may be defined as a difference between a ground truth data (or original data or a source data) and a generated data computed by use of a generative adversarial network (GAN).
- the FID score may be defined as a metric that calculates a distance between feature vectors calculated for real and generated images with help of a pretrained Inception Network.
- the SSI score may also be referred to as a structural similarity index measure (SSIM).
- the SSIM may be defined as a method for predicting a perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos.
- the SSIM may be used for measuring similarity between two images.
- the updated perturbation vector (or a resulting perturbation vector) finally converges to a point where the updated perturbation vector may cooperate with a decoder (or an image generator) in order to generate a visually pleasing output image.
- the updated perturbation vector (or the resulting perturbation vector) can be applied to one or more images (e.g., test images).
- the method 100 further comprises feeding the bottleneck feature, with the perturbation vector injected, to an image generator.
- the bottleneck feature with the perturbation vector injected is further fed to the image generator.
- the image generator may be configured to generate multiple output images which provides an improved visual quality and useful image features which are required in order to make a perception in real-time or near real-time.
- the method 100 further comprises generating an enhanced image from the bottleneck feature and the perturbation vector at the image generator.
- the image generator may be configured to produce the enhanced image (i.e., the output image) at each iteration by use of the bottleneck feature with the perturbation vector injected.
- the use of a different perturbation vector in each iteration leads to a change in various aspects, such as an illumination or a brightness control of the enhanced image.
- a gaussian solution space for altering appearance of the enhanced image is created and it is made possible to control the image enhancement outputs in a testing phase.
- the enhanced image may be sent back to the encoder to force the encoder to produce the same encoded feature (e.g., another similar bottleneck feature) as the raw input image. Therefore, there is a LI norm feature reconstruction loss between those two encoded features (e.g., two bottleneck features).
- the method 100 further comprises, at a discriminator, receiving the enhanced image and a randomly chosen clear image from the dataset of clear images, and determining a score of image enhancement based on a comparison of the enhanced image with the randomly chosen clear image.
- the comparison is in terms of realism of the enhanced image with respect to the randomly chosen clear image.
- the discriminator is configured to determine that whether the enhanced image is a fake image or a real clear image.
- the discriminator is a gradient-based multi-patch discriminator.
- the gradient-based multi-patch discriminator includes multiple network branches and hence, provides an improved output image quality.
- the method 100 further comprises feeding the discriminated score back to the encoder and the image generator.
- the image quality gradually improves.
- a perceptual loss between the raw input image and the enhanced image is computed based on a pretrained Convolutional Neural Network (CNN), such as a VGG neural network, to preserve structure in feature level.
- CNN Convolutional Neural Network
- the discriminator comprises at least the following three network branches:
- a third network branch acquiring a Laplacian of gaussian blurred generator output, wherein a result from the discriminator is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
- the gaussian blurred generator output is responsible for bringing an illumination distribution in an output image that is closer to a target data (i.e., the enhanced image) distribution.
- the identity image acquired by the second network branch is similar to an image generated by a standard discriminator (e.g., Naive discriminator).
- the Laplacian of Gaussian (LoG) blurred generator output is responsible for generating sharper edges in the output image that is closer to the target data (i.e., the enhanced image) distribution.
- the LoG may be defined as a two- dimensional isotropic measure of a second spatial derivative of an image. Alternatively stated, the LoG highlight regions of rapid intensity change in an image and is often adopted for edge detection.
- the summation of the generated outputs, from the three network branches after each convolutional layer, is obtained from the discriminator in order to make a prediction of real or fake for different patches in the output image. In this way, the discriminator provides an improved output image quality.
- the method 100 generates a controllable solution space (or a set of multiple output images) for image enhancement instead of producing only one output image without providing its optimality.
- a controllable solution space or a set of multiple output images
- the method 100 provides multiple output images and makes possible to work with an optimal output image, hence, manifests an improved reliability and efficiency.
- the method 100 uses the sharpened attention map or the depth map and hence, provides a better guidance in order to produce clearer and brighter output images.
- the method 100 performs multiple image enhancement tasks, such as image dehazing, image deblurring, and/or low light enhancement, for holistic and overall enhancement of an image captured in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment.
- image enhancement tasks such as image dehazing, image deblurring, and/or low light enhancement
- the method 100 makes various features of such enhanced images very revealing, which are useful to make a perception, even in different weather and illumination conditions in order to promote safe autonomous driving.
- the method 100 reduces processing complexity and improves quality of the output images.
- the steps 102 to 112 are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
- FIG. 2 is a block diagram that illustrates various exemplary components of an apparatus, in accordance with an embodiment of the present disclosure.
- the apparatus 202 comprises a sharpened attention or a depth map 204, an encoder 206, an image generator 208, a memory 210, a discriminator 212 and a processor 214.
- the apparatus 202 includes suitable logic, circuitry, interfaces and/or code that is configured for enhancing an image.
- the apparatus 202 is configured to execute the method 100 (of FIG. 1).
- Examples of the apparatus 202 include, but are not limited to, a hand-held device or an electronic device or component which can be mounted on a vehicle (e.g., an autonomous vehicle or a semi-autonomous vehicle), a mobile device, a portable device, and the like, which is operable to perform the method 100.
- the sharpened attention map or depth map 204 is configured to provide a better guidance in order to produce an output image(s) with higher visual quality in comparison to a conventional scattering model.
- the sharpened attention map or depth map 204 may be a software program or a mathematical expression or an application that may be installed in the apparatus 202.
- the encoder 206 (also represented as Enc) includes suitable logic, circuitry, interfaces and/or code that may be defined as a network (e.g., a convolutional neural network (CNN), or a recurrent neural network (RNN), etc.) that takes an input data (e.g., an image), and provides an output data (e.g., an output image) in terms of a feature map or a vector or a tensor which represent a latent information of the input data.
- a network e.g., a convolutional neural network (CNN), or a recurrent neural network (RNN), etc.
- CNN convolutional neural network
- RNN recurrent neural network
- Examples of the encoder 206 include, but are not limited to, a recursive neural network, a feed-forward neural network, a deep-belief network, and a convolutional deep-belief network, a self-organizing map, a deep Boltzmann machine, and a stacked de-noising auto-encoder and the like.
- the image generator 208 (also represented as G) includes suitable logic, circuitry, interfaces and/or code that is configured to generate one or more enhanced clear output images that are close to a real data distribution.
- the image generator 208 may also be defined as a network which is configured to either reconstruct the input data (i.e., the input image) from the feature map or change the feature map to a different but related representation.
- the image generator 208 may also be referred to as a decoder.
- the memory 210 includes suitable logic, circuitry, or interfaces that is configured to store the instructions executable by the processor 214.
- the memory 210 may also be configured to store a dataset of clear images. Examples of implementation of the memory 210 may include, but are not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), or CPU cache memory.
- the memory 210 may store an operating system or other program products (including one or more operation algorithms) to operate the apparatus 202.
- the discriminator 212 (also represented as D) includes suitable logic, circuitry, interfaces and/or code that is configured to determine a score of image enhancement based on a comparison of the one or more enhanced clear output images received from the image generator 208 with a randomly chosen clear image from the data set of clear images stored in the memory 210. In other words, a realism of the one or more enhanced clear output images is checked or compared with respect to the randomly chosen clear image.
- the processor 214 includes suitable logic, circuitry, interfaces and/or code that is configured to execute the instructions stored in the memory 210.
- the processor 214 may be a general-purpose processor.
- Other examples of the processor 214 may include, but is not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application- specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a central processing unit (CPU), a state machine, a data processing unit, and other processors or control circuitry.
- the processor 214 may refer to one or more individual processors, processing devices, a processing unit that is part of a machine, such as the apparatus 202.
- the apparatus 202 for enhancing an image wherein the apparatus 202 is configured to generate an input image by concatenating a raw input image with the sharpened attention map or with the depth map 204.
- the apparatus 202 is further configured to generate a bottleneck feature by encoding the input image using the encoder 206.
- the apparatus 202 is further configured to inject a perturbation vector into the bottleneck feature.
- the apparatus 202 is further configured to feed the bottleneck feature, with the perturbation vector injected, to the image generator 208.
- the apparatus 202 is further configured to generate an enhanced image from the bottleneck feature and the perturbation vector at the image generator 208.
- the apparatus 202 is further configured to, at the discriminator 212, receive the enhanced image and a randomly chosen clear image from the dataset of clear images, and determine a score of image enhancement based on a difference between the enhanced image and the randomly chosen clear image.
- the perturbation vector is sampled from a Gaussian distribution.
- the perturbation vector is a six-dimensional perturbation vector sampled from the Gaussian distribution.
- the perturbation vector is updated based on pretrained network weights.
- the perturbation vector works with the pretrained network weights of the encoder 206 or the image generator 208.
- the apparatus 202 is configured to start processing with a random perturbation vector.
- the perturbation vector is adjusted, using gradient descent by reducing adversarial loss, Frechet Inception Distance score or Structural Similarity Index score.
- values of the random perturbation vector are updated using the gradient descent by minimizing various losses, such as the adversarial loss, the Frechet Inception Distance (FID) score and the Structural Similarity Index (SSI) score.
- the discriminator 212 is a gradient-based multi-patch discriminator.
- the discriminator 212 (or gradient-based multi-patch discriminator) includes multiple network branches and hence, provides an improved output image quality.
- the discriminator 212 comprises three network branches:
- a third network branch acquiring the Laplacian of Gaussian of the generated output, wherein a result from the discriminator 212 is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
- the discriminator 212 is configured to make a prediction of real or fake for different patches in the output image.
- a computer program comprising program code which when executed by a computer causes the computer to perform the method 100.
- the computer include, but is not limited to, a laptop computer, an electronic control unit (ECU) in a vehicle, or an onboard computer of a vehicle, a desktop computer, a mainframe computer, a hand-held computer, the processor 214, and other computing devices.
- ECU electronice control unit
- the apparatus 202 generates a controllable solution space (or a set of multiple output images) for image enhancement instead of producing only one output image without providing its optimality.
- a controllable solution space or a set of multiple output images
- the apparatus 202 provides multiple output images and makes possible to work with an optimal output image, hence, manifests an improved reliability and efficiency.
- the apparatus 202 uses the sharpened attention map or the depth map and hence, provides a better guidance in order to produce clearer and brighter output images.
- the apparatus 202 performs multiple image enhancement tasks, such as image dehazing, image deblurring, and/or low light enhancement, for holistic and overall enhancement of an image captured in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment.
- image enhancement tasks such as image dehazing, image deblurring, and/or low light enhancement
- the apparatus 202 makes various features of such enhanced images very revealing, which are useful to make a perception, even in different weather and illumination conditions in order to promote safe autonomous driving.
- FIG. 3A is a schematic representation of learning (or training) an image dehazing model, in accordance with an embodiment of the present disclosure.
- FIG. 3A is described in conjunction with elements from FIGs. 1 and 2.
- a schematic representation 300A that includes a raw input image 302, a densely connected block 304, a bottleneck feature 306, a perturbation vector 308, a multi-layer perceptron (MFP) network 310, a residual block 312, an output image (i.e., an enhanced clear output image 314), another bottleneck feature 316, and a convolutional network 318.
- MFP multi-layer perceptron
- the raw input image 302 (also represented as Xi nput ) corresponds to an image captured in a hazy environment during autonomous driving. Therefore, the raw input image 302 (i.e., Xinput) may be referred to as a degraded image that does not reveal a number of parameters or features which are useful to lead a safe autonomous driving.
- the raw input image 302 e.g., a hazy input image
- the schematic representation 300A is equally applicable to an image that is either blurry or captured in foggy, sunny, rainy, dark, or snowy environment.
- the sharpened attention map or the depth map 204 corresponds to a sharpened attention or depth estimation model which is configured to process the raw input image 302 (i.e., Xinput) and provide an input image (e.g., a sharpened image).
- the input image i.e., the sharpened image
- the input image i.e., the sharpened image
- the encoder 206 i.e., Enc
- the densely connected block 304 may be referred to as one or more convolutional layers in a convolutional neural network (CNN) which is used to utilize the bottleneck feature received from the encoder 206 (i.e., Enc) for training for image classification.
- the one or more convolutional layers are connected (or convolved) with each other in a feed-forward fashion either through multiplication or by dot product.
- the densely connected block 304 i.e., DenseBlk
- the bottleneck feature 306 (also represented as C) refers to an encoded feature map of the input image (encoded by the encoder 206), which has smaller spatial size but greater number of channels than the input image.
- the bottleneck feature 306 (i.e., C) may also be referred to as a bottleneck content feature that represents the content feature of the input image in encoded form.
- the perturbation vector 308 refers to a six-dimensional perturbation vector.
- the perturbation vector 308 is sampled from Gaussian distribution.
- the perturbation vector 308 is up-sampled by use of a multi-layer perceptron (MLP) network 310.
- the MLP network 310 is defined as a class of feed forward artificial neural network (ANN).
- the MLP network 310 includes multiple neural network layers (e.g., an input layer, an output layer and one or more hidden layers) of non-linearly activating nodes.
- the MLP network 310 is a fully connected network therefore, each node in one layer is connected with a certain weight to every other node in another layer.
- the perturbation vector 308 (i.e., the up-sampled perturbation vector) is injected into the bottleneck feature by use of an adaptive instance normalization (Adaln) approach.
- Adaln adaptive instance normalization
- the Adaln approach induces an impact on an image generation phase and produces multimodal output images.
- the Adaln approach is generally used for image style transfer and image generation tasks to alter the appearance of the raw input image 302 (i.e., Xi nput ).
- the residual block 312 (also represented as ResBlk) includes two convolutional layers (e.g., an input layer and an output layer) with a skip connection between the two convolutional layers which allows identity mapping.
- the residual block 312 i.e., ResBlk
- ResBlk is configured to prevent gradient vanishing and provide better feature representations of the bottleneck feature.
- the bottleneck feature with the perturbation vector 308 i.e., the up-sampled perturbation vector
- the image generator 208 i.e., G
- the image generator 208 is configured to generate an enhanced clear output image 314 (also represented as Xciear) from the bottleneck feature with the up-sampled perturbation vector 308 injected, in each training iteration, and adding the enhanced clear output image 314 (i.e., Xciear) to a dataset of clear images stored in the memory 210.
- the other bottleneck feature 316 (also represented as C’) is obtained by encoding the enhanced clear output image 314 (i.e., Xciear) by use of the encoder 206 (i.e., Enc) followed by processing the enhanced clear output image 314 (i.e., Xciear) through the sharpened attention map or the depth map 204.
- the other bottleneck feature 316 (i.e., C’) is configured to be consistent with the bottleneck feature 306 (i.e., C) by a LI norm loss.
- the discriminator 212 is configured to receive the enhanced clear output image 314 (i.e., Xciear) from the image generator 208 (i.e., G) and a randomly chosen clear image from the dataset of clear images stored in the memory 210.
- the discriminator 212 i.e., D
- the discriminator 212 is further configured to determine a score of image enhancement based on a comparison between the enhanced clear output image 314 (i.e., Xciear) and the randomly chosen clear image.
- the convolutional network 318 corresponds to a VGG network (e.g., a VGG16, or a VGG 19) which is used for image classification and detection of image features.
- the raw input image 302 i.e., Xinput
- the enhanced clear output image 314 i.e., Xciear
- a perceptual loss i.e., Lpercep
- the convolutional network 318 is independent from the whole framework, and is only used to add an extra loss to the whole framework.
- the raw input image 302 (i.e., Xinput) is concatenated with the sharpened attention map or the depth map 204 and communicated to the encoder 206 (i.e., Enc) and the densely connected block 304 (i.e., DenseBlk) to obtain the bottleneck feature 306 (i.e., C).
- the perturbation vector 308 is up-sampled with the MLP network 310 and integrated with the bottleneck feature 306 (i.e., C) via the Adaln approach which is used at the MLP network 310.
- the bottleneck feature 306 (i.e., C) is fed to the image generator 208 (i.e., G) via the residual block 312 (i.e., ResBlk).
- the image generator 208 i.e., G
- the image generator 208 is configured to generate the enhanced clear output image 314 (i.e., Xciear) at each iteration.
- a few parameters such as illumination or brightness contrast of the enhanced clear output image 314 (i.e., Xciear) will change according to the different perturbation vector 308 sampled from the Gaussian distribution.
- the different perturbation vector 308 leads to the creation of a Gaussian solution space of multiple output images for image enhancement instead of only one output image and makes it possible to work with an optimal output image (such as the enhanced clear output image 314).
- the enhanced clear output image 314 i.e., Xciear
- the discriminator 212 i.e., D
- the image generator 208 i.e., G
- the image generator 208 is configured to improve itself by minimizing adversarial losses of the discriminator 212 (i.e., D).
- the enhanced clear output image 314 (i.e., Xciear) is further concatenated with the sharpened attention map or the depth map 204 and encoded by use of the encoder 206 (i.e., Enc) and the densely connected block 304 (i.e., DenseBlk) to obtain the other bottleneck feature 316 (i.e., C’).
- the other bottleneck feature 316 (i.e., C’) is configured to be consistent with the bottleneck feature 306 (i.e., C) by a LI norm loss (e.g., Crecon).
- the raw input image 302 i.e., Xinput
- the enhanced clear output image 314 i.e., Xciear
- a perceptual loss e.g., L percep
- VGG convolutional network 318
- controllable solution space of output images (or enhanced images) is generated and searched for an optimal output image from the controllable solution space of output images (or enhanced images).
- two approaches are used, which are described in detail, for example, in FIG. 3F.
- the schematic representation 300A is used for learning (or training) of the image dehazing model.
- the schematic representation 300A may also be used for image deblurring or low light enhancement or for enhancement of an image captured in foggy, sunny, rainy, dark or snowy environments.
- FIG. 3B is a schematic representation of an encoder, in accordance with an embodiment of the present disclosure.
- FIG. 3B is described in conjunction with elements from FIGs. 1, 2, and 3 A.
- a schematic representation 300B of the encoder 206 (of FIG. 2).
- the encoder 206 includes an inception residual block 320.
- the inception residual block 320 includes a plurality of lxl convolutional blocks 320A and a plurality of 3x3 convolutional blocks 320B.
- the inception residual block 320 may be defined as a convolutional block that combines multiple convolutional branches, being able to capture various features at different patch sizes of an image (such as the bottleneck feature used in the schematic representation 300A).
- the encoder 206 includes the inception residual block 320 in contrast to a conventional residual block and hence, provides an improved encoded feature map which incorporates both local and global image contents.
- the improved encoded feature map is obtained due to the plurality of lxl convolutional blocks 320A and the plurality of 3x3 convolutional blocks 320B comprised by the inception residual block 320 of the encoder 206.
- FIG. 3C is a schematic representation of a densely connected block, in accordance with an embodiment of the present disclosure.
- FIG. 3C is described in conjunction with elements from FIGs. 1, 2, 3A, and 3B.
- FIG. 3C there is shown a schematic representation 300C of the densely connected block 304 (of FIG. 3A).
- the densely connected block 304 includes the plurality of 3x3 convolutional blocks 320B.
- the conventional residual block is replaced with the densely connected block 304 (i.e., DenseBlk).
- the densely connected block 304 i.e., DenseBlk
- each of the plurality of 3x3 convolutional blocks 320B are connected with each other and hence, provides improved feature representations of the bottleneck feature.
- FIG. 3D is a schematic representation of an encoder-decoder structure with Gaussian perturbation vector, in accordance with an embodiment of the present disclosure.
- FIG. 3D is described in conjunction with elements from FIGs. 1, 2, 3A, 3B, and 3C.
- the encoder-decoder structure with Gaussian perturbation vector 322 includes the encoder 206 and the image generator 208 (also known as decoder).
- the encoder-decoder structure with Gaussian perturbation vector 322 further includes the densely connected block 304, the bottleneck feature 306, the perturbation vector 308, the MLP network 310, and the residual block 312.
- the encoder-decoder structure with Gaussian perturbation vector 322 includes the densely connected block 304 (i.e., DenseBlk), the perturbation vector 308, and the MLP network 310 and provides a Gaussian solution space of multiple output images (or multimodal output images) for image enhancement instead of only one output image and makes possible to work with an optimal output image (such as the enhanced clear output image 314).
- FIG. 3E is a schematic representation of a discriminator, in accordance with an embodiment of the present disclosure.
- FIG. 3E is described in conjunction with elements from FIGs. 1, 2, 3A, 3B, 3C, and 3D.
- the discriminator 212 includes a first network branch 212A, a second network branch 212B and a third network branch 212C.
- the discriminator 212 i.e., D
- Each of first convolutional layer 324A, the second convolutional layer 324B, and the third convolutional layer 324C may also be referred to as a convolutional neural network (CNN).
- CNN convolutional neural network
- the convolutional neural network (CNN) may be defined as a highly interconnected network of processing elements. Each element is optionally associated with a local memory (i.e., the memory 210) and used for image recognition and processing (such as for processing of the enhanced clear output image 314).
- the first network branch 212A is configured to acquire a Gaussian blurred generator output 326A which is responsible for bringing an illumination distribution in the enhanced clear output image 314 that is closer to a target data distribution.
- the second network branch 212B is configured to acquire an identity image 326B that is similar to an image generated by a standard discriminator (e.g., Naive discriminator).
- a standard discriminator e.g., Naive discriminator
- the third network branch 212C is configured to acquire a Laplacian of Gaussian (LoG) blurred generator output 326C that is responsible for generating sharper edges in the enhanced clear output image 314 that is closer to the target data distribution.
- LiG Laplacian of Gaussian
- the output image 328 is obtained from the discriminator 212 (i.e., D) after a summation of the generated outputs (i.e., 326A, 326B, and 326C) from the three network branches (i.e., the first network branch 212A, the second network branch 212B and the third network branch 212C) after each convolutional layer (i.e., the first convolutional layer 324A, the second convolutional layer 324B, and the third convolutional layer 324C).
- the output image 328 is perceived either as a real image or a fake image based on different patches formed in the enhanced clear output image 314 and the output image 328.
- the real image means an image which resembles in all image features with respect to an input image such as the enhanced clear output image 314 (may be also referred to as enhanced image).
- a fake image may be described as an image which may be generated artificially by use of a software tool.
- the discriminator 212 provides an improved output image quality in comparison to a conventional discriminator that uses only one network branch to generate an output image, hence, manifests reduced image quality.
- FIG. 3F is a schematic representation of fine tuning to obtain an optimal output image, in accordance with an embodiment of the present disclosure.
- FIG. 3F is described in conjunction with elements from FIGs. 1, 2, 3A, 3B, 3C, 3D, and 3E. With reference to FIG. 3F, there is shown a schematic representation 300F of fine tuning to obtain an optimal output image from a set of multiple clear output images.
- the perturbation vector 308 is sampled from the Gaussian distribution.
- a grid search is adopted by interpolating the values of every two dimensions in the Gaussian perturbation and checked for an image with the best visual quality.
- the corresponding perturbation vector is applied to all test images.
- the perturbation vector 308 is updated based on pretrained network weights.
- weights associated with encoder 206, the image generator 208, the densely connected block 304, the bottleneck feature 306, the residual block 312 and the discriminator 212 have fixed value.
- processing starts with a random perturbation vector and values of the random perturbation vector are updated using the gradient descent by minimizing various losses, such as the adversarial loss, the Frechet Inception Distance (FID) score and the Structural Similarity Index (SSI) score.
- the updated perturbation vector (or a resulting perturbation vector) finally converges to a point where the updated perturbation vector may cooperate with the image generator 208 in order to generate a visually pleasing output image.
- the updated perturbation vector (or the resulting perturbation vector) can further be applied to one or more test images.
- FIG. 4 is an illustration of an exemplary scenario of implementation of a method and apparatus for image enhancement, in accordance with an embodiment of the present disclosure.
- FIG. 4 is described in conjunction with elements from FIGs. 1, 2, and 3A-3F.
- FIG. 4 there is shown an exemplary scenario 400 that describes a practical application of the disclosed method and apparatus (FIGs. 1 and 2) in autonomous driving.
- a vehicle 402 moving along a road portion.
- the vehicle 402 may include many other known components typically used in an autonomous vehicle, which are omitted here for the sake of brevity.
- the vehicle 402 may include a battery to power the image-capturing device 406 and the electronic component 404.
- the vehicle 402 may be an autonomous or semi-autonomous vehicle.
- the electronic component 404 may include suitable logic, circuitry, interfaces, and/or code configured to perform multimodal image enhancement that adequately and holistically enhances images captured under different weather and illumination conditions.
- the electronic component 404 is configured to process and enhance the images in real-time or near real-time captured by the image-capturing device 406 in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment, by performing various image enhancement tasks, such as image dehazing, image deblurring, and low light enhancement in a holistic manner to promote safe autonomous driving.
- the electronic component 404 makes a number of features accurately revealing in such enhanced images, which are useful to make an accurate perception of the real-word environment around the vehicle 40, even in different weather and illumination conditions. This in turn leads to safe autonomous driving of the vehicle 402.
- Examples of the electronic component 404 include, but are not limited to, an electronic control unit (ECU), an in-vehicle device, onboard computer, or other electronic component in the vehicle 402.
- the electronic component 404 may correspond to the apparatus 202 (of FIG. 2), where the electronic component 404 is configured to execute the method 100 (of FIG. 1).
- the electronic component 404 may be configured to perform various image enhancement tasks, simultaneously, during driving by the vehicle 402.
- the electronic component 404 may be configured to use the perturbation vector 308 sampled from the Gaussian distribution, which further leads to generation of multiple clear output images with respect to one raw input image (i.e., a degraded image), and hence, an optimal output image (e.g., an enhanced and improved image) can be selected from the multiple clear output images. Therefore, the electronic component 404 provides the optimal output image (i.e., the enhanced and improved image) to make the perception and results into a reliable and safe driving by the vehicle 402. Additionally, the electronic component 404 may be configured to use the discriminator 212 (i.e., the gradient based multi-patch discriminator) which provides the multiple clear output images with further improved visual quality.
- the discriminator 212 i.e., the gradient based multi-patch discriminator
- the apparatus 202 may be implemented as a handheld device which is operable to execute the method 100 (of FIG. 1).
- the handheld device may be a smartphone, which can adequately process one or more images captured under different weather and illumination conditions using the method 100 to generate enhanced images, such as clear output images which are perceived as high-quality, real, and photo-like images by a human eye.
- the method 100 provides an ability to the handheld device to make a prediction of "real" or "fake” for different patches in the output image, thereby providing improved output image quality.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
A method for enhancing an image, includes generating an input image by concatenating a raw input image with a depth map. The method includes generating a bottleneck feature by encoding the input image using an encoder and injecting a perturbation vector into the bottleneck feature. The bottleneck feature is fed with the perturbation vector injected, to an image generator. The method further includes generating an enhanced image from the bottleneck feature and the perturbation vector at the image generator. The method further includes, at discriminator, receiving the enhanced image and a randomly chosen clear image from a dataset of clear images, and determining a score of image enhancement based on a comparison of the enhanced image with the randomly chosen clear image. The method provides multiple output images, reduces processing complexity, and improves quality of the output images.
Description
METHOD AND APPARATUS FOR IMAGE ENHANCEMENT
TECHNICAL FIELD
The present disclosure relates generally to the field of computer vision and machine learning; and more specifically, to a method and an apparatus for multimodal image enhancement in an unsupervised manner.
BACKGROUND
Currently, dealing with images those are captured under different weather and illumination conditions (e.g., sunny, rainy, foggy, dark, blurry, or snowy condition) for image enhancement is a prominent technical challenge. For example, in autonomous driving, it requires to capture images under different weather and illumination conditions and process such images for image enhancement in real-time or near real-time in order to promote a safe autonomous driving, which is a big challenge. The reason is that a number of features of such images which are useful to make a perception, are usually unrevealed due to different weather and illumination conditions.
Currently, certain approaches have been proposed to deal with such images. For example, a conventional method uses paired data of input images and their corresponding output clear images to train a conventional model, which is configured to map the input images to their corresponding output clear images. However, the use of a labelled data set (e.g., the input images and their corresponding output clear images) for such supervised learning of the conventional model requires extensive efforts. There is another conventional method of image dehazing, which is based on a conventional atmospheric scattering model (i.e., a physics-based model) instead of the paired data used in the supervised learning. However, the conventional method of image dehazing is task- specific and difficult to generalize and apply in other image enhancement tasks, such as image deblurring, low light enhancement, etc. Moreover, in the conventional methods of image enhancement, only one output image is obtained from an input image, which is less informative, efficient, and thus of limited use. Thus, there exists a technical problem of how to adequately and holistically enhance images captured under different weather and illumination conditions for practical applications, such as to promote safe autonomous driving, as the current methods are complex, inefficient, and
task-specific image enhancement methods of limited use and not suited for holistic image enhancement.
Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional methods of image enhancement.
SUMMARY
The present disclosure provides a method and an apparatus for a multimodal image enhancement in order to promote a safe autonomous driving. The present disclosure provides a solution to the existing problem of how to adequately and holistically enhance images captured under different weather and illumination conditions as the current methods are complex, inefficient, and task-specific image enhancement methods of limited use and not suited for holistic image enhancement. An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art, and provides a method and an apparatus for a multimodal image enhancement that adequately and holistically enhances images captured under different weather and illumination conditions for various practical applications, such as to promote safe autonomous driving.
One or more objects of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.
In one aspect, the present disclosure provides a method for enhancing an image, wherein the method comprises generating an input image by concatenating a raw input image with a sharpened attention map or with a depth map. The method further comprises generating a bottleneck feature by encoding the input image using an encoder. The method further comprises injecting a perturbation vector into the bottleneck feature. The method further comprises feeding the bottleneck feature, with the perturbation vector injected, to an image generator. The method further comprises generating an enhanced image from the bottleneck feature and the perturbation vector at the image generator. The method further comprises at a discriminator, receiving the enhanced image and a randomly chosen clear image from the dataset of clear images, and determining a score of image enhancement based on a comparison between the enhanced image and the randomly chosen clear image.
The disclosed method generates a controllable solution space (or a set of multiple output images) for image enhancement instead of producing only one output image without providing its optimality. By considering an amount of uncertainty in the raw input image, it is more reasonable to create the controllable solution space, and allowing search for an optimal solution (e.g., improved and enhanced output image). Therefore, the disclosed method provides multiple output images and makes possible to work with an optimal output image, hence, manifests an improved reliability and efficiency. The disclosed method uses the sharpened attention map or the depth map and hence, provides a better guidance in order to produce clearer and brighter output images. Moreover, the disclosed method provides a unified approach for multiple image enhancement tasks (e.g., dehazing, deblurring, low-light enhancement, and the like), dealing with images captured in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment, in order to perform unsupervised, controllable, and holistic image enhancement of such images. For instance, in one exemplary practical application, the method makes various image features of such enhanced images very revealing, which are useful to make a perception, even in different weather and illumination conditions in order to promote safe autonomous driving.
In an implementation form, the method further comprises feeding the discriminated score back to the encoder and the image generator.
By virtue of feeding the discriminated score back to the encoder and the image generator, the image quality gradually improves. Besides, a perceptual loss between the raw input image and the enhanced image is computed based on a pretrained Convolutional Neural Network (CNN), such as a VGG neural network, to enable preserve structure at feature level.
In a further implementation form, the perturbation vector is sampled from a Gaussian distribution
The perturbation vector sampled from the Gaussian distribution leads to generation of multimodal outputs (or multiple output images) creating a solution space of output images for improved image enhancement.
In a further implementation form, the perturbation vector is updated based on pretrained network weights.
The perturbation vector is used to determine an optimal output image (i.e., an improved output) from the controllable solution space of output images. During training, perturbation vector has no predetermined networks weights, and is a random vector sampled from Gaussian distribution during each training step. However, once the training is completed, the perturbation vector is updated based on the pretrained (i.e., fixed) network weights, followed by adjustment (or fine-tuning) of the perturbation vector for searching an optimal solution, i.e., the improved output image.
In a further implementation form, the perturbation vector is adjusted, using gradient descent, by reducing an adversarial loss, a Frechet Inception Distance score or a Structural Similarity Index score.
The perturbation vector is randomly initialized but then adjusted using gradient descent by reducing the adversarial loss, the Frechet Inception Distance score or the Structural Similarity Index score, where the adjusted (or fine-tuned ) perturbation vector results into an improved or the most visually pleasing image.
In a further implementation form, the discriminator is a gradient-based multi-patch discriminator.
The use of the gradient-based multi-patch discriminator results into a further improvement in quality of the output images.
In a further implementation form, the discriminator comprises at least the following three network branches: a first network branch acquiring a gaussian blurred generator output, a second network branch acquiring an identity image, and a third network branch acquiring a Laplacian of a gaussian blurred generator output, wherein a result from the discriminator is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
The use of the three network branches of the discriminator provides an improved illumination control and sharp edges information in the enhanced image.
In another aspect, the present disclosure provides an apparatus for enhancing an image, wherein the apparatus is configured to generate an input image by concatenating a raw input image with a sharpened attention map or with a depth map. The apparatus is further
configured to generate a bottleneck feature by encoding the input image using an encoder. The apparatus is further configured to inject a perturbation vector into the bottleneck feature. The apparatus is further configured to feed the bottleneck feature, with the perturbation vector injected, to an image generator. The apparatus is further configured to generate an enhanced image from the bottleneck feature and the perturbation vector at the image generator. The apparatus, at a discriminator, is further configured to receive the enhanced image and a randomly chosen clear image from the dataset of clear images, and determine a score of image enhancement based on a difference between the enhanced image and the randomly chosen clear image.
The apparatus of the present disclosure achieves all the advantages and effects of the method.
In a yet another aspect, the present disclosure provides a computer program comprising program code which when executed by a computer causes the computer to perform the method.
The computer achieves all the advantages and effects of the method of the present disclosure after execution of the method.
In a yet another aspect, the present disclosure provides an electronic component to be mounted to a vehicle, the electronic component operable to perform the method.
The electronic component computer achieves all the advantages and effects of the method of the present disclosure after executing the method.
It is to be appreciated that all the aforementioned implementation forms can be combined.
It has to be noted that all devices, elements, circuitry, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and
functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
FIG. 1 is a flowchart of a method for enhancing an image, in accordance with an embodiment of the present disclosure;
FIG. 2 is a block diagram that illustrates various exemplary components of an apparatus, in accordance with an embodiment of the present disclosure;
FIG. 3A is a schematic representation of learning (or training) an image dehazing model, in accordance with an embodiment of the present disclosure;
FIG. 3B is a schematic representation of an encoder, in accordance with an embodiment of the present disclosure;
FIG. 3C is a schematic representation of a densely connected block, in accordance with an embodiment of the present disclosure;
FIG. 3D is a schematic representation of an encoder-decoder structure with Gaussian perturbation vector, in accordance with an embodiment of the present disclosure;
FIG. 3E is a schematic representation of a discriminator, in accordance with an embodiment of the present disclosure;
FIG. 3F is a schematic representation of fine tuning to obtain an optimal output image, in accordance with an embodiment of the present disclosure; and
FIG. 4 is an illustration of an exemplary scenario of implementation of a method and apparatus for image enhancement, in accordance with an embodiment of the present disclosure.
In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the non- underlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
DETAILED DESCRIPTION OF EMBODIMENTS
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
FIG. 1 is a flowchart of a method of enhancing an image, in accordance with an embodiment of the present disclosure. With reference to FIG. 1, there is shown a method 100 of enhancing an image. The method 100 includes steps 102 to 112. The method 100 is executed by an apparatus, described in detail, for example, in FIG. 2.
The present disclosure provides the method 100 for enhancing an image, wherein the method 100 comprises:
(i) generating an input image by concatenating a raw input image with a sharpened attention map or with a depth map;
(ii) generating a bottleneck feature by encoding the input image using an encoder;
(iii) injecting a perturbation vector into the bottleneck feature;
(iv) feeding the bottleneck feature, with the perturbation vector injected, to an image generator;
(v) generating an enhanced image from the bottleneck feature and the perturbation vector at the image generator; and
(vi) at a discriminator, receiving the enhanced image and a randomly chosen clear image from the dataset of clear images, and determining a score of image enhancement based on a difference between the enhanced image and the randomly chosen clear image.
The present disclosure provides the method 100 for enhancing an image. The method 100 is based on a generative adversarial network (GAN). The generative adversarial network may be configured to generate multiple output images with respect to an input image by use of an image generator. The generated multiple output images manifest improved visual quality and provide useful image features which are required in order to make a perception in real time or near real-time. The generative adversarial network may be further configured to include a discriminator which supports the image generator to generate realistic outputs.
At step 102, the method 100 comprises generating an input image by concatenating a raw input image with a sharpened attention map or with a depth map. The raw input image corresponds to an image captured in one of a hazy, foggy, sunny, rainy, dark, snowy environment or other adverse weather or illumination condition. The raw input image may also be referred to as a degraded image because the raw input image does not reveal useful image features suitable for a practical application or purpose. For example, the raw input image may be captured by a camera mounted on an autonomous vehicle, which may not reveal features that are required to lead a safe autonomous driving in different weather and illumination conditions. In another example, the raw input image including one or more objects may be captured by a hand-held device, such as a smartphone, in a low -light and rainy environment, in which the captured one or more objects may be unclear. Therefore, in order to obtain useful features, such as object shape(s) and edges, etc., from the raw input image, the raw input image is processed using the sharpened attention map or the depth map which results into the input image, which is input for an encoder in next operation. The input
image manifests improved features, such as an improved visual quality, object shape(s) and edges details in comparison to the raw input image. The sharpened attention map may be defined as a scalar matrix that represents a relative importance of multiple layer activations at different two dimensional (2D) spatial locations with respect to a target task (e.g., an output clear image). The depth map may be defined as an image or image channel that provides an information relating to a distance of a surface(s) of scene objects from a viewpoint. In an implementation, the sharpened attention map or the depth map may be obtained either from a pretrained model or jointly trained within a network in an unsupervised manner.
At step 104, the method 100 further comprises generating a bottleneck feature by encoding the input image using an encoder. The input image is sent to the encoder that results to generation of the bottleneck feature. The bottleneck feature refers to an encoded feature map extracted from the input image, which has smaller spatial size but greater number of channels than the input image. The encoder is described in detail, for example, in FIG. 2.
At step 106, the method 100 further comprises injecting a perturbation vector into the bottleneck feature. By virtue of injecting the perturbation vector into the bottleneck feature, multimodal image outputs may be generated. The multimodal image outputs are generated by altering an output image appearance.
In accordance with an embodiment, the perturbation vector is sampled from a Gaussian distribution. In an implementation, the perturbation vector corresponds to a six-dimensional perturbation vector sampled from the Gaussian distribution. Generally, the Gaussian distribution may be defined as a bell-shaped curve that follow a normal distribution with an equal number of measurements above and below a mean value. The perturbation vector is up-sampled by use of a multi-layer perceptron (MLP) network. The MLP network is described in detail, for example, in FIG. 3A. The up-sampled perturbation vector is injected into the bottleneck feature by use of an adaptive instance normalization (Adaln) approach. The adaptive instance normalization (Adaln) approach induces an impact on an image generation phase and produces multimodal output images. The adaptive instance normalization (Adaln) approach is described in more detail, for example, in FIG. 3A.
In accordance with an embodiment, the perturbation vector is updated based on pretrained network weights. During training, perturbation vector has no predetermined networks weights, and is a random vector sampled from Gaussian distribution during each training
step. However, once the training is completed, the perturbation vector is updated based on the pretrained (i.e., fixed) network weights, followed by adjustment (or fine-tuning) of the perturbation vector for searching an optimal solution, i.e., the improved output image. The pretrained network weights are based on the pretrained encoder, the image generator as well as discriminator network weights. Once the training is completed, such network weights are fixed, and then the fine-tuning of the perturbation vector is carried out.
In accordance with an embodiment, the perturbation vector is adjusted, using gradient descent, by reducing an adversarial loss, a Frechet Inception Distance score and a Structural Similarity Index score. The perturbation vector is randomly initialized but then adjusted using gradient descent by minimizing various losses, such as the adversarial loss, the Frechet Inception Distance (FID) score and the Structural Similarity Index (SSI) score. Generally, the adversarial loss may be defined as a difference between a ground truth data (or original data or a source data) and a generated data computed by use of a generative adversarial network (GAN). The FID score may be defined as a metric that calculates a distance between feature vectors calculated for real and generated images with help of a pretrained Inception Network. The SSI score may also be referred to as a structural similarity index measure (SSIM). The SSIM may be defined as a method for predicting a perceived quality of digital television and cinematic pictures, as well as other kinds of digital images and videos. The SSIM may be used for measuring similarity between two images. The updated perturbation vector (or a resulting perturbation vector) finally converges to a point where the updated perturbation vector may cooperate with a decoder (or an image generator) in order to generate a visually pleasing output image. The updated perturbation vector (or the resulting perturbation vector) can be applied to one or more images (e.g., test images).
At step 108, the method 100 further comprises feeding the bottleneck feature, with the perturbation vector injected, to an image generator. The bottleneck feature with the perturbation vector injected is further fed to the image generator. The image generator may be configured to generate multiple output images which provides an improved visual quality and useful image features which are required in order to make a perception in real-time or near real-time. At step 110, the method 100 further comprises generating an enhanced image from the bottleneck feature and the perturbation vector at the image generator. In an implementation, the image generator may be configured to produce the enhanced image (i.e., the output image) at each iteration by use of the bottleneck feature with the perturbation
vector injected. The use of a different perturbation vector in each iteration leads to a change in various aspects, such as an illumination or a brightness control of the enhanced image. In this way, a gaussian solution space for altering appearance of the enhanced image is created and it is made possible to control the image enhancement outputs in a testing phase. Additionally, the enhanced image may be sent back to the encoder to force the encoder to produce the same encoded feature (e.g., another similar bottleneck feature) as the raw input image. Therefore, there is a LI norm feature reconstruction loss between those two encoded features (e.g., two bottleneck features).
At step 112, the method 100 further comprises, at a discriminator, receiving the enhanced image and a randomly chosen clear image from the dataset of clear images, and determining a score of image enhancement based on a comparison of the enhanced image with the randomly chosen clear image. The comparison is in terms of realism of the enhanced image with respect to the randomly chosen clear image. Based on the received enhanced image and the randomly chosen clear image from the dataset of clear images, the discriminator is configured to determine that whether the enhanced image is a fake image or a real clear image.
In accordance with an embodiment, the discriminator is a gradient-based multi-patch discriminator. The gradient-based multi-patch discriminator includes multiple network branches and hence, provides an improved output image quality.
In accordance with an embodiment, the method 100 further comprises feeding the discriminated score back to the encoder and the image generator. By virtue of feeding the discriminated score back to the encoder and the image generator, the image quality gradually improves. Besides, a perceptual loss between the raw input image and the enhanced image is computed based on a pretrained Convolutional Neural Network (CNN), such as a VGG neural network, to preserve structure in feature level.
In accordance with an embodiment, the discriminator comprises at least the following three network branches:
(a) a first network branch acquiring a gaussian blurred generator output,
(b) a second network branch acquiring an identity image, and
(c) a third network branch acquiring a Laplacian of gaussian blurred generator output,
wherein a result from the discriminator is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
The gaussian blurred generator output is responsible for bringing an illumination distribution in an output image that is closer to a target data (i.e., the enhanced image) distribution. The identity image acquired by the second network branch is similar to an image generated by a standard discriminator (e.g., Naive discriminator). The Laplacian of Gaussian (LoG) blurred generator output is responsible for generating sharper edges in the output image that is closer to the target data (i.e., the enhanced image) distribution. The LoG may be defined as a two- dimensional isotropic measure of a second spatial derivative of an image. Alternatively stated, the LoG highlight regions of rapid intensity change in an image and is often adopted for edge detection. The summation of the generated outputs, from the three network branches after each convolutional layer, is obtained from the discriminator in order to make a prediction of real or fake for different patches in the output image. In this way, the discriminator provides an improved output image quality.
Thus, the method 100 generates a controllable solution space (or a set of multiple output images) for image enhancement instead of producing only one output image without providing its optimality. By considering an amount of uncertainty in the raw input image, it is more reasonable to create the controllable solution space, and allowing search for an optimal solution (e.g., improved and enhanced output image). Therefore, the method 100 provides multiple output images and makes possible to work with an optimal output image, hence, manifests an improved reliability and efficiency. The method 100 uses the sharpened attention map or the depth map and hence, provides a better guidance in order to produce clearer and brighter output images. Moreover, the method 100 performs multiple image enhancement tasks, such as image dehazing, image deblurring, and/or low light enhancement, for holistic and overall enhancement of an image captured in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment. For instance, in one exemplary practical application, the method 100 makes various features of such enhanced images very revealing, which are useful to make a perception, even in different weather and illumination conditions in order to promote safe autonomous driving. Further, the method 100 reduces processing complexity and improves quality of the output images.
The steps 102 to 112 are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
FIG. 2 is a block diagram that illustrates various exemplary components of an apparatus, in accordance with an embodiment of the present disclosure. With reference to FIG. 2, there is shown a block diagram 200 of an apparatus 202. The apparatus 202 comprises a sharpened attention or a depth map 204, an encoder 206, an image generator 208, a memory 210, a discriminator 212 and a processor 214.
The apparatus 202 includes suitable logic, circuitry, interfaces and/or code that is configured for enhancing an image. The apparatus 202 is configured to execute the method 100 (of FIG. 1). Examples of the apparatus 202 include, but are not limited to, a hand-held device or an electronic device or component which can be mounted on a vehicle (e.g., an autonomous vehicle or a semi-autonomous vehicle), a mobile device, a portable device, and the like, which is operable to perform the method 100.
The sharpened attention map or depth map 204 is configured to provide a better guidance in order to produce an output image(s) with higher visual quality in comparison to a conventional scattering model. The sharpened attention map or depth map 204 may be a software program or a mathematical expression or an application that may be installed in the apparatus 202.
The encoder 206 (also represented as Enc) includes suitable logic, circuitry, interfaces and/or code that may be defined as a network (e.g., a convolutional neural network (CNN), or a recurrent neural network (RNN), etc.) that takes an input data (e.g., an image), and provides an output data (e.g., an output image) in terms of a feature map or a vector or a tensor which represent a latent information of the input data. Examples of the encoder 206 include, but are not limited to, a recursive neural network, a feed-forward neural network, a deep-belief network, and a convolutional deep-belief network, a self-organizing map, a deep Boltzmann machine, and a stacked de-noising auto-encoder and the like.
The image generator 208 (also represented as G) includes suitable logic, circuitry, interfaces and/or code that is configured to generate one or more enhanced clear output images that are close to a real data distribution. In an implementation, the image generator 208 may also be
defined as a network which is configured to either reconstruct the input data (i.e., the input image) from the feature map or change the feature map to a different but related representation. The image generator 208 may also be referred to as a decoder.
The memory 210 includes suitable logic, circuitry, or interfaces that is configured to store the instructions executable by the processor 214. The memory 210 may also be configured to store a dataset of clear images. Examples of implementation of the memory 210 may include, but are not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), or CPU cache memory. The memory 210 may store an operating system or other program products (including one or more operation algorithms) to operate the apparatus 202.
The discriminator 212 (also represented as D) includes suitable logic, circuitry, interfaces and/or code that is configured to determine a score of image enhancement based on a comparison of the one or more enhanced clear output images received from the image generator 208 with a randomly chosen clear image from the data set of clear images stored in the memory 210. In other words, a realism of the one or more enhanced clear output images is checked or compared with respect to the randomly chosen clear image.
The processor 214 includes suitable logic, circuitry, interfaces and/or code that is configured to execute the instructions stored in the memory 210. In an example, the processor 214 may be a general-purpose processor. Other examples of the processor 214 may include, but is not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application- specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a central processing unit (CPU), a state machine, a data processing unit, and other processors or control circuitry. Moreover, the processor 214 may refer to one or more individual processors, processing devices, a processing unit that is part of a machine, such as the apparatus 202.
In operation, the apparatus 202 for enhancing an image, wherein the apparatus 202 is configured to generate an input image by concatenating a raw input image with the sharpened attention map or with the depth map 204. The apparatus 202 is further configured to generate a bottleneck feature by encoding the input image using the encoder 206. The apparatus 202 is further configured to inject a perturbation vector into the bottleneck feature. The apparatus
202 is further configured to feed the bottleneck feature, with the perturbation vector injected, to the image generator 208. The apparatus 202 is further configured to generate an enhanced image from the bottleneck feature and the perturbation vector at the image generator 208. The apparatus 202 is further configured to, at the discriminator 212, receive the enhanced image and a randomly chosen clear image from the dataset of clear images, and determine a score of image enhancement based on a difference between the enhanced image and the randomly chosen clear image.
In accordance with an embodiment, the perturbation vector is sampled from a Gaussian distribution. In an implementation, the perturbation vector is a six-dimensional perturbation vector sampled from the Gaussian distribution.
In accordance with an embodiment, the perturbation vector is updated based on pretrained network weights. In another implementation, the perturbation vector works with the pretrained network weights of the encoder 206 or the image generator 208. In such an implementation, the apparatus 202 is configured to start processing with a random perturbation vector.
In accordance with an embodiment, the perturbation vector is adjusted, using gradient descent by reducing adversarial loss, Frechet Inception Distance score or Structural Similarity Index score. In case of the random perturbation vector, values of the random perturbation vector are updated using the gradient descent by minimizing various losses, such as the adversarial loss, the Frechet Inception Distance (FID) score and the Structural Similarity Index (SSI) score.
In accordance with an embodiment, the discriminator 212 is a gradient-based multi-patch discriminator. The discriminator 212 (or gradient-based multi-patch discriminator) includes multiple network branches and hence, provides an improved output image quality.
In accordance with an embodiment, the discriminator 212 comprises three network branches:
(a) a first network branch acquiring a gaussian blurred generator output
(b) a second network branch acquiring an identity image; and
(c) a third network branch acquiring the Laplacian of Gaussian of the generated output, wherein a result from the discriminator 212 is obtained after a summation of the generated outputs from the three branches after each convolutional layer. By use of the first network
branch, the second network branch and the third network branch, the discriminator 212 is configured to make a prediction of real or fake for different patches in the output image.
In accordance with an embodiment, a computer program comprising program code which when executed by a computer causes the computer to perform the method 100. Examples of the computer include, but is not limited to, a laptop computer, an electronic control unit (ECU) in a vehicle, or an onboard computer of a vehicle, a desktop computer, a mainframe computer, a hand-held computer, the processor 214, and other computing devices.
Thus, the apparatus 202 generates a controllable solution space (or a set of multiple output images) for image enhancement instead of producing only one output image without providing its optimality. By considering an amount of uncertainty in the raw input image, it is more reasonable to create the controllable solution space, and allowing search for an optimal solution (e.g., improved and enhanced output image). Therefore, the apparatus 202 provides multiple output images and makes possible to work with an optimal output image, hence, manifests an improved reliability and efficiency. The apparatus 202 uses the sharpened attention map or the depth map and hence, provides a better guidance in order to produce clearer and brighter output images. Moreover, the apparatus 202 performs multiple image enhancement tasks, such as image dehazing, image deblurring, and/or low light enhancement, for holistic and overall enhancement of an image captured in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment. For instance, in one exemplary practical application, the apparatus 202 makes various features of such enhanced images very revealing, which are useful to make a perception, even in different weather and illumination conditions in order to promote safe autonomous driving.
FIG. 3A is a schematic representation of learning (or training) an image dehazing model, in accordance with an embodiment of the present disclosure. FIG. 3A is described in conjunction with elements from FIGs. 1 and 2. With reference to FIG. 3A, there is shown a schematic representation 300A that includes a raw input image 302, a densely connected block 304, a bottleneck feature 306, a perturbation vector 308, a multi-layer perceptron (MFP) network 310, a residual block 312, an output image (i.e., an enhanced clear output image 314), another bottleneck feature 316, and a convolutional network 318.
The raw input image 302 (also represented as Xinput) corresponds to an image captured in a hazy environment during autonomous driving. Therefore, the raw input image 302 (i.e.,
Xinput) may be referred to as a degraded image that does not reveal a number of parameters or features which are useful to lead a safe autonomous driving. In the schematic representation 300A, the raw input image 302 (e.g., a hazy input image) is considered for enhancement. However, the schematic representation 300A is equally applicable to an image that is either blurry or captured in foggy, sunny, rainy, dark, or snowy environment.
The sharpened attention map or the depth map 204 corresponds to a sharpened attention or depth estimation model which is configured to process the raw input image 302 (i.e., Xinput) and provide an input image (e.g., a sharpened image). The input image (i.e., the sharpened image) manifests a high visual quality, and hence provides better object shapes and edge(s) information present in an image. Thereafter, the input image (i.e., the sharpened image) is encoded by use of the encoder 206 (i.e., Enc).
The densely connected block 304 (also represented as DenseBlk) may be referred to as one or more convolutional layers in a convolutional neural network (CNN) which is used to utilize the bottleneck feature received from the encoder 206 (i.e., Enc) for training for image classification. The one or more convolutional layers are connected (or convolved) with each other in a feed-forward fashion either through multiplication or by dot product. The densely connected block 304 (i.e., DenseBlk) are fully-connected layers usually used for image classification tasks.
The bottleneck feature 306 (also represented as C) refers to an encoded feature map of the input image (encoded by the encoder 206), which has smaller spatial size but greater number of channels than the input image. The bottleneck feature 306 (i.e., C) may also be referred to as a bottleneck content feature that represents the content feature of the input image in encoded form.
The perturbation vector 308 refers to a six-dimensional perturbation vector. In the schematic representation 300A, the perturbation vector 308 is sampled from Gaussian distribution. The perturbation vector 308 is up-sampled by use of a multi-layer perceptron (MLP) network 310. The MLP network 310 is defined as a class of feed forward artificial neural network (ANN). The MLP network 310 includes multiple neural network layers (e.g., an input layer, an output layer and one or more hidden layers) of non-linearly activating nodes. The MLP network 310 is a fully connected network therefore, each node in one layer is connected with a certain weight to every other node in another layer. The perturbation vector 308 (i.e., the
up-sampled perturbation vector) is injected into the bottleneck feature by use of an adaptive instance normalization (Adaln) approach. The Adaln approach induces an impact on an image generation phase and produces multimodal output images. The Adaln approach is generally used for image style transfer and image generation tasks to alter the appearance of the raw input image 302 (i.e., Xinput).
The residual block 312 (also represented as ResBlk) includes two convolutional layers (e.g., an input layer and an output layer) with a skip connection between the two convolutional layers which allows identity mapping. The residual block 312 (i.e., ResBlk) is configured to prevent gradient vanishing and provide better feature representations of the bottleneck feature.
Thereafter, the bottleneck feature with the perturbation vector 308 (i.e., the up-sampled perturbation vector) injected, is fed to the image generator 208 (i.e., G). The image generator 208 (i.e., G) is configured to generate an enhanced clear output image 314 (also represented as Xciear) from the bottleneck feature with the up-sampled perturbation vector 308 injected, in each training iteration, and adding the enhanced clear output image 314 (i.e., Xciear) to a dataset of clear images stored in the memory 210.
The other bottleneck feature 316 (also represented as C’) is obtained by encoding the enhanced clear output image 314 (i.e., Xciear) by use of the encoder 206 (i.e., Enc) followed by processing the enhanced clear output image 314 (i.e., Xciear) through the sharpened attention map or the depth map 204. The other bottleneck feature 316 (i.e., C’) is configured to be consistent with the bottleneck feature 306 (i.e., C) by a LI norm loss.
Further, the discriminator 212 (i.e., D) is configured to receive the enhanced clear output image 314 (i.e., Xciear) from the image generator 208 (i.e., G) and a randomly chosen clear image from the dataset of clear images stored in the memory 210. The discriminator 212 (i.e., D) is further configured to determine a score of image enhancement based on a comparison between the enhanced clear output image 314 (i.e., Xciear) and the randomly chosen clear image.
The convolutional network 318 corresponds to a VGG network (e.g., a VGG16, or a VGG 19) which is used for image classification and detection of image features. Moreover, the raw input image 302 (i.e., Xinput) and the enhanced clear output image 314 (i.e., Xciear) are
configured to be consistent with respect to a perceptual loss (i.e., Lpercep) by use of the pre trained convolutional network, and hence, preserving the image structure in feature level. As shown, the convolutional network 318 is independent from the whole framework, and is only used to add an extra loss to the whole framework.
In operation, the raw input image 302 (i.e., Xinput) is concatenated with the sharpened attention map or the depth map 204 and communicated to the encoder 206 (i.e., Enc) and the densely connected block 304 (i.e., DenseBlk) to obtain the bottleneck feature 306 (i.e., C). The perturbation vector 308 is up-sampled with the MLP network 310 and integrated with the bottleneck feature 306 (i.e., C) via the Adaln approach which is used at the MLP network 310. After integrating the perturbation vector 308 with the bottleneck feature 306 (i.e., C), the bottleneck feature 306 (i.e., C) is fed to the image generator 208 (i.e., G) via the residual block 312 (i.e., ResBlk). The image generator 208 (i.e., G) is configured to generate the enhanced clear output image 314 (i.e., Xciear) at each iteration. In each iteration of training (or learning) of the image dehazing model, a few parameters, such as illumination or brightness contrast of the enhanced clear output image 314 (i.e., Xciear) will change according to the different perturbation vector 308 sampled from the Gaussian distribution. The different perturbation vector 308 leads to the creation of a Gaussian solution space of multiple output images for image enhancement instead of only one output image and makes it possible to work with an optimal output image (such as the enhanced clear output image 314). At each iteration of training, the enhanced clear output image 314 (i.e., Xciear) is fed to the discriminator 212 (i.e., D) together with a randomly chosen clear image from a dataset of clear images to determine whether the enhanced clear output image 314 is a fake image or a real clear image. Thereafter, the image generator 208 (i.e., G) is configured to improve itself by minimizing adversarial losses of the discriminator 212 (i.e., D). Moreover, the enhanced clear output image 314 (i.e., Xciear) is further concatenated with the sharpened attention map or the depth map 204 and encoded by use of the encoder 206 (i.e., Enc) and the densely connected block 304 (i.e., DenseBlk) to obtain the other bottleneck feature 316 (i.e., C’). The other bottleneck feature 316 (i.e., C’) is configured to be consistent with the bottleneck feature 306 (i.e., C) by a LI norm loss (e.g., Crecon). The raw input image 302 (i.e., Xinput) and the enhanced clear output image 314 (i.e., Xciear) are forced to be consistent with respect to a perceptual loss (e.g., Lpercep) generated by the convolutional network 318 (i.e., VGG) in
order to preserve the features of the raw input image 302 (i.e., Xinput) in the enhanced clear output image 314 (i.e., Xciear).
After training of the image dehazing model by executing multiple iterations, the controllable solution space of output images (or enhanced images) is generated and searched for an optimal output image from the controllable solution space of output images (or enhanced images). In order to fine-tune the optimal output image, two approaches are used, which are described in detail, for example, in FIG. 3F.
In this embodiment, the schematic representation 300A is used for learning (or training) of the image dehazing model. In another embodiment, the schematic representation 300A may also be used for image deblurring or low light enhancement or for enhancement of an image captured in foggy, sunny, rainy, dark or snowy environments.
FIG. 3B is a schematic representation of an encoder, in accordance with an embodiment of the present disclosure. FIG. 3B is described in conjunction with elements from FIGs. 1, 2, and 3 A. With reference to FIG. 3B, there is shown a schematic representation 300B of the encoder 206 (of FIG. 2). The encoder 206 includes an inception residual block 320. The inception residual block 320 includes a plurality of lxl convolutional blocks 320A and a plurality of 3x3 convolutional blocks 320B.
The inception residual block 320 may be defined as a convolutional block that combines multiple convolutional branches, being able to capture various features at different patch sizes of an image (such as the bottleneck feature used in the schematic representation 300A).
The encoder 206 includes the inception residual block 320 in contrast to a conventional residual block and hence, provides an improved encoded feature map which incorporates both local and global image contents. The improved encoded feature map is obtained due to the plurality of lxl convolutional blocks 320A and the plurality of 3x3 convolutional blocks 320B comprised by the inception residual block 320 of the encoder 206.
FIG. 3C is a schematic representation of a densely connected block, in accordance with an embodiment of the present disclosure. FIG. 3C is described in conjunction with elements from FIGs. 1, 2, 3A, and 3B. With reference to FIG. 3C, there is shown a schematic
representation 300C of the densely connected block 304 (of FIG. 3A). The densely connected block 304 includes the plurality of 3x3 convolutional blocks 320B.
In order to obtain the bottleneck feature 306 (i.e., C) with more meaningful and robust feature representations of the bottleneck feature received from the encoder 206 (i.e., Enc), the conventional residual block is replaced with the densely connected block 304 (i.e., DenseBlk). In the densely connected block 304 (i.e., DenseBlk), each of the plurality of 3x3 convolutional blocks 320B are connected with each other and hence, provides improved feature representations of the bottleneck feature.
FIG. 3D is a schematic representation of an encoder-decoder structure with Gaussian perturbation vector, in accordance with an embodiment of the present disclosure. FIG. 3D is described in conjunction with elements from FIGs. 1, 2, 3A, 3B, and 3C. With reference to FIG. 3D, there is shown a schematic representation 300D of an encoder-decoder structure with Gaussian perturbation vector 322. The encoder-decoder structure with Gaussian perturbation vector 322 includes the encoder 206 and the image generator 208 (also known as decoder). The encoder-decoder structure with Gaussian perturbation vector 322 further includes the densely connected block 304, the bottleneck feature 306, the perturbation vector 308, the MLP network 310, and the residual block 312.
In contrast to a conventional encoder-decoder structure which includes the conventional residual block and provides only one output image without any optimality, hence, not preferred. However, the encoder-decoder structure with Gaussian perturbation vector 322 includes the densely connected block 304 (i.e., DenseBlk), the perturbation vector 308, and the MLP network 310 and provides a Gaussian solution space of multiple output images (or multimodal output images) for image enhancement instead of only one output image and makes possible to work with an optimal output image (such as the enhanced clear output image 314).
FIG. 3E is a schematic representation of a discriminator, in accordance with an embodiment of the present disclosure. FIG. 3E is described in conjunction with elements from FIGs. 1, 2, 3A, 3B, 3C, and 3D. With reference to FIG. 3E, there is shown a schematic representation of the discriminator 212 (of FIG. 2). The discriminator 212 (i.e., D) includes a first network branch 212A, a second network branch 212B and a third network branch 212C. The
discriminator 212 (i.e., D) further includes a first convolutional layer 324A, a second convolutional layer 324B, a third convolutional layer 324C and an output image 328.
Each of first convolutional layer 324A, the second convolutional layer 324B, and the third convolutional layer 324C may also be referred to as a convolutional neural network (CNN). Generally, the convolutional neural network (CNN) may be defined as a highly interconnected network of processing elements. Each element is optionally associated with a local memory (i.e., the memory 210) and used for image recognition and processing (such as for processing of the enhanced clear output image 314).
The first network branch 212A is configured to acquire a Gaussian blurred generator output 326A which is responsible for bringing an illumination distribution in the enhanced clear output image 314 that is closer to a target data distribution.
The second network branch 212B is configured to acquire an identity image 326B that is similar to an image generated by a standard discriminator (e.g., Naive discriminator).
The third network branch 212C is configured to acquire a Laplacian of Gaussian (LoG) blurred generator output 326C that is responsible for generating sharper edges in the enhanced clear output image 314 that is closer to the target data distribution.
Thereafter, the output image 328 is obtained from the discriminator 212 (i.e., D) after a summation of the generated outputs (i.e., 326A, 326B, and 326C) from the three network branches (i.e., the first network branch 212A, the second network branch 212B and the third network branch 212C) after each convolutional layer (i.e., the first convolutional layer 324A, the second convolutional layer 324B, and the third convolutional layer 324C). The output image 328 is perceived either as a real image or a fake image based on different patches formed in the enhanced clear output image 314 and the output image 328. The real image means an image which resembles in all image features with respect to an input image such as the enhanced clear output image 314 (may be also referred to as enhanced image). A fake image may be described as an image which may be generated artificially by use of a software tool. In this way, the discriminator 212 provides an improved output image quality in comparison to a conventional discriminator that uses only one network branch to generate an output image, hence, manifests reduced image quality.
FIG. 3F is a schematic representation of fine tuning to obtain an optimal output image, in accordance with an embodiment of the present disclosure. FIG. 3F is described in conjunction with elements from FIGs. 1, 2, 3A, 3B, 3C, 3D, and 3E. With reference to FIG. 3F, there is shown a schematic representation 300F of fine tuning to obtain an optimal output image from a set of multiple clear output images.
In the schematic representation 300F, two approaches may be used to obtain the optimal output image from the set of multiple clear output images which may be generated by use of the schematic representation 300A.
In a first approach, the perturbation vector 308 is sampled from the Gaussian distribution. In such an approach, a grid search is adopted by interpolating the values of every two dimensions in the Gaussian perturbation and checked for an image with the best visual quality. The corresponding perturbation vector is applied to all test images.
In a second approach, the perturbation vector 308 is updated based on pretrained network weights. In such an approach, weights associated with encoder 206, the image generator 208, the densely connected block 304, the bottleneck feature 306, the residual block 312 and the discriminator 212 have fixed value. Thereafter, processing starts with a random perturbation vector and values of the random perturbation vector are updated using the gradient descent by minimizing various losses, such as the adversarial loss, the Frechet Inception Distance (FID) score and the Structural Similarity Index (SSI) score. The updated perturbation vector (or a resulting perturbation vector) finally converges to a point where the updated perturbation vector may cooperate with the image generator 208 in order to generate a visually pleasing output image. The updated perturbation vector (or the resulting perturbation vector) can further be applied to one or more test images.
FIG. 4 is an illustration of an exemplary scenario of implementation of a method and apparatus for image enhancement, in accordance with an embodiment of the present disclosure. FIG. 4 is described in conjunction with elements from FIGs. 1, 2, and 3A-3F. With reference to FIG. 4, there is shown an exemplary scenario 400 that describes a practical application of the disclosed method and apparatus (FIGs. 1 and 2) in autonomous driving. In the exemplary scenario 400, there is shown a vehicle 402 moving along a road portion. There is further shown an electronic component 404 and one or more image-capturing devices, such as an image-capturing device 406 mounted on the vehicle 402. It is to be understood
that the vehicle 402 may include many other known components typically used in an autonomous vehicle, which are omitted here for the sake of brevity. For example, the vehicle 402 may include a battery to power the image-capturing device 406 and the electronic component 404.
In the exemplary scenario 400, the vehicle 402 may be an autonomous or semi-autonomous vehicle. The electronic component 404 may include suitable logic, circuitry, interfaces, and/or code configured to perform multimodal image enhancement that adequately and holistically enhances images captured under different weather and illumination conditions. For instance, the electronic component 404 is configured to process and enhance the images in real-time or near real-time captured by the image-capturing device 406 in different weather and illumination conditions, such as in foggy, sunny, rainy, dark or snowy environment, by performing various image enhancement tasks, such as image dehazing, image deblurring, and low light enhancement in a holistic manner to promote safe autonomous driving. Alternatively stated, the electronic component 404 makes a number of features accurately revealing in such enhanced images, which are useful to make an accurate perception of the real-word environment around the vehicle 40, even in different weather and illumination conditions. This in turn leads to safe autonomous driving of the vehicle 402. Examples of the electronic component 404 include, but are not limited to, an electronic control unit (ECU), an in-vehicle device, onboard computer, or other electronic component in the vehicle 402. The electronic component 404 may correspond to the apparatus 202 (of FIG. 2), where the electronic component 404 is configured to execute the method 100 (of FIG. 1).
The electronic component 404 may be configured to perform various image enhancement tasks, simultaneously, during driving by the vehicle 402. The electronic component 404 may be configured to use the perturbation vector 308 sampled from the Gaussian distribution, which further leads to generation of multiple clear output images with respect to one raw input image (i.e., a degraded image), and hence, an optimal output image (e.g., an enhanced and improved image) can be selected from the multiple clear output images. Therefore, the electronic component 404 provides the optimal output image (i.e., the enhanced and improved image) to make the perception and results into a reliable and safe driving by the vehicle 402. Additionally, the electronic component 404 may be configured to use the
discriminator 212 (i.e., the gradient based multi-patch discriminator) which provides the multiple clear output images with further improved visual quality.
In another implementation scenario, the apparatus 202 may be implemented as a handheld device which is operable to execute the method 100 (of FIG. 1). In an example, the handheld device may be a smartphone, which can adequately process one or more images captured under different weather and illumination conditions using the method 100 to generate enhanced images, such as clear output images which are perceived as high-quality, real, and photo-like images by a human eye. The method 100 provides an ability to the handheld device to make a prediction of "real" or "fake" for different patches in the output image, thereby providing improved output image quality.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the present disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.
Claims
1. A method (100) for enhancing an image, wherein the method (100) comprises:
(i) generating an input image by concatenating a raw input image (302) with a sharpened attention map or with a depth map (204);
(ii) generating a bottleneck feature by encoding the input image using an encoder (206);
(iii) injecting a perturbation vector (308) into the bottleneck feature;
(iv) feeding the bottleneck feature, with the perturbation vector (308) injected, to an image generator (208);
(v) generating an enhanced image from the bottleneck feature and the perturbation vector (308) at the image generator (208); and
(vi) at a discriminator (212), receiving the enhanced image and a randomly chosen clear image from a dataset of clear images, and determining a score of image enhancement based on a difference between the enhanced image and the randomly chosen clear image.
2. The method (100) according to claim 1, wherein the method (100) further comprises feeding the discriminated score back to the encoder (206) and the image generator.
3. The method (100) of claim 1, wherein the perturbation vector (308) is sampled from a Gaussian distribution.
4. The method (100) of claim 1, wherein the perturbation vector (308) is updated based on pretrained network weights.
5. The method (100) of claim 4, wherein the perturbation vector (308) is adjusted, using gradient descent, by reducing an adversarial loss, a Frechet Inception Distance score or a Structural Similarity Index score.
6. The method (100) of claim 1, wherein the discriminator (212) is a gradient-based multi patch discriminator.
7. The method of (100) claim 3, wherein the discriminator (212) comprises at least the following three network branches:
(a) a first network branch (212A) acquiring a gaussian blurred generator output (326A);
(b) a second network branch (212B) acquiring an identity image (326B); and
(c) a third network branch (212C) acquiring a Laplacian of a gaussian blurred generator output (326C), wherein a result from the discriminator (212) is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
8. An apparatus (202) for enhancing an image, wherein the apparatus (202) is configured to:
(i) generate an input image by concatenating a raw input image (302) with a sharpened attention map or with a depth map (204);
(ii) generate a bottleneck feature by encoding the input image using an encoder (206);
(iii) inject a perturbation vector (308) into the bottleneck feature;
(iv) feed the bottleneck feature, with the perturbation vector (308) injected, to an image generator (208);
(v) generate an enhanced image from the bottleneck feature and the perturbation vector (308) at the image generator (208); and
(vi) at a discriminator (212), receive the enhanced image and a randomly chosen clear image from a dataset of clear images, and determine a score of image enhancement based on a comparison of the enhanced image with the randomly chosen clear image.
9. The apparatus (202) of claim 8, wherein the perturbation vector (308) is sampled from a Gaussian distribution.
10. The apparatus (202) of claim 8, wherein the perturbation vector (308) is updated based on pretrained network weights.
11. The apparatus (202) of claim 10, wherein the perturbation vector (308) is adjusted, using gradient descent by reducing adversarial loss, Frechet Inception Distance score or Structural Similarity Index score.
12. The apparatus of (202) claim 8, wherein the discriminator (212) is a gradient-based multi patch discriminator.
13. The apparatus of (202) claim 8, wherein the discriminator (212) comprises three network branches,
(a) a first network branch (212A) acquiring a gaussian blurred generator output (326A);
(b) a second network branch (212B) acquiring an identity image (326B); and (c) a third network branch (212C) acquiring the Laplacian of Gaussian of the generated output (326C), wherein a result from the discriminator (212) is obtained after a summation of the generated outputs from the three branches after each convolutional layer.
14. A computer program comprising program code which when executed by a computer causes the computer to perform the method (100) of claim 1.
15. An electronic component (404) to be mounted to a vehicle (402); the electronic component (404) operable to perform the method (100) of claim 1.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21707209.9A EP4232944A1 (en) | 2021-02-19 | 2021-02-19 | Method and apparatus for image enhancement |
CN202180079713.XA CN116547696A (en) | 2021-02-19 | 2021-02-19 | Image enhancement method and device |
PCT/EP2021/054088 WO2022174908A1 (en) | 2021-02-19 | 2021-02-19 | Method and apparatus for image enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/054088 WO2022174908A1 (en) | 2021-02-19 | 2021-02-19 | Method and apparatus for image enhancement |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022174908A1 true WO2022174908A1 (en) | 2022-08-25 |
Family
ID=74673209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2021/054088 WO2022174908A1 (en) | 2021-02-19 | 2021-02-19 | Method and apparatus for image enhancement |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4232944A1 (en) |
CN (1) | CN116547696A (en) |
WO (1) | WO2022174908A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117893413A (en) * | 2024-03-15 | 2024-04-16 | 博创联动科技股份有限公司 | Vehicle-mounted terminal man-machine interaction method based on image enhancement |
DE102023114868A1 (en) | 2023-06-06 | 2024-12-12 | Connaught Electronics Ltd. | Preprocessing of input data for use by a computer vision algorithm and at least partially automatic driving of a motor vehicle |
-
2021
- 2021-02-19 CN CN202180079713.XA patent/CN116547696A/en active Pending
- 2021-02-19 EP EP21707209.9A patent/EP4232944A1/en active Pending
- 2021-02-19 WO PCT/EP2021/054088 patent/WO2022174908A1/en active Application Filing
Non-Patent Citations (2)
Title |
---|
SU YAN ZHAO ET AL: "Prior guided conditional generative adversarial network for single image dehazing", NEUROCOMPUTING, ELSEVIER, AMSTERDAM, NL, vol. 423, 2 November 2020 (2020-11-02), pages 620 - 638, XP086401078, ISSN: 0925-2312, [retrieved on 20201102], DOI: 10.1016/J.NEUCOM.2020.10.061 * |
YU MINGZHAO ET AL: "Ensemble Dehazing Networks for Non-homogeneous Haze", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), IEEE, 14 June 2020 (2020-06-14), pages 1832 - 1841, XP033799165, DOI: 10.1109/CVPRW50498.2020.00233 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102023114868A1 (en) | 2023-06-06 | 2024-12-12 | Connaught Electronics Ltd. | Preprocessing of input data for use by a computer vision algorithm and at least partially automatic driving of a motor vehicle |
CN117893413A (en) * | 2024-03-15 | 2024-04-16 | 博创联动科技股份有限公司 | Vehicle-mounted terminal man-machine interaction method based on image enhancement |
CN117893413B (en) * | 2024-03-15 | 2024-06-11 | 博创联动科技股份有限公司 | Vehicle-mounted terminal man-machine interaction method based on image enhancement |
Also Published As
Publication number | Publication date |
---|---|
CN116547696A (en) | 2023-08-04 |
EP4232944A1 (en) | 2023-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | Semantic-supervised infrared and visible image fusion via a dual-discriminator generative adversarial network | |
Kwon et al. | Predicting future frames using retrospective cycle gan | |
EP3298576B1 (en) | Training a neural network | |
Li et al. | Improved point-voxel region convolutional neural network: 3D object detectors for autonomous driving | |
Singh et al. | Single image dehazing for a variety of haze scenarios using back projected pyramid network | |
US11222237B2 (en) | Reinforcement learning model for labeling spatial relationships between images | |
CN108875487B (en) | Training of pedestrian re-recognition network and pedestrian re-recognition based on training | |
WO2021063476A1 (en) | Method for training a generative adversarial network, modified image generation module and system for detecting features in an image | |
Biasotti et al. | SHREC’14 track: Retrieval and classification on textured 3D models | |
CN111027576A (en) | Cooperative significance detection method based on cooperative significance generation type countermeasure network | |
CN114519853B (en) | Three-dimensional target detection method and system based on multi-mode fusion | |
WO2022174908A1 (en) | Method and apparatus for image enhancement | |
CN112149526B (en) | Lane line detection method and system based on long-distance information fusion | |
Chowdhury et al. | Automated augmentation with reinforcement learning and GANs for robust identification of traffic signs using front camera images | |
EP2839410B1 (en) | Method for recognizing a visual context of an image and corresponding device | |
Guo et al. | D3-Net: Integrated multi-task convolutional neural network for water surface deblurring, dehazing and object detection | |
CN119131332A (en) | A visual inspection method and related system for complex assembly structural parts | |
Hu et al. | Color image guided locality regularized representation for Kinect depth holes filling | |
Pal et al. | MAML-SR: Self-adaptive super-resolution networks via multi-scale optimized attention-aware meta-learning | |
Zou et al. | Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution | |
Shen et al. | A multi-task approach to face deblurring | |
CN111861878A (en) | Optimizing supervised generation countermeasure networks through latent spatial regularization | |
CN113723469A (en) | Interpretable hyperspectral image classification method and device based on space-spectrum combined network | |
CN115587297A (en) | Method, apparatus, device and medium for constructing image recognition model and image recognition | |
EP4358015B1 (en) | Method of and apparatus for processing digital image data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21707209 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180079713.X Country of ref document: CN |
|
ENP | Entry into the national phase |
Ref document number: 2021707209 Country of ref document: EP Effective date: 20230524 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |