[go: up one dir, main page]

0% found this document useful (0 votes)
6 views6 pages

Paper 3

This document presents a project focused on using Generative Adversarial Networks (GANs) for restoring clarity to motion-blurred images. The model is trained on a lightweight version of the GoPro dataset, achieving a mean PSNR of 29.1644 and SSIM of 0.7459, demonstrating effective image restoration. Future work aims to enhance the model's performance by utilizing larger datasets and exploring different architectures.

Uploaded by

thrishareddy2151
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

Paper 3

This document presents a project focused on using Generative Adversarial Networks (GANs) for restoring clarity to motion-blurred images. The model is trained on a lightweight version of the GoPro dataset, achieving a mean PSNR of 29.1644 and SSIM of 0.7459, demonstrating effective image restoration. Future work aims to enhance the model's performance by utilizing larger datasets and exploring different architectures.

Uploaded by

thrishareddy2151
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Generative Adversarial Network on Motion-Blur Image Restoration

Zhengdong Li
Department of Electronic and Computer Engineering, HKUST
Hong Kong S.A.R, CHINA
zlifd@connect.ust.hk
arXiv:2412.19479v1 [cs.CV] 27 Dec 2024

Abstract

In everyday life, photographs taken with a camera of-


ten suffer from motion blur due to hand vibrations or sud-
den movements. This phenomenon can significantly detract
from the quality of the images captured, making it an inter-
esting challenge to develop a deep learning model that uti-
lizes the principles of adversarial networks to restore clar-
ity to these blurred pixels. In this project, we will focus
on leveraging Generative Adversarial Networks (GANs) to
effectively deblur images affected by motion blur. A GAN- Figure 1. The working flow for Generator and Discriminator in
based Tensorflow model is defined, training and evaluating GAN with GoPro dataset
by GoPro dataset which comprises paired street view im-
ages featuring both clear and blurred versions. This adver-
sarial training process between Discriminator and Genera- over time.
tor helps to produce increasingly realistic images over time.
Peak Signal-to-Noise Ratio (PSNR) and Structural Similar- 2. Related Work
ity Index Measure (SSIM) are the two evaluation metrics
used to provide quantitative measures of image quality, al- Recently, Kupyn [4] developed a deblur GAN model in
lowing us to evaluate the effectiveness of the deblurring pro- Pytorch framework, but it lacks of details for the archi-
cess. Mean PSNR in 29.1644 and mean SSIM in 0.7459 tectures of both Generator and Discriminator. The size of
with average 4.6921 seconds deblurring time are achieved the model, like the number of parameters, are also missing.
in this project. The blurry pixels are sharper in the output Hence, this project is going to fill in the missing gap of the
of GAN model shows a good image restoration effect in real GAN model in deblurring the motion-blur images. In our
world applications. approach, lightweight version of GoPro dataset was used
to train the model which was developed using Keras from
Tensorflow framework. It provides a pair of both clear and
1. Introduction artificially blur images from street views, involving around
500 images with resolution of 1280x720 each in total. We
The concept of GAN was introduced by Goodfellow [1] will compare the sharpness of the blurry pixels between in-
in 2014. The architecture involves two primary compo- put motion-blur images and output deblurred images to see
nents: the Generator (G) and the Discriminator (D). The the effectiveness of the pre-trained GAN model. Also, we
Generator is responsible for creating synthetic images that will compare our performance of the model in terms of dif-
mimic real ones to fool the Discriminator, while the Dis- ferent evaluation metrics, like PSNR and SSIM to related
criminator’s role is to differentiate between real images and works [4] [5].
the fakes produced by the Generator. Specifically, the Dis-
criminator outputs a value of 1 for real images and 0 for fake 3. Proposed Methods
images. The objective is to minimize the loss of the Gener-
ator while simultaneously maximizing the performance of Fig. 1 illustrates the working mechanism of image de-
the Discriminator. This adversarial training process encour- blurring through GAN in our proposed method. Unlike
ages the Generator to produce increasingly realistic images feed in the random noise as usual in the GAN model [11],
the Generator in our project takes blur images from GoPro
dataset as an input directly and generates a synthetic image N
that closely resembles a real image. Its main objective is to 1 X 2
Lperceptual (ytrue , ypred ) = ∥ϕ(ytrue ) − ϕ(ypred )∥ (2)
deceive the Discriminator into thinking that the generated N i=1
image is real. Conversely, the Discriminator receives the
synthetic images from Generator and its corresponding real where:
and clear image from GoPro dataset, seeking to accurately
identify them. It aims to avoid being misled by the Gener- • ytrue is the ground truth image.
ator and should ideally classify real images as ”True” and • ypred is the generated image.
fake ones as ”False.” This feedback loop provides guidance
for the Generator to enhance its output, producing more re- • ϕ is the feature extraction model (e.g., VGG16 [9]) ap-
alistic images that can trick the Discriminator over time. plied to the images.
In this adversarial setup, the Generator (G) and Discrim-
inator (D) have conflicting goals. The Discriminator at- • N is the number of features in the output of layer
tempts to maximize its output values, leveraging both the ’block3 conv3’.
blurry images z and clear images x, while the Generator • ∥·∥ denotes the Euclidean norm.
works to minimize its own output values. This minimax re-
lationship between the two components is key to improving 4. Experiments
the deblurring effectiveness. The dynamics of this min-max
game can be encapsulated in the Eq. 1. 4.1. Training Process
The training process starts by segmenting the input im-
min max V (D, G) = Ex [log D(x)]+Ez [log (1 − D(G(z)))] age into a 256x256 matrix, which is then fed into the GAN
G D model. The training progresses over the 40 epochs, during
(1) which the data is organized with batch size 16. Both the
where: Generator and Discriminator are trained using a constant
learning rate of 0.005. At the outset, the Generator creates a
• G: The generator function that generates images from
synthetic image, while the Discriminator learns to differen-
blurry images z from GoPro dataset.
tiate between this fake image and the real inputs. This iter-
• D: The discriminator function that distinguishes be- ative process continues to train the entire GAN model. The
tween real and generated images. training persists until all the 500 input images have been uti-
lized. Since our working environment is on MacOS which
• V (D, G): The value function representing the adver- does not have a CUDA GPU for acceleration, so having
sarial game between the G and the D. longer training time and using smaller dataset become in-
evitable. The whole training duration typically spans about
• Ex : The expectation over real images x.
4 hours on a 1.4 GHz Quad-Core Intel Core i5 MacOS with
• D(x): The probability that the discriminator correctly single Intel Iris Plus Graphics 645 GPU IDE.
identifies a real image.
4.2. GAN Architecture
• Ez : The expectation over the blurry images z.
Tab. 2 summaries the size of the proposed GAN model
• G(z): The generated image from the blurry images z. in terms of the number of 2D convolutional layers and the
number of parameters with 30 layers and 14.5M respec-
• D(G(z)): The probability that the discriminator iden- tively. Tab. 1 also summaries the main training parameters
tifies the generated image as real. of the GAN model and compared to Kupyn [4].
In Generator, 9 ResNet [2] block are used for sharper
• log D(x): The log probability of the discriminator cor-
image production. In each ResNet, a sequence of Conv2d
rectly identifying a real image.
Layer, following by a Batch Normalization function and
• log(1−D(G(z))): The log probability of the discrimi- ReLU activation function is used. The dropout rate is set
nator incorrectly identifying a generated image as real. to 0.5. ReLU and Tanh are used as the activation functions
with Kernel Size is either 3x3 or 7x7 and Stride Size is ei-
Also, to ensure that the Generator is deblurring the ther 1x1 or 2x2. Since both input and output of the Genera-
motion-blur images, a perceptual loss function was apply tor is an image, so the dimension is 256x256x3 in both head
at the output of Generator. The perceptual loss function can and tail with multiple ResNet blocks in between as shown
be defined in Eq. 2 as follows: in Fig. 2.
- this work Kupyn [4]
Learning Rate 0.005 0.0001
Epochs 40 300
Batch Size 16 1
Dataset Light GoPro ( 2GB) Full GoPro ( 35GB)
Framework Keras Pytorch

Table 1. Main training parameters in this project compared with


Kupyn [4]
Figure 2. Graphical representation of the architecture of Generator
Type Con2d Layers Params
Discriminator 6 3.10M
Generator 24 11.40M
Total 30 14.5M

Table 2. Structure of the GAN model with the number of Convo-


lutional 2D layers and parameters shown.

(Peak Signal-to-Noise Ratio) quantifies the ratio between


the maximum potential power of a signal and the power of
noise that degrades its fidelity. It is determined using the
logarithm of the ratio of the maximum possible pixel value
to the mean squared error (MSE) between the original and
processed images. A higher PSNR value signifies a better
quality reconstruction.
On the other hand, SSIM (Structural Similarity Index)
evaluates the structural similarity between the original and
processed images. It considers three factors: luminance,
contrast, and structure. SSIM is computed by analyzing the
similarity of these three components across corresponding
patches of the original and processed images. The SSIM
values range from -1 to 1, where a value of 1 indicates per-
fect similarity.
This project achieves an mean PSNR for 29.1644 and
0.7459 for SSIM as shown in Tab. 3. The average deblur-
ring time per single image is around 4.6921 seconds. Com-
pared to our main reference [4] and other related works as
shown in Tab. 4, we have competitive results for PSNR
Figure 3. Graphical representation of the architecture of Discrim- which is just 0.46 lower than [4] even we use lightweight
inator version of GoPro dataset. For SSIM, we still have some
space to improve, where we are numerically 0.16 to 0.208
behind. One of the reasons might due to the usage of
In Discriminator, it outputs 0 for fake image and 1 for lightweight dataset which is only around 2GB. We believe
real image. The activation layers used Sigmoid, LeakReLU that the evaluation metrics, like PSNR and SSIM, can be
and Tanh functions. And the Kernel Size is a constant 4x4 further improved if we a larger dataset, like the original full
matrix and the Stride Size is the same as Generator. For the GoPro dataset which is 35GB in total.
Discriminator, its input is also an image while the output is For graphical representations, the visualization for de-
either ’1’ or ’0’, hence, the dimension is 256x256x3 in head blur performance of the GAN model are shown in Fig. 4,
and 1 in tail as shown in Fig. 3. Fig. 5 and Fig. 6 respectively. Left image is the blurry in-
put and right image is the deblur output. In Fig. 4, it is one
4.3. Results
of the sample image from GoPro dataset in the evaluation
Two metrics, PSNR and SSIM, were utilized to as- part. The training process also use related images, so this
sess the effectiveness of the deblurring process. PSNR type of images has no doubt that having a good deblurred
Metrics Highest Lowest Mean
PSNR 30.7574 26.4420 29.1644
SSIM 0.7995 0.6499 0.7459
Time 5.51 4.1807 4.6921

Table 3. Evaluation results for PSNR, SSIM and mean time for
deblurring on single image in second on the trained GAN model.

- PSNR ↑ SSIM ↑
this work 29.16 0.75
[4] 28.7 0.958
[5] 29.55 0.934
[7] 28.93 0.91

Table 4. Comparison on PSNR and SSIM in GoPro dataset

Figure 5. Sample output with self-proposed blurry images.

future work may include but not limited to, (1) construct a
deeper neural network architecture, like more convolutional
layers and parameters such that the model can better learn
the features during training, (2) train with a larger dataset,
like [3] and [10] which comprises over 70 real world videos
Figure 4. Sample output using blurry image from GoPro dataset
captured in order to generate more training images, (3) work
in validation part.
on CUDA GPU environment to shorten the training time,
(4) explore different architectures such as U-Net [8], which
performance. For example, the edges of the whole building have shown promise in image-to-image translation tasks,
are sharper after deblurred by the GAN model. In Fig. 6, capturing more complex features and improving the qual-
they are the self-proposed motion-blur images in both in- ity of the generated images, and (5) leverage a pre-trained
door and outdoor, which also yield good deblur quality, and model as backbone which has already learned useful fea-
this can be clearly seen in the edges of the wall in the mid- tures from a large dataset can accelerate training and im-
dle part of Fig. 6. However, the deblur performance is not prove performance.
that significant in the self-proposed blurry images when it For our potential application, we aim to utilize a pre-
does not have enough motion-blur pixels from the input im- trained GAN model to deblur motion-blurred images cap-
ages, such as in Fig. 5. One reason could be the input image tured by smartphones. This innovative approach seeks
is not fully motion-blurred, which makes it challenging for to address the common issue of poor-quality images that
the Generator to produce a clear image based on the training arise from hand vibrations, particularly when capturing fast-
from the GoPro dataset. Hence, this is also another reason moving subjects. By leveraging the capabilities of GANs,
why we do not have a higher SSIM in the previous met- we hope to enhance image clarity and restore details that
rics evaluation part since not all the input images are truly are typically lost during motion blur. However, a significant
motion-blur. challenge we face is the difficulty in obtaining authentic
motion-blurred images from smartphones. Many modern
5. Future Work devices are equipped with advanced optical image stabiliza-
tion (OIS) [6] features that effectively mitigate motion blur,
We aim to improve the deblur performance for the GAN resulting in clearer images even in shaky conditions. This
model, yielding higher SSIM and PSNR for evaluation met- technological advancement, while beneficial for everyday
rics and produce sharper images per pixels for for graphical photography, poses a hurdle for our research, as it limits the
representations in the future. Hence, the direction of the availability of suitable training data for our GAN model.
blurred versions. This dataset provided an ideal founda-
tion for our adversarial training approach, which consists
of a Generator and a Discriminator competing with each
other. Through this dynamic interaction, the model progres-
sively enhances the quality of the generated images, ulti-
mately leading to more realistic and visually appealing out-
put. Throughout our experiments, we achieved a notable
mean PSNR of 29.1644 and a mean SSIM of 0.7459, indi-
cating a significant enhancement in image clarity and struc-
tural fidelity compared to the original blurred images. Ad-
ditionally, the average deblurring time of 4.6921 seconds
demonstrates the efficiency of our approach in real-world
applications. The results obtained from our GAN model
highlight the effectiveness of using adversarial training for
image restoration. The output images exhibit sharper details
and improved clarity, showcasing the potential of GANs in
tackling real-world challenges associated with motion blur.
This work not only underscores the advancements in image
processing techniques but also opens avenues for further
research and development in the field of computer vision,
where GANs can be applied to various image enhancement
tasks. Overall, our findings contribute to the growing body
of knowledge surrounding generative models and their prac-
tical applications in restoring image quality.

References
[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and
Yoshua Bengio. Generative adversarial nets. Advances in
neural information processing systems, 27, 2014. 1
Figure 6. Another sample output with self-proposed blurry im- [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
ages. Since input images have more blurry pixels, so the output Deep residual learning for image recognition. In Proceed-
images have better deblurred quality. For instance, the edge of the ings of the IEEE conference on computer vision and pattern
wall is more clear than before in the middle of the figure. recognition, pages 770–778, 2016. 2
[3] Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva
Ramanan, and Simon Lucey. Need for speed: A bench-
As shown in Fig. 5, it is somehow difficult to obtain some mark for higher frame rate object tracking. In Proceedings of
motion-blur images. Bad input quality in blurry images re- the IEEE international conference on computer vision, pages
sults in bad output quality in deblurred images as well. 1125–1134, 2017. 4
To overcome this obstacle, using existing datasets that [4] Orest Kupyn, Volodymyr Budzan, Mykola Mykhailych,
Dmytro Mishkin, and Jiřı́ Matas. Deblurgan: Blind mo-
contain examples of blurred images or simulating motion
tion deblurring using conditional adversarial networks. In
blur in controlled environments could be one of the solu-
Proceedings of the IEEE conference on computer vision and
tions. By addressing these challenges, we aim to develop pattern recognition, pages 8183–8192, 2018. 1, 2, 3, 4
a robust solution that enhances the quality of smartphone [5] Orest Kupyn, Tetiana Martyniuk, Junru Wu, and Zhangyang
photography, ultimately providing users with clearer, more Wang. Deblurgan-v2: Deblurring (orders-of-magnitude)
vibrant images even in less-than-ideal shooting conditions. faster and better. In Proceedings of the IEEE/CVF inter-
national conference on computer vision, pages 8878–8887,
6. Conclusion 2019. 1, 4
[6] Tzuu-Hseng S Li, Ching-Chang Chen, and Yu-Te Su. Opti-
In this project, we focus on GANs to address the chal- cal image stabilizing system using fuzzy sliding-mode con-
lenge of motion blur in images. By utilizing a GAN-based troller for digital cameras. IEEE Transactions on Consumer
model implemented in Keras, we focused on training and Electronics, 58(2):237–245, 2012. 4
evaluating our system using the GoPro dataset, which con- [7] Seungjun Nah, Tae Hyun Kim, and Kyoung Mu Lee. Deep
tains paired street view images that include both clear and multi-scale convolutional neural network for dynamic scene
deblurring. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 3883–3891,
2017. 4
[8] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-
net: Convolutional networks for biomedical image segmen-
tation. In Medical image computing and computer-assisted
intervention–MICCAI 2015: 18th international conference,
Munich, Germany, October 5-9, 2015, proceedings, part III
18, pages 234–241. Springer, 2015. 4
[9] Karen Simonyan. Very deep convolutional networks
for large-scale image recognition. arXiv preprint
arXiv:1409.1556, 2014. 2
[10] Shuochen Su, Mauricio Delbracio, Jue Wang, Guillermo
Sapiro, Wolfgang Heidrich, and Oliver Wang. Deep video
deblurring for hand-held cameras. In Proceedings of the
IEEE conference on computer vision and pattern recogni-
tion, pages 1279–1288, 2017. 4
[11] Linh Duy Tran, Son Minh Nguyen, and Masayuki Arai. Gan-
based noise model for denoising real images. In Proceedings
of the Asian Conference on Computer Vision, 2020. 1

You might also like