[go: up one dir, main page]

0% found this document useful (0 votes)
8 views26 pages

Project Report Draft

This document is a certification template for a Master's report in Computer Science/Computer Application, including sections for declaration, acknowledgments, and a detailed table of contents. It presents a comparative analysis of two super-resolution strategies, ESRGAN and DBPN, focusing on their methodologies and performance metrics such as SSIM and PSNR. The abstract highlights the significance of super-resolution in enhancing image quality across various domains, emphasizing the need for high-resolution images in today's visual data processing landscape.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views26 pages

Project Report Draft

This document is a certification template for a Master's report in Computer Science/Computer Application, including sections for declaration, acknowledgments, and a detailed table of contents. It presents a comparative analysis of two super-resolution strategies, ESRGAN and DBPN, focusing on their methodologies and performance metrics such as SSIM and PSNR. The abstract highlights the significance of super-resolution in enhancing image quality across various domains, emphasizing the need for high-resolution images in today's visual data processing landscape.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CERTIFICATE

This is to certify that the report entitled “…………………………..……….’’

submitted in partial fulfilment of the requirements for the award of the Degree of Master of

Computer Science/Computer Application is a record of bonafide work done by Mr. / Ms.

…………………………………. during the period from ………………………………….to

……………………….. of his / her study in the Department of Computer Application

under my supervision and guidance.

Head of the Department Signature of the Guide

Signature of the Examiner

1
DECLARATION

I ………………………………………………… hereby declare that the report entitled

“……………………………………………..…” in partial fulfillment of the requirements for

the award of the Degree of Master of Computer Science/Computer Application is a record of

bonafide work done by me during the period from … ………… to……….……………..

under the supervision and guidance of ….……………………………………………..

Department of ………………………………………….

Signature of the Candidate

2
ACKNOWLEDGEMENT

I thank God almighty for bestowing his blessings upon me to proceed and complete

the work. I have great pleasure in acknowledging the help given by various individuals

throughout the project work. This project itself is an acknowledgement to the inspiration,

drive and technical assistance contributed by many individuals.

I express my sincere and heartfelt gratitude to Dr.M V JUDY , Head of Department and
faculty members for helpful and co-operative during the period of project.

I also express my deep gratitude to my guide Dr. ARUN K S, for his valuable guidance and
timely suggestions that helped in the completion of this project in time.

A special mention for, coordinator . I extend my sincere thanks to all the teaching and non-
teaching staff for providing the necessary facilities and help.

Without the support of anyone of them this project would not have been a reality.

3
TABLE OF CONTENTS

ABSTRACT………………………………………………………………...…..5

1. INTRODUCTION………………………………………………………….6

2. THEORETICAL BACKGROUND………………………………………7

2.1 MACHINE LEARNING………………………………………………7

2.2 SUPER RESOLUTION………………………………………………..8

3. MODELS COMPARED……………………………………………………9

3.1 ESRGAN………………………………………………………………...9

3.2 DBPN……………………………………………………………………11

4. COMPARISON METHODOLOGY……………………………………...13

4.1 SSIM…………………………………………………………………….13

4.2 PSNR……………………………………………………………………14

5. COMPARISON ……………………………………………………………15

6. SCOPE ……………………………………………………………………..16

7. DISCUSSION AND CONCLUSIONS …………………………………...17

7.1 CONCLUSION ………………………………………………………...17

7.2 FUTURE ENHANCEMENT …………………………………………18

8. REFERENCE ……………………………………………………………..19

APPENDIX ……………………………………………………………………20

4
ABSTRACT
In the era of visual information processing, the demand for high-resolution images
continues to escalate across various domains, including medical imaging, satellite
imagery, and entertainment. Super-resolution strategies play a pivotal role in meeting
this demand by enhancing the quality of images, enabling finer details, and improving
overall visual perception.

This study conducts a comprehensive comparative analysis of two leading super-


resolution strategies, namely Enhanced Super-Resolution Generative Adversarial
Networks (ESRGAN) and Deep Back-Projection Networks (DBPN). The research
explores their effectiveness in addressing the challenges posed by low-resolution
images, providing insights into their capabilities, limitations, and applicability. To
quantitatively assess the performance, the study employs metrics such as Structural
Similarity Index (SSIM) and Peak Signal-to-Noise Ratio (PSNR), providing a
comprehensive evaluation of the strategies in terms of image fidelity and perceptual
quality. ESRGAN leverages generative adversarial networks to generate perceptually
convincing high-resolution images, while DBPN employs an iterative error-correcting
feedback mechanism for guided reconstruction, considering both up- and down-
projection errors. The study investigates their performance across diverse scenarios,
emphasizing the handling of large scaling factors, the mutual LR-HR relationship
through connected up- and down-sampling stages.

5
1. INTRODUCTION

In today's context of visual data processing, the need for high-resolution images has
risen to prominence, spanning various sectors including medical imaging, satellite
surveillance, and entertainment. The need for intricate details and enhanced visual
fidelity is ever-increasing, driven by the desire for more accurate analysis, precise
diagnostics, and an immersive visual experience.

To address this demand, super-resolution strategies have emerged as crucial tools,


offering solutions to enhance the quality of images and bridge the gap between low
and high-resolution domains. These strategies play a pivotal role in transforming
pixelated or degraded images into sharper, more detailed renditions. By doing so, they
contribute significantly to improving overall visual perception, aiding professionals in
fields where image clarity is of utmost importance.

This study delves into the realm of super-resolution strategies, focusing on a


comparative analysis of two leading approaches—Enhanced Super-Resolution
Generative Adversarial Networks (ESRGAN) and Deep Back-Projection Networks
(DBPN). The research aims to shed light on the effectiveness of these strategies in
addressing the challenges associated with low-resolution images. By providing a
nuanced understanding of their capabilities and limitations, the study seeks to guide
practitioners and researchers in selecting the most suitable techniques for specific
application scenarios.

6
2. THEORETICAL BACKGROUND

2.1 MACHINE LEARNING

Machine learning, a subfield of artificial intelligence, empowers computers to learn


from data without explicit programming. This is achieved through statistical and
computational techniques that enable algorithms to extract patterns from data and
make predictions or decisions.

There are three main types of machine learning: supervised learning, unsupervised
learning, and reinforcement learning. Supervised learning involves training a machine
learning model using labelled data. Labelled data consists of data points where the
correct answer is known, such as images with labels identifying the objects they
contain. By identifying patterns in the labelled data, the model can learn to predict
outcomes for new, unlabeled data.

Unsupervised learning involves training a machine learning model using unlabeled


data. Unlike labelled data, unlabeled data does not have any associated labels, making
it more challenging for the model to learn patterns. However, unsupervised learning
algorithms can still be used to identify hidden patterns and relationships in the data.

Reinforcement learning involves training a computer learning model to make


decisions in an environment where it receives feedback in the form of rewards or
penalties. The model learns through trial and error, optimizing its actions to maximize
its reward.

Machine learning has a wide range of applications across various domains, including
natural language processing (NLP), image and video recognition, fraud detection,
recommendation systems, and predictive analytics. As data collection and computing
power continue to grow, machine learning will undoubtedly play an increasingly
crucial role in shaping the future. Its ability to extract insights from vast amounts of
data has the potential to revolutionize various industries and aspects of our lives.

7
2.2 SUPER RESOLUTION

Super-resolution (SR) stands as a machine learning technique dedicated to enhancing


the resolution of low-resolution (LR) images, ultimately producing high-resolution
(HR) counterparts. This intricate process involves leveraging machine learning
algorithms to grasp the intricate relationship between LR and HR images. The two
primary categories of SR are single-image super-resolution (SISR) and multi-image
super-resolution (MISR).

SISR involves the utilization of a lone LR image to generate an HR counterpart,


employing deep learning models for comprehending the LR-HR image relationship.
Noteworthy algorithms in this domain include SRCNN (Super-Resolution
Convolutional Neural Network), FSRCNN (Fast Super-Resolution Convolutional
Neural Network), EDSR (Enhanced Deep Super-Resolution), and SRGAN (Super-
Resolution Generative Adversarial Network).

Conversely, MISR involves multiple LR images in the process, using a registration


step for alignment before employing deep learning models. Notable MISR algorithms
encompass EM (Expectation-Maximization), BM3D (Block-Matching and 3D
Filtering), RCAN (Recursive Convolutional Neural Network), and RDN (Residual
Dense Network).

The applications of SR are extensive, ranging from the medical field, where it
enhances the resolution of images like MRI and CT scans for improved disease
diagnosis, to satellite imaging, natural disaster tracking, and security threat
identification. In the realm of security, SR enhances the resolution of footage,
facilitating law enforcement agencies in criminal identification and crime resolution.

Looking ahead, SR remains a dynamic field of research with continual algorithmic


developments. These advancements increasingly enhance accuracy and efficiency,
pushing the boundaries to generate HR images from LR counterparts that are virtually
indistinguishable from real HR images. The future of SR holds promise as it continues
to evolve, meeting the escalating demands for higher image resolutions across various
domains.

8
3. MODELS COMPARED

3.1 ESRGAN

ESRGAN, or Enhanced Super-Resolution Generative Adversarial Network, is an


advanced deep learning model developed for image super-resolution. It is an
improvement upon the original SRGAN, designed to produce even higher-quality and
more realistic high-resolution images from low-resolution inputs.

SRGAN is a powerful deep learning technique for enhancing the resolution of low-
resolution (LR) images. It is based on the generative adversarial network (GAN)
architecture, which pits two neural networks against each other in a generative and
adversarial process. The SRGAN architecture comprises two primary components: the
generator and the discriminator.

The generator in SRGAN is a deep convolutional neural network responsible for


transforming low-resolution images into high-resolution ones. It typically consists of
multiple layers of convolutional, activation, and normalization layers. Unlike simpler
super-resolution methods that use bicubic interpolation, the generator in SRGAN
learns complex mappings from low to high resolution through training on a dataset of
high and low-resolution image pairs.O ne of the distinctive features of SRGAN's
generator is the use of residual blocks. These blocks allow the network to learn
residual features, making it easier to capture and represent intricate details during the
super-resolution process.

The discriminator in SRGAN is typically a convolutional neural network (CNN)


designed to classify whether an input image is real (high-resolution) or fake
(generated by the generator).It also consists of multiple layers of convolutional,
activation, and normalization layers, similar to the generator.

The input to the discriminator is a high-resolution image, either from the real dataset
or generated by the generator. The discriminator's architecture may include pooling
layers, convolutional layers, and fully connected layers to capture hierarchical features
at different scales. It utilizes a binary classification output, where the network outputs
a probability indicating whether the input image is real or generated.

We improve the SRGAN from three aspects:

1. adopt a deeper model using Residual-in-Residual Dense Block (RRDB)


without batch normalization layers.
2. employ relativistic average GAN instead of the vanilla GAN.
3. improve the perceptual loss by using the features before activation.

9
The refinement of SRGAN involves a paradigm shift towards a more profound
architecture, leveraging the Residual-in-Residual Dense Blocks (RRDB) to enhance
feature extraction for nuanced super-resolution. Simultaneously, the deliberate
exclusion of batch normalization layers is aimed at mitigating potential artifacts,
thereby preserving the high-frequency details in the generated images. The adoption
of the Relativistic Average GAN (RaGAN) introduces a discerning adversarial
training strategy, enabling the discriminator to compare the relative realism of real and
generated samples for improved stability and convergence during training.
Furthermore, the enhancement of perceptual loss by utilizing features before
activation acknowledges the importance of capturing intricate details, emphasizing a
more nuanced and accurate representation of complex visual patterns.

The adversarial training process in ESRGAN involves the generator and discriminator
networks. The generator aims to produce high-resolution images that are realistic and
visually convincing, while the discriminator is trained to distinguish between real
high-resolution images and those generated by the model.

The discriminator network typically uses a binary cross-entropy loss function during
training. This loss function measures the difference between the predicted probability
distribution (output of the discriminator) and the true distribution (real or fake label).
The use of adversarial training introduces a dynamic interplay between the generator's
perceptual loss and the discriminator's binary cross-entropy loss, leading to the
convergence of the model towards generating high-quality, realistic images.

10
3.2 DBPN
Deep Back-Projection Networks (DBPN) represent a novel approach to single-image
super-resolution, deviating from traditional feed-forward architectures. The core idea
involves iterative up- and downsampling layers, incorporating an error feedback
mechanism to rectify projection errors at each stage. Unlike conventional methods that
focus on a one-way mapping from low-resolution (LR) to high-resolution (HR),
DBPN establishes a mutual connection between LR and HR images. This is achieved
through alternating up- and down-sampling stages, each addressing different types of
image degradation and HR components.

In proposing an iterative error-correcting feedback mechanism for Single Image


Super-Resolution (SR), DBPN calculates both up- and down-projection errors. This
iterative process guides the reconstruction to achieve improved results by
characterizing or constraining features in the early layers based on projection errors .
Unlike traditional feed-forward architectures that represent a one-way mapping,
DBPN introduces mutually connected up- and down-sampling stages. Addressing the
limitations of feed-forward approaches, especially in large scaling factors, the network
focuses on generating HR features using up sampling layers and projecting them back
to the LR space using down sampling layers. This mutual connection underscores the
interdependence of LR and HR images.

DBPN's distinctive feature lies in its utilization of deep concatenation. The network
can represent various image degradations and HR components, facilitating the
reconstruction of the HR image. Through deep concatenation of HR feature maps
from all upsampling steps, DBPN directly employs different types of LR-to-HR
features without propagating them through sampling layers. An innovative
improvement comes with dense connections incorporated into each up- and down-
sampling stage. Inspired by the concept of dense connections in neural networks, this
enhancement aims to encourage feature reuse, fostering more effective information
flow and utilization across different network stages. The resulting dense connections
contribute significantly to the overall accuracy of DBPN.

11
The proposed D-DBPN can be divided into three parts: initial feature extraction,
projection, and reconstruction . Here, let conv(f, n) be a convolutional layer, where f is
the filter size and n is the number of filters.

1. Initial feature extraction: We construct initial LR feature-maps L 0 from the input


using conv(3, n0). Then conv(1, nR) is used to reduce the dimension from n0 to nR
before entering projection step where n0 is the number of filters used in the initial LR
features extraction and nR is the number of filters used in each projection unit.

2. Back-projection stages. Following initial feature extraction is a sequence of


projection units, alternating between construction of LR and HR feature maps Ht , L t
; each unit has access to the outputs of all previous units.

3. Reconstruction. Finally, the target HR image is reconstructed as I sr = fRec([H1 ,


H2 , ..., Ht ]), where fRec use conv(3, 3) as reconstruction and [H1 , H2 , ..., Ht ]
refers to the concatenation of the feature-maps produced in each up-projection unit.

12
4. COMPARISON METHODOLOGY
4.1 SSIM
Structural Similarity Index (SSIM) is a metric used to assess the structural similarity
between two images. It is based on the idea that the human visual system perceives
image quality based on three main factors: luminance, contrast, and structure.
SSIM is calculated by comparing the luminance, contrast, and structure of two
images. The luminance term measures the similarity of the average brightness of the
two images. The contrast term measures the similarity of the local contrast of the two
images. The structure term measures the similarity of the textures of the two images.
The SSIM score is a value between -1 and 1, where 1 indicates that the two images are
identical and -1 indicates that the two images are completely different. A score of 0
indicates that the two images are uncorrelated.
The SSIM score is calculated as a weighted combination of the luminance, contrast,
and structure terms. The formula is as follows:
SSIM(x, y) = [l(x, y)]α [c(x, y)]β [s(x, y)]γ
where:
•x and y are the two images being compared
•l(x, y) is the luminance term
•c(x, y) is the contrast term
•s(x, y) is the structure term
•α, β, and γ are constants that control the relative importance of each term
The values of α, β, and γ are typically set to 1, 0.03, and 0.03, respectively. These
values were chosen based on empirical studies that suggest they best align with human
perception of image quality.
The SSIM score is a good measure of image quality because it is based on the human
visual system. It is also robust to noise and distortions, so it is a good measure of
image quality in real-world applications.
SSIM is sensitive to local variations in the image, such as noise or illumination
changes. These local variations can affect the SSIM score, even if they do not
significantly impact the perceived quality of the image.
SSIM is based on a simplified model of human visual perception, and it may not
always align perfectly with subjective human judgments of image quality. There are
certain types of image distortions or artifacts that may not significantly affect the
SSIM score but can still be perceived as detrimental to image quality.

13
4.2 PSNR
The PSNR is defined as the ratio between the maximum possible power of a signal
and the power of the corrupting noise that affects the signal. In the context of image
quality assessment, the signal is the original image and the noise is the difference
between the original image and the reconstructed image. A higher PSNR value
indicates a higher quality reconstructed image.

 R is the maximum possible value of a pixel in the image (typically 255 for 8-bit
images)
 MSE is the mean squared error between the original image and the
reconstructed image
 M and N are the width and height of the image, respectively
 x is the original image
 y is the reconstructed image
PSNR is versatile and applicable to various image processing tasks, including image
compression, image denoising, and image super-resolution. In image compression,
PSNR is used to evaluate the quality of compressed images by comparing them to the
original uncompressed images. A higher PSNR value for the compressed image
indicates that it has retained more of the original image's details and visual quality. In
image denoising, PSNR is used to assess the effectiveness of denoising algorithms by
comparing the denoised image to the original noisy image. A higher PSNR value for
the denoised image indicates that it has successfully removed noise while preserving
the original image's structure and details. In image super-resolution, PSNR is used to
evaluate the quality of super-resolved images by comparing them to the original high-
resolution images.
While PSNR is a widely used metric for assessing image quality, it is not always a
reliable indicator of perceived image quality. Images with different PSNR values may
have the same perceived quality due to factors like the human visual system's ability
to adapt to noise and imperfections. Additionally, PSNR's sensitivity to noise can lead
to overestimation of image quality when noise is present, even if it does not
significantly affect the perceived quality. Furthermore, the relationship between PSNR
and image quality is not always monotonic, meaning that images with higher PSNR
values may not always have better perceived quality.

14
5. COMPARISON RESULTS
In the comparative analysis between ESRGAN and DBPN, the evaluation metrics,
specifically SSIM and PSNR values, favored DBPN over ESRGAN. The quantitative
measures indicated that DBPN exhibited higher values for both SSIM and PSNR,
suggesting superior performance in terms of image quality and fidelity. However, it's
worth noting that subjective assessment by the human eye revealed that ESRGAN
produced results with a more sharpened appearance. This highlights the nuanced
nature of image quality evaluation, where quantitative metrics may align with certain
algorithms, while subjective perceptions by human observers can also play a crucial
role in assessing visual appeal. It is important to emphasize that the perception of
sharpness does not always equate to correctness or optimal visual fidelity. While
ESRGAN may provide a sharpened appearance, the overall judgment of image quality
involves a complex interplay of various factors, including artifacts, details, and
subjective preferences.

15
6. SCOPE
Based on the results obtained, ESRGAN demonstrates notable strengths in scenarios
where a visually sharpened appearance is a priority. Its subjective performance, as
perceived by the human eye, suggests a capacity for producing images with enhanced
sharpness. Therefore, ESRGAN may find particular suitability in applications where a
visually striking and sharp representation is crucial, such as in certain forms of artistic
rendering or visual content where perceptual quality is paramount.

On the other hand, DBPN emerges as a robust choice when prioritizing quantitative
metrics . Its superior performance in these measures indicates a capacity for
maintaining high image quality and fidelity. This makes DBPN well-suited for
applications where precise and quantifiable image quality is essential, such as in
medical imaging or scientific analysis.

In essence, the choice between ESRGAN and DBPN can be tailored to the specific
requirements of the application, whether it prioritizes subjective visual appeal or
demands high precision and fidelity based on quantitative metrics.

16
7. DISCUSSIONS AND CONCLUSIONS
7.1 CONCLUSION
In conclusion, the comparative analysis of ESRGAN and DBPN in the realm of super-
resolution strategies has provided valuable insights into their respective strengths and
weaknesses. The quantitative evaluation, based on SSIM and PSNR metrics, indicated
that DBPN excelled in terms of image quality and fidelity. However, the subjective
assessment by human observers revealed a nuanced aspect—ESRGAN's results were
perceived to have a more sharpened appearance.

This duality underscores the multifaceted nature of image quality assessment, where
quantitative metrics may capture certain aspects of performance, but subjective human
perception plays a pivotal role in determining overall visual appeal. The choice
between ESRGAN and DBPN may depend on specific use cases and preferences,
considering factors such as ease of training and algorithm complexity.

17
7.2 FUTURE ENHANCEMENTS
Future enhancements in super-resolution strategies could explore the integration of
hybrid models that leverage the strengths of both ESRGAN and DBPN. Combining
the perceptual sharpness of ESRGAN with the quantitative fidelity of DBPN could
result in a more versatile and adaptive super-resolution approach. Additionally, further
research could focus on developing real-time implementations of these strategies,
addressing the computational complexity associated with high-quality image
generation.

Continued advancements in deep learning architectures and training methodologies


may also contribute to the refinement of super-resolution strategies. Exploring novel
loss functions, regularization techniques, or attention mechanisms could further
improve the generalization and adaptability of these models across diverse datasets
and applications.

As the field of super-resolution evolves, interdisciplinary collaborations between


experts in computer vision, machine learning, and domain-specific domains could
foster innovative approaches and solutions, pushing the boundaries of what is
achievable in generating high-quality, high-resolution images.

18
8. REFERENCE

https://github.com/xinntao/ESRGAN

https://github.com/alterzero/DBPN-Pytorch

https://alterzero.github.io/projects/DBPN.html

https://www.tensorflow.org/hub/tutorials/image_enhancing

19
APPENDIX
SCREENSHOTS :
Using GPU

Creating directories and loading the dataset

20
ESRGAN

Downloading the pretrained model

Applying the model on the test images

21
DBPN

Downloading the pre trained model

Applying the model on the test images

COMPARISON

22
23
24
COMPARING USING SSIM AND PSNR

25
TABULATION

26

You might also like