[go: up one dir, main page]

0% found this document useful (0 votes)
20 views22 pages

Applsci 14 04568

This article presents a self-supervised learning method for removing show-through effects in color-document images, addressing challenges such as the need for paired datasets and the similarity between foreground and show-through content. The proposed CycleGAN-based color document show-through-removal network (CDSR-CycleGAN) utilizes a two-stage structure with double-cycle consistency loss and pseudo-similarity loss, achieving competitive performance with an average PSNR of 33.85 dB. Extensive experiments validate the effectiveness of the method, making it suitable for various applications in document image processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views22 pages

Applsci 14 04568

This article presents a self-supervised learning method for removing show-through effects in color-document images, addressing challenges such as the need for paired datasets and the similarity between foreground and show-through content. The proposed CycleGAN-based color document show-through-removal network (CDSR-CycleGAN) utilizes a two-stage structure with double-cycle consistency loss and pseudo-similarity loss, achieving competitive performance with an average PSNR of 33.85 dB. Extensive experiments validate the effectiveness of the method, making it suitable for various applications in document image processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

applied

sciences
Article
Removal of Color-Document Image Show-Through Based on
Self-Supervised Learning
Mengying Ni, Zongbao Liang * and Jindong Xu

School of Computer and Control Engineering, Yantai University, Yantai 264005, China;
nimengying@ytu.edu.cn (M.N.); xujindong@ytu.edu.cn (J.X.)
* Correspondence: author: liangzongbao@s.ytu.edu.cn

Abstract: Show-through phenomena have always been a challenging issue in color-document image
processing, which is widely used in various fields such as finance, education, and administration.
Existing methods for processing color-document images face challenges, including dealing with
double-sided documents with show-through effects, accurately distinguishing between foreground
and show-through parts, and addressing the issue of insufficient real image data for supervised
training. To overcome these challenges, this paper proposes a self-supervised-learning-based method
for removing show-through effects in color-document images. The proposed method utilizes a two-
stage-structured show-through-removal network that incorporates a double-cycle consistency loss
and a pseudo-similarity loss to effectively constrain the process of show-through removal. Moreover,
we constructed two datasets consisting of different show-through mixing ratios and conducted
extensive experiments to verify the effectiveness of the proposed method. Experimental results
demonstrate that the proposed method achieves competitive performance compared to state-of-
the-art methods and can effectively perform show-through removal without the need for paired
datasets. Specifically, the proposed method achieves an average PSNR of 33.85 dB on our datasets,
outperforming comparable methods by a margin of 0.89 dB.

Keywords: document image restoration; show-through removal; self-supervised learning; cycle


generative adversarial network
Citation: Ni, M.; Liang, Z.; Xu, J.
Removal of Color-Document Image
Show-Through Based on Self-
Supervised Learning. Appl. Sci. 2024, 1. Introduction
14, 4568. https://doi.org/10.3390/ In the current era of digitization, with the widespread use of electronic devices, the
app14114568 processing of textual data has predominantly shifted towards digital formats. To convert
Academic Editors: Douglas
paper documents into a digital form, methods such as scanning or capturing photos using
O’Shaughnessy and José smartphones can be employed. Consequently, document image processing has become
Miguel Molina Martínez increasingly significant, encompassing key steps such as image acquisition, image pre-
processing, layout analysis, and optical character recognition (OCR) [1]. However, during
Received: 4 April 2024 the process of acquiring document images, transparent or uneven materials may result in a
Revised: 19 May 2024
show-through effect, where the content from the reverse side of the document becomes
Accepted: 23 May 2024
visible on the front side. This “show-through” phenomenon, also referred to as “back-to-
Published: 26 May 2024
front interference” or “bleeding”, significantly impacts subsequent text recognition and
analysis. Therefore, document image pre-processing and optimization have emerged as
important challenges that necessitate the application of image de-bleeding methods to
Copyright: © 2024 by the authors.
restore and enhance the quality and legibility of document images.
Licensee MDPI, Basel, Switzerland. Image restoration (IR) has long been a prominent and challenging research area in
This article is an open access article computer vision that is renowned for its complexities. In a multitude of application
distributed under the terms and scenarios, IR holds paramount significance in delivering exceptional visual experiences
conditions of the Creative Commons and facilitating high-level visual applications. The primary objective of IR is to recover the
Attribution (CC BY) license (https:// ideal, pristine image x from the observed image y, a process that is typically associated
creativecommons.org/licenses/by/ with the following model:
4.0/). y = H (x) + n (1)

Appl. Sci. 2024, 14, 4568. https://doi.org/10.3390/app14114568 https://www.mdpi.com/journal/applsci


Appl. Sci. 2024, 14, 4568 2 of 22

Here, H (·) represents the transfer function, describing the transformations undergone
by the image during the acquisition and transmission processes, while n denotes the
presence of noise, representing the existence of random interference.
Due to the diverse causes of image degradation, different IR tasks are closely associated
with their corresponding transmission functions H (·) and noises n. For instance, in the task
of document image dehazing, the transfer function H (·) can represent optical properties,
color distortions, and other phenomena introduced during scanning, capturing, or other
image-acquisition methods. The noise term represents random interference introduced
during the processes of image acquisition and transmission. These transformations may
result in the occurrence of a “show-through” phenomenon in document images, making
text, graphics, colors, and other information visible through the pages or paper. Based on
the appearance and underlying mechanisms of the show-through images, the document
image show-through model can be expressed as follows:

I S = (1 − α) · I GT + α · ϕ(S) (2)

In this context, I S represents the mixed image with a show-through effect; I GT rep-
resents the foreground image (the front side of the document image); S represents the
background image (the back side of the document image); and α represents the blending ra-
tio of the show-through layer. Additionally, ϕ(·) denotes the bleeding-attenuation function.
Resolving Equation (2) without any prior knowledge leads to an infinite number of solu-
tions due to the unknown variables S, α, and ϕ(·), making the restoration of I GT from I S a
non-deterministic problem. Furthermore, the structure and attributes of the show-through
layer often resemble the content of the foreground image, posing challenges in removing
unnecessary information from the show-through layer while preserving the content and
details of the foreground image with reduced redundancy.
So far, many studies have adopted methods based on generative adversarial network
(GANs) [2] to restore document images or natural images. Souibgui et al. [3] proposed
a robust end-to-end framework called document-enhancement GAN, which effectively
restores severely degraded document images using conditional GAN [4]. However, these
methods still require the use of a large number of paired datasets to train the model for
good experimental results. It is difficult to find real, clear images that match the degraded
document images. In addition, Liu et al. [5] proposed a single-image dehazing method
based on cycle GAN (CycleGAN) [6], which uses unpaired hazy images for training and
constrains the process of haze removal through double-cycle consistency loss. This method
is suitable for removing fog images in natural scenes, but for document images, few people
adopt the approach of double-cycle consistency loss.
Therefore, the removal of show-through effects in document images faces several
challenges:
a. The foreground and show-through parts in document images are highly similar
because they both comprise document content. Without additional prior knowledge,
it is difficult to accurately recover the desired foreground images.
b. In real-world scenarios, there is a lack of authentic images that match the show-
through images, making supervised training infeasible. As a result, most document
image processing algorithms rely on synthetic training data. However, there ex-
ists a gap between synthetic and real-world data, which limits the performance of
these algorithms.
To address these challenges, this paper proposes a self-supervised-learning-based
method for removing show-through effects in document images. This method, called the
CycleGAN-based color document show-through-removal network (CDSR-CycleGAN),
eliminates show-through effects without requiring paired datasets. The network framework
adopts a two-stage structure and employs double-cycle consistency loss and pseudo-
similarity loss constraints to guide the process of removal of image show-through. Two
Appl. Sci. 2024, 14, 4568 3 of 22

datasets, each consisting of different show-through mixing ratios, are constructed for
training the document show-through-removal task.
The remaining parts of this article are organized as follows: Section 2 presents the
related work; Section 3 introduces the proposed method; Section 4 presents the experimental
results; Section 5 discusses the findings; and Section 6 concludes the paper.

2. Related Work
2.1. Show-through-Removal Method
Show-through refers to the presence of transparency or uneven materials in document
images during the scanning or capturing process, resulting in a visible transparency effect.
In 1994, Lins et al. [7] first described this “front-to-back interference” noise as “seen-
through.” Later, researchers referred to it as “show-through” or “bleed-through.” The initial
solution to this problem was proposed by Lins et al. [7], who suggested a “mirror-filtering”
strategy that scans both sides of the document, aligns the two images, and compares the
intensities of corresponding ink pixels. Sharma [8] initially analyzed the show-through
phenomenon based on physical principles and derived a simplified mathematical model.
Subsequently, they linearized the model and designed an adaptive linear filtering scheme
to eliminate the show-through effect. Rowley-Brooke et al. [9] introduced a database of
degraded document images affected by bleed-through and demonstrated its utility by
evaluating three non-blind bleed-through removal techniques against manually created
ground-truth foreground masks. Moghaddam et al. [10] segmented images from double-
sided ancient documents using a level-set framework and eliminated the influence of bleed-
through effects. The method comprises three key steps: binarization using adaptive global
thresholds, separation of interference patterns and genuine text using reverse diffusion
forces, and smoothing boundaries using small regularization forces. However, due to
variations in document image quality, the performance of traditional methods varies,
requiring a large number of empirical parameters to construct suitable solutions. He
et al. [11] proposed a novel DL-based document-enhancement-and-binarization model
(DeepOtsu), utilizing iterative refinement and superposition to achieve visually pleasing
results. Hanif et al. [12] proposed a document-restoration method aimed at eliminating
unwanted interfering degradation patterns from ancient manuscripts containing color.

2.2. GAN
GAN, proposed by Ian Goodfellow et al. [2] in 2014, is a deep learning (DL) [13] model
consisting of a generator and a discriminator. Its basic principle is to train the generator
and discriminator in an adversarial manner, allowing the generator to gradually generate
realistic samples. GAN has been widely applied in various fields such as images, audio,
and text due to its unique generation capabilities and capacity for unsupervised learning.
Poddar et al. [14] proposed TBM-GAN, a generative-adversarial-network framework
for synthesizing realistic handwritten documents with degraded backgrounds. De et al. [15]
developed a DL-based document image-binarization model employing dual discriminator
GAN and focal loss as the generator loss. Suh et al. [16] introduced a two-stage method
for color-document image enhancement and binarization using independent GANs. Lin
et al. [17] suggested a training method using GANs to address limited data availability.
Ju et al. [18] proposed CCDWT-GAN, a color channel-based discrete wavelet transform
GAN, for efficient extraction of text information from color degraded document images.
Zou et al. [19] introduced a method called “deep adversarial decomposition” for separating
individual layers of a single superimposed image. This method combines adversarial
training, a separation-critic network, and a crossroad L1 loss function to effectively handle
both linear and non-linear mixtures. However, previously developed GAN methods require
large amounts of paired datasets. Therefore, in this paper, we choose to use CycleGAN,
which does not require paired images.
Appl. Sci. 2024, 14, 4568 4 of 22

2.3. Cycle Consistency Loss


CycleGAN is a special variant of traditional GAN proposed by Zhu et al. [6] in 2017.
There are some problems in previous GANs, such as difficult training, unstable genera-
tion results, and the need for paired training data. To solve these problems, CycleGAN
introduces the cycle consistency loss function.
Gangeh et al. [20] proposed an end-to-end unsupervised multi-document image blind
denoising method, which provides a unified model that can remove various artifacts from
various document types without the need for paired clean images. Torbunov et al. [21]
proposed a new image-transformation model that achieves better performance while main-
taining cycle consistency constraints, solving problems in the field of unpaired image
transformation. The model uses a vision transformer and adopts necessary training and
regularization techniques to achieve better performance. Wu et al. [22] proposed an unsu-
pervised method based on cycle consistency for blind image restoration. By constraining
low-frequency content, optimizing the content of training data, and using model-averaging
techniques, the quality of image restoration is improved. Wang et al. [23] proposed the
UDoc-GAN framework to solve the problem of uncontrolled lighting affecting document
images captured on mobile devices. UDoc-GAN performs document light correction in non-
paired settings, learning the relationship between normal and abnormal lighting domains
by predicting environmental light features and redefining cycle consistency constraints.
Xu et al. [24] proposed a single-image bleed-through image-restoration method based on
self-supervised learning. By leveraging the cyclic consistency of generative adversarial net-
works and the transfer-learning capability of constraints, this method achieves model training
without paired image sets. The approach incorporates a self-supervised-learning module
designed to extract supervised information from large-scale unsupervised data, effectively
enhancing the quality of texture, edges, and other detailed information in the bleed-through
image content, thus enabling the removal of bleed-through from a single image.
The above methods have verified the effectiveness of CycleGAN and cycle consistency
loss. Based on this finding, we also considered that CycleGAN does not have accurately
matched label supervision during the training process. Therefore, we designed a network
to provide pseudo-similarity loss and used double-cycle consistency loss to constrain the
training process.

3. Proposed Methods
3.1. Overall Network Framework
The overall network framework of the self-supervised-learning-based color docu-
ment show-through-removal network, CDSR-CycleGAN, is illustrated in Figures 1 and 2.
CDSR-CycleGAN transforms the unpaired image show-through-removal problem into
an image-to-image generation problem, generating non-show-through images through a
self-supervised-learning cycle-consistency network. This section provides a detailed de-
scription of the network composition, which consists of two parts: a show-through-removal
framework and a show-through-generation framework.
In the diagram, X represents the input show-through image. De represents the show-
through-removal generator, and De( X ) represents the show-through-removal image. Y
represents the clear image. Re represents the show-through generator, and Re(Y ) represents
the synthetic show-through image. S represents the similarity network; S_De and S_Re
are, respectively, used to generate similarity non-show-through images and show-through
images. Dshow−through is the adversarial discriminator for classifying between the real
show-through image and the generated show-through image. Dnon−show−through is the
adversarial discriminator for distinguishing between the real non-show-through image
and the show-through-removal image. The light blue lines in the figure represent pseudo-
similarity loss and cycle-consistency loss, respectively.
Appl.
Appl. Sci. 2024, 14,
Sci. 2024, 14, 4568
x FOR PEER REVIEW 55of 23
of 22
Appl. Sci. 2024, 14, x FOR PEER REVIEW 5 of 23

Figure 1. The framework for show-through removal.


Figure 1. The framework for show-through removal.
Figure 1. The framework for show-through removal.

Figure 2. The framework for show-through generation.


Figure 2. The framework for show-through generation.
Figure 2. The framework for show-through generation.
In the diagram, 𝑋 represents the input show-through image. 𝐷𝑒 represents the
In theFigures
show-through-removal
Since diagram, 1 and𝑋generator,
2represents
share a similar andthe𝐷𝑒(𝑋)
input show-through
represents
framework we will 𝐷𝑒
image.
the show-through-removal
structure, represents
use Figure 1 asim-the
an
show-through-removal
age. 𝑌 represents
example to illustratethe generator,
the clear
procedural and
image.steps 𝐷𝑒(𝑋) represents
𝑅𝑒 ofrepresents
our method. the As the show-through-removal
shown in Figure
show-through 1, the input
generator, im-
and
age. 𝑌Xrepresents
𝑅𝑒(𝑌)
image represents
with thesynthetic
show-through
the clearis fed image.
into the 𝑅𝑒show-through-removal-generator
show-through represents
image.the 𝑆 represents
show-through generator,
the De, resulting
similarity and
net-in
𝑅𝑒(𝑌)
the
work; represents
𝑆_𝐷𝑒 andimage
bleeding-free the
𝑆_𝑅𝑒 De synthetic show-through
(X )respectively,
are, . Then, De(X )used is passed image. 𝑆 represents
throughsimilarity
to generate the
both the show-through similarity
non-show-through net-
genera-
work;
tor Re and
images 𝑆_𝐷𝑒
andthe and 𝑆_𝑅𝑒 network
similarity
show-through are, respectively,
images. S_De, used to
𝐷 producing isgenerate
the bleeding-added
the similarity
adversarial image non-show-through
Re(De(for
discriminator X ))clas-
and
images
the and
similarity show-through
image S ( De ( X images.
)) , 𝐷
respectively. is
Subsequently,
sifying between the real show-through image and the generated show-through image. the adversarial
the cycle discriminator
consistency for
loss_1 clas-
and
𝐷sifying
the between the
pseudo-similarity is thereal
loss show-through
are computed.
adversarial image
Both De
discriminator and
for the
andgenerated
(X )distinguishing show-through
S(De(X ))between
are thenthe fedreal image.
into the
non-
𝐷
discriminator
show-through image D is the
non−show and adversarial to discriminator
determine whether for distinguishing
they are
the show-through-removal image. The light blue lines in the
−through between
bleeding-removed the real non-
images.
Finally, S(De(X ))image
show-through
figure represent is input and into
pseudo-similarity the theshow-through-removal
show-through generatorimage.
loss and cycle-consistency Re to loss,
generate theblue
Therespectively.
light bleeding-added
lines in the
image
figure Re ( S (
representDe ( X ))) and the
pseudo-similarity cycle consistency
loss and loss_2 is
cycle-consistency computed.
Since Figures 1 and 2 share a similar framework structure, we will use Figure 1 as an loss, respectively.
example Since to Figures
illustrate 1 and 2 share a similar
the procedural steps of framework
our method. structure,
As shown we will use Figure
in Figure 1, the1input
as an
3.2. Generative
example to Networkthe procedural steps of our method. As shown in Figure 1, the input
illustrate
image 𝑋 with show-through is fed into the show-through-removal-generator 𝐷𝑒, result-
(1)
image
ing 𝑋 with
inGenerator
the De
show-through
bleeding-free imageis 𝐷𝑒(𝑋) fed into the show-through-removal-generator
. Then, 𝐷𝑒(𝑋) is passed through both the 𝐷𝑒, show-
result-
ing in
through thegenerator
Generator De is𝑅𝑒
bleeding-free an and image 𝐷𝑒(𝑋). Then,
the similarity
integral component network
of the𝐷𝑒(𝑋) 𝑆_𝐷𝑒, is passed
CDSR-CycleGAN producingthrough both theatshow-
the bleeding-added
model aimed effec-
through
imageremoving
tively generator
𝑅𝑒(𝐷𝑒(𝑋)) 𝑅𝑒
the and and the
the similarity
show-through similarity
image
effect network
from 𝑆(𝐷𝑒(𝑋)),
input images 𝑆_𝐷𝑒, producing
respectively.
and restoring the bleeding-added
Subsequently,
the originalthe cy-
image.
image
cle
As 𝑅𝑒(𝐷𝑒(𝑋))
consistency
illustrated inloss_1andand
Figure theitsimilarity
3, the
consists three 𝑆(𝐷𝑒(𝑋)),
ofimage
pseudo-similarity loss respectively.
sub-networks, are computed.
including Subsequently,
De Both
remove𝐷𝑒(𝑋)
, Dethe andcy-,
predict
cle
and consistency
𝑆(𝐷𝑒(𝑋))
Derefineare . The loss_1
firstfed
then stageand
into the
includes pseudo-similarity
Deremove and𝐷
the discriminator loss are computed.
Depredict . The primary Both 𝐷𝑒(𝑋)
Deremove
role of the whether
to determine and
𝑆(𝐷𝑒(𝑋))
network
they are is are then fedeliminate
to initially
bleeding-removed intoimages.
the thediscriminator
Finally, 𝑆(𝐷𝑒(𝑋))
show-through 𝐷 effectisofinput to determine
the image,
into the making whether
it
show-through easier
they
for are
subsequent bleeding-removed
processes to images.
generator 𝑅𝑒 to generate the bleeding-added image 𝑅𝑒(𝑆(𝐷𝑒(𝑋)))
restore theFinally,
original 𝑆(𝐷𝑒(𝑋))
image. is
The input
De into the
network show-through
is
predict and the cycle con- used to
generator
predict and 𝑅𝑒
locateto
sistency loss_2 is computed. generate
foreground the bleeding-added
information, providingimage 𝑅𝑒(𝑆(𝐷𝑒(𝑋)))
precise guidance and
for the
further cycle con-
process-
sistency
ing. Lastly,loss_2
in the is computed.
second stage, the Derefine network adjusts and restores the image with
removed show-through, further enhancing its visual appeal and realism. Specifically, the
channel numbers of Deremove and Depredict are set to 64 for consistency, while the channel
image. As illustrated in Figure 3, it consists of three sub-networks, including 𝐷𝑒 ,
𝐷𝑒 , and 𝐷𝑒 . The first stage includes 𝐷𝑒 and 𝐷𝑒 . The primary role
of the 𝐷𝑒 network is to initially eliminate the show-through effect of the image,
making it easier for subsequent processes to restore the original image. The 𝐷𝑒 net-
Appl. Sci. 2024, 14, 4568 work is used to predict and locate foreground information, providing precise guidance
6 of 22
for further processing. Lastly, in the second stage, the 𝐷𝑒 network adjusts and re-
stores the image with removed show-through, further enhancing its visual appeal and
realism. Specifically, the channel numbers of 𝐷𝑒 and 𝐷𝑒 are set to 64 for con-
numbers of Derefine increase sequentially (64, 128, 256, 128, 64, and 3), capturing image
sistency, while
features and the channel
details more effectively. 𝐷𝑒
numbers ofThrough increase
the sequentially
coordinated (64,
efforts of128, 256,
these 128,sub-
three 64,
and 3), capturing image features and details more effectively. Through the
networks, Generator De efficiently removes the show-through effect from input imagescoordinated
efforts of thesevisually
and generates three sub-networks, detail-rich 𝐷𝑒
superior and Generator efficiently removes the show-through
images.
effect from input images and generates visually superior and detail-rich images.

Figure
Figure 3.
3. The
The network
network architecture
architecture of
of generator 𝐷𝑒.
generator De.

Both 𝐷𝑒
Both Deremove and De 𝐷𝑒predict take
take input
input images
images with show-through effects. effects. Firstly,
Firstly,
𝐷𝑒
Deremove undergoes
undergoes shallow
shallowfeature
featureextraction
extractiontotoobtain
obtainpreliminary
preliminaryfeatures.
features. Next,
Next, a triple
a tri-
feature attention module (TFAM) with multiple skip connections
ple feature attention module (TFAM) with multiple skip connections is applied achieve is applied to achieve
deeper feature
deeper feature attention.
attention. Subsequently,
Subsequently, further
further processing
processingisisconducted
conductedusing usingtwotwoconvo-
convo-
lutional networks and a long-skip global residual learning module.
lutional networks and a long-skip global residual learning module. Ultimately, an image Ultimately, an image
with the initial removal of the show-through effect
with the initial removal of the show-through effect is obtained. is obtained.
The TFAM
The TFAM network
network structure,
structure,illustrated
illustratedininFigure
Figure4,4,consists
consistsof of
three feature
three featureattention
atten-
tion modules (FAMs) and a convolutional layer. The FAM network structure, shown in5,
modules (FAMs) and a convolutional layer. The FAM network structure, shown in Figure
comprises
Figure local residual
5, comprises learning
local residualandlearning
channel-pixel attention (CPA)
and channel-pixel modules.
attention Local
(CPA) residual
modules.
learning allows less-important bleeding information to bypass
Local residual learning allows less-important bleeding information to bypass throughthrough multiple local
residual connections, while the main network focuses on effective foreground
multiple local residual connections, while the main network focuses on effective fore- information.
CPA retains
ground shallow information
information. CPA retainsand transfers
shallow it to deepand
information layers, allowing
transfers it toDe remove
deep to focus
layers, al-
more on
lowing 𝐷𝑒 recovering effective information such as show-through overlapping
to focus more on recovering effective information such as show-through regions, detail
Appl. Sci.Appl.
2024,Sci.
14,2024,
x FOR14, PEER
x FOR REVIEW textures, andregions,
PEER REVIEW
overlapping color accuracy. This not and
detail textures, onlycolor
enhances the restoration
accuracy. accuracy
This not only of Dethe
enhances remove 7 but
resto-of 723of 23
also reduces redundancy
ration accuracy of 𝐷𝑒 and improves efficiency.
but also reduces redundancy and improves efficiency.

Figure 4. The network architecture of TFAM.


Figure 4. The network architecture of TFAM.
Figure 4. The network architecture of TFAM.

Figure 5. The network architecture of FAM.

𝐷𝑒
Figure 5. The network architecture of FAM.
network is used to predict and locate foreground information. Firstly,
Figure 5. The network architecture of FAM.
the receptive field is expanded through four sets of dilated convolutions and standard
The 𝐷𝑒
convolution networktoisenable
operations used to
thepredict
networkand locate foreground
to capture information.
more contextual Firstly,
information.
This enhances
the receptive fieldthe
is accuracy
expandedandthrough
precisionfour
of foreground prediction.
sets of dilated Next, the and
convolutions globalstandard
and
local features
convolution of the 𝐷𝑒
operations to enablenetwork are integrated
the network through
to capture long contextual
more paths to extract more
information.
robust features. Attention mechanisms are then employed to quickly capture
This enhances the accuracy and precision of foreground prediction. Next, the global andforeground
information hidden in complex backgrounds. First, a 1 × 1 convolutional layer com-
Figure 5. The network architecture of FAM.

The 𝐷𝑒 network is used to predict and locate foreground information. Firstly,


Appl. Sci. 2024, 14, 4568 7 of 22
the receptive field is expanded through four sets of dilated convolutions and standard
convolution operations to enable the network to capture more contextual information.
This enhances the accuracy
The Depredict network is and
usedprecision
to predict ofand
foreground prediction.
locate foreground Next, theFirstly,
information. global and
local features of the 𝐷𝑒
the receptive field is expanded through four sets of dilated convolutions and standard more
network are integrated through long paths to extract
robust features.
convolution Attentiontomechanisms
operations are then
enable the network toemployed
capture more to quickly
contextualcapture foreground
information.
This enhances
information hiddenthe accuracy
in complexand precision of foreground
backgrounds. 1 × 1 convolutional
First, aprediction. Next, the global and com-
layer
local features of the
presses the features of layer De network are integrated through long paths to extract
predict13 into a vector, which is used to adjust the feature represen- more
robust
tation features.
of the previousAttention
step asmechanisms are then
weights. These employed
adjusted to quickly
weights capture foreground
are element-wise multiplied
information hidden in complex backgrounds. First, a 1 × 1 convolutional layer compresses
with the output of layer 12 to extract more significant foreground features. Finally, the
the features of layer 13 into a vector, which is used to adjust the feature representation of
extracted foreground features are reconstructed together with the original input image to
the previous step as weights. These adjusted weights are element-wise multiplied with the
obtain
outputa clear image.
of layer 12 to extract more significant foreground features. Finally, the extracted
𝐷𝑒
foreground consists
features of arean encoder and
reconstructed decoder,
together with using a encoder-decoder
the original input image to structure
obtain a simi-
larclear
to that of CycleGAN. Nine context-aware blocks (CAB) are stacked between the en-
image.
coder and decoder
Derefine to increase
consists the receptive
of an encoder and decoder, field
usingandacontextual dependency,
encoder-decoder structure as shown in
similar
to that
Figure 6. of
CABCycleGAN.
achievesNine context-aware
this end blocks (CAB)
through feature are stackedabetween
transformation, the encodermech-
context-attention
and decoder
anism, to increase
and residual the receptive
connections. Thefield and contextual
encoder dependency,
and decoder as shown in
are connected Figure 6.resid-
through
CAB achieves this end through feature transformation, a context-attention mechanism, and
ual connections, allowing the 𝐷𝑒 network to be end-to-end mapped and speeding
residual connections. The encoder and decoder are connected through residual connections,
up convergence while improving the quality of generated images. In the 𝐷𝑒 net-
allowing the Derefine network to be end-to-end mapped and speeding up convergence while
work, the encoder extracts features and downsamples the input
improving the quality of generated images. In the Derefine network, the encoder extracts image through compo-
nents suchand
features as convolution,
downsamples batch normalization
the input image through (IN), a ReLU such
components activation function,batch
as convolution, dropout,
and max pooling.
normalization Thea decoder
(IN), then upsamples
ReLU activation and reconstructs
function, dropout, the features
and max pooling. extracted by
The decoder
thethen
encoder through
upsamples and components
reconstructs the such as transposed
features extracted by convolution, IN, the components
the encoder through ReLU activation
such as and
function, transposed
dropout. convolution, IN, the ReLU activation function, and dropout.

Figure 6. The network architecture of CAB.


Figure 6. The network architecture of CAB.

(2) Generator Re
The generator Re (show-through-generation network) is another component of the
CDSR-CycleGAN model, and its network structure is shown in Figure 7. Its role is to
take in input images without the show-through effect and generate the corresponding
show-through images. Unlike the other generator De, which focuses on generating cycle
mappings, the generator Re specifically focuses on generating show-through mappings.
Re is the network used for show-through generation and consists of two sub-networks:
the foreground prediction network Repredict , which predicts and locates the foreground
information, and the background blur network Reblur , which performs blurring on the
background. Similar to Depredict , Repredict has a similar network structure but differs in the
way it reconstructs the image. Reblur has a network structure similar to Derefine but serves a
different purpose. Repredict combines the extracted foreground features with the original
input image to obtain a clear foreground image with known foreground information
positions. Reblur is used to blur the input image and provides a foundation for adding the
back-to-front interference effect. Subsequently, the show-through effect is simulated around
the predicted foreground information by further enhancing the realism of the image.
(2) Generator 𝑅𝑒
The generator 𝑅𝑒 (show-through-generation network) is another component of the
CDSR-CycleGAN model, and its network structure is shown in Figure 7. Its role is to take
Appl. Sci. 2024, 14, 4568
in input images without the show-through effect and generate the corresponding show-
8 of 22
through images. Unlike the other generator 𝐷𝑒, which focuses on generating cycle map-
pings, the generator 𝑅𝑒 specifically focuses on generating show-through mappings.

Figure 7. The network architecture of generator 𝑅𝑒.


Figure 7. The network architecture of generator Re.

𝑅𝑒 is the Network
(3) Similarity network Sused for show-through generation and consists of two sub-net-
works: the foreground prediction network 𝑅𝑒 , which predicts and locates the fore-
The similarity network S is a crucial component of the CDSR-CycleGAN model. In
ground information, and the background blur network 𝑅𝑒 , which performs blurring
the absence of labeled values, this paper utilizes S to provide unsupervised similarity
on the background. Similar to 𝐷𝑒 , 𝑅𝑒 has a similar network structure but dif-
labels and compute pseudo-similarity loss. In practical applications, S is employed to
fers in the way it reconstructs the image. 𝑅𝑒 has a network structure similar to 𝐷𝑒
generate similarity images for both the De generator and the Re generator. Therefore, it
but serves a different purpose. 𝑅𝑒 combines the extracted foreground features with
is referred to as the similarity-De network S_De in the De architecture and the similarity-
Rethe originalS_Re
network inputin image
the Re to obtain a clear foreground image with known foreground in-
architecture.
formation positions. 𝑅𝑒 is used to blur
The architecture of S is illustrated the input
in Figure image
8, with and provides
a similar a foundation
application for Re asfor
adding the back-to-front interference effect. Subsequently, the show-through effect is
for De. Firstly, a 7 × 7 initial convolution operation is applied, followed by instance normal- sim-
ulatedand
ization around
ReLUthe predictedSubsequently,
activation. foreground information
downsampling by further enhancing
is performed twicethe realism
using 3 × 3of
the image.
convolution kernels, IN, and ReLU activation, gradually increasing the number of channels.
(3) Similarity
Then, Networkis𝑆conducted through nine CABs, as shown in Figure 6. Finally,
feature extraction
two upsampling operations
The similarity network are𝑆 performed
is a crucialusing deconvolution
component blocks, each consisting
of the CDSR-CycleGAN model.ofIn
3 ×absence
a the 3 deconvolution operation, IN, ReLU activation, and a gradual reduction in
of labeled values, this paper utilizes 𝑆 to provide unsupervised similaritychannel la-
numbers.
Appl. Sci. 2024, 14, x FOR PEER REVIEW Ultimately, the output of network S is generated using a 7 ×
bels and compute pseudo-similarity loss. In practical applications, 𝑆 is employedkernel
7 convolution to9 gen-
of 23
and mapped
erate to the
similarity images (−1,
rangefor 1) through
both the 𝐷𝑒 the tanh activation
generator and the function.
𝑅𝑒 generator. Therefore, it is
referred to as the similarity-𝐷𝑒 network 𝑆_𝐷𝑒 in the 𝐷𝑒 architecture and the similarity-
𝑅𝑒 network 𝑆_𝑅𝑒 in the 𝑅𝑒 architecture.
The architecture of 𝑆 is illustrated in Figure 8, with a similar application for 𝑅𝑒 as
for 𝐷𝑒. Firstly, a 7 × 7 initial convolution operation is applied, followed by instance nor-
malization and ReLU activation. Subsequently, downsampling is performed twice using
3 × 3 convolution kernels, IN, and ReLU activation, gradually increasing the number of
channels. Then, feature extraction is conducted through nine CABs, as shown in Figure 6.
Finally, two upsampling operations are performed using deconvolution blocks, each con-
sisting of a 3 × 3 deconvolution operation, IN, ReLU activation, and a gradual reduction
Figure
in 8.8.The
channel
Figure The networkarchitecture
numbers.
network architecture
Ultimately,ofof similarity
the output
similarity S. 𝑆. 𝑆 is generated using a 7 × 7 con-
network
of network
network
volution kernel and mapped to the range (−1, 1) through the tanh activation function.
Through
Through the
the combined
combined action
action of of De 𝐷𝑒
thethe generator,
generator, the Re 𝑅𝑒 generator,
thegenerator, and
and the the simi-
similarity
larity network
network 𝑆 , the CDSR-CycleGAN
S, the CDSR-CycleGAN modelhigh-quality
model achieves achieves high-quality
show-throughshow-through
removal and re-
moval and show-through
show-through generation. generation.

3.3.
3.3.Discriminative
DiscriminativeNetwork
Network
For
Forthe
theshow-through-removal
show-through-removal task, discriminatorplays
task, the discriminator playsaaroleroleinin identifying
identifying the
the differences
differences between
between thethe generated
generated show-through
show-through images
images and
and thethe real
real non-bleeding
non-bleeding re-
removal images. In the show-through-removal network, two discriminators
moval images. In the show-through-removal network, two discriminators with identical with identical
network
networkstructures
structuresareare
employed,
employed,as as
shown
shown in in
Figure 9. The
Figure discriminator
9. The discriminator 𝐷 −through is
Dshow
used
is used to differentiate between input show-through images and generated imageswith
to differentiate between input show-through images and generated images with
show-through,
show-through, while
while thethe
discriminator
discriminatorDnon
𝐷−show−through is used to distinguish
is used between
to distinguish the
between
non-show-through
the non-show-through imagesimages
generated by the generator
generated G and the
by the generator G input
and thenon-show-through
input non-show-
through images. We utilize five convolution blocks for classifying real and fake images,
where each convolution layer has a kernel size of 4 × 4; a stride of 2; and channel numbers
of 64, 128, 256, 512, and 1, from low to high. Within each convolution block, a convolution
layer, LReLU activation function, and IN normalization layer are sequentially applied.
For the show-through-removal task, the discriminator plays a role in identifying the
differences between the generated show-through images and the real non-bleeding re-
moval images. In the show-through-removal network, two discriminators with identical
network structures are employed, as shown in Figure 9. The discriminator 𝐷
Appl. Sci. 2024, 14, 4568
is used to differentiate between input show-through images and generated images9 of 22
with
show-through, while the discriminator 𝐷 is used to distinguish between
the non-show-through images generated by the generator G and the input non-show-
images. We utilize five convolution blocks for classifying real and fake images, where each
through images. We utilize five convolution blocks for classifying real and fake images,
convolution layer has a kernel size of 4 × 4; a stride of 2; and channel numbers of 64, 128,
where each convolution layer has a kernel size of 4 × 4; a stride of 2; and channel numbers
256, 512, and 1, from low to high. Within each convolution block, a convolution layer,
of 64, 128, 256, 512, and 1, from low to high. Within each convolution block, a convolution
LReLU activation function, and IN normalization layer are sequentially applied.
layer, LReLU activation function, and IN normalization layer are sequentially applied.

Figure9.9.The
Figure Thenetwork
networkarchitecture discriminatorD.𝐷.
architectureofofdiscriminator

3.4.
3.4.Loss
LossFunctions
Functions
In
Inthe
the CycleGAN network, this
CycleGAN network, thispaper
paperutilizes
utilizesmultiple
multiple loss
loss functions
functions to train
to train the the
net-
network; these aim to capture different objectives between the generator and discriminator.
work; these aim to capture different objectives between the generator and discriminator.
(1)
(1) Adversarial lossLℒadv
Adversarialloss
Adversarial
Adversariallosslosswas
wasused
usedforfor
adversarial
adversariallearning between
learning the generator
between the generatorand discrim-
and dis-
inator. The goal of the generator is to generate realistic fake images that the
criminator. The goal of the generator is to generate realistic fake images that the discrim- discriminator
cannot
inator distinguish from real
cannot distinguish images.
from On the other
real images. On the hand,
otherthe goalthe
hand, of the
goaldiscriminator is to
of the discrimina-
precisely differentiate between real and fake images.
tor is to precisely differentiate between real and fake images.
For
Forthe generatorsRe𝑅𝑒
thegenerators andandS and
𝑆 andthe the
discriminator Dshow
discriminator 𝐷−through , the ,adversarial loss
the adversarial
for generating
loss show-through
for generating show-throughimages can be
images defined
can as shown
be defined in Equation
as shown (3). This
in Equation (3). loss
This
encourages the generator to generate images that are more similar to the real
loss encourages the generator to generate images that are more similar to the real images, images, while
the discriminator
while aims toaims
the discriminator distinguish between
to distinguish the generated
between show-through
the generated show-throughimages and
images
real images that have not been subjected to show-through
and real images that have not been subjected to show-through removal. removal.
 
Lre−adv Re, S, Dshow−through , Y, X
h i
= EX ∼ pshow−through (X ) log Dshow−through ( X )
h i (3)
+EY ∼ pnon−show−through (Y ) log(1− Dshow−through ( Re(Y )))
h i
+EY ∼ pnon−show−through (Y ) log(1− Dshow−through (S( Re( Y ))) )

In the equation, Re represents the show-through-generation network, S( De) repre-


sents the similarity-guided show-through-generation network, and Dshow−through aims to
distinguish between the generated bleeding images Re(Y ) and the real bleeding images X.
Similarly, the adversarial loss for images that have not been subjected to show-through
removal is defined as shown in Equation (4). This loss encourages the generator to generate
images that are more similar to the real non-show-through images, while the discriminator
aims to differentiate between the generated non-show-through-removal images and the
input non-show-through images.
Appl. Sci. 2024, 14, 4568 10 of 22

 
Lde−adv De, S, Dnon−show−through , X, Y
h i
= EY ∼ pnon−show−through (Y ) log Dnon−show−through (Y )
h i (4)
+EX ∼ pshow−through (X ) log(1− Dnon−show−through ( De( X )))
h i
+EX ∼ pshow−through (X ) log(1− Dnon−show−through (S( De( X ))))

In the aforementioned equations, De, S, and Re minimize the target values, while the
discriminators Dshow−through and Dnon−show−through attempt to maximize them. However,
when using the adversarial loss is used, some artifacts may appear in the generated results.
(2) Cycle consistency loss Lcycle
To better correct the bleeding images and preserve more details, we introduced a cy-
cle consistency loss [6] function for show-through images. The goal of this function is to
minimize the difference between the show-through image X and its reconstructed bleeding im-
ages Re(De(X )) and Re(S(De(X ))), as well as the difference between the non-show-through
image Y and its reconstructed non-show-through images De(Re(Y)) and De(S(Re(X ))) (as
illustrated in Figures 1 and 2). Through this loss function, the consistency of the images can
be maintained after they have been transformed in both directions. That is, an image that
has been transformed into a show-through image should be able to be restored back to the
original domain. This loss function helps to reduce information loss and maintain the content
of the images.
The cycle consistency loss from domain X to domain Y is defined as follows:

Lcycle_de ( De, S, Re)


= EX ∼ pshow−through (X ) [|| Re( De( X )) − X ||1 ] (5)
+EX ∼ pshow−through (X ) [|| Re(S( De( X ))) − X ||1 ]

Simultaneously, the cycle consistency loss from domain Y to domain X is defined


as follows:
Lcycle_re ( Re, S, De)
= EY ∼ pnon−show−through (Y ) [|| De( Re(Y )) − Y ||1 ] (6)
+EY ∼ pnon−show−through (Y ) [|| De(S( Re( X ))) − Y ||1 ]

The overall definition of the cycle consistency loss is the sum of the losses from both
directions and is shown in Equation (7), as follows:

Lcycle ( Re, S, De) = Lcycle_de ( De, S, Re) + Lcycle_re ( Re, S, De) (7)

(3) Identity loss Lid


The identity loss function for show-through images plays a pivotal role in CycleGAN,
as it ensures color consistency between the input and output bleeding images. This prevents
the generator from arbitrarily altering the color tone of the images and ensures that the
generated results align with our expectations.
The identity loss is defined as shown in Equation (8):

Lid ( Re, De)


= EX ∼ pshow−through (X ) [|| Re( X ) − X ||1 ] (8)
+EY ∼ pnon−show−through (Y ) [|| De(Y ) − Y ||1 ]

(4) Pseudo-similarity loss L ps


Due to the lack of label supervision during the training process of CycleGAN, we
introduced a similarity network to generate pseudo-labels and calculate the similarity
loss between the generated images. This helps the generator produce more realistic non-
show-through and show-through images, thereby improving the quality and realism of the
Appl. Sci. 2024, 14, 4568 11 of 22

generated results. Because the similarity loss is minimized, the generated images become
more similar to the real images, leading to better transformation outcomes.
The pseudo-similarity loss is defined as shown in Equation (9):

L ps ( De, S, Re)
= EX ∼ pshow−through (X ) [||S( De( X )) − De( X )||1 ] (9)
+EY ∼ pnon−show−through (Y ) [||S( Re(Y )) − Re(Y )||1 ]

(5) Perceptual loss L perceptual


To generate images that are more semantically and visually similar, we introduced a
pre-trained VGG19 [25] perceptual loss to constrain the generator. The perceptual loss pre-
serves the structure of the original images by combining features extracted from the second
and fifth pooling layers of VGG19. Under the constraints of the generators De: X → Y and
Re: Y → X , the expression of the perceptual loss is shown in Equation (10), as follows:

L perceptual = ∥δ( X ) − δ( Re( De( X )))∥22 + ∥δ(Y ) − δ( Re( De(Y )))∥22 (10)

Here, ( X, Y ) represents the non-paired image sets of show-through and non-show-


through images, respectively, and δ is the feature extractor of VGG19.
(6) Total loss Ltotal
Combining the above loss terms, the final total loss function can be represented as
follows, in Equation (11):

Ltotal = λ1 Lre_adv + λ2 Lde_adv + λ3 Lcycle + λ4 Lid + λ5 L ps + λ6 L perceptual (11)

Here, λ1 = λ2 = 1, λ3 = 10, λ4 = 5, λ5 = 0.5, and λ6 = 0.7 [24] are the weights


corresponding to each loss term. To optimize the networks in this paper, the objective is
shown as follows, in Equation (12):

De∗ , Re∗ = arg min max Ltotal (12)


De,Re,S Dshow−through ,Dnon−show−through

4. Experimental Results and Analysis


4.1. Implementation Details and Parameter Settings
The proposed CDSR-CycleGAN was implemented in PyTorch on a PC equipped with
an NVIDIA GeForce RTX 3090 GPU. It should be noted that all training images were
learned in a non-paired manner and that their original resolution was 256 × 256. During
the training process, the Adam optimizer [26] was used with β 1 = 0.5 and β 2 = 0.999. The
model was trained for a total of 100 epochs with a batch size of 1. The model was trained
from scratch for 50 epochs with a learning rate of 0.0001 and then trained for another
50 epochs with a linear decay of the learning rate to 0 before the training was stopped.

4.2. Dataset and Evaluation Metrics


(1) Synthetic dataset
Since the degree of the show-through phenomenon varies across different real-life
scenes, we simulated two different degrees of bleeding phenomena based on different
Gaussian blur levels to demonstrate the effectiveness of our proposed method for different
levels of show-through phenomena. We simulated the real bleeding phenomenon by
adjusting the Gaussian template. Ultimately, a kernel size (ksize) of 5 × 5 with a standard
deviation (sigma) of either 1 or 0.5 was chosen, as depicted in Figure 10.
Since the degree of the show-through phenomenon varies across different real-life
scenes, we simulated two different degrees of bleeding phenomena based on different
Gaussian blur levels to demonstrate the effectiveness of our proposed method for different
Appl. Sci. 2024, 14, 4568
levels of show-through phenomena. We simulated the real bleeding phenomenon by ad-
12 of 22
justing the Gaussian template. Ultimately, a kernel size (ksize) of 5 × 5 with a standard
deviation (sigma) of either 1 or 0.5 was chosen, as depicted in Figure 10.

Figure 10.Example
Figure10. Exampleofofsynthesized
synthesizedshow-through
show-throughimage.
image.

S-color
S-coloradopts a hybrid
adopts a hybridlinear show-through
linear show-through model, wherewhere
model, the show-through
the show-through component
com-
ofponent
S-colorofisS-color
achieved by flipping the image horizontally during the image-preprocessing
is achieved by flipping the image horizontally during the image-prepro-
stage.
cessingThestage.
specific
Theoperation
specific in Equationin(2)
operation is as follows:
Equation (2) is the flipped image
as follows: is subjected
the flipped imagetois
Gaussian blur using the function (·) , with a kernel size of 5 × 5 and a
subjected to Gaussian blur using the function 𝜙(·), with a kernel size of 5 × 5 and a se-
ϕ selectable standard
deviation of 1 or 0.5.deviation
lectable standard Additionally, a randomly
of 1 or selected parameter
0.5. Additionally, a randomly α with a value
selected between𝛼
parameter
0.1 and 0.2 is used to simulate a real bleeding phenomenon.
with a value between 0.1 and 0.2 is used to simulate a real bleeding phenomenon.
The
TheS-color1.0
S-color1.0andandS-color0.5
S-color0.5datasets
datasetseach consist
each of 1000
consist pairspairs
of 1000 of training images
of training and
images
100 pairs of testing images. Each dataset consists of show-through images and non-show-
and 100 pairs of testing images. Each dataset consists of show-through images and non-
through images, and there is no exact correspondence between these two pairs of images.
show-through images, and there is no exact correspondence between these two pairs of
This means that the model needs to learn how to transform images from one domain to
images. This means that the model needs to learn how to transform images from one do-
another, rather than simply performing a straightforward mapping.
main to another, rather than simply performing a straightforward mapping.
(2) Real dataset
(2) Real dataset
We evaluated the proposed method using the Media Team Document Database
We evaluated the proposed method using the Media Team Document Database
(MTDB) [27], which contains various types of document images such as address lists,
(MTDB) [27], which contains various types of document images such as address lists, ad-
advertisements, articles, business cards, music sheets, and maps. In our experiments, we
vertisements, articles, business cards, music sheets, and maps. In our experiments, we fo-
focused on 10 out of the 19 predefined categories in MTDB that, more than the others,
cused on 10 out of the 19 predefined categories in MTDB that, more than the others, con-
contain text documents exhibiting show-through phenomena. The scans from the MTDB
tain performed
were text documentsusingexhibiting
a Hewlettshow-through
Packard ScanJet phenomena.
scanner and TheDeskScan
scans from the MTDB
ll 2.3.1a were
software,
with color-adjustment parameters set to distance 0, direction 0, saturation 50, normal with
performed using a Hewlett Packard ScanJet scanner and DeskScan ll 2.3.1a software, em-
color-adjustment
phasis parameters
curve, resolution set totrue-color
of 300 dpi, distance 0, direction
mode, 0, saturation
no image sharpening,50, normal
and photoemphasis
style
with highlight and shadow values set to 125. We first randomly cropped images of size
256 × 256 from higher-resolution images (2000+ × 3000+ ) and then selected 200 images
with obvious show-through phenomena as our test set. The access link for MTDB is as
follows: https://github.com/liangzongbao/CDSR-CycleGAN/tree/main/data/MDTB
(25 May 2024).
The ICDAR-SROIE dataset [28] was created to support the ICDAR 2019 competition
and aims to evaluate the performance of scene text-recognition and information-extraction
algorithms on real business documents, specifically receipts and invoices. This dataset con-
sists of 1000 complete scanned receipt images, with each image containing approximately
four key text fields such as item names, unit prices, total costs, etc. The text primarily
consists of numbers and English characters, and some images may have issues like stains,
noise, and blurriness. In an approach similar to that used for MTDB, we first randomly
cropped images of size 256 × 256 from a subset of higher-resolution images, and then
selected 30 images with noticeable stains as our test set to validate the generalizability of
our proposed method in this paper.
The MS Táin Bó Cuailnge (MS) [9] is an ancient manuscript. It was created by Seosamh
Ó Longáin (1817–1880), one of the last professional Gaelic scribes. He produced numerous
Gaelic manuscripts, especially during the 19th century. We selected the first and thirteenth
Appl. Sci. 2024, 14, 4568 13 of 22

pages from the book, then cropped them to a size of 256 × 256 for testing. The URL is
https://www.isos.dias.ie/AUS1/AUS_MS_Tain_Bo_Cuailgne.html (22 April 2024).
The Bleed-Through Database (BTD) [9] is designed to be a resource for people working
in the field of digital document restoration and more specifically on the problem of bleed-
through degradation. It consists of a set of 25 registered recto–verso sample grayscale
image pairs taken from larger manuscript images with varied degrees of bleed-through.
The URL is https://www.isos.dias.ie/Sigmedia/Bleed_Through_Database.html (8 May
2024). Both the MS and BTD datasets can be accessed at https://www.isos.dias.ie (8 May
2024) [29].
(3) Evaluation metrics:
To quantitatively evaluate the network’s performance in removing show-through
effects, we used PSNR and SSIM [30] as performance metrics. For both PSNR and SSIM,
larger values indicate a closer resemblance to the ground truth and better removal of
show-through effects.
Quantitative analysis was conducted on benchmark datasets with ground truth. We
calculated FgError, the probability that a pixel in the foreground text is classified as back-
ground or translucent; BgError, the probability that a background or translucent pixel
is classified as foreground; and TotError, which is the average of FgError and BgError
weighted by the number of foreground pixels and background pixels, which are derived
from the corresponding ground-truth images. The Terr error indicates the probability of
misclassification of any pixel in the image. According to [9], these quality metrics are
defined as follows:
FgError = N1 ∑ | GT − BY |
GT ( Fg)
1
BgError = N ∑ | GT − BY | (13)
GT ( Bg)
1
TotError = N ∑ | GT − BY |
GT

where GT is the ground truth, BY is the binarized restoration result, and GT ( Fg) is the
foreground region only of the ground truth image; similarly, GT ( Bg) corresponds to the
background region only, and N is the number of pixels in the image.

4.3. Analysis of Experimental Results on Synthesized Dataset


We compared the proposed method with several other image-restoration methods, in-
cluding DAD [19], DeepOtsu [11], Uformer [31], MPR-Net [32], VDIR [33], S-CycleGAN [24],
and YTMT [34]. For these methods, we retrained the network using the implementations
provided by the respective authors and then evaluated them. Table 1 presents the quan-
titative analysis of two synthetic datasets, while Figures 11 and 12 show examples of
sample restoration.

Table 1. Quantitative evaluation of synthesized dataset.

Methods
Dataset Index S-Cycle
DAD DeepOtsu Uformer MPRNet VDIR YTMT Ours
GAN
PSNR 28.65 21.49 24.69 32.86 27.81 32.92 25.17 33.79
S-color0.5
SSIM 0.977 0.861 0.902 0.972 0.956 0.979 0.912 0.984
PSNR 28.87 21.48 24.68 32.98 27.80 32.99 25.40 33.91
S-color1.0
SSIM 0.980 0.861 0.914 0.981 0.958 0.983 0.948 0.989
PSNR 28.76 21.49 24.69 32.92 27.81 32.96 25.29 33.85
Average
SSIM 0.979 0.861 0.908 0.977 0.957 0.981 0.930 0.987
S-color0.5 PSNR 28.65 21.49 24.69 32.86 27.81 32.92 25.17 33.79
S-color0.5 SSIM 0.977 0.861 0.902 0.972 0.956 0.979 0.912 0.984
SSIM 0.977 0.861 0.902 0.972 0.956 0.979 0.912 0.984
PSNR 28.87 21.48 24.68 32.98 27.80 32.99 25.40 33.91
S-color1.0 PSNR 28.87 21.48 24.68 32.98 27.80 32.99 25.40 33.91
S-color1.0 SSIM 0.980 0.861 0.914 0.981 0.958 0.983 0.948 0.989
SSIM 0.980 0.861 0.914 0.981 0.958 0.983 0.948 0.989
PSNR
Appl. Sci. 2024, 14, 4568 28.76 21.49 24.69 32.92 27.81 32.96 25.29 14 of 33.85
22
Average PSNR 28.76 21.49 24.69 32.92 27.81 32.96 25.29 33.85
Average SSIM 0.979 0.861 0.908 0.977 0.957 0.981 0.930 0.987
SSIM 0.979 0.861 0.908 0.977 0.957 0.981 0.930 0.987

Figure 11.11.
Figure Qualitative
Qualitativeanalysis
analysis of
of S-color0.5 dataset.
S-color0.5 dataset.
Figure 11. Qualitative analysis of S-color0.5 dataset.

Figure 12. Qualitative analysis of the S-color1.0 dataset.


Figure 12. Qualitative analysis of the S-color1.0 dataset.
Figure 12. Qualitative analysis of the S-color1.0 dataset.

As shown in Table 1, on the synthetic datasets, the proposed method achieved higher
average SSIM and PSNR (dB) values than the compared methods. Figures 11 and 12 respec-
tively present the restoration results of a sample image from the S-color0.5 and S-color1.0
synthetic datasets. It is observed that both the DAD and DeepOtsu methods removed the
show-through effects, but the result of the DeepOtsu method suffered from color distortion,
while the DAD method resulted in color deviation. The results of the Uformer and YTMT
methods still contained a significant number of show-through components. The MPR-Net,
VDIR, and S-CycleGAN methods all removed the show-through effects to some extent,
but some residual show-through components remained. In contrast, the proposed method
completely removed the show-through effects and achieved better restoration results.

4.4. Analysis of Experimental Results on Real Show-through Dataset


In this section, we aim to evaluate the generalizability of the proposed method using
the MTDB, MS, and ICDAR-SROIE datasets. These three real datasets were directly tested
using the model trained on the S-color0.5 dataset.
Figure 13 presents example restorations of the advertisement-category samples in
the MTDB dataset. It can be observed that only our proposed method achieved good
4.4. Analysis of Experimental Results on Real Show-through Dataset
In this section, we aim to evaluate the generalizability of the proposed method using
the MTDB, MS, and ICDAR-SROIE datasets. These three real datasets were directly tested
Appl. Sci. 2024, 14, 4568 using the model trained on the S-color0.5 dataset.
15 of 22
Figure 13 presents example restorations of the advertisement-category samples in the
MTDB dataset. It can be observed that only our proposed method achieved good visual
restoration results, results,
visual restoration while the results
while of theofother
the results methods
the other suffered
methods from
suffered fromsignificant
significantresid-
ualresidual
show-through effects.
show-through Although
effects. the DeepOtsu
Although the DeepOtsumethod completely
method completelyremoved
removedthe theshow-
show-through
through effects,
effects, it it resulted
resulted in a grayscale
in a grayscale appearance,
appearance, andcolor
and the the color of deep-red
of deep-red patterns
patterns is not accurately
is not accurately represented. represented.

Figure 13.13.
Figure Qualitative
Qualitativeanalysis
analysis of MTDB
MTDBdataset.
dataset.

Figure14
Figure 14 displays
displays ananexample
example restoration of a sample
restoration with stains
of a sample within the ICDAR-SROIE
stains in the ICDAR-
dataset. It can be seen that the DAD and S-CycleGAN methods and our proposed method
SROIE dataset. It can be seen that the DAD and S-CycleGAN methods and our proposed
yielded good restoration results, while the DeepOtsu, Uformer, MPR-Net, VDIR, and YTMT
method
methodsyielded goodimages
all yielded restoration results,
with some whilestains,
residual the DeepOtsu, Uformer,
failing to produce clearMPR-Net,
images. VDIR,
and YTMT
Figure methods
15 shows allthe
yielded
resultsimages with some
of show-through residual
removal on stains, failing
MS books. to produce
Although there clear
images.
are still some remnants of show-through distortion, the use of our method has made the
Appl. Sci. 2024, 14, x FOR PEER REVIEWimage clearer and more readable. This also confirms the applicability of our method to the 16 of 23
digitization of ancient books.

Figure
Figure 14.14. Qualitativeanalysis
Qualitative analysis of
of ICDAR-SROIE
ICDAR-SROIEdataset.
dataset.

Figure 15 shows the results of show-through removal on MS books. Although there


are still some remnants of show-through distortion, the use of our method has made the
image clearer and more readable. This also confirms the applicability of our method to the
digitization of ancient books.
Figure 14. Qualitative analysis of ICDAR-SROIE dataset.

Figure 15 shows the results of show-through removal on MS books. Although there


are still some remnants of show-through distortion, the use of our method has made the
Appl. Sci. 2024, 14, 4568 16 of 22
image clearer and more readable. This also confirms the applicability of our method to the
digitization of ancient books.

Figure15.
Figure 15.Qualitative
Qualitativeanalysis
analysisofofMS
MSdataset.
dataset.

4.5.
4.5.Quantitative
Quantitativeand
andQualitative
QualitativeAnalysis
Analysison onthe
theBleed-through
Bleed-throughDatabase
Database
This
This section utilizes the BTD dataset with ground truthfor
section utilizes the BTD dataset with ground truth forboth
bothquantitative
quantitativeand and
qualitative
qualitative analysis of our proposed method and existing methods, includingHua
analysis of our proposed method and existing methods, including Hua[35],[35],
Mog
Mog[10],
[10],and
andRo2
Ro2[36].
[36].Figure
Figure1616displays
displaysthe therestoration
restorationresults
resultsfor
forthe
thedataset
datasetnamed
named
NLI.MSG311.265/6.
NLI.MSG311.265/6.ItItcan canbebeobserved
observed that,
that,compared
compared to other methods,
to other methods, our our
proposed
proposed ap-
proach effectively removes most of the show-through while preserving the
approach effectively removes most of the show-through while preserving the foreground. foreground. Hua
performs well inwell
Hua performs handling dark, translucent
in handling areas inareas
dark, translucent isolation but often
in isolation butdeletes foreground
often deletes fore-
text in overlapping image regions, thereby reducing readability. Mog retains
ground text in overlapping image regions, thereby reducing readability. Mog retains fore- foreground
information
Appl. Sci. 2024, 14, x FOR PEER REVIEW but struggles
ground information with dark,
but struggles translucent
with regions, leaving
dark, translucent regions,visible
leaving show-through.
17 show-
visible of 23
Ro2 generally preserves foreground information well but still results
through. Ro2 generally preserves foreground information well but still results in some residual
in some
show-through in most cases.
residual show-through Tablecases.
in most 2 presents
Tablethe quantitative
2 presents the analysis.
quantitativeOuranalysis.
method per-Our
forms the best across all three metrics, indicating superior performance in show-through
method performs the best across all three metrics, indicating superior performance in
removal. Due to the Hua method removing a significant amount of foreground text while
show-through removal. Due to the Hua method removing a significant amount of fore-
eliminating show-through, it has the highest average FgError. The results of the Mog and
ground text while eliminating show-through, it has the highest average FgError. The re-
Ro2 methods exhibit some residual show-through.
sults of the Mog and Ro2 methods exhibit some residual show-through.

Figure 16. Qualitative analysis of NLI.MSG311.265/6 from the BTD dataset.


Figure 16. Qualitative analysis of NLI.MSG311.265/6 from the BTD dataset.
Table 2. Analysis of mean error probabilities for the entire BTD dataset.

Methods
Index
Hua Mog Ro2 Ours
FgError 0.2308 0.0746 0.0696 0.0633
BgError 0.0012 0.0148 0.0085 0.0008
TotError 0.0413 0.0244 0.0196 0.0160
Appl. Sci. 2024, 14, 4568 17 of 22

Table 2. Analysis of mean error probabilities for the entire BTD dataset.

Methods
Index
Hua Mog Ro2 Ours
FgError 0.2308 0.0746 0.0696 0.0633
BgError 0.0012 0.0148 0.0085 0.0008
TotError 0.0413 0.0244 0.0196 0.0160

4.6. Runtime Analysis


To compare the time consumption, we compared the average runtime of different
methods on the S-color0.5 dataset. To illustrate the relationship between runtime and
document size, we conducted tests on images of different dimensions: 256 × 256, 512 × 512,
and 1024 × 1024 pixels. The average runtimes were 0.1303, 0.2155, and 0.4105, respectively.
It is evident that as the document size increases, so does the runtime. Subsequently, we
compared our method with the others using a 256 × 256-pixel document image as an
example. Table 3 lists the implementation frameworks used by various methods and their
average runtimes. Among them, DeepOtsu and VDIR use the TensorFlow framework, while
the other compared methods, including our proposed method, are based on the PyTorch
framework. In terms of average runtime, our proposed method slightly outperforms the
other methods in this aspect.

Table 3. Analysis of average runtime (seconds) using images sized 256 × 256.

Methods Framework Time (s)


DAD PyTorch 0.4649
DeepOtsu TensorFlow 1.3982
Uformer PyTorch 0.5203
MPRNet PyTorch 0.1363
VDIR TensorFlow 0.5729
S-CycleGAN PyTorch 0.2438
YTMT PyTorch 0.2134
CDSR-CycleGAN PyTorch 0.1303

4.7. Analysis of Ablation-Experiment Results


To evaluate the effectiveness of the method proposed in this paper, we conducted
ablation studies using the S-color0.5 dataset. Specifically, we periodically removed a
component from the overall architecture. To ensure fair comparison, all models tested were
trained under the same training settings, except for the modifications shown in Table 4.
Additionally, the effectiveness of cycle consistency loss 2 (Lcycle2 ) and pseudo-similarity
loss (L ps ) was further investigated.

Table 4. Analysis of ablation experiments on different components on the S-color0.5 dataset.

Methods
Index w/o w/o w/o CDSR-
w/o Depredict w/o Deremove w/o Repredict w/o CAB
S Lps Lcycle2 CycleGAN
PSNR 32.52 33.73 29.87 33.75 32.94 30.45 33.71 33.79
SSIM 0.978 0.981 0.969 0.983 0.982 0.974 0.983 0.984

From observations of the last three columns of Table 4, it can be seen that there
is a decline in performance when the Lcycle2 and L ps are replaced. This confirms their
advantages in improving the quality of restoration results. By combining all the aforemen-
tioned components, our proposed method achieved the best performance, with a PSNR of
Appl. Sci. 2024, 14, 4568 18 of 22

33.79 dB. Therefore, each key component considered during the ablation process has made
its own contribution.

4.8. OCR Recognition Analysis


This section aims to validate the effectiveness of the proposed algorithm in removing
document show-through and to explore its potential applications in practical engineering.
A series of experiments were conducted, and the experimental result images based on the
S-color0.5 dataset were tested using the open-source PaddlePaddle v4 version for OCR
testing, with the test address available at https://github.com/PaddlePaddle/PaddleOCR
(8 May 2024).
In this section, the show-through-removal results of the proposed algorithm are
presented and compared with the results of several comparison algorithms for OCR recog-
nition. The OCR-recognition results are shown in Figure 17, indicating that the proposed
algorithm demonstrates significant effectiveness in removing document show-through. By
applying this algorithm to process the experimental result images, the document show-
through effect can be successfully eliminated, greatly improving the readability and19accu-
Appl. Sci. 2024, 14, x FOR PEER REVIEW of 23
racy of the images.

Figure 17. Example of OCR recognition comparison.


Figure 17. Example of OCR recognition comparison.

Thefirst
The firstrow
rowofofthetheimage
imageininFigure
Figure17 17isisused
usedasasan anexample
examplefor forevaluation
evaluationof ofthe
the
proposed method and comparison with other methods. The OCR-recognition
proposed method and comparison with other methods. The OCR-recognition comparison comparison
results,asasshown
results, shown in in Figure
Figure 18, 18, consist
consist of three
of three rows:rows: the row
the first firstpresents
row presents the experi-
the experimental
result image, the second row displays the recognition result visualization, and theand
mental result image, the second row displays the recognition result visualization, the
third
third row shows the output text result. The results of the DAD
row shows the output text result. The results of the DAD and DeepOtsu methods are and DeepOtsu methods
are missing
missing somesome text content,
text content, while while
YTMT YTMT incorrectly
incorrectly recognized
recognized the show-through
the show-through con-
content
tent
as theascorrect
the correct
text. text. A comparison
A comparison reveals
reveals thethe significant
significant effectivenessofofthe
effectiveness theproposed
proposed
algorithmin
algorithm ineliminating
eliminatingdocument
documentshow-through.
show-through.By Byapplying
applyingthis thisalgorithm
algorithmto toprocess
process
theexperimental
the experimentalresultresultimages,
images,the thedocument
documentshow-through
show-througheffect effectcan
canbebesuccessfully
successfully
eliminated, greatly
eliminated, greatly improving the thereadability
readabilityand andaccuracy
accuracy of of
thethe
images.
images.Taking the sam-
Taking the
ple in Figure 16 as an example, incomplete data from the top row and right side were not
included in the evaluation. Table 5 presents a quantitative analysis of OCR recognition
based on counts of the number of words and characters and the number of correct words
and characters. The number of words in the ground truth is 65, and the number of char-
acters is 303, so the closer the measured quantities are to these values, the better the recog-
Appl. Sci. 2024, 14, 4568 19 of 22

sample in Figure 16 as an example, incomplete data from the top row and right side were
not included in the evaluation. Table 5 presents a quantitative analysis of OCR recognition
based on counts of the number of words and characters and the number of correct words
and characters. The number of words in the ground truth is 65, and the number of characters
is 303, so the closer the measured quantities are to these values, the better the recognition
performance. In terms of word count, some adjacent words are recognized as a single
word due to being too close together; for example, “a bag to” is recognized as “abagto”,
which reduces the word count. In terms of characters, it was observed that “n”, “r”, and
“c” are prone to recognition errors. Specifically, the proposed method demonstrates20better
Appl. Sci. 2024, 14, x FOR PEER REVIEW of 23
recognition accuracy compared to several other methods.

Figure 18. Visualization of OCR recognition for different methods.


Figure 18. Visualization of OCR recognition for different methods.

Table 5. Quantitative analysis of OCR recognition.


Table 5. Quantitative analysis of OCR recognition.
Methods
Index Methods DeepOtsu
Ground Truth Show-through Image DAD YTMT Ours
WordsIndex 65Ground Show-Through
76 46 59
Numbers DeepOtsu DAD YTMT 71 Ours 65
Characters 303Truth Image
353 266 295 375 303
Correct Words
Words 65 65 76 48 46 3659 54 71 32 65 64
Numbers
Characters 303 353 266 295 375 303
Numbers Characters 303 290 244 293 295 303
Correct Words 65 48 36 54 32 64
Numbers Characters 303 290 244 293 295 303
This finding demonstrates the practical significance of the proposed algorithm in
solving the document show-through problem. Furthermore, the potential advantages and
possibilities of applying
This finding this algorithm
demonstrates in practical
the practical engineering
significance will be
of the proposed explored.
algorithm in For ex-
solving
ample,
the in fields
document such as digital
show-through archiveFurthermore,
problem. management, theprint quality
potential control, and
advantages andimage post-
possibilities
processing, this algorithm can provide a reliable solution for relevant applications. Elimi-
nating the document show-through effect can allow clearer and more accurate images to
be obtained, thereby enhancing the efficiency and accuracy of subsequent processing
tasks.
Appl. Sci. 2024, 14, 4568 20 of 22

of applying this algorithm in practical engineering will be explored. For example, in fields
such as digital archive management, print quality control, and image post-processing, this
algorithm can provide a reliable solution for relevant applications. Eliminating the document
show-through effect can allow clearer and more accurate images to be obtained, thereby
enhancing the efficiency and accuracy of subsequent processing tasks.

5. Discussion
One limitation of our proposed method is that CDSR-CycleGAN cannot fully restore
complex backgrounds in real images. For instance, in the MTDB dataset, although our
method successfully removes the show-through effect, it does not perfectly preserve the
original appearance of the images. This may impact the visual experience, despite being
advantageous for document image recognition. The reason lies in the inability to accurately
describe real show-through images using existing synthesized data in such scenarios. In
future research, we plan to explore new methods to address this issue and create diverse
show-through datasets to train our network and improve upon this limitation.
Another limitation is that this paper employ synthetic data as the training set; therefore,
its capability to handle the maximum show-through level may be affected. Our method
simulates different degrees of show-through in document images by adjusting the standard
deviation of Gaussian blur. Specifically, we used Gaussian blurs with standard deviations
of 0.5 and 1, respectively. Therefore, the maximum degradation supported by our method
is that of show-through document images with a Gaussian blur standard deviation of 0.5.
In the future, we will focus on addressing higher-level show-through documents.

6. Conclusions
In this paper, we propose a self-supervised-learning-based method named CDSR-
CycleGAN for removing the show-through effect in color document images. The method
utilizes unpaired image data for training, eliminating the need for extensive preparation of
synthesized show-through images. An innovative cyclic generative adversarial network
was introduced to oversee the mapping from the show-through domain to the non-show-
through domain. The network employs a two-stage structure and incorporates double-cycle
consistency loss and pseudo-similarity loss as constraints for the process of removal of
image show-through. Extensive experiments on both synthesized and real show-through
images demonstrate the effectiveness of the method in recovering show-through images
and outperforming several other approaches.

Author Contributions: Conceptualization, Z.L. and J.X.; methodology, Z.L.; software, M.N.; val-
idation, Z.L., M.N. and J.X.; formal analysis, Z.L.; investigation, Z.L.; resources, M.N. and J.X.;
data curation, Z.L.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L., J.X.
and M.N.; visualization, M.N.; supervision, M.N. and J.X.; project administration, M.N.; funding
acquisition, J.X. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported in part by the National Natural Science Foundation of China
under Grant 62072391 and Grant 62066013.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author, Zongbao Liang, at liangzong-
bao@s.ytu.edu.cn.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Chandio, A.A.; Asikuzzaman, M.; Pickering, M.R.; Leghari, M. Cursive text recognition in natural scene images using deep
convolutional recurrent neural network. IEEE Access 2022, 10, 10062–10078. [CrossRef]
Appl. Sci. 2024, 14, 4568 21 of 22

2. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial
nets. Adv. Neural Inf. Process. Syst. 2014, 2, 2672–2680.
3. Souibgui, M.A.; Kessentini, Y. DE-GAN: A conditional generative adversarial network for document enhancement. IEEE Trans.
Pattern Anal. Mach. Intell. 2020, 44, 1180–1191. [CrossRef] [PubMed]
4. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784.
5. Liu, W.; Hou, X.; Duan, J.; Qiu, G. End-to-end single image fog removal using enhanced cycle consistent adversarial networks.
IEEE Trans. Image Process. 2020, 29, 7819–7833. [CrossRef]
6. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In
Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232.
7. Lins, R.D.; Neto, M.G.; Neto, L.F.; Rosa, L.G. An environment for processing images of historical documents. Microprocess.
Microprogramming 1994, 40, 939–942. [CrossRef]
8. Sharma, G. Show-through cancellation in scans of duplex printed documents. IEEE Trans. Image Process. 2001, 10, 736–754.
[CrossRef] [PubMed]
9. Rowley-Brooke, R.; Pitié, F.; Kokaram, A. A ground truth bleed-through document image database. In Theory and Practice
of Digital Libraries: Second International Conference, TPDL 2012, Paphos, Cyprus, September 23–27, 2012. Proceedings 2; Springer:
Berlin/Heidelberg, Germany, 2012; pp. 185–196.
10. Moghaddam, R.F.; Cheriet, M. A variational approach to degraded document enhancement. IEEE Trans. Pattern Anal. Mach. Intell.
2009, 32, 1347–1361. [CrossRef] [PubMed]
11. He, S.; Schomaker, L. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognit. 2019,
91, 379–390. [CrossRef]
12. Hanif, M.; Tonazzini, A.; Hussain, S.F.; Khalil, A.; Habib, U. Restoration and content analysis of ancient manuscripts via color
space based segmentation. PLoS ONE 2023, 18, e0282142. [CrossRef] [PubMed]
13. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
14. Poddar, A.; Dey, S.; Jawanpuria, P.; Mukhopadhyay, J.; Kumar Biswas, P. TBM-GAN: Synthetic document generation with
degraded background. In Proceedings of the International Conference on Document Analysis and Recognition, San José, CA,
USA, 21–26 August 2023; pp. 366–383.
15. De, R.; Chakraborty, A.; Sarkar, R. Document image binarization using dual discriminator generative adversarial networks. IEEE
Signal Process. Lett. 2020, 27, 1090–1094. [CrossRef]
16. Suh, S.; Kim, J.; Lukowicz, P.; Lee, Y.O. Two-stage generative adversarial networks for binarization of color document images.
Pattern Recognit. 2022, 130, 108810. [CrossRef]
17. Lin, Y.-S.; Lin, T.-Y.; Chiang, J.-S.; Chen, C.-C. Binarization of color document image based on adversarial generative network
and discrete wavelet transform. In Proceedings of the 2022 IET International Conference on Engineering Technologies and
Applications (IET-ICETA), Changhua, Taiwan, 14–16 October 2022; pp. 1–2.
18. Ju, R.-Y.; Lin, Y.-S.; Chiang, J.-S.; Chen, C.-C.; Chen, W.-H.; Chien, C.-T. CCDWT-GAN: Generative adversarial networks based on
color channel using discrete wavelet transform for document image binarization. In Proceedings of the Pacific Rim International
Conference on Artificial Intelligence (PRICAI), Jakarta, Indonesia, 15–19 November 2023; pp. 186–198.
19. Zou, Z.; Lei, S.; Shi, T.; Shi, Z.; Ye, J. Deep adversarial decomposition: A unified framework for separating superimposed images.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 12806–12816.
20. Gangeh, M.J.; Plata, M.; Nezhad, H.R.M.; Duffy, N.P. End-to-end unsupervised document image blind denoising. In Proceedings
of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7888–7897.
21. Torbunov, D.; Huang, Y.; Yu, H.; Huang, J.; Yoo, S.; Lin, M.; Viren, B.; Ren, Y. Uvcgan: Unet vision transformer cycle-consistent
gan for unpaired image-to-image translation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 702–712.
22. Wu, S.; Dong, C.; Qiao, Y. Blind image restoration based on cycle-consistent network. IEEE Trans. Multimed. 2022, 25, 1111–1124.
[CrossRef]
23. Wang, Y.; Zhou, W.; Lu, Z.; Li, H. Udoc-gan: Unpaired document illumination correction with background light prior. In
Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 5074–5082.
24. Xu, J.D.; Ma, Y.L.; Liang, Z.B.; Ni, M.Y. Single bleed-through image restoration with self-supervised learning. Acta Autom. Sin.
2023, 49, 219–228.
25. Simonyan, K.; Zisserman, A. Very Deep convolutional networks for large-Scale image recognition. In Proceedings of the
International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–14.
26. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning
Representations (ICLR), San Diego, CA, USA, 7–9 May 2015.
27. Sauvola, J.; Kauniskangas, H. MediaTeam Document Database II, a CD-ROM Collection of Document Images; University of Oulu: Oulu,
Finland, 1999.
28. Huang, Z.; Chen, K.; He, J.; Bai, X.; Karatzas, D.; Lu, S.; Jawahar, C. Icdar2019 competition on scanned receipt ocr and information
extraction. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney,
Australia, 20–25 September 2019; pp. 1516–1520.
Appl. Sci. 2024, 14, 4568 22 of 22

29. Irish Script On Screen Project (2012). Available online: www.isos.dias.ie (accessed on 8 May 2024).
30. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern
Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369.
31. Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022;
pp. 17683–17693.
32. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Multi-stage progressive image restoration. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021;
pp. 14821–14831.
33. Soh, J.W.; Cho, N.I. Variational deep image restoration. IEEE Trans. Image Process. 2022, 31, 4363–4376. [CrossRef] [PubMed]
34. Hu, Q.; Guo, X. Trash or treasure? an interactive dual-stream strategy for single image reflection separation. Adv. Neural Inf.
Process. Syst. 2021, 34, 24683–24694.
35. Huang, Y.; Brown, M.S.; Xu, D. User assisted ink-bleed reduction. IEEE Trans. Image Process. 2010, 19, 2646–2658. [CrossRef]
[PubMed]
36. Rowley-Brooke, R.; Pitié, F.; Kokaram, A. A non-parametric framework for document bleed-through removal. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2954–2960.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like