Applsci 14 04568
Applsci 14 04568
sciences
Article
Removal of Color-Document Image Show-Through Based on
Self-Supervised Learning
Mengying Ni, Zongbao Liang * and Jindong Xu
School of Computer and Control Engineering, Yantai University, Yantai 264005, China;
nimengying@ytu.edu.cn (M.N.); xujindong@ytu.edu.cn (J.X.)
* Correspondence: author: liangzongbao@s.ytu.edu.cn
Abstract: Show-through phenomena have always been a challenging issue in color-document image
processing, which is widely used in various fields such as finance, education, and administration.
Existing methods for processing color-document images face challenges, including dealing with
double-sided documents with show-through effects, accurately distinguishing between foreground
and show-through parts, and addressing the issue of insufficient real image data for supervised
training. To overcome these challenges, this paper proposes a self-supervised-learning-based method
for removing show-through effects in color-document images. The proposed method utilizes a two-
stage-structured show-through-removal network that incorporates a double-cycle consistency loss
and a pseudo-similarity loss to effectively constrain the process of show-through removal. Moreover,
we constructed two datasets consisting of different show-through mixing ratios and conducted
extensive experiments to verify the effectiveness of the proposed method. Experimental results
demonstrate that the proposed method achieves competitive performance compared to state-of-
the-art methods and can effectively perform show-through removal without the need for paired
datasets. Specifically, the proposed method achieves an average PSNR of 33.85 dB on our datasets,
outperforming comparable methods by a margin of 0.89 dB.
Here, H (·) represents the transfer function, describing the transformations undergone
by the image during the acquisition and transmission processes, while n denotes the
presence of noise, representing the existence of random interference.
Due to the diverse causes of image degradation, different IR tasks are closely associated
with their corresponding transmission functions H (·) and noises n. For instance, in the task
of document image dehazing, the transfer function H (·) can represent optical properties,
color distortions, and other phenomena introduced during scanning, capturing, or other
image-acquisition methods. The noise term represents random interference introduced
during the processes of image acquisition and transmission. These transformations may
result in the occurrence of a “show-through” phenomenon in document images, making
text, graphics, colors, and other information visible through the pages or paper. Based on
the appearance and underlying mechanisms of the show-through images, the document
image show-through model can be expressed as follows:
I S = (1 − α) · I GT + α · ϕ(S) (2)
In this context, I S represents the mixed image with a show-through effect; I GT rep-
resents the foreground image (the front side of the document image); S represents the
background image (the back side of the document image); and α represents the blending ra-
tio of the show-through layer. Additionally, ϕ(·) denotes the bleeding-attenuation function.
Resolving Equation (2) without any prior knowledge leads to an infinite number of solu-
tions due to the unknown variables S, α, and ϕ(·), making the restoration of I GT from I S a
non-deterministic problem. Furthermore, the structure and attributes of the show-through
layer often resemble the content of the foreground image, posing challenges in removing
unnecessary information from the show-through layer while preserving the content and
details of the foreground image with reduced redundancy.
So far, many studies have adopted methods based on generative adversarial network
(GANs) [2] to restore document images or natural images. Souibgui et al. [3] proposed
a robust end-to-end framework called document-enhancement GAN, which effectively
restores severely degraded document images using conditional GAN [4]. However, these
methods still require the use of a large number of paired datasets to train the model for
good experimental results. It is difficult to find real, clear images that match the degraded
document images. In addition, Liu et al. [5] proposed a single-image dehazing method
based on cycle GAN (CycleGAN) [6], which uses unpaired hazy images for training and
constrains the process of haze removal through double-cycle consistency loss. This method
is suitable for removing fog images in natural scenes, but for document images, few people
adopt the approach of double-cycle consistency loss.
Therefore, the removal of show-through effects in document images faces several
challenges:
a. The foreground and show-through parts in document images are highly similar
because they both comprise document content. Without additional prior knowledge,
it is difficult to accurately recover the desired foreground images.
b. In real-world scenarios, there is a lack of authentic images that match the show-
through images, making supervised training infeasible. As a result, most document
image processing algorithms rely on synthetic training data. However, there ex-
ists a gap between synthetic and real-world data, which limits the performance of
these algorithms.
To address these challenges, this paper proposes a self-supervised-learning-based
method for removing show-through effects in document images. This method, called the
CycleGAN-based color document show-through-removal network (CDSR-CycleGAN),
eliminates show-through effects without requiring paired datasets. The network framework
adopts a two-stage structure and employs double-cycle consistency loss and pseudo-
similarity loss constraints to guide the process of removal of image show-through. Two
Appl. Sci. 2024, 14, 4568 3 of 22
datasets, each consisting of different show-through mixing ratios, are constructed for
training the document show-through-removal task.
The remaining parts of this article are organized as follows: Section 2 presents the
related work; Section 3 introduces the proposed method; Section 4 presents the experimental
results; Section 5 discusses the findings; and Section 6 concludes the paper.
2. Related Work
2.1. Show-through-Removal Method
Show-through refers to the presence of transparency or uneven materials in document
images during the scanning or capturing process, resulting in a visible transparency effect.
In 1994, Lins et al. [7] first described this “front-to-back interference” noise as “seen-
through.” Later, researchers referred to it as “show-through” or “bleed-through.” The initial
solution to this problem was proposed by Lins et al. [7], who suggested a “mirror-filtering”
strategy that scans both sides of the document, aligns the two images, and compares the
intensities of corresponding ink pixels. Sharma [8] initially analyzed the show-through
phenomenon based on physical principles and derived a simplified mathematical model.
Subsequently, they linearized the model and designed an adaptive linear filtering scheme
to eliminate the show-through effect. Rowley-Brooke et al. [9] introduced a database of
degraded document images affected by bleed-through and demonstrated its utility by
evaluating three non-blind bleed-through removal techniques against manually created
ground-truth foreground masks. Moghaddam et al. [10] segmented images from double-
sided ancient documents using a level-set framework and eliminated the influence of bleed-
through effects. The method comprises three key steps: binarization using adaptive global
thresholds, separation of interference patterns and genuine text using reverse diffusion
forces, and smoothing boundaries using small regularization forces. However, due to
variations in document image quality, the performance of traditional methods varies,
requiring a large number of empirical parameters to construct suitable solutions. He
et al. [11] proposed a novel DL-based document-enhancement-and-binarization model
(DeepOtsu), utilizing iterative refinement and superposition to achieve visually pleasing
results. Hanif et al. [12] proposed a document-restoration method aimed at eliminating
unwanted interfering degradation patterns from ancient manuscripts containing color.
2.2. GAN
GAN, proposed by Ian Goodfellow et al. [2] in 2014, is a deep learning (DL) [13] model
consisting of a generator and a discriminator. Its basic principle is to train the generator
and discriminator in an adversarial manner, allowing the generator to gradually generate
realistic samples. GAN has been widely applied in various fields such as images, audio,
and text due to its unique generation capabilities and capacity for unsupervised learning.
Poddar et al. [14] proposed TBM-GAN, a generative-adversarial-network framework
for synthesizing realistic handwritten documents with degraded backgrounds. De et al. [15]
developed a DL-based document image-binarization model employing dual discriminator
GAN and focal loss as the generator loss. Suh et al. [16] introduced a two-stage method
for color-document image enhancement and binarization using independent GANs. Lin
et al. [17] suggested a training method using GANs to address limited data availability.
Ju et al. [18] proposed CCDWT-GAN, a color channel-based discrete wavelet transform
GAN, for efficient extraction of text information from color degraded document images.
Zou et al. [19] introduced a method called “deep adversarial decomposition” for separating
individual layers of a single superimposed image. This method combines adversarial
training, a separation-critic network, and a crossroad L1 loss function to effectively handle
both linear and non-linear mixtures. However, previously developed GAN methods require
large amounts of paired datasets. Therefore, in this paper, we choose to use CycleGAN,
which does not require paired images.
Appl. Sci. 2024, 14, 4568 4 of 22
3. Proposed Methods
3.1. Overall Network Framework
The overall network framework of the self-supervised-learning-based color docu-
ment show-through-removal network, CDSR-CycleGAN, is illustrated in Figures 1 and 2.
CDSR-CycleGAN transforms the unpaired image show-through-removal problem into
an image-to-image generation problem, generating non-show-through images through a
self-supervised-learning cycle-consistency network. This section provides a detailed de-
scription of the network composition, which consists of two parts: a show-through-removal
framework and a show-through-generation framework.
In the diagram, X represents the input show-through image. De represents the show-
through-removal generator, and De( X ) represents the show-through-removal image. Y
represents the clear image. Re represents the show-through generator, and Re(Y ) represents
the synthetic show-through image. S represents the similarity network; S_De and S_Re
are, respectively, used to generate similarity non-show-through images and show-through
images. Dshow−through is the adversarial discriminator for classifying between the real
show-through image and the generated show-through image. Dnon−show−through is the
adversarial discriminator for distinguishing between the real non-show-through image
and the show-through-removal image. The light blue lines in the figure represent pseudo-
similarity loss and cycle-consistency loss, respectively.
Appl.
Appl. Sci. 2024, 14,
Sci. 2024, 14, 4568
x FOR PEER REVIEW 55of 23
of 22
Appl. Sci. 2024, 14, x FOR PEER REVIEW 5 of 23
Figure
Figure 3.
3. The
The network
network architecture
architecture of
of generator 𝐷𝑒.
generator De.
Both 𝐷𝑒
Both Deremove and De 𝐷𝑒predict take
take input
input images
images with show-through effects. effects. Firstly,
Firstly,
𝐷𝑒
Deremove undergoes
undergoes shallow
shallowfeature
featureextraction
extractiontotoobtain
obtainpreliminary
preliminaryfeatures.
features. Next,
Next, a triple
a tri-
feature attention module (TFAM) with multiple skip connections
ple feature attention module (TFAM) with multiple skip connections is applied achieve is applied to achieve
deeper feature
deeper feature attention.
attention. Subsequently,
Subsequently, further
further processing
processingisisconducted
conductedusing usingtwotwoconvo-
convo-
lutional networks and a long-skip global residual learning module.
lutional networks and a long-skip global residual learning module. Ultimately, an image Ultimately, an image
with the initial removal of the show-through effect
with the initial removal of the show-through effect is obtained. is obtained.
The TFAM
The TFAM network
network structure,
structure,illustrated
illustratedininFigure
Figure4,4,consists
consistsof of
three feature
three featureattention
atten-
tion modules (FAMs) and a convolutional layer. The FAM network structure, shown in5,
modules (FAMs) and a convolutional layer. The FAM network structure, shown in Figure
comprises
Figure local residual
5, comprises learning
local residualandlearning
channel-pixel attention (CPA)
and channel-pixel modules.
attention Local
(CPA) residual
modules.
learning allows less-important bleeding information to bypass
Local residual learning allows less-important bleeding information to bypass throughthrough multiple local
residual connections, while the main network focuses on effective foreground
multiple local residual connections, while the main network focuses on effective fore- information.
CPA retains
ground shallow information
information. CPA retainsand transfers
shallow it to deepand
information layers, allowing
transfers it toDe remove
deep to focus
layers, al-
more on
lowing 𝐷𝑒 recovering effective information such as show-through overlapping
to focus more on recovering effective information such as show-through regions, detail
Appl. Sci.Appl.
2024,Sci.
14,2024,
x FOR14, PEER
x FOR REVIEW textures, andregions,
PEER REVIEW
overlapping color accuracy. This not and
detail textures, onlycolor
enhances the restoration
accuracy. accuracy
This not only of Dethe
enhances remove 7 but
resto-of 723of 23
also reduces redundancy
ration accuracy of 𝐷𝑒 and improves efficiency.
but also reduces redundancy and improves efficiency.
𝐷𝑒
Figure 5. The network architecture of FAM.
network is used to predict and locate foreground information. Firstly,
Figure 5. The network architecture of FAM.
the receptive field is expanded through four sets of dilated convolutions and standard
The 𝐷𝑒
convolution networktoisenable
operations used to
thepredict
networkand locate foreground
to capture information.
more contextual Firstly,
information.
This enhances
the receptive fieldthe
is accuracy
expandedandthrough
precisionfour
of foreground prediction.
sets of dilated Next, the and
convolutions globalstandard
and
local features
convolution of the 𝐷𝑒
operations to enablenetwork are integrated
the network through
to capture long contextual
more paths to extract more
information.
robust features. Attention mechanisms are then employed to quickly capture
This enhances the accuracy and precision of foreground prediction. Next, the global andforeground
information hidden in complex backgrounds. First, a 1 × 1 convolutional layer com-
Figure 5. The network architecture of FAM.
(2) Generator Re
The generator Re (show-through-generation network) is another component of the
CDSR-CycleGAN model, and its network structure is shown in Figure 7. Its role is to
take in input images without the show-through effect and generate the corresponding
show-through images. Unlike the other generator De, which focuses on generating cycle
mappings, the generator Re specifically focuses on generating show-through mappings.
Re is the network used for show-through generation and consists of two sub-networks:
the foreground prediction network Repredict , which predicts and locates the foreground
information, and the background blur network Reblur , which performs blurring on the
background. Similar to Depredict , Repredict has a similar network structure but differs in the
way it reconstructs the image. Reblur has a network structure similar to Derefine but serves a
different purpose. Repredict combines the extracted foreground features with the original
input image to obtain a clear foreground image with known foreground information
positions. Reblur is used to blur the input image and provides a foundation for adding the
back-to-front interference effect. Subsequently, the show-through effect is simulated around
the predicted foreground information by further enhancing the realism of the image.
(2) Generator 𝑅𝑒
The generator 𝑅𝑒 (show-through-generation network) is another component of the
CDSR-CycleGAN model, and its network structure is shown in Figure 7. Its role is to take
Appl. Sci. 2024, 14, 4568
in input images without the show-through effect and generate the corresponding show-
8 of 22
through images. Unlike the other generator 𝐷𝑒, which focuses on generating cycle map-
pings, the generator 𝑅𝑒 specifically focuses on generating show-through mappings.
𝑅𝑒 is the Network
(3) Similarity network Sused for show-through generation and consists of two sub-net-
works: the foreground prediction network 𝑅𝑒 , which predicts and locates the fore-
The similarity network S is a crucial component of the CDSR-CycleGAN model. In
ground information, and the background blur network 𝑅𝑒 , which performs blurring
the absence of labeled values, this paper utilizes S to provide unsupervised similarity
on the background. Similar to 𝐷𝑒 , 𝑅𝑒 has a similar network structure but dif-
labels and compute pseudo-similarity loss. In practical applications, S is employed to
fers in the way it reconstructs the image. 𝑅𝑒 has a network structure similar to 𝐷𝑒
generate similarity images for both the De generator and the Re generator. Therefore, it
but serves a different purpose. 𝑅𝑒 combines the extracted foreground features with
is referred to as the similarity-De network S_De in the De architecture and the similarity-
Rethe originalS_Re
network inputin image
the Re to obtain a clear foreground image with known foreground in-
architecture.
formation positions. 𝑅𝑒 is used to blur
The architecture of S is illustrated the input
in Figure image
8, with and provides
a similar a foundation
application for Re asfor
adding the back-to-front interference effect. Subsequently, the show-through effect is
for De. Firstly, a 7 × 7 initial convolution operation is applied, followed by instance normal- sim-
ulatedand
ization around
ReLUthe predictedSubsequently,
activation. foreground information
downsampling by further enhancing
is performed twicethe realism
using 3 × 3of
the image.
convolution kernels, IN, and ReLU activation, gradually increasing the number of channels.
(3) Similarity
Then, Networkis𝑆conducted through nine CABs, as shown in Figure 6. Finally,
feature extraction
two upsampling operations
The similarity network are𝑆 performed
is a crucialusing deconvolution
component blocks, each consisting
of the CDSR-CycleGAN model.ofIn
3 ×absence
a the 3 deconvolution operation, IN, ReLU activation, and a gradual reduction in
of labeled values, this paper utilizes 𝑆 to provide unsupervised similaritychannel la-
numbers.
Appl. Sci. 2024, 14, x FOR PEER REVIEW Ultimately, the output of network S is generated using a 7 ×
bels and compute pseudo-similarity loss. In practical applications, 𝑆 is employedkernel
7 convolution to9 gen-
of 23
and mapped
erate to the
similarity images (−1,
rangefor 1) through
both the 𝐷𝑒 the tanh activation
generator and the function.
𝑅𝑒 generator. Therefore, it is
referred to as the similarity-𝐷𝑒 network 𝑆_𝐷𝑒 in the 𝐷𝑒 architecture and the similarity-
𝑅𝑒 network 𝑆_𝑅𝑒 in the 𝑅𝑒 architecture.
The architecture of 𝑆 is illustrated in Figure 8, with a similar application for 𝑅𝑒 as
for 𝐷𝑒. Firstly, a 7 × 7 initial convolution operation is applied, followed by instance nor-
malization and ReLU activation. Subsequently, downsampling is performed twice using
3 × 3 convolution kernels, IN, and ReLU activation, gradually increasing the number of
channels. Then, feature extraction is conducted through nine CABs, as shown in Figure 6.
Finally, two upsampling operations are performed using deconvolution blocks, each con-
sisting of a 3 × 3 deconvolution operation, IN, ReLU activation, and a gradual reduction
Figure
in 8.8.The
channel
Figure The networkarchitecture
numbers.
network architecture
Ultimately,ofof similarity
the output
similarity S. 𝑆. 𝑆 is generated using a 7 × 7 con-
network
of network
network
volution kernel and mapped to the range (−1, 1) through the tanh activation function.
Through
Through the
the combined
combined action
action of of De 𝐷𝑒
thethe generator,
generator, the Re 𝑅𝑒 generator,
thegenerator, and
and the the simi-
similarity
larity network
network 𝑆 , the CDSR-CycleGAN
S, the CDSR-CycleGAN modelhigh-quality
model achieves achieves high-quality
show-throughshow-through
removal and re-
moval and show-through
show-through generation. generation.
3.3.
3.3.Discriminative
DiscriminativeNetwork
Network
For
Forthe
theshow-through-removal
show-through-removal task, discriminatorplays
task, the discriminator playsaaroleroleinin identifying
identifying the
the differences
differences between
between thethe generated
generated show-through
show-through images
images and
and thethe real
real non-bleeding
non-bleeding re-
removal images. In the show-through-removal network, two discriminators
moval images. In the show-through-removal network, two discriminators with identical with identical
network
networkstructures
structuresareare
employed,
employed,as as
shown
shown in in
Figure 9. The
Figure discriminator
9. The discriminator 𝐷 −through is
Dshow
used
is used to differentiate between input show-through images and generated imageswith
to differentiate between input show-through images and generated images with
show-through,
show-through, while
while thethe
discriminator
discriminatorDnon
𝐷−show−through is used to distinguish
is used between
to distinguish the
between
non-show-through
the non-show-through imagesimages
generated by the generator
generated G and the
by the generator G input
and thenon-show-through
input non-show-
through images. We utilize five convolution blocks for classifying real and fake images,
where each convolution layer has a kernel size of 4 × 4; a stride of 2; and channel numbers
of 64, 128, 256, 512, and 1, from low to high. Within each convolution block, a convolution
layer, LReLU activation function, and IN normalization layer are sequentially applied.
For the show-through-removal task, the discriminator plays a role in identifying the
differences between the generated show-through images and the real non-bleeding re-
moval images. In the show-through-removal network, two discriminators with identical
network structures are employed, as shown in Figure 9. The discriminator 𝐷
Appl. Sci. 2024, 14, 4568
is used to differentiate between input show-through images and generated images9 of 22
with
show-through, while the discriminator 𝐷 is used to distinguish between
the non-show-through images generated by the generator G and the input non-show-
images. We utilize five convolution blocks for classifying real and fake images, where each
through images. We utilize five convolution blocks for classifying real and fake images,
convolution layer has a kernel size of 4 × 4; a stride of 2; and channel numbers of 64, 128,
where each convolution layer has a kernel size of 4 × 4; a stride of 2; and channel numbers
256, 512, and 1, from low to high. Within each convolution block, a convolution layer,
of 64, 128, 256, 512, and 1, from low to high. Within each convolution block, a convolution
LReLU activation function, and IN normalization layer are sequentially applied.
layer, LReLU activation function, and IN normalization layer are sequentially applied.
Figure9.9.The
Figure Thenetwork
networkarchitecture discriminatorD.𝐷.
architectureofofdiscriminator
3.4.
3.4.Loss
LossFunctions
Functions
In
Inthe
the CycleGAN network, this
CycleGAN network, thispaper
paperutilizes
utilizesmultiple
multiple loss
loss functions
functions to train
to train the the
net-
network; these aim to capture different objectives between the generator and discriminator.
work; these aim to capture different objectives between the generator and discriminator.
(1)
(1) Adversarial lossLℒadv
Adversarialloss
Adversarial
Adversariallosslosswas
wasused
usedforfor
adversarial
adversariallearning between
learning the generator
between the generatorand discrim-
and dis-
inator. The goal of the generator is to generate realistic fake images that the
criminator. The goal of the generator is to generate realistic fake images that the discrim- discriminator
cannot
inator distinguish from real
cannot distinguish images.
from On the other
real images. On the hand,
otherthe goalthe
hand, of the
goaldiscriminator is to
of the discrimina-
precisely differentiate between real and fake images.
tor is to precisely differentiate between real and fake images.
For
Forthe generatorsRe𝑅𝑒
thegenerators andandS and
𝑆 andthe the
discriminator Dshow
discriminator 𝐷−through , the ,adversarial loss
the adversarial
for generating
loss show-through
for generating show-throughimages can be
images defined
can as shown
be defined in Equation
as shown (3). This
in Equation (3). loss
This
encourages the generator to generate images that are more similar to the real
loss encourages the generator to generate images that are more similar to the real images, images, while
the discriminator
while aims toaims
the discriminator distinguish between
to distinguish the generated
between show-through
the generated show-throughimages and
images
real images that have not been subjected to show-through
and real images that have not been subjected to show-through removal. removal.
Lre−adv Re, S, Dshow−through , Y, X
h i
= EX ∼ pshow−through (X ) log Dshow−through ( X )
h i (3)
+EY ∼ pnon−show−through (Y ) log(1− Dshow−through ( Re(Y )))
h i
+EY ∼ pnon−show−through (Y ) log(1− Dshow−through (S( Re( Y ))) )
Lde−adv De, S, Dnon−show−through , X, Y
h i
= EY ∼ pnon−show−through (Y ) log Dnon−show−through (Y )
h i (4)
+EX ∼ pshow−through (X ) log(1− Dnon−show−through ( De( X )))
h i
+EX ∼ pshow−through (X ) log(1− Dnon−show−through (S( De( X ))))
In the aforementioned equations, De, S, and Re minimize the target values, while the
discriminators Dshow−through and Dnon−show−through attempt to maximize them. However,
when using the adversarial loss is used, some artifacts may appear in the generated results.
(2) Cycle consistency loss Lcycle
To better correct the bleeding images and preserve more details, we introduced a cy-
cle consistency loss [6] function for show-through images. The goal of this function is to
minimize the difference between the show-through image X and its reconstructed bleeding im-
ages Re(De(X )) and Re(S(De(X ))), as well as the difference between the non-show-through
image Y and its reconstructed non-show-through images De(Re(Y)) and De(S(Re(X ))) (as
illustrated in Figures 1 and 2). Through this loss function, the consistency of the images can
be maintained after they have been transformed in both directions. That is, an image that
has been transformed into a show-through image should be able to be restored back to the
original domain. This loss function helps to reduce information loss and maintain the content
of the images.
The cycle consistency loss from domain X to domain Y is defined as follows:
The overall definition of the cycle consistency loss is the sum of the losses from both
directions and is shown in Equation (7), as follows:
Lcycle ( Re, S, De) = Lcycle_de ( De, S, Re) + Lcycle_re ( Re, S, De) (7)
generated results. Because the similarity loss is minimized, the generated images become
more similar to the real images, leading to better transformation outcomes.
The pseudo-similarity loss is defined as shown in Equation (9):
L ps ( De, S, Re)
= EX ∼ pshow−through (X ) [||S( De( X )) − De( X )||1 ] (9)
+EY ∼ pnon−show−through (Y ) [||S( Re(Y )) − Re(Y )||1 ]
L perceptual = ∥δ( X ) − δ( Re( De( X )))∥22 + ∥δ(Y ) − δ( Re( De(Y )))∥22 (10)
Figure 10.Example
Figure10. Exampleofofsynthesized
synthesizedshow-through
show-throughimage.
image.
S-color
S-coloradopts a hybrid
adopts a hybridlinear show-through
linear show-through model, wherewhere
model, the show-through
the show-through component
com-
ofponent
S-colorofisS-color
achieved by flipping the image horizontally during the image-preprocessing
is achieved by flipping the image horizontally during the image-prepro-
stage.
cessingThestage.
specific
Theoperation
specific in Equationin(2)
operation is as follows:
Equation (2) is the flipped image
as follows: is subjected
the flipped imagetois
Gaussian blur using the function (·) , with a kernel size of 5 × 5 and a
subjected to Gaussian blur using the function 𝜙(·), with a kernel size of 5 × 5 and a se-
ϕ selectable standard
deviation of 1 or 0.5.deviation
lectable standard Additionally, a randomly
of 1 or selected parameter
0.5. Additionally, a randomly α with a value
selected between𝛼
parameter
0.1 and 0.2 is used to simulate a real bleeding phenomenon.
with a value between 0.1 and 0.2 is used to simulate a real bleeding phenomenon.
The
TheS-color1.0
S-color1.0andandS-color0.5
S-color0.5datasets
datasetseach consist
each of 1000
consist pairspairs
of 1000 of training images
of training and
images
100 pairs of testing images. Each dataset consists of show-through images and non-show-
and 100 pairs of testing images. Each dataset consists of show-through images and non-
through images, and there is no exact correspondence between these two pairs of images.
show-through images, and there is no exact correspondence between these two pairs of
This means that the model needs to learn how to transform images from one domain to
images. This means that the model needs to learn how to transform images from one do-
another, rather than simply performing a straightforward mapping.
main to another, rather than simply performing a straightforward mapping.
(2) Real dataset
(2) Real dataset
We evaluated the proposed method using the Media Team Document Database
We evaluated the proposed method using the Media Team Document Database
(MTDB) [27], which contains various types of document images such as address lists,
(MTDB) [27], which contains various types of document images such as address lists, ad-
advertisements, articles, business cards, music sheets, and maps. In our experiments, we
vertisements, articles, business cards, music sheets, and maps. In our experiments, we fo-
focused on 10 out of the 19 predefined categories in MTDB that, more than the others,
cused on 10 out of the 19 predefined categories in MTDB that, more than the others, con-
contain text documents exhibiting show-through phenomena. The scans from the MTDB
tain performed
were text documentsusingexhibiting
a Hewlettshow-through
Packard ScanJet phenomena.
scanner and TheDeskScan
scans from the MTDB
ll 2.3.1a were
software,
with color-adjustment parameters set to distance 0, direction 0, saturation 50, normal with
performed using a Hewlett Packard ScanJet scanner and DeskScan ll 2.3.1a software, em-
color-adjustment
phasis parameters
curve, resolution set totrue-color
of 300 dpi, distance 0, direction
mode, 0, saturation
no image sharpening,50, normal
and photoemphasis
style
with highlight and shadow values set to 125. We first randomly cropped images of size
256 × 256 from higher-resolution images (2000+ × 3000+ ) and then selected 200 images
with obvious show-through phenomena as our test set. The access link for MTDB is as
follows: https://github.com/liangzongbao/CDSR-CycleGAN/tree/main/data/MDTB
(25 May 2024).
The ICDAR-SROIE dataset [28] was created to support the ICDAR 2019 competition
and aims to evaluate the performance of scene text-recognition and information-extraction
algorithms on real business documents, specifically receipts and invoices. This dataset con-
sists of 1000 complete scanned receipt images, with each image containing approximately
four key text fields such as item names, unit prices, total costs, etc. The text primarily
consists of numbers and English characters, and some images may have issues like stains,
noise, and blurriness. In an approach similar to that used for MTDB, we first randomly
cropped images of size 256 × 256 from a subset of higher-resolution images, and then
selected 30 images with noticeable stains as our test set to validate the generalizability of
our proposed method in this paper.
The MS Táin Bó Cuailnge (MS) [9] is an ancient manuscript. It was created by Seosamh
Ó Longáin (1817–1880), one of the last professional Gaelic scribes. He produced numerous
Gaelic manuscripts, especially during the 19th century. We selected the first and thirteenth
Appl. Sci. 2024, 14, 4568 13 of 22
pages from the book, then cropped them to a size of 256 × 256 for testing. The URL is
https://www.isos.dias.ie/AUS1/AUS_MS_Tain_Bo_Cuailgne.html (22 April 2024).
The Bleed-Through Database (BTD) [9] is designed to be a resource for people working
in the field of digital document restoration and more specifically on the problem of bleed-
through degradation. It consists of a set of 25 registered recto–verso sample grayscale
image pairs taken from larger manuscript images with varied degrees of bleed-through.
The URL is https://www.isos.dias.ie/Sigmedia/Bleed_Through_Database.html (8 May
2024). Both the MS and BTD datasets can be accessed at https://www.isos.dias.ie (8 May
2024) [29].
(3) Evaluation metrics:
To quantitatively evaluate the network’s performance in removing show-through
effects, we used PSNR and SSIM [30] as performance metrics. For both PSNR and SSIM,
larger values indicate a closer resemblance to the ground truth and better removal of
show-through effects.
Quantitative analysis was conducted on benchmark datasets with ground truth. We
calculated FgError, the probability that a pixel in the foreground text is classified as back-
ground or translucent; BgError, the probability that a background or translucent pixel
is classified as foreground; and TotError, which is the average of FgError and BgError
weighted by the number of foreground pixels and background pixels, which are derived
from the corresponding ground-truth images. The Terr error indicates the probability of
misclassification of any pixel in the image. According to [9], these quality metrics are
defined as follows:
FgError = N1 ∑ | GT − BY |
GT ( Fg)
1
BgError = N ∑ | GT − BY | (13)
GT ( Bg)
1
TotError = N ∑ | GT − BY |
GT
where GT is the ground truth, BY is the binarized restoration result, and GT ( Fg) is the
foreground region only of the ground truth image; similarly, GT ( Bg) corresponds to the
background region only, and N is the number of pixels in the image.
Methods
Dataset Index S-Cycle
DAD DeepOtsu Uformer MPRNet VDIR YTMT Ours
GAN
PSNR 28.65 21.49 24.69 32.86 27.81 32.92 25.17 33.79
S-color0.5
SSIM 0.977 0.861 0.902 0.972 0.956 0.979 0.912 0.984
PSNR 28.87 21.48 24.68 32.98 27.80 32.99 25.40 33.91
S-color1.0
SSIM 0.980 0.861 0.914 0.981 0.958 0.983 0.948 0.989
PSNR 28.76 21.49 24.69 32.92 27.81 32.96 25.29 33.85
Average
SSIM 0.979 0.861 0.908 0.977 0.957 0.981 0.930 0.987
S-color0.5 PSNR 28.65 21.49 24.69 32.86 27.81 32.92 25.17 33.79
S-color0.5 SSIM 0.977 0.861 0.902 0.972 0.956 0.979 0.912 0.984
SSIM 0.977 0.861 0.902 0.972 0.956 0.979 0.912 0.984
PSNR 28.87 21.48 24.68 32.98 27.80 32.99 25.40 33.91
S-color1.0 PSNR 28.87 21.48 24.68 32.98 27.80 32.99 25.40 33.91
S-color1.0 SSIM 0.980 0.861 0.914 0.981 0.958 0.983 0.948 0.989
SSIM 0.980 0.861 0.914 0.981 0.958 0.983 0.948 0.989
PSNR
Appl. Sci. 2024, 14, 4568 28.76 21.49 24.69 32.92 27.81 32.96 25.29 14 of 33.85
22
Average PSNR 28.76 21.49 24.69 32.92 27.81 32.96 25.29 33.85
Average SSIM 0.979 0.861 0.908 0.977 0.957 0.981 0.930 0.987
SSIM 0.979 0.861 0.908 0.977 0.957 0.981 0.930 0.987
Figure 11.11.
Figure Qualitative
Qualitativeanalysis
analysis of
of S-color0.5 dataset.
S-color0.5 dataset.
Figure 11. Qualitative analysis of S-color0.5 dataset.
As shown in Table 1, on the synthetic datasets, the proposed method achieved higher
average SSIM and PSNR (dB) values than the compared methods. Figures 11 and 12 respec-
tively present the restoration results of a sample image from the S-color0.5 and S-color1.0
synthetic datasets. It is observed that both the DAD and DeepOtsu methods removed the
show-through effects, but the result of the DeepOtsu method suffered from color distortion,
while the DAD method resulted in color deviation. The results of the Uformer and YTMT
methods still contained a significant number of show-through components. The MPR-Net,
VDIR, and S-CycleGAN methods all removed the show-through effects to some extent,
but some residual show-through components remained. In contrast, the proposed method
completely removed the show-through effects and achieved better restoration results.
Figure 13.13.
Figure Qualitative
Qualitativeanalysis
analysis of MTDB
MTDBdataset.
dataset.
Figure14
Figure 14 displays
displays ananexample
example restoration of a sample
restoration with stains
of a sample within the ICDAR-SROIE
stains in the ICDAR-
dataset. It can be seen that the DAD and S-CycleGAN methods and our proposed method
SROIE dataset. It can be seen that the DAD and S-CycleGAN methods and our proposed
yielded good restoration results, while the DeepOtsu, Uformer, MPR-Net, VDIR, and YTMT
method
methodsyielded goodimages
all yielded restoration results,
with some whilestains,
residual the DeepOtsu, Uformer,
failing to produce clearMPR-Net,
images. VDIR,
and YTMT
Figure methods
15 shows allthe
yielded
resultsimages with some
of show-through residual
removal on stains, failing
MS books. to produce
Although there clear
images.
are still some remnants of show-through distortion, the use of our method has made the
Appl. Sci. 2024, 14, x FOR PEER REVIEWimage clearer and more readable. This also confirms the applicability of our method to the 16 of 23
digitization of ancient books.
Figure
Figure 14.14. Qualitativeanalysis
Qualitative analysis of
of ICDAR-SROIE
ICDAR-SROIEdataset.
dataset.
Figure15.
Figure 15.Qualitative
Qualitativeanalysis
analysisofofMS
MSdataset.
dataset.
4.5.
4.5.Quantitative
Quantitativeand
andQualitative
QualitativeAnalysis
Analysison onthe
theBleed-through
Bleed-throughDatabase
Database
This
This section utilizes the BTD dataset with ground truthfor
section utilizes the BTD dataset with ground truth forboth
bothquantitative
quantitativeand and
qualitative
qualitative analysis of our proposed method and existing methods, includingHua
analysis of our proposed method and existing methods, including Hua[35],[35],
Mog
Mog[10],
[10],and
andRo2
Ro2[36].
[36].Figure
Figure1616displays
displaysthe therestoration
restorationresults
resultsfor
forthe
thedataset
datasetnamed
named
NLI.MSG311.265/6.
NLI.MSG311.265/6.ItItcan canbebeobserved
observed that,
that,compared
compared to other methods,
to other methods, our our
proposed
proposed ap-
proach effectively removes most of the show-through while preserving the
approach effectively removes most of the show-through while preserving the foreground. foreground. Hua
performs well inwell
Hua performs handling dark, translucent
in handling areas inareas
dark, translucent isolation but often
in isolation butdeletes foreground
often deletes fore-
text in overlapping image regions, thereby reducing readability. Mog retains
ground text in overlapping image regions, thereby reducing readability. Mog retains fore- foreground
information
Appl. Sci. 2024, 14, x FOR PEER REVIEW but struggles
ground information with dark,
but struggles translucent
with regions, leaving
dark, translucent regions,visible
leaving show-through.
17 show-
visible of 23
Ro2 generally preserves foreground information well but still results
through. Ro2 generally preserves foreground information well but still results in some residual
in some
show-through in most cases.
residual show-through Tablecases.
in most 2 presents
Tablethe quantitative
2 presents the analysis.
quantitativeOuranalysis.
method per-Our
forms the best across all three metrics, indicating superior performance in show-through
method performs the best across all three metrics, indicating superior performance in
removal. Due to the Hua method removing a significant amount of foreground text while
show-through removal. Due to the Hua method removing a significant amount of fore-
eliminating show-through, it has the highest average FgError. The results of the Mog and
ground text while eliminating show-through, it has the highest average FgError. The re-
Ro2 methods exhibit some residual show-through.
sults of the Mog and Ro2 methods exhibit some residual show-through.
Methods
Index
Hua Mog Ro2 Ours
FgError 0.2308 0.0746 0.0696 0.0633
BgError 0.0012 0.0148 0.0085 0.0008
TotError 0.0413 0.0244 0.0196 0.0160
Appl. Sci. 2024, 14, 4568 17 of 22
Table 2. Analysis of mean error probabilities for the entire BTD dataset.
Methods
Index
Hua Mog Ro2 Ours
FgError 0.2308 0.0746 0.0696 0.0633
BgError 0.0012 0.0148 0.0085 0.0008
TotError 0.0413 0.0244 0.0196 0.0160
Table 3. Analysis of average runtime (seconds) using images sized 256 × 256.
Methods
Index w/o w/o w/o CDSR-
w/o Depredict w/o Deremove w/o Repredict w/o CAB
S Lps Lcycle2 CycleGAN
PSNR 32.52 33.73 29.87 33.75 32.94 30.45 33.71 33.79
SSIM 0.978 0.981 0.969 0.983 0.982 0.974 0.983 0.984
From observations of the last three columns of Table 4, it can be seen that there
is a decline in performance when the Lcycle2 and L ps are replaced. This confirms their
advantages in improving the quality of restoration results. By combining all the aforemen-
tioned components, our proposed method achieved the best performance, with a PSNR of
Appl. Sci. 2024, 14, 4568 18 of 22
33.79 dB. Therefore, each key component considered during the ablation process has made
its own contribution.
Thefirst
The firstrow
rowofofthetheimage
imageininFigure
Figure17 17isisused
usedasasan anexample
examplefor forevaluation
evaluationof ofthe
the
proposed method and comparison with other methods. The OCR-recognition
proposed method and comparison with other methods. The OCR-recognition comparison comparison
results,asasshown
results, shown in in Figure
Figure 18, 18, consist
consist of three
of three rows:rows: the row
the first firstpresents
row presents the experi-
the experimental
result image, the second row displays the recognition result visualization, and theand
mental result image, the second row displays the recognition result visualization, the
third
third row shows the output text result. The results of the DAD
row shows the output text result. The results of the DAD and DeepOtsu methods are and DeepOtsu methods
are missing
missing somesome text content,
text content, while while
YTMT YTMT incorrectly
incorrectly recognized
recognized the show-through
the show-through con-
content
tent
as theascorrect
the correct
text. text. A comparison
A comparison reveals
reveals thethe significant
significant effectivenessofofthe
effectiveness theproposed
proposed
algorithmin
algorithm ineliminating
eliminatingdocument
documentshow-through.
show-through.By Byapplying
applyingthis thisalgorithm
algorithmto toprocess
process
theexperimental
the experimentalresultresultimages,
images,the thedocument
documentshow-through
show-througheffect effectcan
canbebesuccessfully
successfully
eliminated, greatly
eliminated, greatly improving the thereadability
readabilityand andaccuracy
accuracy of of
thethe
images.
images.Taking the sam-
Taking the
ple in Figure 16 as an example, incomplete data from the top row and right side were not
included in the evaluation. Table 5 presents a quantitative analysis of OCR recognition
based on counts of the number of words and characters and the number of correct words
and characters. The number of words in the ground truth is 65, and the number of char-
acters is 303, so the closer the measured quantities are to these values, the better the recog-
Appl. Sci. 2024, 14, 4568 19 of 22
sample in Figure 16 as an example, incomplete data from the top row and right side were
not included in the evaluation. Table 5 presents a quantitative analysis of OCR recognition
based on counts of the number of words and characters and the number of correct words
and characters. The number of words in the ground truth is 65, and the number of characters
is 303, so the closer the measured quantities are to these values, the better the recognition
performance. In terms of word count, some adjacent words are recognized as a single
word due to being too close together; for example, “a bag to” is recognized as “abagto”,
which reduces the word count. In terms of characters, it was observed that “n”, “r”, and
“c” are prone to recognition errors. Specifically, the proposed method demonstrates20better
Appl. Sci. 2024, 14, x FOR PEER REVIEW of 23
recognition accuracy compared to several other methods.
of applying this algorithm in practical engineering will be explored. For example, in fields
such as digital archive management, print quality control, and image post-processing, this
algorithm can provide a reliable solution for relevant applications. Eliminating the document
show-through effect can allow clearer and more accurate images to be obtained, thereby
enhancing the efficiency and accuracy of subsequent processing tasks.
5. Discussion
One limitation of our proposed method is that CDSR-CycleGAN cannot fully restore
complex backgrounds in real images. For instance, in the MTDB dataset, although our
method successfully removes the show-through effect, it does not perfectly preserve the
original appearance of the images. This may impact the visual experience, despite being
advantageous for document image recognition. The reason lies in the inability to accurately
describe real show-through images using existing synthesized data in such scenarios. In
future research, we plan to explore new methods to address this issue and create diverse
show-through datasets to train our network and improve upon this limitation.
Another limitation is that this paper employ synthetic data as the training set; therefore,
its capability to handle the maximum show-through level may be affected. Our method
simulates different degrees of show-through in document images by adjusting the standard
deviation of Gaussian blur. Specifically, we used Gaussian blurs with standard deviations
of 0.5 and 1, respectively. Therefore, the maximum degradation supported by our method
is that of show-through document images with a Gaussian blur standard deviation of 0.5.
In the future, we will focus on addressing higher-level show-through documents.
6. Conclusions
In this paper, we propose a self-supervised-learning-based method named CDSR-
CycleGAN for removing the show-through effect in color document images. The method
utilizes unpaired image data for training, eliminating the need for extensive preparation of
synthesized show-through images. An innovative cyclic generative adversarial network
was introduced to oversee the mapping from the show-through domain to the non-show-
through domain. The network employs a two-stage structure and incorporates double-cycle
consistency loss and pseudo-similarity loss as constraints for the process of removal of
image show-through. Extensive experiments on both synthesized and real show-through
images demonstrate the effectiveness of the method in recovering show-through images
and outperforming several other approaches.
Author Contributions: Conceptualization, Z.L. and J.X.; methodology, Z.L.; software, M.N.; val-
idation, Z.L., M.N. and J.X.; formal analysis, Z.L.; investigation, Z.L.; resources, M.N. and J.X.;
data curation, Z.L.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L., J.X.
and M.N.; visualization, M.N.; supervision, M.N. and J.X.; project administration, M.N.; funding
acquisition, J.X. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported in part by the National Natural Science Foundation of China
under Grant 62072391 and Grant 62066013.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The original contributions presented in the study are included in the
article, further inquiries can be directed to the corresponding author, Zongbao Liang, at liangzong-
bao@s.ytu.edu.cn.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Chandio, A.A.; Asikuzzaman, M.; Pickering, M.R.; Leghari, M. Cursive text recognition in natural scene images using deep
convolutional recurrent neural network. IEEE Access 2022, 10, 10062–10078. [CrossRef]
Appl. Sci. 2024, 14, 4568 21 of 22
2. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial
nets. Adv. Neural Inf. Process. Syst. 2014, 2, 2672–2680.
3. Souibgui, M.A.; Kessentini, Y. DE-GAN: A conditional generative adversarial network for document enhancement. IEEE Trans.
Pattern Anal. Mach. Intell. 2020, 44, 1180–1191. [CrossRef] [PubMed]
4. Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784.
5. Liu, W.; Hou, X.; Duan, J.; Qiu, G. End-to-end single image fog removal using enhanced cycle consistent adversarial networks.
IEEE Trans. Image Process. 2020, 29, 7819–7833. [CrossRef]
6. Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In
Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232.
7. Lins, R.D.; Neto, M.G.; Neto, L.F.; Rosa, L.G. An environment for processing images of historical documents. Microprocess.
Microprogramming 1994, 40, 939–942. [CrossRef]
8. Sharma, G. Show-through cancellation in scans of duplex printed documents. IEEE Trans. Image Process. 2001, 10, 736–754.
[CrossRef] [PubMed]
9. Rowley-Brooke, R.; Pitié, F.; Kokaram, A. A ground truth bleed-through document image database. In Theory and Practice
of Digital Libraries: Second International Conference, TPDL 2012, Paphos, Cyprus, September 23–27, 2012. Proceedings 2; Springer:
Berlin/Heidelberg, Germany, 2012; pp. 185–196.
10. Moghaddam, R.F.; Cheriet, M. A variational approach to degraded document enhancement. IEEE Trans. Pattern Anal. Mach. Intell.
2009, 32, 1347–1361. [CrossRef] [PubMed]
11. He, S.; Schomaker, L. DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognit. 2019,
91, 379–390. [CrossRef]
12. Hanif, M.; Tonazzini, A.; Hussain, S.F.; Khalil, A.; Habib, U. Restoration and content analysis of ancient manuscripts via color
space based segmentation. PLoS ONE 2023, 18, e0282142. [CrossRef] [PubMed]
13. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
14. Poddar, A.; Dey, S.; Jawanpuria, P.; Mukhopadhyay, J.; Kumar Biswas, P. TBM-GAN: Synthetic document generation with
degraded background. In Proceedings of the International Conference on Document Analysis and Recognition, San José, CA,
USA, 21–26 August 2023; pp. 366–383.
15. De, R.; Chakraborty, A.; Sarkar, R. Document image binarization using dual discriminator generative adversarial networks. IEEE
Signal Process. Lett. 2020, 27, 1090–1094. [CrossRef]
16. Suh, S.; Kim, J.; Lukowicz, P.; Lee, Y.O. Two-stage generative adversarial networks for binarization of color document images.
Pattern Recognit. 2022, 130, 108810. [CrossRef]
17. Lin, Y.-S.; Lin, T.-Y.; Chiang, J.-S.; Chen, C.-C. Binarization of color document image based on adversarial generative network
and discrete wavelet transform. In Proceedings of the 2022 IET International Conference on Engineering Technologies and
Applications (IET-ICETA), Changhua, Taiwan, 14–16 October 2022; pp. 1–2.
18. Ju, R.-Y.; Lin, Y.-S.; Chiang, J.-S.; Chen, C.-C.; Chen, W.-H.; Chien, C.-T. CCDWT-GAN: Generative adversarial networks based on
color channel using discrete wavelet transform for document image binarization. In Proceedings of the Pacific Rim International
Conference on Artificial Intelligence (PRICAI), Jakarta, Indonesia, 15–19 November 2023; pp. 186–198.
19. Zou, Z.; Lei, S.; Shi, T.; Shi, Z.; Ye, J. Deep adversarial decomposition: A unified framework for separating superimposed images.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 12806–12816.
20. Gangeh, M.J.; Plata, M.; Nezhad, H.R.M.; Duffy, N.P. End-to-end unsupervised document image blind denoising. In Proceedings
of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 7888–7897.
21. Torbunov, D.; Huang, Y.; Yu, H.; Huang, J.; Yoo, S.; Lin, M.; Viren, B.; Ren, Y. Uvcgan: Unet vision transformer cycle-consistent
gan for unpaired image-to-image translation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer
Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 702–712.
22. Wu, S.; Dong, C.; Qiao, Y. Blind image restoration based on cycle-consistent network. IEEE Trans. Multimed. 2022, 25, 1111–1124.
[CrossRef]
23. Wang, Y.; Zhou, W.; Lu, Z.; Li, H. Udoc-gan: Unpaired document illumination correction with background light prior. In
Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 5074–5082.
24. Xu, J.D.; Ma, Y.L.; Liang, Z.B.; Ni, M.Y. Single bleed-through image restoration with self-supervised learning. Acta Autom. Sin.
2023, 49, 219–228.
25. Simonyan, K.; Zisserman, A. Very Deep convolutional networks for large-Scale image recognition. In Proceedings of the
International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 1–14.
26. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning
Representations (ICLR), San Diego, CA, USA, 7–9 May 2015.
27. Sauvola, J.; Kauniskangas, H. MediaTeam Document Database II, a CD-ROM Collection of Document Images; University of Oulu: Oulu,
Finland, 1999.
28. Huang, Z.; Chen, K.; He, J.; Bai, X.; Karatzas, D.; Lu, S.; Jawahar, C. Icdar2019 competition on scanned receipt ocr and information
extraction. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney,
Australia, 20–25 September 2019; pp. 1516–1520.
Appl. Sci. 2024, 14, 4568 22 of 22
29. Irish Script On Screen Project (2012). Available online: www.isos.dias.ie (accessed on 8 May 2024).
30. Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern
Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369.
31. Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022;
pp. 17683–17693.
32. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Multi-stage progressive image restoration. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021;
pp. 14821–14831.
33. Soh, J.W.; Cho, N.I. Variational deep image restoration. IEEE Trans. Image Process. 2022, 31, 4363–4376. [CrossRef] [PubMed]
34. Hu, Q.; Guo, X. Trash or treasure? an interactive dual-stream strategy for single image reflection separation. Adv. Neural Inf.
Process. Syst. 2021, 34, 24683–24694.
35. Huang, Y.; Brown, M.S.; Xu, D. User assisted ink-bleed reduction. IEEE Trans. Image Process. 2010, 19, 2646–2658. [CrossRef]
[PubMed]
36. Rowley-Brooke, R.; Pitié, F.; Kokaram, A. A non-parametric framework for document bleed-through removal. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2954–2960.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.