[go: up one dir, main page]

0% found this document useful (0 votes)
24 views15 pages

FFDNet

The article presents FFDNet, a fast and flexible convolutional neural network for image denoising that utilizes a tunable noise level map to effectively handle various noise levels and spatially variant noise. FFDNet outperforms existing state-of-the-art denoising methods in terms of efficiency and denoising performance, making it suitable for practical applications. The proposed model is evaluated through extensive experiments on both synthetic and real noisy images, demonstrating its effectiveness and potential in the field of image processing.

Uploaded by

May Thet Tun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views15 pages

FFDNet

The article presents FFDNet, a fast and flexible convolutional neural network for image denoising that utilizes a tunable noise level map to effectively handle various noise levels and spatially variant noise. FFDNet outperforms existing state-of-the-art denoising methods in terms of efficiency and denoising performance, making it suitable for practical applications. The proposed model is evaluated through extensive experiments on both synthetic and real noisy images, demonstrating its effectiveness and potential in the field of image processing.

Uploaded by

May Thet Tun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

FFDNet: Toward a Fast and Flexible Solution for


CNN based Image Denoising
Kai Zhang, Wangmeng Zuo, Senior Member, IEEE, and Lei Zhang, Fellow, IEEE

Abstract—Due to the fast inference and good performance, additive white Gaussian noise (AWGN) and the noise level is
discriminative learning methods have been widely studied in given. In order to handle practical image denoising problems,
image denoising. However, these methods mostly learn a specific a flexible image denoiser is expected to have the following
model for each noise level, and require multiple models for
denoising images with different noise levels. They also lack desirable properties: (i) it is able to perform denoising using
flexibility to deal with spatially variant noise, limiting their a single model; (ii) it is efficient, effective and user-friendly;
applications in practical denoising. To address these issues, and (iii) it can handle spatially variant noise. Such a denoiser
we present a fast and flexible denoising convolutional neural can be directly deployed to recover the clean image when the
network, namely FFDNet, with a tunable noise level map as noise level is known or can be well estimated. When the noise
the input. The proposed FFDNet works on downsampled sub-
images, achieving a good trade-off between inference speed and level is unknown or is difficult to estimate, the denoiser should
denoising performance. In contrast to the existing discriminative allow the user to adaptively control the trade-off between noise
denoisers, FFDNet enjoys several desirable properties, including reduction and detail preservation. Furthermore, the noise can
(i) the ability to handle a wide range of noise levels (i.e., [0, be spatially variant and the denoiser should be flexible enough
75]) effectively with a single network, (ii) the ability to remove to handle spatially variant noise.
spatially variant noise by specifying a non-uniform noise level
map, and (iii) faster speed than benchmark BM3D even on CPU However, state-of-the-art image denoising methods are still
without sacrificing denoising performance. Extensive experiments limited in flexibility or efficiency. In general, image denoising
on synthetic and real noisy images are conducted to evaluate methods can be grouped into two major categories, model-
FFDNet in comparison with state-of-the-art denoisers. The results based methods and discriminative learning based ones. Model-
show that FFDNet is effective and efficient, making it highly based methods such as BM3D [11] and WNNM [5] are
attractive for practical denoising applications.
flexible in handling denoising problems with various noise
Index Terms—Image denoising, convolutional neural networks, levels, but they suffer from several drawbacks. For example,
Gaussian noise, spatially variant noise their optimization algorithms are generally time-consuming,
and cannot be directly used to remove spatially variant noise.
I. I NTRODUCTION Moreover, model-based methods usually employ hand-crafted
image priors (e.g., sparsity [14], [15] and nonlocal self-
T HE importance of image denoising in low level vision can
be revealed from many aspects. First, noise corruption is
inevitable during the image sensing process and it may heavily
similarity [12], [13], [16]), which may not be strong enough
to characterize complex image structures.
degrade the visual quality of acquired image. Removing noise As an alternative, discriminative denoising methods aim
from the observed image is an essential step in various image to learn the underlying image prior and fast inference from
processing and computer vision tasks [1], [2]. Second, from a training set of degraded and ground-truth image pairs.
the Bayesian perspective, image denoising is an ideal test bed One approach is to learn stage-wise image priors in the
for evaluating image prior models and optimization method- context of truncated inference procedure [17]. Another more
s [3], [4], [5]. Last but not least, in the unrolled inference via popular approach is plain discriminative learning, such as the
variable splitting techniques, many image restoration problems MLP [18] and convolutional neural network (CNN) based
can be addressed by sequentially solving a series of denoising methods [19], [20], among which the DnCNN [20] method has
subproblems, which further broadens the application fields of achieved very competitive denoising performance. The success
image denoising [6], [7], [8], [9]. of CNN for image denoising is attributed to its large modeling
As in many previous literature of image denoising [10], capacity and tremendous advances in network training and
[11], [12], [13], in this paper we assume that the noise is design. However, existing discriminative denoising methods
are limited in flexibility, and the learned model is usually
This project is partially supported by the National Natural Scientific tailored to a specific noise level. From the perspective of
Foundation of China (NSFC) under Grant No. 61671182 and 61471146, and
the HK RGC GRF grant (under no. PolyU 152124/15E). (Corresponding regression, they aim to learn a mapping function x = F(y; Θσ )
author: Wangmeng Zuo.) between the input noisy observation y and the desired output
K. Zhang is with the School of Computer Science and Technology, Harbin x. The model parameters Θσ are trained for noisy images
Institute of Technology, Harbin 150001, China, and also with the Department
of Computing, The Hong Kong Polytechnic University, Hong Kong (e-mail: corrupted by AWGN with a fixed noise level σ, while the
cskaizhang@gmail.com). trained model with Θσ is hard to be directly deployed to
W. Zuo is with the School of Computer Science and Technology, Harbin In- images with other noise levels. Though a single CNN model
stitute of Technology, Harbin 150001, China (e-mail: cswmzuo@gmail.com).
L. Zhang is with the Department of Computing, The Hong Kong Polytech- (i.e., DnCNN-B) is trained in [20] for Gaussian denoising, it
nic University, Hong Kong (e-mail: cslzhang@comp.polyu.edu.hk). does not generalize well to real noisy images and works only

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

if the noise level is in the preset range, e.g., [0, 55]. Besides, II. R ELATED W ORK
all the existing discriminative learning based methods lack In this section, we briefly review and discuss the two major
flexibility to deal with spatially variant noise. categories of relevant methods to this work, i.e., maximum a
To overcome the drawbacks of existing CNN based denois- posteriori (MAP) inference guided discriminative learning and
ing methods, we present a fast and flexible denoising convo- plain discriminative learning.
lutional neural network (FFDNet). Specifically, our FFDNet
is formulated as x = F(y, M; Θ), where M is a noise level A. MAP Inference Guided Discriminative Learning
map. In the DnCNN model x = F(y; Θσ ), the parameters Θσ
Instead of first learning the prior and then performing the
vary with the change of noise level σ, while in the FFDNet
inference, this category of methods aims to learn the prior
model, the noise level map is modeled as an input and the
parameters along with a compact unrolled inference through
model parameters Θ are invariant to noise level. Thus, FFDNet
minimizing a loss function [21]. Following the pioneer work
provides a flexible way to handle different noise levels with a
of fields of experts [3], Barbu [21] trained a discriminative
single network.
Markov random field (MRF) model together with a gradient
By introducing a noise level map as input, it is natural to
descent inference for image denoising. Samuel and Tap-
expect that the model performs well when the noise level map
pen [22] independently proposed a compact gradient descent
matches the ground-truth one of noisy input. Furthermore, the
inference learning framework, and discussed the advantages of
noise level map should also play the role of controlling the
discriminative learning over model-based optimization method
trade-off between noise reduction and detail preservation. It is
with MRF prior. Sun and Tappen [23] proposed a novel
found that heavy visual quality degradation may be engendered
nonlocal range MRF (NLR-MRF) framework, and employed
when setting a larger noise level to smooth out the details.
the gradient-based discriminative learning method to train the
We highlight this problem and adopt a method of orthogonal
model. Generally speaking, the methods above only learn
initialization on convolutional filters to alleviate this. Besides,
the prior parameters in a discriminative manner, while the
the proposed FFDNet works on downsampled sub-images,
inference parameters are stage-invariant.
which largely accelerates the training and testing speed, and
With the aid of unrolled half quadratic splitting (HQS)
enlarges the receptive field as well.
techniques, Schmidt et al. [24], [25] proposed a cascade of
Using images corrupted by AWGN, we quantitatively com- shrinkage fields (CSF) framework to learn stage-wise inference
pare FFDNet with state-of-the-art denoising methods, includ- parameters. Chen et al. [17] further proposed a trainable non-
ing model-based methods such as BM3D [11] and WNNM [5] linear reaction diffusion (TNRD) model through discriminative
and discriminative learning based methods such as TNRD [17] learning of a compact gradient descent inference step. Recent-
and DnCNN [20]. The results clearly demonstrate the supe- ly, Lefkimmiatis [26] and Qiao et al. [27] adopted a proximal
riority of FFDNet in terms of both denoising performance gradient-based denoising inference from a variational model to
and computational efficiency. In addition, FFDNet performs incorporate the nonlocal self-similarity prior. It is worth noting
favorably on images corrupted by spatially variant AWGN. that, apart from MAP inference, Vemulapalli et al. [28] derived
We further evaluate FFDNet on real-world noisy images, an end-to-end trainable patch-based denoising network based
where the noise is often signal-dependent, non-Gaussian and on Gaussian Conditional Random Field (GCRF) inference.
spatially variant. The proposed FFDNet model still achieves MAP inference guided discriminative learning usually re-
perceptually convincing results by setting proper noise level quires much fewer inference steps, and is very efficient in
maps. Overall, FFDNet enjoys high potentials for practical image denoising. It also has clear interpretability because
denoising applications. the discriminative architecture is derived from optimization
The main contribution of our work is summarized as fol- algorithms such as HQS and gradient descent [17], [21], [22],
lows: [23], [24]. However, the learned priors and inference procedure
• A fast and flexible denoising network, namely FFDNet, is are limited by the form of MAP model [25], and generally
proposed for discriminative image denoising. By taking a perform inferior to the state-of-the-art CNN-based denoisers.
tunable noise level map as input, a single FFDNet is able For example, the inference of CSF [24] is not very flexible
to deal with noise on different levels, as well as spatially since it is strictly derived from the HQS optimization under
variant noise. the field of experts (FoE) framework. The capacity of FoE is
• We highlight the importance to guarantee the role of the however not large enough to fully characterize image priors,
noise level map in controlling the trade-off between noise which in turn makes CSF less effective. For these reasons,
reduction and detail preservation. Kruse et al. [29] generalized CSF for better performance by
• FFDNet exhibits perceptually appealing results on both replacing some modular parts of unrolled inference with more
synthetic noisy images corrupted by AWGN and real- powerful CNN.
world noisy images, demonstrating its potential for prac-
tical image denoising. B. Plain Discriminative Learning
The remainder of this paper is organized as follows. Sec. II Instead of modeling image priors explicitly, the plain dis-
reviews existing discriminative denoising methods. Sec. III criminative learning methods learn a direct mapping function
presents the proposed image denoising model. Sec. IV reports to model image prior implicitly. The multi-layer perceptron
the experimental results. Sec. V concludes the paper. (MLP) and CNNs have been adopted to learn such priors.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

0123456789ÿ 6594 138395ÿ5773 093149ÿ 6594


ÿ149ÿ998ÿ57

ÿÿÿÿ!"#

ÿÿÿÿ!"#
ÿÿ!"#




Fig. 1. The architecture of the proposed FFDNet for image denoising. The input image is reshaped to four sub-images, which are then input to the CNN
together with a noise level map. The final output is reconstructed by the four denoised sub-images.

The use of CNN for image denoising can be traced back In this work, we take a tunable noise level map M as input to
to [19], where a five-layer network with sigmoid nonlinearity make the denoising model flexible to noise levels. To improve
was proposed. Subsequently, auto-encoder based methods have the efficiency of the denoiser, a reversible downsampling oper-
been suggested for image denoising [30], [31]. However, ator is introduced to reshape the input image of size W ×H×C
early MLP and CNN-based methods are limited in denoising into four downsampled sub-images of size W H
2 × 2 × 4C. Here
performance and cannot compete with the benchmark BM3D C is the number of channels, i.e., C = 1 for grayscale image
method [11]. and C = 3 for color image. In order to enable the noise level
The first discriminative denoising method which achieves map to robustly control the trade-off between noise reduction
comparable performance with BM3D is the plain MLP method and detail preservation by introducing no visual artifacts, we
proposed by Burger et al. [18]. Benefitted from the advances in adopt the orthogonal initialization method to the convolution
deep CNN, Zhang et al. [20] proposed a plain denoising CNN filters.
(DnCNN) method which achieves state-of-the-art denoising
performance. They showed that residual learning and batch A. Network Architecture
normalization [32] are particularly useful for the success of Fig. 1 illustrates the architecture of FFDNet. The first layer
denoising. For a better trade-off between accuracy and speed, is a reversible downsampling operator which reshapes a noisy
Zhang et al. [9] introduced a 7-layer denoising network with image y into four downsampled sub-images. We further con-
dilated convolution [33] to expand the receptive field of CNN. catenate a tunable noise level map M with the downsampled
Mao et al. [34] proposed a very deep fully convolutional sub-images to form a tensor ỹ of size W H
2 × 2 × (4C + 1) as
encoding-decoding network with symmetric skip connection the inputs to CNN. For spatially invariant AWGN with noise
for image denoising. Santhanam et al. [35] developed a recur- level σ, M is a uniform map with all elements being σ.
sively branched deconvolutional network (RBDN) for image With the tensor ỹ as input, the following CNN consist-
denoising as well as generic image-to-image regression. Tai s of a series of 3 × 3 convolution layers. Each layer is
et al. [36] proposed a very deep persistent memory network composed of three types of operations: Convolution (Conv),
(MemNet) by introducing a memory block to mine persistent Rectified Linear Units (ReLU) [37], and Batch Normalization
memory through an adaptive learning process. (BN) [32]. More specifically, “Conv+ReLU” is adopted for
Plain discriminative learning has shown better performance the first convolution layer, “Conv+BN+ReLU” for the middle
than MAP inference guided discriminative learning; however, layers, and “Conv” for the last convolution layer. Zero-padding
existing discriminative learning methods have to learn multiple is employed to keep the size of feature maps unchanged
models for handling images with different noise levels, and are after each convolution. After the last convolution layer, an
incapable to deal with spatially variant noise. To the best of upscaling operation is applied as the reverse operator of the
our knowledge, it remains an unaddressed issue to develop a downsampling operator applied in the input stage to produce
single discriminative denoising model which can handle noise the estimated clean image x̂ of size W × H × C. Different
of different levels, even spatially variant noise, in a speed even from DnCNN, FFDNet does not predict the noise. The reason
faster than BM3D. is given in Sec. III-F. Since FFDNet operates on downsampled
sub-images, it is not necessary to employ the dilated convolu-
III. P ROPOSED FAST AND F LEXIBLE D ISCRIMINATIVE tion [33] to further increase the receptive field.
CNN D ENOISER By considering the balance of complexity and performance,
We present a single discriminative CNN model, namely we empirically set the number of convolution layers as 15 for
FFDNet, to achieve the following three objectives: grayscale image and 12 for color image. As to the channels
• Fast speed: The denoiser is expected to be highly efficient of feature maps, we set 64 for grayscale image and 96 for
without sacrificing denoising performance. color image. The reason that we use different settings for
• Flexibility: The denoiser is able to handle images with grayscale and color images is twofold. First, since there are
different noise levels and even spatially variant noise. high dependencies among the R, G, B channels, using a
• Robustness: The denoiser should introduce no visual arti- smaller number of convolution layers encourages the model
facts in controlling the trade-off between noise reduction to exploit the inter-channel dependency. Second, color image
and detail preservation. has more channels as input, and hence more feature (i.e.,

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

more channels of feature map) is required. According to inherit the flexibility of handling noise model with different
our experimental results, increasing the number of feature parameters, even spatially variant noises by noting M can be
maps contributes more to the denoising performance on color non-uniform.
images. Using different settings for color images, FFDNet can
bring an average gain of 0.15dB by PSNR on different noise C. Denoising on Sub-images
levels. As we shall see from Sec. IV-F, 12-layer FFDNet
Efficiency is another crucial issue for practical CNN-based
for color image runs slightly slower than 15-layer FFDNet
denoising. One straightforward idea is to reduce the depth
for grayscale image. Taking both denoising performance and
and number of filters. However, such a strategy will sacrifice
efficiency into account, we set the number of convolution
much the modeling capacity and receptive field of CNN [20].
layers as 12 and the number of feature maps as 96 for color
In [9], dilated convolution is introduced to expand receptive
image denoising.
field without the increase of network depth, resulting in a 7-
layer denoising CNN. Unfortunately, we empirically find that
B. Noise Level Map FFDNet with dilated convolution tends to generate artifacts
Let’s first revisit the model-based image denoising methods around sharp edges.
to analyze why they are flexible in handling noises at different Shi et al. [39] proposed to extract deep features directly from
levels, which will in turn help us to improve the flexibility the low-resolution image for super-resolution, and introduced
of CNN-based denoiser. Most of the model-based denoising a sub-pixel convolution layer to improve computational effi-
methods aim to solve the following problem ciency. In the application of image denoising, we introduce a
reversible downsampling layer to reshape the input image into
1 a set of small sub-images. Here the downsampling factor is set
x̂ = arg min x ky − xk2 + λΦ(x), (1)
2σ 2 to 2 since it can largely improve the speed without reducing
where 2σ1 2 ky − xk2 is the data fidelity term with noise level modeling capacity. The CNN is deployed on the sub-images,
σ, Φ(x) is the regularization term associated with image and finally a sub-pixel convolution layer is adopted to reverse
prior, and λ controls the balance between the data fidelity the downsampling process.
and regularization terms. It is worth noting that in practice λ Denoising on downsampled sub-images can also effectively
governs the compromise between noise reduction and detail expand the receptive field which in turn leads to a moderate
preservation. When it is too small, much noise will remain; network depth. For example, the proposed network with a
on the opposite, details will be smoothed out along with depth of 15 and 3 × 3 convolution will have a large receptive
suppressing noise. field of 62 × 62. In contrast, a plain 15-layer CNN only has
With some optimization algorithms, the solution of Eqn. (1) a receptive field size of 31×31. We note that the receptive
actually defines an implicit function given by field of most state-of-the-art denoising methods ranges from
35 × 35 to 61 × 61 [20]. Further increase of receptive field ac-
x̂ = F(y, σ, λ; Θ). (2) tually benefits little in improving denoising performance [40].
Since λ can be absorbed into σ, Eqn. (2) can be rewritten as What is more, the introduction of subsampling and sub-pixel
convolution is effective in reducing the memory burden.
x̂ = F(y, σ; Θ). (3) Experiments are conducted to validate the effectiveness of
In this sense, setting noise level σ also plays the role of setting downsampling for balancing denoising accuracy and efficiency
λ to control the trade-off between noise reduction and detail on the BSD68 dataset with σ = 15 and 50. For grayscale
preservation. In a word, model-based methods are flexible in image denoising, we train a baseline CNN which has the same
handling images with various noise levels by simply specifying depth as FFDNet without downsampling. The comparison of
σ in Eqn. (3). average PSNR values is given as follows: (i) when σ is small
According to the above discussion, it is natural to utilize (i.e., 15), the baseline CNN slightly outperforms FFDNet by
CNN to learn an explicit mapping of Eqn. (3) which takes the 0.02dB; (ii) when σ is large (i.e., 50), FFDNet performs better
noise image and noise level as input. However, since the inputs than the baseline CNN by 0.09dB. However, FFDNet is nearly
y and σ have different dimensions, it is not easy to directly 3 times faster and is more memory-friendly than the baseline
feed them into CNN. Inspired by the patch based denoising CNN. As a result, by performing denoising on sub-images,
methods which actually set σ for each patch, we resolve the FFDNet significantly improves efficiency while maintaining
dimensionality mismatching problem by stretching the noise denoising performance.
level σ into a noise level map M. In M, all the elements are
σ. As a result, Eqn. (3) can be further rewritten as D. Examining the Role of Noise Level Map
By training the model with abundant data units (y, M; x),
x̂ = F(y, M; Θ). (4)
where M is exactly the noise level map of y, the model is
It is worth emphasizing that M can be extended to degradation expected to perform well when the noise level matches the
maps with multiple channels for more general noise models ground-truth one (see Fig. 2(a)). On the other hand, in practice,
such as the multivariate (3D) Gaussian noise model N (0, Σ) we may need to use the learned model to smooth out some
with zero mean and covariance matrix Σ in the RGB color details with a higher noise level map than the ground-truth
space [38]. As such, a single CNN model is expected to one (see Fig. 2(b)). In other words, one may take advantage

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

real noisy images whose noises are much more complex than
AWGN (see the results of DnCNN-B in Fig. 8). Actually, since
the CNN model can be treated as the inference of Eqn. (1)
and the data fidelity term corresponds to the degradation
process (or the noise model), the modeling accuracy of the
degradation process is very important for the success of a
denoising model. For example, a model trained for AWGN
removal is not expected to be still effective for Poisson noise
removal. By contrast, the non-blind FFDNet model can be
viewed as multiple denoisers, each of which is anchored with
a noise level. Accordingly, it has the ability to control the
(a) (b) (c) trade-off between noise removal and detail preservation which
in turn facilitates the removal of real noise to some extent (see
Fig. 2. An example to show the importance of guaranteeing the role of
noise level map in controlling the trade-off between noise reduction and detail the results of DnCNN and FFDNet in Fig. 8).
preservation. The input is a noisy image with noise level 25. (a) Result without Second, the performance for AWGN removal is different.
visual artifacts by matched noise level 25. (b) Result without visual artifacts The non-blind model with noise level map has moderately
by mismatched noise level 60. (c) Result with visual artifacts by mismatched
noise level 60. better performance for AWGN removal than the blind one
(about 0.1dB gain on average for the BSD68 dataset), possibly
of the role of λ to control the trade-off between noise reduction because the noise level map provides additional information
and detail preservation. Hence, it is very necessary to further to the input. Similar phenomenon has also been recognized in
examine whether M can play the role of λ. the task of single image super-resolution (SISR) [47].
Third, the application range is different. In the variable
Unfortunately, the use of both M and y as input also
splitting algorithms for general image restoration tasks, the
increases the difficulty to train the model. According to our
prior term involves a denoising subproblem with a current
experiments on several learned models, the model may give
noise level [6], [7], [8]. Thus, the non-blind model can be
rise to visual artifacts especially when the input noise level
easily plugged into variable splitting algorithms to solve var-
is much higher than the ground-truth one (see Fig. 2(c)),
ious image restoration tasks, such as image deblurring, SISR,
which indicates M fails to control the trade-off between noise
and image inpainting [9], [48]. However, the blind model does
reduction and detail preservation. Note that it does not mean
not have this merit.
all the models suffer from such problem. One possible solution
to avoid this is to regularize the convolution filters. As a
F. Residual vs. Non-residual Learning of Plain CNN
widely-used regularization method, orthogonal regularization
has proven to be effective in eliminating the correlation be- It has been pointed out that the integration of residual
tween convolution filters, facilitating gradient propagation and learning for plain CNN and batch normalization is beneficial
improving the compactness of the learned model. In addition, to the removal of AWGN as it eases the training and tends
recent studies have demonstrated the advantage of orthogonal to deliver better performance [20]. The main reason is that
regularization in enhancing the network generalization ability the residual (noise) output follows a Gaussian distribution
in applications of deep hashing and image classification [41], which facilitates the Gaussian normalization step of batch
[42], [43], [44], [45]. According to our experiments, we normalization. The denoising network gains most from such
empirically find that the orthogonal initialization of the con- a task-specific merit especially when a single noise level is
volution filters [43], [46] works well in suppressing the above considered.
mentioned visual artifacts. In FFDNet, we instead consider a wide range of noise
It is worth emphasising that this section aims to highlight level and introduce a noise level map as input. Thus, it
the necessity of guaranteeing the role of M in controlling is interesting to revisit the integration of residual learning
the trade-off between noise reduction and detail preservation and batch normalization for plain CNN. According to our
rather than proposing a method to avoid the possible visual experiments, batch normalization can always accelerate the
artifacts caused by noise level mismatch. In practice, one may training of denoising network regardless of the residual or
retrain the model until M plays its role and results in no visual non-residual learning strategy of plain CNN. In particular,
artifacts with a lager noise level. with batch normalization, while the residual learning enjoys
a faster convergence than non-residual learning, their final
performances after fine-tuning are almost exactly the same.
E. FFDNet vs. a Single Blind Model Some recent works have proposed to train very deep plain
So far, we have known that it is possible to learn a networks with nearly the same performance to that with resid-
single model for blind and non-blind Gaussian denoising, ual learning [44], [49]. In fact, when a network is moderately
respectively. And it is of significant importance to clarify their deep (e.g., less than 20), it is feasible to train a plain network
differences. without the residual learning strategy by using advanced CNN
First, the generalization ability is different. Although the training and design techniques such as ReLU [37], batch
blind model performs favorably for synthetic AWGN removal normalization [32] and Adam [50]. For simplicity, we do not
without knowing the noise level, it does not generalize well to use residual learning for network design.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

G. Un-clipping vs. Clipping of Noisy Images for Training TABLE I


M AIN SPECIFICATIONS OF THE PROPOSED FFDN ET
In the AWGN denoising literature, there exist two widely-
used settings, i.e., un-clipping [5], [11], [17], [18] and clip- FFDNet Number Number of Noise level Training
ping [24], [28], of synthetic noisy image to evaluate the perfor- models of layers channels range patch size
Grayscale 15 64 [0, 75] 70×70
mance of denoising methods. The main difference between the Color 12 96 [0, 75] 50×50
two settings lies in whether the noisy image is clipped into the
range of 0-255 (or more precisely, quantized into 8-bit format) patch size should be larger than the receptive field of FFDNet,
after adding the noise. and we set it to 70×70 and 50×50 for grayscale images and
On the one hand, the un-clipping setting which is also the color images, respectively. The noisy patches are obtained by
most widely-used setting serves an ideal test bed for evalu- adding AWGN of noise level σ ∈ [0, 75] to the clean patches.
ating the denoising methods. This is because most denoising For each noisy patch, the noise level map is uniform. Since
methods assume the noise is ideal AWGN, and the clipping FFDNet is a fully convolutional neural network, it inherits the
of noisy input would make the noise characteristics deviate local connectivity property that the output pixel is determined
from being AWGN. Furthermore, in the variable splitting by the local noisy input and local noise level. Hence, the
algorithms for solving general image restoration problems, trained FFDNet naturally has the ability to handle spatially
there exists a subproblem which, from a Bayesian perspective, variant noise by specifying a non-uniform noise level map.
corresponds to a Gaussian denoising problem [9], [48]. This For clarity, in Table I we list the main specifications of the
further broadens the use of the un-clipping setting. Thus, FFDNet models.
unless otherwise specified, FFDNet in this work refers to the The ADAM algorithm [50] is adopted to optimize FFDNet
model trained on images without clipping or quantization. by minimizing the following loss function,
On the other hand, since real noisy images are always 1 XN
integer-valued and range-limited, it has been argued that the L(Θ) = kF(yi , Mi ; Θ) − xi k2 . (5)
2N i=1
clipping setting of noisy image makes the data more realis-
tic [24]. However, when the noise level is high, the noise will The learning rate starts from 10−3 and reduces to 10−4
be not zero-mean any more due to clipping effects [51]. This when the training error stops decreasing. When the training
in turn will lead to unreliable denoiser for plugging into the error keeps unchanged in five sequential epochs, we merge
variable splitting algorithms to solve other image restoration the parameters of each batch normalization into the adjacent
problems. convolution filters. Then, a smaller learning rate of 10−6 is
To thoroughly evaluate the proposed method, we also train adopted for additional 50 epochs to fine-tune the FFDNet
an FFDNet model with clipping setting of noisy image, namely model. As for the other hyper-parameters of ADAM, we use
FFDNet-Clip, for comparison. During training and testing their default settings. The mini-batch size is set as 128, and
of FFDNet-Clip, the noisy images are quantized into 8-bit the rotation and flip based data augmentation is also adopted
format. Specifically, for a clean image x, we use the matlab during training. The FFDNet models are trained in Matlab
function imnoise(x, ’gaussian’, 0, ( 255 σ 2
) ) to generate (R2015b) environment with MatConvNet package [55] and an
the quantized noisy y with noise level σ. Nvidia Titan X Pascal GPU. The training of a single model
can be done in about two days.
To evaluate the proposed FFDNet denoisers on grayscale
IV. E XPERIMENTS
image denoising, we use BSD68 [3] and Set12 datasets to
A. Dataset Generation and Network Training test FFDNet for removing AWGN noise, and use the “RNI6”
To train the FFDNet model, we need to prepare a training dataset [56] to test FFDNet for removing real noise. The
dataset of input-output pairs {(yi , Mi ; xi )}N
i=1 . Here, yi is BSD68 dataset consists of 68 images from the separate test set
obtained by adding AWGN to latent image xi , and Mi is the of the BSD300 dataset [57]. The Set12 dataset is a collection
noise level map. The reason to use AWGN to generate the of widely-used testing images. The RNI6 dataset contains
training dataset is two-fold. First, AWGN is a natural choice 6 real noisy images without ground-truth. In particular, to
when there is no specific prior information on noise source. evaluate FFDNet-Clip, we use the quantized “Clip300” dataset
Second, real-world noise can be approximated as locally which comprises the 100 images of test set from the BSD300
AWGN [52]. More specifically, FFDNet model is trained on dataset [57] and 200 images from PASCALVOC 2012 [58]
the noisy images yi = xi + vi without quantization to 8-bit dataset. Note that all the testing images are not included in
integer values. Though the real noisy images are generally 8- the training dataset.
bit quantized, we empirically found that the learned model still As for color image denoising, we employ four dataset-
works effectively on real noisy images. For FFDNet-Clip, as s, namely CBSD68, Kodak24 [59], McMaster [60], and
mentioned in Sec. III-G, we use the matlab function imnoise “RNI15” [56], [61]. The CBSD68 dataset is the corresponding
to generate the quantized noisy image from a clean one. color version of the grayscale BSD68 dataset. The Kodak24
We collected a large dataset of source images, including dataset consists of 24 center-cropped images of size 500×500
400 BSD images, 400 images selected from the validation set from the original Kodak dataset. The McMaster dataset is a
of ImageNet [53], and the 4,744 images from the Waterloo widely-used dataset for color demosaicing, which contains 18
Exploration Database [54]. In each epoch, we randomly crop cropped images of size 500×500. Compared to the Kodak24
N = 128 × 8, 000 patches from these images for training. The images, the images in McMaster dataset exhibit more saturated

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

colors [60]. The RNI15 dataset consists of 15 real noisy


images. We note that RNI6 and RNI15 cover a variety of
real noise types, such as camera noise and JPEG compression
noise. Since the ground-truth clean images are unavailable
for real noisy images, we thus only provide the visual com-
parisons on these images. The source codes of FFDNet and
its extension to multivariate Gaussian noise are available at
https://github.com/cszn/FFDNet.

B. Experiments on AWGN Removal


In this subsection, we test FFDNet on noisy images cor-
(a) (b) (c)
rupted by spatially invariant AWGN. For grayscale image
denoising, we mainly compare FFDNet with state-of-the-art Fig. 4. Color image denoising results by CBM3D, CDnCNN and FFDNet
methods BM3D [11], WNNM [5], MLP [18], TNRD [17], and on noise level σ = 50. (a) CBM3D (25.49dB); (b) CDnCNN (26.19dB); (c)
FFDNet (26.28dB).
DnCNN [20]. Note that BM3D and WNNM are two represen-
tative model-based methods based on nonlocal self-similarity is slightly inferior to DnCNN when the noise level is low
prior, whereas TNRD, MLP and DnCNN are discriminative (e.g., σ ≤ 25), but gradually outperforms DnCNN with the
learning based methods. Tables II and III report the PSNR increase of noise level (e.g., σ > 25). This phenomenon
results on BSD68 and Set12 datasets, respectively. We also may be resulted from the trade-off between receptive field
use two CNN-based denoising methods, i.e., RED30 [34] and size and modeling capacity. FFDNet has a larger receptive
MemNet [36], for further comparison. Their PSNR results on field than DnCNN, thus favoring for removing strong noise,
BSD68 dataset with noise level 50 are 26.34dB and 26.35dB, while DnCNN has better modeling capacity which is beneficial
respectively. Note that RED30 and MemNet are trained on a for denoising images with lower noise level. Third, FFDNet
specific noise level and are less efficient than DnCNN. From outperforms WNNM on images such as “House”, while it
Tables II and III, one can have the following observations. is inferior to WNNM on image “Barbara”. This is because
First, FFDNet surpasses BM3D by a large margin and “Barbara” has a rich amount of repetitive structures, which
outperforms WNNM, MLP and TNRD by about 0.2dB for can be effectively exploited by nonlocal self-similarity based
a wide range of noise levels on BSD68. Second, FFDNet WNNM method. The visual comparisons of different methods
are given in Fig. 3. Overall, FFDNet produces the best
TABLE III
perceptual quality of denoised images.
T HE AVERAGE PSNR( D B) RESULTS OF DIFFERENT METHODS ON BSD68 To evaluate FFDNet-Clip, Table IV shows the PSNR com-
WITH NOISE LEVELS 15, 25 35, 50 AND 75 parison with DCGRF [28] and RBDN [35] on the Clip300
dataset. It can be seen that FFDNet-Clip with matched noise
Methods BM3D WNNM MLP TNRD DnCNN FFDNet
σ = 15 31.07 31.37 – 31.42 31.72 31.63 level achieves better performance than DCGRF and RBDN,
σ = 25 28.57 28.83 28.96 28.92 29.23 29.19 showing that FFDNet performs well under the clipping setting.
σ = 35 27.08 27.30 27.50 – 27.69 27.73 We also tested FFDNet-Clip on BSD68 dataset with clipping
σ = 50 25.62 25.87 26.03 25.97 26.23 26.29
σ = 75 24.21 24.40 24.59 – 24.64 24.79
setting, it has been found that the PSNR result is similar to
that of FFDNet with un-clipping setting.
TABLE IV For color image denoising, we compare FFDNet with CB-
T HE AVERAGE PSNR( D B) RESULTS OF DIFFERENT METHODS ON C LIP 300
WITH NOISE LEVELS 15, 25 35, 50 AND 60
M3D [11] and CDnCNN [20]. Table V reports the performance
of different methods on CBSD68, Kodak24, and McMaster
Methods σ = 15 σ = 25 σ = 35 σ = 50 σ = 60 datasets, and Fig. 4 presents the visual comparisons. It can be
DCGRF 31.35 28.67 27.08 25.38 24.45
seen that FFDNet consistently outperforms CBM3D on differ-
RBDN 31.05 28.77 27.31 25.80 23.25
FFDNet-Clip 31.68 29.25 27.75 26.25 25.51 ent noise levels in terms of both quantitative and qualitative
evaluation, and has competing performance with CDnCNN.
TABLE V
T HE AVERAGE PSNR( D B) RESULTS OF CBM3D, CD N CNN AND C. Experiments on Spatially Variant AWGN Removal
FFDN ET ON CBSD68, KODAK 24 AND M C M ASTER DATASETS WITH
NOISE LEVELS 15, 25 35, 50 AND 75
We then test the flexibility of FFDNet to deal with spatially
variant AWGN. To synthesize spatially variant AWGN, we first
Datasets Methods σ=15 σ=25 σ=35 σ=50 σ=75 generate an AWGN image v1 with unit standard deviation and
CBM3D 33.52 30.71 28.89 27.38 25.74
CBSD68 CDnCNN 33.89 31.23 29.58 27.92 24.47
a noise level map M of the same size. Then, element-wise
FFDNet 33.87 31.21 29.58 27.96 26.24 multiplication is applied on v1 and M to produce the spatially
CBM3D 34.28 31.68 29.90 28.46 26.82 variant AWGN, i.e., v = v1 M. In the denoising stage, we
Kodak24 CDnCNN 34.48 32.03 30.46 28.85 25.04 take the bilinearly downsampled noise level map as the input
FFDNet 34.63 32.13 30.57 28.98 27.27
CBM3D 34.06 31.66 29.92 28.51 26.79 to FFDNet. Since the noise level map is spatially smooth, the
McMaster CDnCNN 33.44 31.51 30.14 28.61 25.10 use of downsampled noise level map generally has very little
FFDNet 34.66 32.35 30.81 29.18 27.33 effect on the final denoising performance.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

(a) (b) (c) (d) (e) (f)

Fig. 3. Denoising results on image “102061” from the BSD68 dataset with noise level 50 by different methods. (a) BM3D (26.21dB); (b) WNNM (26.51dB);
(c) MLP (26.54dB); (d) TNRD (26.59dB); (e) DnCNN (26.89dB); (f) FFDNet (26.93dB).

TABLE II
T HE PSNR( D B) RESULTS OF DIFFERENT METHODS ON S ET 12 DATASET WITH NOISE LEVELS 15, 25 35, 50 AND 75. T HE BEST TWO RESULTS ARE
HIGHLIGHTED IN RED AND BLUE COLORS , RESPECTIVELY

Images C.man House Peppers Starfish Monarch Airplane Parrot Lena Barbara Boat Man Couple Average
Noise Level σ = 15
BM3D 31.91 34.93 32.69 31.14 31.85 31.07 31.37 34.26 33.10 32.13 31.92 32.10 32.37
WNNM 32.17 35.13 32.99 31.82 32.71 31.39 31.62 34.27 33.60 32.27 32.11 32.17 32.70
TNRD 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11 32.50
DnCNN 32.61 34.97 33.30 32.20 33.09 31.70 31.83 34.62 32.64 32.42 32.46 32.47 32.86
FFDNet 32.42 35.01 33.10 32.02 32.77 31.58 31.77 34.63 32.50 32.35 32.40 32.45 32.75
Noise Level σ = 25
BM3D 29.45 32.85 30.16 28.56 29.25 28.42 28.93 32.07 30.71 29.90 29.61 29.71 29.97
WNNM 29.64 33.22 30.42 29.03 29.84 28.69 29.15 32.24 31.24 30.03 29.76 29.82 30.26
MLP 29.61 32.56 30.30 28.82 29.61 28.82 29.25 32.25 29.54 29.97 29.88 29.73 30.03
TNRD 29.72 32.53 30.57 29.02 29.85 28.88 29.18 32.00 29.41 29.91 29.87 29.71 30.06
DnCNN 30.18 33.06 30.87 29.41 30.28 29.13 29.43 32.44 30.00 30.21 30.10 30.12 30.43
FFDNet 30.06 33.27 30.79 29.33 30.14 29.05 29.43 32.59 29.98 30.23 30.10 30.18 30.43
Noise Level σ = 35
BM3D 27.92 31.36 28.51 26.86 27.58 26.83 27.40 30.56 28.98 28.43 28.22 28.15 28.40
WNNM 28.08 31.92 28.75 27.27 28.13 27.10 27.69 30.73 29.48 28.54 28.33 28.24 28.69
MLP 28.08 31.18 28.54 27.12 27.97 27.22 27.72 30.82 27.62 28.53 28.47 28.24 28.46
DnCNN 28.61 31.61 29.14 27.53 28.51 27.52 27.94 30.91 28.09 28.72 28.66 28.52 28.82
FFDNet 28.54 31.99 29.18 27.58 28.54 27.47 28.02 31.20 28.29 28.82 28.70 28.68 28.92
Noise Level σ = 50
BM3D 26.13 29.69 26.68 25.04 25.82 25.10 25.90 29.05 27.22 26.78 26.81 26.46 26.72
WNNM 26.45 30.33 26.95 25.44 26.32 25.42 26.14 29.25 27.79 26.97 26.94 26.64 27.05
MLP 26.37 29.64 26.68 25.43 26.26 25.56 26.12 29.32 25.24 27.03 27.06 26.67 26.78
TNRD 26.62 29.48 27.10 25.42 26.31 25.59 26.16 28.93 25.70 26.94 26.98 26.50 26.81
DnCNN 27.03 30.00 27.32 25.70 26.78 25.87 26.48 29.39 26.22 27.20 27.24 26.90 27.18
FFDNet 27.03 30.43 27.43 25.77 26.88 25.90 26.58 29.68 26.48 27.32 27.30 27.07 27.32
Noise Level σ = 75
BM3D 24.32 27.51 24.73 23.27 23.91 23.48 24.18 27.25 25.12 25.12 25.32 24.70 24.91
WNNM 24.60 28.24 24.96 23.49 24.31 23.74 24.43 27.54 25.81 25.29 25.42 24.86 25.23
MLP 24.63 27.78 24.88 23.57 24.40 23.87 24.55 27.68 23.39 25.44 25.59 25.02 25.07
DnCNN 25.07 27.85 25.17 23.64 24.71 24.03 24.71 27.54 23.63 25.47 25.64 24.97 25.20
FFDNet 25.29 28.43 25.39 23.82 24.99 24.18 24.94 27.97 24.24 25.64 25.75 25.29 25.49

Fig. 5 gives an example to show the effectiveness of D. Experiments on Noise Level Sensitivity
FFDNet on removing spatially variant AWGN. We do not
compare FFDNet with other methods because no state-of-the- In practical applications, the noise level map may not be
art AWGN denoising method can be readily extended to handle accurately estimated from the noisy observation, and mismatch
spatially variant AWGN. From Fig. 5, one can see that FFDNet between the input and real noise levels is inevitable. If the
with non-uniform noise level map is flexible and powerful input noise level is lower than the real noise level, the noise
to remove spatially variant AWGN. In contrast, FFDNet with cannot be completely removed. Therefore, users often prefer
uniform noise level map would fail to remove strong noise to set a higher noise level to remove more noise. However,
at the region with higher noise level while smoothing out the this may also remove some image details together with noise.
details at the region with lower noise level. A practical denoiser should tolerate certain mismatch of noise
levels. In this subsection, we evaluate FFDNet in comparison

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

level matches the ground-truth one. BM3D and FFDNet


produce similar visual results with lower input noise lev-
50 50

40 40

els, while they exhibit certain difference with higher input

Noise Level

Noise Level
30 30

20 20

0
256
10

0
256
10

noise levels. Both of them will smooth out noise in flat


256 256

regions, and gradually smooth out image structures with


192 192
192 192
128 128
128 128
64 64 64 64
1 1 1 1

the increase of input noise levels. Particularly, FFDNet


may wipe out some low contrast line structure, whereas
BM3D can still preserve the mean patch regardless of the
(a) input noise levels due to its use of nonlocal information.
• Using a higher input noise level can generally produce
(b) (c) better visual results than using a lower one. In addition,
Fig. 5. Examples of FFDNet on removing spatially variant AWGN. (a) Noisy there is no much visual difference when the input noise
image (20.55dB) with spatially variant AWGN. (b) Ground-truth noise level level is a little higher than the ground-truth one.
map and corresponding denoised image (30.08dB) by FFDNet; (c) Uniform
noise level map constructed by using the mean value of ground-truth noise According to above observations, FFDNet exhibits similar
level map and corresponding denoised image (27.45dB) by FFDNet. noise level sensitivity performance to BM3D and DnCNN in
balancing noise reduction and detail preservation. When the
42
FFDNet-5
ground-truth noise level is unknown, it is more preferable to
40
FFDNet-15
FFDNet-25
set a larger input noise level than a lower one to remove noise
38

36
FFDNet-50 with better perceptual quality.
BM3D-5
BM3D-15
34
BM3D-25
BM3D-50
32
DnCNN-15
E. Experiments on Real Noisy Images
PSNR (dB)

30 DnCNN-25
DnCNN-50 In this subsection, real noisy images are used to further as-
28
sess the practicability of FFDNet. However, such an evaluation
26
is difficult to conduct due to the following reasons. (i) Both the
24

22
ground-truth clean image and noise level are unknown for real
20
noisy image. (ii) The real noise comes from various sources
18
such as camera imaging pipeline (e.g., shot noise, amplifier
16
noise and quantization noise), scanning, lossy compression
14
and image resizing [62], [63], and it is generally non-Gaussian,
0 5 10 15 20 25 30 35 40 45 50
Image Noise Level
spatially variant, and signal-dependent. As a result, the AWGN
assumption in many denoising algorithms does not hold, and
Fig. 6. Noise level sensitivity curves of BM3D, DnCNN and FFDNet. The the associated noise level estimation methods do not work well
averaged PSNR results are evaluated on BSD68. for real noisy images.
Instead of adopting any noise level estimation methods, we
adopt an interactive strategy to handle real noisy images. First
with benchmark BM3D and DnCNN by varying different input of all, we empirically found that the assumption of spatially
noise levels for a given ground-truth noise level. invariant noise usually works well for most real noisy images.
Fig. 6 illustrates the noise level sensitivity curves of BM3D, We then employ a set of typical input noise levels to produce
DnCNN and FFDNet. Different methods with different input multiple outputs, and select the one which has best trade-off
noise levels (e.g., “FFDNet-15” represents FFDNet with input between noise reduction and detail preservation. Second, the
noise level fixed as 15) are evaluated on BSD68 images with spatially variant noise in most real-world images is signal-
noise level ranging from 0 to 50. Fig. 7 shows the visual dependent. In this case, we first sample several typical regions
comparisons between BM3D/CBM3D and FFDNet by setting of distinct colors. For each typical region, we apply different
different input noise levels to denoise a noisy image. Four noise levels with an interval of 5, and choose the best noise
typical image structures, including flat region, sharp edge, line level by observing the denoising results. The noise levels at
with high contrast, and line with low contrast, are selected other regions are then interpolated from the noise levels of
for visual comparison to investigate the noise level sensitivity the typical regions to constitute an approximated non-uniform
of BM3D and FFDNet. From Figs. 6 and 7, we have the noise level map. Our FFDNet focuses on non-blind denoising
following observations. and assumes the noise level map is known. In practice, some
• On all noise levels, FFDNet achieves similar denoising advanced noise level estimation methods [62], [64] can be
results to BM3D and DnCNN when their input noise adopted to assist the estimation of noise level map. In our
levels are the same. following experiments, unless otherwise specified, we assume
• With the fixed input noise level, for all the three methods, spatially invariant noise for the real noisy images.
the PSNR value tends to stay the same when the ground- Since there is no ground-truth image for a real noisy image,
truth noise level is lower, and begins to decrease when visual comparison is employed to evaluate the performance
the ground-truth noise level is higher. of FFDNet. We choose BM3D for comparison because it is
• The best visual quality is obtained when the input noise widely accepted as a benchmark for denoising applications.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

10

(a) (b) (c) (d) (e) (f)

Fig. 7. Visual comparisons between FFDNet and BM3D/CBM3D by setting different input noise levels to denoise a noisy image. (a) From top to bottom:
ground-truth image, four clean zoom-in regions, and the corresponding noisy regions (AWGN, noise level 15). (b) From top to bottom: denoising results
by BM3D with input noise levels 5, 10, 15, 20, 50, and 75, respectively. (c) Results by FFDNet with the same settings as in (b). (d) From top to bottom:
ground-truth image, four clean zoom-in regions, and the corresponding noisy regions (AWGN, noise level 25). (e) From top to bottom: denoising results by
CBM3D with input noise levels 10, 20, 25, 30, 45 and 60, respectively. (f) Results by FFDNet with the same settings as in (e).

Given a noisy image, the same input noise level is used for can handle various kinds of noises, such as JPEG lossy
BM3D and FFDNet. Another CNN-based denoising method compression noise (see image “Audrey Hepburn”), and video
DnCNN and a blind denoising method Noise Clinic [56] are noise (see image “Movie”).
also used for comparison. Note that, apart from the non-blind Fig. 11 shows a more challenging example to demonstrate
DnCNN models for specific noise levels, the blind DnCNN the advantage of FFDNet for denoising noisy images with
model (i.e., DnCNN-B) trained on noise level range of [0, 55] spatially variant noise. We select five typical regions to es-
is also used for grayscale image denoising. For color image timate the noise levels, including two background regions, the
denoising, the blind CDnCNN-B is used for comparison. coffee region, the milk-foam region, and the specular reflection
Fig. 8 compares the grayscale image denoising results of region. In our experiment, we manually and interactively set
Noise Clinic, BM3D, DnCNN, DnCNN-B and FFDNet on σ = 10 for the milk-foam and specular reflection regions, σ
RNI6 images. As one can see, Noise Clinic reduces much = 35 for the background region with high noise (i.e., green
the noise, but it also generates many algorithm-induced ar- region), and σ = 25 for the other regions. We then interpolate
tifacts. BM3D, DnCNN and FFDNet produce more visually the non-uniform noise level map for the whole image based on
pleasant results. While the non-blind DnCNN models perform the estimated five noise levels. As one can see, while FFDNet
favorably, the blind DnCNN-B model performs poorly in with a small uniform input noise level can recover the details
removing the non-AWGN real noise. This phenomenon clearly of regions with low noise level, it fails to remove strong noise.
demonstrates the better generalization ability of non-blind On the other hand, FFDNet with a large uniform input noise
model over blind one for controlling the trade-off between level can remove strong noise but it will also smooth out the
noise removal and detail preservation. It is worth noting that, details in the region with low noise level. In contrast, the
for image “Building” which contains structured noise, Noise denoising result with a proper non-uniform noise level map
Clinic and BM3D fail to remove those structured noises not only preserves image details but also removes the strong
since the structured noises fit the nonlocal self-similarity prior noise.
adopted in Noise Clinic and BM3D. In contrast, FFDNet Finally, according to the above experiments on real noisy
and DnCNN successfully remove such noise without losing images, we can see that the FFDNet model trained with un-
underlying image textures. quantized image data performs well on 8-bit quantized real
noisy images.
Fig. 9 shows the denoising results of Noise Clinic, CBM3D,
CDnCNN-B and FFDNet on five color noisy images from
RNI15. It can be seen that CDnCNN-B yields very pleasing F. Running Time
results for noisy image with AWGN-like noise such as im- Table VI lists the running time results of BM3D, DnCNN
age “Frog”, and is still unable to handle non-AWGN noise. and FFDNet for denoising grayscale and color images with
Notably, from the denoising results of “Boy”, one can see size 256×256, 512×512 and 1,024×1,024. The evaluation was
that CBM3D remains the structured color noise unremoved performed in Matlab (R2015b) environment on a computer
whereas FFDNet removes successfully such kind of noise. We with a six-core Intel(R) Core(TM) i7-5820K CPU @ 3.3GHz,
can conclude that while the nonlocal self-similarity prior helps 32 GB of RAM and an Nvidia Titan X Pascal GPU. For
to remove random noise, it hinders the removal of structured BM3D, we evaluate its running time by denoising images
noise. In comparison, the prior implicitly learned by CNN is with noise level 25. For DnCNN, the grayscale and color
able to remove both random noise and structured noise. image denoising models have 17 and 20 convolution layers,
Fig. 10 further shows more visual results of FFDNet on the respectively. The Nvidia cuDNN-v5.1 deep learning library is
other nine images from RNI15. It can be seen that FFDNet used to accelerate the computation of DnCNN and FFDNet.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

11

(a) (b) (c) (d) (e) (f)

Fig. 8. Grayscale image denoising results by different methods on real noisy images. From top to bottom: noisy images, denoised images by Noise Clinic,
denoised images by BM3D, denoised images by DnCNN, denoised images by DnCNN-B, denoised images by FFDNet. (a) David Hilbert, σ = 14 (15 for
DnCNN); (b) Old Tom Morris, σ = 15; (c) Chupa Chups, σ = 10; (d) Vinegar, σ = 20; (e) Building, σ = 20; (f) Marilyn, σ = 7 (10 for DnCNN).

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

12

(a) (b) (c) (d) (e)

Fig. 9. Color image denoising results by different methods on real noisy images. From top to bottom: noisy images, denoised images by Noise Clinic,
denoised images by CBM3D, denoised images by CDnCNN-B, denoised images by FFDNet. (a) Dog, σ = 28; (b) Frog, σ = 15; (c) Pattern1, σ = 12; (d)
Pattern2, σ = 40; (e) Boy, σ = 45.

The memory transfer time between CPU and GPU is also


TABLE VI
RUNNING TIME ( IN SECONDS ) OF DIFFERENT METHODS FOR DENOISING
counted. Note that DnCNN and FFDNet can be implemented
IMAGES WITH SIZE 256×256, 512×512 AND 1,024×1,024 with both single-threaded (ST) and multi-threaded (MT) CPU
computations.
256×256 512×512 1,024×1,024
Methods Device
Gray Color Gray Color Gray Color From Table VI, we have the following observations. First,
BM3D CPU(ST) 0.59 0.98 2.52 3.57 10.77 20.15
CPU(ST) 2.14 2.44 8.63 9.85 32.82 38.11 BM3D spends much more time on denoising color images than
DnCNN CPU(MT) 0.74 0.98 3.41 4.10 12.10 15.48 grayscale images. The reason is that, compared to gray-BM3D,
GPU 0.011 0.014 0.033 0.040 0.124 0.167 CBM3D needs extra time to denoise the chrominance com-
CPU(ST) 0.44 0.62 1.81 2.51 7.24 10.17
FFDNet CPU(MT) 0.18 0.21 0.73 0.98 2.96 3.95
ponents after luminance-chrominance color transformation.
GPU 0.006 0.008 0.012 0.017 0.038 0.057 Second, while DnCNN can benefit from GPU computation for
fast implementation, it has comparable CPU time to BM3D.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

13

Third, FFDNet spends almost the same time for processing [13] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse
grayscale and color images. More specifically, FFDNet with representation for image restoration,” IEEE Transactions on Image
Processing, vol. 22, no. 4, pp. 1620–1630, 2013.
multi-threaded implementation is about three times faster than [14] M. Elad and M. Aharon, “Image denoising via sparse and redundant
DnCNN and BM3D on CPU, and much faster than DnCNN representations over learned dictionaries,” IEEE Transactions on Image
on GPU. Even with single-threaded implementation, FFDNet Processing, vol. 15, no. 12, pp. 3736–3745, 2006.
[15] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image
is also faster than BM3D. Taking denoising performance and restoration,” IEEE Transactions on Image Processing, vol. 17, no. 1, pp.
flexibility into consideration, FFDNet is very competitive for 53–69, 2008.
practical applications. [16] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for
image denoising,” in IEEE Conference on Computer Vision and Pattern
Recognition, vol. 2, 2005, pp. 60–65.
V. C ONCLUSION [17] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible
In this paper, we proposed a new CNN model, namely FFD- framework for fast and effective image restoration,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–
Net, for fast, effective and flexible discriminative denoising. To 1272, 2017.
achieve this goal, several techniques were utilized in network [18] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can
design and training, such as the use of noise level map as plain neural networks compete with BM3D?” in IEEE Conference on
Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.
input and denoising in downsampled sub-images space. The
[19] V. Jain and S. Seung, “Natural image denoising with convolutional
results on synthetic images with AWGN demonstrated that networks,” in Advances in Neural Information Processing Systems, 2009,
FFDNet can not only produce state-of-the-art results when pp. 769–776.
input noise level matches ground-truth noise level, but also [20] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
Gaussian denoiser: Residual learning of deep CNN for image denoising,”
have the ability to robustly control the trade-off between IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155,
noise reduction and detail preservation. The results on im- July 2017.
ages with spatially variant AWGN validated the flexibility [21] A. Barbu, “Training an active random field for real-time image denois-
ing,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2451–
of FFDNet for handing inhomogeneous noise. The results 2462, 2009.
on real noisy images further demonstrated that FFDNet can [22] K. G. Samuel and M. F. Tappen, “Learning optimized MAP estimates in
deliver perceptually appealing denoising results. Finally, the continuously-valued MRF models,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2009, pp. 477–484.
running time comparisons showed the faster speed of FFDNet [23] J. Sun and M. F. Tappen, “Learning non-local range markov random
over other competing methods such as BM3D. Considering field for image restoration,” in IEEE Conference on Computer Vision
its flexibility, efficiency and effectiveness, FFDNet provides a and Pattern Recognition, 2011, pp. 2745–2752.
[24] U. Schmidt and S. Roth, “Shrinkage fields for effective image restora-
practical solution to CNN denoising applications. tion,” in IEEE Conference on Computer Vision and Pattern Recognition,
2014, pp. 2774–2781.
R EFERENCES [25] U. Schmidt, “Half-quadratic inference and learning for natural images,”
Ph.D. dissertation, Technische Universität, Darmstadt, 2017. [Online].
[1] H. C. Andrews and B. R. Hunt, “Digital image restoration,” Prentice-
Available: http://tuprints.ulb.tu-darmstadt.de/6044/
Hall Signal Processing Series, Englewood Cliffs: Prentice-Hall, 1977,
vol. 1, 1977. [26] S. Lefkimmiatis, “Non-local color image denoising with convolutional
[2] P. Chatterjee and P. Milanfar, “Is denoising dead?” IEEE Transactions neural networks,” in IEEE Conference on Computer Vision and Pattern
on Image Processing, vol. 19, no. 4, pp. 895–911, 2010. Recognition, 2017, pp. 3587–3596.
[3] S. Roth and M. J. Black, “Fields of experts: A framework for learning [27] P. Qiao, Y. Dou, W. Feng, R. Li, and Y. Chen, “Learning non-local
image priors,” in IEEE Computer Society Conference on Computer image diffusion for image denoising,” in Proceedings of the 2017 ACM
Vision and Pattern Recognition, vol. 2, 2005, pp. 860–867. on Multimedia Conference, 2017, pp. 1847–1855.
[4] D. Zoran and Y. Weiss, “From learning models of natural image patches [28] R. Vemulapalli, O. Tuzel, and M.-Y. Liu, “Deep gaussian conditional
to whole image restoration,” in IEEE International Conference on random field network: A model-based deep network for discriminative
Computer Vision, 2011, pp. 479–486. denoising,” in IEEE Conference on Computer Vision and Pattern Recog-
[5] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm nition, June 2016.
minimization with application to image denoising,” in IEEE Conference [29] J. Kruse, C. Rother, and U. Schmidt, “Learning to push the limits
on Computer Vision and Pattern Recognition, 2014, pp. 2862–2869. of efficient FFT-based image deconvolution,” in IEEE International
[6] M. V. Afonso, J. M. Bioucas-Dias, and M. A. Figueiredo, “Fast image Conference on Computer Vision, Oct 2017.
recovery using variable splitting and constrained optimization,” IEEE [30] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with
Transactions on Image Processing, vol. 19, no. 9, pp. 2345–2356, 2010. deep neural networks,” in Advances in Neural Information Processing
[7] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak, D. Reddy, Systems, 2012, pp. 341–349.
O. Gallo, J. Liu, W. Heidrich, K. Egiazarian et al., “FlexISP: A flexible [31] F. Agostinelli, M. R. Anderson, and H. Lee, “Robust image denoising
camera image processing framework,” ACM Transactions on Graphics, with multi-column deep neural networks,” in Advances in Neural Infor-
vol. 33, no. 6, p. 231, 2014. mation Processing Systems, 2013, pp. 1493–1501.
[8] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: [32] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
Regularization by denoising (RED),” submitted to SIAM Journal on network training by reducing internal covariate shift,” in International
Imaging Sciences, 2016. Conference on Machine Learning, 2015, pp. 448–456.
[9] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser [33] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated
prior for image restoration,” in IEEE Conference on Computer Vision convolutions,” in International Conference on Learning Representations,
and Pattern Recognition, 2017, pp. 3929–3938. 2016.
[10] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image [34] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep
denoising using scale mixtures of gaussians in the wavelet domain,” convolutional encoder-decoder networks with symmetric skip connec-
IEEE Transactions on Image processing, vol. 12, no. 11, pp. 1338–1351, tions,” in Advances in Neural Information Processing Systems, 2016,
2003. pp. 2802–2810.
[11] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by [35] V. Santhanam, V. I. Morariu, and L. S. Davis, “Generalized deep image
sparse 3-D transform-domain collaborative filtering,” IEEE Transactions to image regression,” in IEEE Conference on Computer Vision and
on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007. Pattern Recognition, 2017, pp. 5609–5619.
[12] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local [36] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory
sparse models for image restoration,” in IEEE International Conference network for image restoration,” in IEEE International Conference on
on Computer Vision, 2009, pp. 2272–2279. Computer Vision, 2017, pp. 4539–4547.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

14

(a) (b) (c) (d)

(e) (f) (g) (h) (i)

Fig. 10. More denoising results of FFDNet on real image denoising. (a) Flowers, σ = 70; (b) Bears, σ = 15; (c) Audrey Hepburn, σ = 10; (d) Postcards, σ
= 15; (e) Stars, σ = 18; (f) Window, σ = 15; (g) Singer, σ = 30; (h) Movie, σ = 12; (i) Pattern3, σ = 25.

(a) (b) (c) (d) (e) (f)

Fig. 11. An example of FFDNet on image “Glass” with spatially variant noise. (a) Noisy image; (b) Denoised image by Noise Clinic; (c) Denoised image
by FFDNet with σ = 10; (d) Denoised image by FFDNet with σ = 25; (e) Denoised image by FFDNet with σ = 35; (f) Denoised image by FFDNet with
non-uniform noise level map.

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing

15

[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- [62] C. Liu, R. Szeliski, S. B. Kang, C. L. Zitnick, and W. T. Freeman,
tion with deep convolutional neural networks,” in Advances in neural “Automatic estimation and removal of noise from a single image,” IEEE
information processing systems, 2012, pp. 1097–1105. Transactions on Pattern Analysis and Machine Intelligence, vol. 30,
[38] S. Nam, Y. Hwang, Y. Matsushita, and S. Joo Kim, “A holistic no. 2, pp. 299–314, 2008.
approach to cross-channel image noise modeling and its application to [63] M. Colom, M. Lebrun, A. Buades, and J.-M. Morel, “A non-parametric
image denoising,” in IEEE Conference on Computer Vision and Pattern approach for the estimation of intensity-frequency dependent noise,” in
Recognition, June 2016. IEEE International Conference on Image Processing, 2014, pp. 4261–
[39] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, 4265.
D. Rueckert, and Z. Wang, “Real-time single image and video super- [64] L. Azzari and A. Foi, “Gaussian-cauchy mixture modeling for robust
resolution using an efficient sub-pixel convolutional neural network,” in signal-dependent noise estimation,” in 2014 IEEE International Confer-
IEEE Conference on Computer Vision and Pattern Recognition, 2016, ence on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp.
pp. 1874–1883. 5357–5361.
[40] A. Levin and B. Nadler, “Natural image denoising: Optimality and
inherent bounds,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2011, pp. 2833–2840.
[41] D. Wang, P. Cui, M. Ou, and W. Zhu, “Deep multimodal hashing
with orthogonal regularization.” in International Joint Conference on
Artificial Intelligence, 2015, pp. 2291–2297.
[42] Z. Mhammedi, A. Hellicar, A. Rahman, and J. Bailey, “Efficient or- Kai Zhang received the M.Sc. degree in ap-
thogonal parametrisation of recurrent neural networks using householder plied mathematics from China Jiliang University,
reflections,” arXiv preprint arXiv:1612.00188, 2016. Hangzhou, China, in 2014. He is currently pursuing
[43] K. Jia, “Improving training of deep neural networks via singular value the Ph.D. degree in computer science and technology
bounding,” in IEEE Conference on Computer Vision and Pattern Recog- at Harbin Institute of Technology, Harbin, China,
nition, 2017, pp. 4344–4352. under the supervision of Prof. Wangmeng Zuo and
[44] D. Xie, J. Xiong, and S. Pu, “All you need is beyond a good init: Ex- Prof. Lei Zhang. From July 2015 to June 2017,
ploring better solution for training extremely deep convolutional neural he was a Research Assistant in the Department of
networks with orthonormality and modulation,” in IEEE Conference on Computing, The Hong Kong Polytechnic University,
Computer Vision and Pattern Recognition, 2017, pp. 6176–6185. Hong Kong. His research interests include machine
[45] Y. Sun, L. Zheng, W. Deng, and S. Wang, “SVDNet for pedestrian learning and image processing.
retrieval,” arXiv preprint arXiv:1703.05693, 2017.
[46] D. Mishkin and J. Matas, “All you need is a good init,” ArXiv e-prints,
2015.
[47] G. Riegler, S. Schulter, M. Ruther, and H. Bischof, “Conditioned
regression models for non-blind single image super-resolution,” in IEEE
International Conference on Computer Vision, 2015, pp. 522–530.
[48] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-Play ADMM Wangmeng Zuo (M’09-SM’14) received the Ph.D.
for image restoration: Fixed-point convergence and applications,” IEEE degree in computer application technology from the
Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017. Harbin Institute of Technology, Harbin, China, in
[49] S. Zagoruyko and N. Komodakis, “Diracnets: training very deep neural 2007. He is currently a Professor in the School of
networks without skip-connections,” arXiv preprint arXiv:1706.00388, Computer Science and Technology, Harbin Institute
2017. of Technology. His current research interests include
[50] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” image enhancement and restoration, image and face
in International Conference for Learning Representations, 2015. editing, object detection, visual tracking, and image
[51] T. Plotz and S. Roth, “Benchmarking denoising algorithms with real classification. He has published over 70 papers in
photographs,” in The IEEE Conference on Computer Vision and Pattern toptier academic journals and conferences. He has
Recognition (CVPR), July 2017. served as a Tutorial Organizer in ECCV 2016, an
[52] J.-S. Lee, “Refined filtering of image noise using local statistics,” Associate Editor of the IET Biometrics and Journal of Electronic Imaging.
Computer graphics and image processing, vol. 15, no. 4, pp. 380–389,
1981.
[53] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
A large-scale hierarchical image database,” in IEEE Conference on
Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[54] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang,
“Waterloo exploration database: New challenges for image quality Lei Zhang (M’04-SM’14-F’18) received his B.Sc.
assessment models,” IEEE Transactions on Image Processing, vol. 26, degree in 1995 from Shenyang Institute of Aero-
no. 2, pp. 1004–1016, 2017. nautical Engineering, Shenyang, P.R. China, and
[55] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks M.Sc. and Ph.D degrees in Control Theory and Engi-
for matlab,” in ACM Conference on Multimedia Conference, 2015, pp. neering from Northwestern Polytechnical University,
689–692. Xi’an, P.R. China, respectively in 1998 and 2001,
[56] M. Lebrun, M. Colom, and J.-M. Morel, “The noise clinic: A blind respectively. From 2001 to 2002, he was a research
image denoising algorithm,” Image Processing On Line, vol. 5, pp. associate in the Department of Computing, The
1–54, 2015. [Online]. Available: http://demo.ipol.im/demo/125/ Hong Kong Polytechnic University. From January
[57] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human 2003 to January 2006 he worked as a Postdoctoral
segmented natural images and its application to evaluating segmentation Fellow in the Department of Electrical and Computer
algorithms and measuring ecological statistics,” in Proc. 8th Int’l Conf. Engineering, McMaster University, Canada. In 2006, he joined as an Assistant
Computer Vision, vol. 2, July 2001, pp. 416–423. Professor with the Department of Computing, The Hong Kong Polytechnic
[58] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, University, where he has been a Chair Professor, Since 2017. He has published
J. Winn, and A. Zisserman, “The pascal visual object classes challenge: over 200 papers in those areas. His research interests include computer vision,
A retrospective,” International Journal of Computer Vision, vol. 111, pattern recognition, image and video analysis, and biometrics. As of 2018, his
no. 1, pp. 98–136, Jan 2015. publications have been cited over 33,000 times in the literature. Prof. Zhang
[59] R. Franzen, “Kodak lossless true color image suite,” source: http://r0k. is an Associate Editor of IEEE Trans. on Image Processing, SIAM Journal
us/graphics/kodak, vol. 4, 1999. of Imaging Sciences and Image and Vision Computing, etc. He is a “Web of
[60] L. Zhang, X. Wu, A. Buades, and X. Li, “Color demosaicking by local Science Highly Cited Researcher” from 2015 to 2017. More information can
directional interpolation and nonlocal adaptive thresholding,” Journal of be found in his homepage http://www4.comp.polyu.edu.hk/∼cslzhang/.
Electronic Imaging, vol. 20, no. 2, pp. 1–15, 2011.
[61] [Online]. Available: https://ni.neatvideo.com/home

1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like