FFDNet
FFDNet
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
Abstract—Due to the fast inference and good performance, additive white Gaussian noise (AWGN) and the noise level is
discriminative learning methods have been widely studied in given. In order to handle practical image denoising problems,
image denoising. However, these methods mostly learn a specific a flexible image denoiser is expected to have the following
model for each noise level, and require multiple models for
denoising images with different noise levels. They also lack desirable properties: (i) it is able to perform denoising using
flexibility to deal with spatially variant noise, limiting their a single model; (ii) it is efficient, effective and user-friendly;
applications in practical denoising. To address these issues, and (iii) it can handle spatially variant noise. Such a denoiser
we present a fast and flexible denoising convolutional neural can be directly deployed to recover the clean image when the
network, namely FFDNet, with a tunable noise level map as noise level is known or can be well estimated. When the noise
the input. The proposed FFDNet works on downsampled sub-
images, achieving a good trade-off between inference speed and level is unknown or is difficult to estimate, the denoiser should
denoising performance. In contrast to the existing discriminative allow the user to adaptively control the trade-off between noise
denoisers, FFDNet enjoys several desirable properties, including reduction and detail preservation. Furthermore, the noise can
(i) the ability to handle a wide range of noise levels (i.e., [0, be spatially variant and the denoiser should be flexible enough
75]) effectively with a single network, (ii) the ability to remove to handle spatially variant noise.
spatially variant noise by specifying a non-uniform noise level
map, and (iii) faster speed than benchmark BM3D even on CPU However, state-of-the-art image denoising methods are still
without sacrificing denoising performance. Extensive experiments limited in flexibility or efficiency. In general, image denoising
on synthetic and real noisy images are conducted to evaluate methods can be grouped into two major categories, model-
FFDNet in comparison with state-of-the-art denoisers. The results based methods and discriminative learning based ones. Model-
show that FFDNet is effective and efficient, making it highly based methods such as BM3D [11] and WNNM [5] are
attractive for practical denoising applications.
flexible in handling denoising problems with various noise
Index Terms—Image denoising, convolutional neural networks, levels, but they suffer from several drawbacks. For example,
Gaussian noise, spatially variant noise their optimization algorithms are generally time-consuming,
and cannot be directly used to remove spatially variant noise.
I. I NTRODUCTION Moreover, model-based methods usually employ hand-crafted
image priors (e.g., sparsity [14], [15] and nonlocal self-
T HE importance of image denoising in low level vision can
be revealed from many aspects. First, noise corruption is
inevitable during the image sensing process and it may heavily
similarity [12], [13], [16]), which may not be strong enough
to characterize complex image structures.
degrade the visual quality of acquired image. Removing noise As an alternative, discriminative denoising methods aim
from the observed image is an essential step in various image to learn the underlying image prior and fast inference from
processing and computer vision tasks [1], [2]. Second, from a training set of degraded and ground-truth image pairs.
the Bayesian perspective, image denoising is an ideal test bed One approach is to learn stage-wise image priors in the
for evaluating image prior models and optimization method- context of truncated inference procedure [17]. Another more
s [3], [4], [5]. Last but not least, in the unrolled inference via popular approach is plain discriminative learning, such as the
variable splitting techniques, many image restoration problems MLP [18] and convolutional neural network (CNN) based
can be addressed by sequentially solving a series of denoising methods [19], [20], among which the DnCNN [20] method has
subproblems, which further broadens the application fields of achieved very competitive denoising performance. The success
image denoising [6], [7], [8], [9]. of CNN for image denoising is attributed to its large modeling
As in many previous literature of image denoising [10], capacity and tremendous advances in network training and
[11], [12], [13], in this paper we assume that the noise is design. However, existing discriminative denoising methods
are limited in flexibility, and the learned model is usually
This project is partially supported by the National Natural Scientific tailored to a specific noise level. From the perspective of
Foundation of China (NSFC) under Grant No. 61671182 and 61471146, and
the HK RGC GRF grant (under no. PolyU 152124/15E). (Corresponding regression, they aim to learn a mapping function x = F(y; Θσ )
author: Wangmeng Zuo.) between the input noisy observation y and the desired output
K. Zhang is with the School of Computer Science and Technology, Harbin x. The model parameters Θσ are trained for noisy images
Institute of Technology, Harbin 150001, China, and also with the Department
of Computing, The Hong Kong Polytechnic University, Hong Kong (e-mail: corrupted by AWGN with a fixed noise level σ, while the
cskaizhang@gmail.com). trained model with Θσ is hard to be directly deployed to
W. Zuo is with the School of Computer Science and Technology, Harbin In- images with other noise levels. Though a single CNN model
stitute of Technology, Harbin 150001, China (e-mail: cswmzuo@gmail.com).
L. Zhang is with the Department of Computing, The Hong Kong Polytech- (i.e., DnCNN-B) is trained in [20] for Gaussian denoising, it
nic University, Hong Kong (e-mail: cslzhang@comp.polyu.edu.hk). does not generalize well to real noisy images and works only
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
if the noise level is in the preset range, e.g., [0, 55]. Besides, II. R ELATED W ORK
all the existing discriminative learning based methods lack In this section, we briefly review and discuss the two major
flexibility to deal with spatially variant noise. categories of relevant methods to this work, i.e., maximum a
To overcome the drawbacks of existing CNN based denois- posteriori (MAP) inference guided discriminative learning and
ing methods, we present a fast and flexible denoising convo- plain discriminative learning.
lutional neural network (FFDNet). Specifically, our FFDNet
is formulated as x = F(y, M; Θ), where M is a noise level A. MAP Inference Guided Discriminative Learning
map. In the DnCNN model x = F(y; Θσ ), the parameters Θσ
Instead of first learning the prior and then performing the
vary with the change of noise level σ, while in the FFDNet
inference, this category of methods aims to learn the prior
model, the noise level map is modeled as an input and the
parameters along with a compact unrolled inference through
model parameters Θ are invariant to noise level. Thus, FFDNet
minimizing a loss function [21]. Following the pioneer work
provides a flexible way to handle different noise levels with a
of fields of experts [3], Barbu [21] trained a discriminative
single network.
Markov random field (MRF) model together with a gradient
By introducing a noise level map as input, it is natural to
descent inference for image denoising. Samuel and Tap-
expect that the model performs well when the noise level map
pen [22] independently proposed a compact gradient descent
matches the ground-truth one of noisy input. Furthermore, the
inference learning framework, and discussed the advantages of
noise level map should also play the role of controlling the
discriminative learning over model-based optimization method
trade-off between noise reduction and detail preservation. It is
with MRF prior. Sun and Tappen [23] proposed a novel
found that heavy visual quality degradation may be engendered
nonlocal range MRF (NLR-MRF) framework, and employed
when setting a larger noise level to smooth out the details.
the gradient-based discriminative learning method to train the
We highlight this problem and adopt a method of orthogonal
model. Generally speaking, the methods above only learn
initialization on convolutional filters to alleviate this. Besides,
the prior parameters in a discriminative manner, while the
the proposed FFDNet works on downsampled sub-images,
inference parameters are stage-invariant.
which largely accelerates the training and testing speed, and
With the aid of unrolled half quadratic splitting (HQS)
enlarges the receptive field as well.
techniques, Schmidt et al. [24], [25] proposed a cascade of
Using images corrupted by AWGN, we quantitatively com- shrinkage fields (CSF) framework to learn stage-wise inference
pare FFDNet with state-of-the-art denoising methods, includ- parameters. Chen et al. [17] further proposed a trainable non-
ing model-based methods such as BM3D [11] and WNNM [5] linear reaction diffusion (TNRD) model through discriminative
and discriminative learning based methods such as TNRD [17] learning of a compact gradient descent inference step. Recent-
and DnCNN [20]. The results clearly demonstrate the supe- ly, Lefkimmiatis [26] and Qiao et al. [27] adopted a proximal
riority of FFDNet in terms of both denoising performance gradient-based denoising inference from a variational model to
and computational efficiency. In addition, FFDNet performs incorporate the nonlocal self-similarity prior. It is worth noting
favorably on images corrupted by spatially variant AWGN. that, apart from MAP inference, Vemulapalli et al. [28] derived
We further evaluate FFDNet on real-world noisy images, an end-to-end trainable patch-based denoising network based
where the noise is often signal-dependent, non-Gaussian and on Gaussian Conditional Random Field (GCRF) inference.
spatially variant. The proposed FFDNet model still achieves MAP inference guided discriminative learning usually re-
perceptually convincing results by setting proper noise level quires much fewer inference steps, and is very efficient in
maps. Overall, FFDNet enjoys high potentials for practical image denoising. It also has clear interpretability because
denoising applications. the discriminative architecture is derived from optimization
The main contribution of our work is summarized as fol- algorithms such as HQS and gradient descent [17], [21], [22],
lows: [23], [24]. However, the learned priors and inference procedure
• A fast and flexible denoising network, namely FFDNet, is are limited by the form of MAP model [25], and generally
proposed for discriminative image denoising. By taking a perform inferior to the state-of-the-art CNN-based denoisers.
tunable noise level map as input, a single FFDNet is able For example, the inference of CSF [24] is not very flexible
to deal with noise on different levels, as well as spatially since it is strictly derived from the HQS optimization under
variant noise. the field of experts (FoE) framework. The capacity of FoE is
• We highlight the importance to guarantee the role of the however not large enough to fully characterize image priors,
noise level map in controlling the trade-off between noise which in turn makes CSF less effective. For these reasons,
reduction and detail preservation. Kruse et al. [29] generalized CSF for better performance by
• FFDNet exhibits perceptually appealing results on both replacing some modular parts of unrolled inference with more
synthetic noisy images corrupted by AWGN and real- powerful CNN.
world noisy images, demonstrating its potential for prac-
tical image denoising. B. Plain Discriminative Learning
The remainder of this paper is organized as follows. Sec. II Instead of modeling image priors explicitly, the plain dis-
reviews existing discriminative denoising methods. Sec. III criminative learning methods learn a direct mapping function
presents the proposed image denoising model. Sec. IV reports to model image prior implicitly. The multi-layer perceptron
the experimental results. Sec. V concludes the paper. (MLP) and CNNs have been adopted to learn such priors.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
ÿÿÿÿ!"#
ÿÿÿÿ!"#
ÿÿ!"#
Fig. 1. The architecture of the proposed FFDNet for image denoising. The input image is reshaped to four sub-images, which are then input to the CNN
together with a noise level map. The final output is reconstructed by the four denoised sub-images.
The use of CNN for image denoising can be traced back In this work, we take a tunable noise level map M as input to
to [19], where a five-layer network with sigmoid nonlinearity make the denoising model flexible to noise levels. To improve
was proposed. Subsequently, auto-encoder based methods have the efficiency of the denoiser, a reversible downsampling oper-
been suggested for image denoising [30], [31]. However, ator is introduced to reshape the input image of size W ×H×C
early MLP and CNN-based methods are limited in denoising into four downsampled sub-images of size W H
2 × 2 × 4C. Here
performance and cannot compete with the benchmark BM3D C is the number of channels, i.e., C = 1 for grayscale image
method [11]. and C = 3 for color image. In order to enable the noise level
The first discriminative denoising method which achieves map to robustly control the trade-off between noise reduction
comparable performance with BM3D is the plain MLP method and detail preservation by introducing no visual artifacts, we
proposed by Burger et al. [18]. Benefitted from the advances in adopt the orthogonal initialization method to the convolution
deep CNN, Zhang et al. [20] proposed a plain denoising CNN filters.
(DnCNN) method which achieves state-of-the-art denoising
performance. They showed that residual learning and batch A. Network Architecture
normalization [32] are particularly useful for the success of Fig. 1 illustrates the architecture of FFDNet. The first layer
denoising. For a better trade-off between accuracy and speed, is a reversible downsampling operator which reshapes a noisy
Zhang et al. [9] introduced a 7-layer denoising network with image y into four downsampled sub-images. We further con-
dilated convolution [33] to expand the receptive field of CNN. catenate a tunable noise level map M with the downsampled
Mao et al. [34] proposed a very deep fully convolutional sub-images to form a tensor ỹ of size W H
2 × 2 × (4C + 1) as
encoding-decoding network with symmetric skip connection the inputs to CNN. For spatially invariant AWGN with noise
for image denoising. Santhanam et al. [35] developed a recur- level σ, M is a uniform map with all elements being σ.
sively branched deconvolutional network (RBDN) for image With the tensor ỹ as input, the following CNN consist-
denoising as well as generic image-to-image regression. Tai s of a series of 3 × 3 convolution layers. Each layer is
et al. [36] proposed a very deep persistent memory network composed of three types of operations: Convolution (Conv),
(MemNet) by introducing a memory block to mine persistent Rectified Linear Units (ReLU) [37], and Batch Normalization
memory through an adaptive learning process. (BN) [32]. More specifically, “Conv+ReLU” is adopted for
Plain discriminative learning has shown better performance the first convolution layer, “Conv+BN+ReLU” for the middle
than MAP inference guided discriminative learning; however, layers, and “Conv” for the last convolution layer. Zero-padding
existing discriminative learning methods have to learn multiple is employed to keep the size of feature maps unchanged
models for handling images with different noise levels, and are after each convolution. After the last convolution layer, an
incapable to deal with spatially variant noise. To the best of upscaling operation is applied as the reverse operator of the
our knowledge, it remains an unaddressed issue to develop a downsampling operator applied in the input stage to produce
single discriminative denoising model which can handle noise the estimated clean image x̂ of size W × H × C. Different
of different levels, even spatially variant noise, in a speed even from DnCNN, FFDNet does not predict the noise. The reason
faster than BM3D. is given in Sec. III-F. Since FFDNet operates on downsampled
sub-images, it is not necessary to employ the dilated convolu-
III. P ROPOSED FAST AND F LEXIBLE D ISCRIMINATIVE tion [33] to further increase the receptive field.
CNN D ENOISER By considering the balance of complexity and performance,
We present a single discriminative CNN model, namely we empirically set the number of convolution layers as 15 for
FFDNet, to achieve the following three objectives: grayscale image and 12 for color image. As to the channels
• Fast speed: The denoiser is expected to be highly efficient of feature maps, we set 64 for grayscale image and 96 for
without sacrificing denoising performance. color image. The reason that we use different settings for
• Flexibility: The denoiser is able to handle images with grayscale and color images is twofold. First, since there are
different noise levels and even spatially variant noise. high dependencies among the R, G, B channels, using a
• Robustness: The denoiser should introduce no visual arti- smaller number of convolution layers encourages the model
facts in controlling the trade-off between noise reduction to exploit the inter-channel dependency. Second, color image
and detail preservation. has more channels as input, and hence more feature (i.e.,
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
more channels of feature map) is required. According to inherit the flexibility of handling noise model with different
our experimental results, increasing the number of feature parameters, even spatially variant noises by noting M can be
maps contributes more to the denoising performance on color non-uniform.
images. Using different settings for color images, FFDNet can
bring an average gain of 0.15dB by PSNR on different noise C. Denoising on Sub-images
levels. As we shall see from Sec. IV-F, 12-layer FFDNet
Efficiency is another crucial issue for practical CNN-based
for color image runs slightly slower than 15-layer FFDNet
denoising. One straightforward idea is to reduce the depth
for grayscale image. Taking both denoising performance and
and number of filters. However, such a strategy will sacrifice
efficiency into account, we set the number of convolution
much the modeling capacity and receptive field of CNN [20].
layers as 12 and the number of feature maps as 96 for color
In [9], dilated convolution is introduced to expand receptive
image denoising.
field without the increase of network depth, resulting in a 7-
layer denoising CNN. Unfortunately, we empirically find that
B. Noise Level Map FFDNet with dilated convolution tends to generate artifacts
Let’s first revisit the model-based image denoising methods around sharp edges.
to analyze why they are flexible in handling noises at different Shi et al. [39] proposed to extract deep features directly from
levels, which will in turn help us to improve the flexibility the low-resolution image for super-resolution, and introduced
of CNN-based denoiser. Most of the model-based denoising a sub-pixel convolution layer to improve computational effi-
methods aim to solve the following problem ciency. In the application of image denoising, we introduce a
reversible downsampling layer to reshape the input image into
1 a set of small sub-images. Here the downsampling factor is set
x̂ = arg min x ky − xk2 + λΦ(x), (1)
2σ 2 to 2 since it can largely improve the speed without reducing
where 2σ1 2 ky − xk2 is the data fidelity term with noise level modeling capacity. The CNN is deployed on the sub-images,
σ, Φ(x) is the regularization term associated with image and finally a sub-pixel convolution layer is adopted to reverse
prior, and λ controls the balance between the data fidelity the downsampling process.
and regularization terms. It is worth noting that in practice λ Denoising on downsampled sub-images can also effectively
governs the compromise between noise reduction and detail expand the receptive field which in turn leads to a moderate
preservation. When it is too small, much noise will remain; network depth. For example, the proposed network with a
on the opposite, details will be smoothed out along with depth of 15 and 3 × 3 convolution will have a large receptive
suppressing noise. field of 62 × 62. In contrast, a plain 15-layer CNN only has
With some optimization algorithms, the solution of Eqn. (1) a receptive field size of 31×31. We note that the receptive
actually defines an implicit function given by field of most state-of-the-art denoising methods ranges from
35 × 35 to 61 × 61 [20]. Further increase of receptive field ac-
x̂ = F(y, σ, λ; Θ). (2) tually benefits little in improving denoising performance [40].
Since λ can be absorbed into σ, Eqn. (2) can be rewritten as What is more, the introduction of subsampling and sub-pixel
convolution is effective in reducing the memory burden.
x̂ = F(y, σ; Θ). (3) Experiments are conducted to validate the effectiveness of
In this sense, setting noise level σ also plays the role of setting downsampling for balancing denoising accuracy and efficiency
λ to control the trade-off between noise reduction and detail on the BSD68 dataset with σ = 15 and 50. For grayscale
preservation. In a word, model-based methods are flexible in image denoising, we train a baseline CNN which has the same
handling images with various noise levels by simply specifying depth as FFDNet without downsampling. The comparison of
σ in Eqn. (3). average PSNR values is given as follows: (i) when σ is small
According to the above discussion, it is natural to utilize (i.e., 15), the baseline CNN slightly outperforms FFDNet by
CNN to learn an explicit mapping of Eqn. (3) which takes the 0.02dB; (ii) when σ is large (i.e., 50), FFDNet performs better
noise image and noise level as input. However, since the inputs than the baseline CNN by 0.09dB. However, FFDNet is nearly
y and σ have different dimensions, it is not easy to directly 3 times faster and is more memory-friendly than the baseline
feed them into CNN. Inspired by the patch based denoising CNN. As a result, by performing denoising on sub-images,
methods which actually set σ for each patch, we resolve the FFDNet significantly improves efficiency while maintaining
dimensionality mismatching problem by stretching the noise denoising performance.
level σ into a noise level map M. In M, all the elements are
σ. As a result, Eqn. (3) can be further rewritten as D. Examining the Role of Noise Level Map
By training the model with abundant data units (y, M; x),
x̂ = F(y, M; Θ). (4)
where M is exactly the noise level map of y, the model is
It is worth emphasizing that M can be extended to degradation expected to perform well when the noise level matches the
maps with multiple channels for more general noise models ground-truth one (see Fig. 2(a)). On the other hand, in practice,
such as the multivariate (3D) Gaussian noise model N (0, Σ) we may need to use the learned model to smooth out some
with zero mean and covariance matrix Σ in the RGB color details with a higher noise level map than the ground-truth
space [38]. As such, a single CNN model is expected to one (see Fig. 2(b)). In other words, one may take advantage
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
real noisy images whose noises are much more complex than
AWGN (see the results of DnCNN-B in Fig. 8). Actually, since
the CNN model can be treated as the inference of Eqn. (1)
and the data fidelity term corresponds to the degradation
process (or the noise model), the modeling accuracy of the
degradation process is very important for the success of a
denoising model. For example, a model trained for AWGN
removal is not expected to be still effective for Poisson noise
removal. By contrast, the non-blind FFDNet model can be
viewed as multiple denoisers, each of which is anchored with
a noise level. Accordingly, it has the ability to control the
(a) (b) (c) trade-off between noise removal and detail preservation which
in turn facilitates the removal of real noise to some extent (see
Fig. 2. An example to show the importance of guaranteeing the role of
noise level map in controlling the trade-off between noise reduction and detail the results of DnCNN and FFDNet in Fig. 8).
preservation. The input is a noisy image with noise level 25. (a) Result without Second, the performance for AWGN removal is different.
visual artifacts by matched noise level 25. (b) Result without visual artifacts The non-blind model with noise level map has moderately
by mismatched noise level 60. (c) Result with visual artifacts by mismatched
noise level 60. better performance for AWGN removal than the blind one
(about 0.1dB gain on average for the BSD68 dataset), possibly
of the role of λ to control the trade-off between noise reduction because the noise level map provides additional information
and detail preservation. Hence, it is very necessary to further to the input. Similar phenomenon has also been recognized in
examine whether M can play the role of λ. the task of single image super-resolution (SISR) [47].
Third, the application range is different. In the variable
Unfortunately, the use of both M and y as input also
splitting algorithms for general image restoration tasks, the
increases the difficulty to train the model. According to our
prior term involves a denoising subproblem with a current
experiments on several learned models, the model may give
noise level [6], [7], [8]. Thus, the non-blind model can be
rise to visual artifacts especially when the input noise level
easily plugged into variable splitting algorithms to solve var-
is much higher than the ground-truth one (see Fig. 2(c)),
ious image restoration tasks, such as image deblurring, SISR,
which indicates M fails to control the trade-off between noise
and image inpainting [9], [48]. However, the blind model does
reduction and detail preservation. Note that it does not mean
not have this merit.
all the models suffer from such problem. One possible solution
to avoid this is to regularize the convolution filters. As a
F. Residual vs. Non-residual Learning of Plain CNN
widely-used regularization method, orthogonal regularization
has proven to be effective in eliminating the correlation be- It has been pointed out that the integration of residual
tween convolution filters, facilitating gradient propagation and learning for plain CNN and batch normalization is beneficial
improving the compactness of the learned model. In addition, to the removal of AWGN as it eases the training and tends
recent studies have demonstrated the advantage of orthogonal to deliver better performance [20]. The main reason is that
regularization in enhancing the network generalization ability the residual (noise) output follows a Gaussian distribution
in applications of deep hashing and image classification [41], which facilitates the Gaussian normalization step of batch
[42], [43], [44], [45]. According to our experiments, we normalization. The denoising network gains most from such
empirically find that the orthogonal initialization of the con- a task-specific merit especially when a single noise level is
volution filters [43], [46] works well in suppressing the above considered.
mentioned visual artifacts. In FFDNet, we instead consider a wide range of noise
It is worth emphasising that this section aims to highlight level and introduce a noise level map as input. Thus, it
the necessity of guaranteeing the role of M in controlling is interesting to revisit the integration of residual learning
the trade-off between noise reduction and detail preservation and batch normalization for plain CNN. According to our
rather than proposing a method to avoid the possible visual experiments, batch normalization can always accelerate the
artifacts caused by noise level mismatch. In practice, one may training of denoising network regardless of the residual or
retrain the model until M plays its role and results in no visual non-residual learning strategy of plain CNN. In particular,
artifacts with a lager noise level. with batch normalization, while the residual learning enjoys
a faster convergence than non-residual learning, their final
performances after fine-tuning are almost exactly the same.
E. FFDNet vs. a Single Blind Model Some recent works have proposed to train very deep plain
So far, we have known that it is possible to learn a networks with nearly the same performance to that with resid-
single model for blind and non-blind Gaussian denoising, ual learning [44], [49]. In fact, when a network is moderately
respectively. And it is of significant importance to clarify their deep (e.g., less than 20), it is feasible to train a plain network
differences. without the residual learning strategy by using advanced CNN
First, the generalization ability is different. Although the training and design techniques such as ReLU [37], batch
blind model performs favorably for synthetic AWGN removal normalization [32] and Adam [50]. For simplicity, we do not
without knowing the noise level, it does not generalize well to use residual learning for network design.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
Fig. 3. Denoising results on image “102061” from the BSD68 dataset with noise level 50 by different methods. (a) BM3D (26.21dB); (b) WNNM (26.51dB);
(c) MLP (26.54dB); (d) TNRD (26.59dB); (e) DnCNN (26.89dB); (f) FFDNet (26.93dB).
TABLE II
T HE PSNR( D B) RESULTS OF DIFFERENT METHODS ON S ET 12 DATASET WITH NOISE LEVELS 15, 25 35, 50 AND 75. T HE BEST TWO RESULTS ARE
HIGHLIGHTED IN RED AND BLUE COLORS , RESPECTIVELY
Images C.man House Peppers Starfish Monarch Airplane Parrot Lena Barbara Boat Man Couple Average
Noise Level σ = 15
BM3D 31.91 34.93 32.69 31.14 31.85 31.07 31.37 34.26 33.10 32.13 31.92 32.10 32.37
WNNM 32.17 35.13 32.99 31.82 32.71 31.39 31.62 34.27 33.60 32.27 32.11 32.17 32.70
TNRD 32.19 34.53 33.04 31.75 32.56 31.46 31.63 34.24 32.13 32.14 32.23 32.11 32.50
DnCNN 32.61 34.97 33.30 32.20 33.09 31.70 31.83 34.62 32.64 32.42 32.46 32.47 32.86
FFDNet 32.42 35.01 33.10 32.02 32.77 31.58 31.77 34.63 32.50 32.35 32.40 32.45 32.75
Noise Level σ = 25
BM3D 29.45 32.85 30.16 28.56 29.25 28.42 28.93 32.07 30.71 29.90 29.61 29.71 29.97
WNNM 29.64 33.22 30.42 29.03 29.84 28.69 29.15 32.24 31.24 30.03 29.76 29.82 30.26
MLP 29.61 32.56 30.30 28.82 29.61 28.82 29.25 32.25 29.54 29.97 29.88 29.73 30.03
TNRD 29.72 32.53 30.57 29.02 29.85 28.88 29.18 32.00 29.41 29.91 29.87 29.71 30.06
DnCNN 30.18 33.06 30.87 29.41 30.28 29.13 29.43 32.44 30.00 30.21 30.10 30.12 30.43
FFDNet 30.06 33.27 30.79 29.33 30.14 29.05 29.43 32.59 29.98 30.23 30.10 30.18 30.43
Noise Level σ = 35
BM3D 27.92 31.36 28.51 26.86 27.58 26.83 27.40 30.56 28.98 28.43 28.22 28.15 28.40
WNNM 28.08 31.92 28.75 27.27 28.13 27.10 27.69 30.73 29.48 28.54 28.33 28.24 28.69
MLP 28.08 31.18 28.54 27.12 27.97 27.22 27.72 30.82 27.62 28.53 28.47 28.24 28.46
DnCNN 28.61 31.61 29.14 27.53 28.51 27.52 27.94 30.91 28.09 28.72 28.66 28.52 28.82
FFDNet 28.54 31.99 29.18 27.58 28.54 27.47 28.02 31.20 28.29 28.82 28.70 28.68 28.92
Noise Level σ = 50
BM3D 26.13 29.69 26.68 25.04 25.82 25.10 25.90 29.05 27.22 26.78 26.81 26.46 26.72
WNNM 26.45 30.33 26.95 25.44 26.32 25.42 26.14 29.25 27.79 26.97 26.94 26.64 27.05
MLP 26.37 29.64 26.68 25.43 26.26 25.56 26.12 29.32 25.24 27.03 27.06 26.67 26.78
TNRD 26.62 29.48 27.10 25.42 26.31 25.59 26.16 28.93 25.70 26.94 26.98 26.50 26.81
DnCNN 27.03 30.00 27.32 25.70 26.78 25.87 26.48 29.39 26.22 27.20 27.24 26.90 27.18
FFDNet 27.03 30.43 27.43 25.77 26.88 25.90 26.58 29.68 26.48 27.32 27.30 27.07 27.32
Noise Level σ = 75
BM3D 24.32 27.51 24.73 23.27 23.91 23.48 24.18 27.25 25.12 25.12 25.32 24.70 24.91
WNNM 24.60 28.24 24.96 23.49 24.31 23.74 24.43 27.54 25.81 25.29 25.42 24.86 25.23
MLP 24.63 27.78 24.88 23.57 24.40 23.87 24.55 27.68 23.39 25.44 25.59 25.02 25.07
DnCNN 25.07 27.85 25.17 23.64 24.71 24.03 24.71 27.54 23.63 25.47 25.64 24.97 25.20
FFDNet 25.29 28.43 25.39 23.82 24.99 24.18 24.94 27.97 24.24 25.64 25.75 25.29 25.49
Fig. 5 gives an example to show the effectiveness of D. Experiments on Noise Level Sensitivity
FFDNet on removing spatially variant AWGN. We do not
compare FFDNet with other methods because no state-of-the- In practical applications, the noise level map may not be
art AWGN denoising method can be readily extended to handle accurately estimated from the noisy observation, and mismatch
spatially variant AWGN. From Fig. 5, one can see that FFDNet between the input and real noise levels is inevitable. If the
with non-uniform noise level map is flexible and powerful input noise level is lower than the real noise level, the noise
to remove spatially variant AWGN. In contrast, FFDNet with cannot be completely removed. Therefore, users often prefer
uniform noise level map would fail to remove strong noise to set a higher noise level to remove more noise. However,
at the region with higher noise level while smoothing out the this may also remove some image details together with noise.
details at the region with lower noise level. A practical denoiser should tolerate certain mismatch of noise
levels. In this subsection, we evaluate FFDNet in comparison
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
40 40
Noise Level
Noise Level
30 30
20 20
0
256
10
0
256
10
36
FFDNet-50 with better perceptual quality.
BM3D-5
BM3D-15
34
BM3D-25
BM3D-50
32
DnCNN-15
E. Experiments on Real Noisy Images
PSNR (dB)
30 DnCNN-25
DnCNN-50 In this subsection, real noisy images are used to further as-
28
sess the practicability of FFDNet. However, such an evaluation
26
is difficult to conduct due to the following reasons. (i) Both the
24
22
ground-truth clean image and noise level are unknown for real
20
noisy image. (ii) The real noise comes from various sources
18
such as camera imaging pipeline (e.g., shot noise, amplifier
16
noise and quantization noise), scanning, lossy compression
14
and image resizing [62], [63], and it is generally non-Gaussian,
0 5 10 15 20 25 30 35 40 45 50
Image Noise Level
spatially variant, and signal-dependent. As a result, the AWGN
assumption in many denoising algorithms does not hold, and
Fig. 6. Noise level sensitivity curves of BM3D, DnCNN and FFDNet. The the associated noise level estimation methods do not work well
averaged PSNR results are evaluated on BSD68. for real noisy images.
Instead of adopting any noise level estimation methods, we
adopt an interactive strategy to handle real noisy images. First
with benchmark BM3D and DnCNN by varying different input of all, we empirically found that the assumption of spatially
noise levels for a given ground-truth noise level. invariant noise usually works well for most real noisy images.
Fig. 6 illustrates the noise level sensitivity curves of BM3D, We then employ a set of typical input noise levels to produce
DnCNN and FFDNet. Different methods with different input multiple outputs, and select the one which has best trade-off
noise levels (e.g., “FFDNet-15” represents FFDNet with input between noise reduction and detail preservation. Second, the
noise level fixed as 15) are evaluated on BSD68 images with spatially variant noise in most real-world images is signal-
noise level ranging from 0 to 50. Fig. 7 shows the visual dependent. In this case, we first sample several typical regions
comparisons between BM3D/CBM3D and FFDNet by setting of distinct colors. For each typical region, we apply different
different input noise levels to denoise a noisy image. Four noise levels with an interval of 5, and choose the best noise
typical image structures, including flat region, sharp edge, line level by observing the denoising results. The noise levels at
with high contrast, and line with low contrast, are selected other regions are then interpolated from the noise levels of
for visual comparison to investigate the noise level sensitivity the typical regions to constitute an approximated non-uniform
of BM3D and FFDNet. From Figs. 6 and 7, we have the noise level map. Our FFDNet focuses on non-blind denoising
following observations. and assumes the noise level map is known. In practice, some
• On all noise levels, FFDNet achieves similar denoising advanced noise level estimation methods [62], [64] can be
results to BM3D and DnCNN when their input noise adopted to assist the estimation of noise level map. In our
levels are the same. following experiments, unless otherwise specified, we assume
• With the fixed input noise level, for all the three methods, spatially invariant noise for the real noisy images.
the PSNR value tends to stay the same when the ground- Since there is no ground-truth image for a real noisy image,
truth noise level is lower, and begins to decrease when visual comparison is employed to evaluate the performance
the ground-truth noise level is higher. of FFDNet. We choose BM3D for comparison because it is
• The best visual quality is obtained when the input noise widely accepted as a benchmark for denoising applications.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
10
Fig. 7. Visual comparisons between FFDNet and BM3D/CBM3D by setting different input noise levels to denoise a noisy image. (a) From top to bottom:
ground-truth image, four clean zoom-in regions, and the corresponding noisy regions (AWGN, noise level 15). (b) From top to bottom: denoising results
by BM3D with input noise levels 5, 10, 15, 20, 50, and 75, respectively. (c) Results by FFDNet with the same settings as in (b). (d) From top to bottom:
ground-truth image, four clean zoom-in regions, and the corresponding noisy regions (AWGN, noise level 25). (e) From top to bottom: denoising results by
CBM3D with input noise levels 10, 20, 25, 30, 45 and 60, respectively. (f) Results by FFDNet with the same settings as in (e).
Given a noisy image, the same input noise level is used for can handle various kinds of noises, such as JPEG lossy
BM3D and FFDNet. Another CNN-based denoising method compression noise (see image “Audrey Hepburn”), and video
DnCNN and a blind denoising method Noise Clinic [56] are noise (see image “Movie”).
also used for comparison. Note that, apart from the non-blind Fig. 11 shows a more challenging example to demonstrate
DnCNN models for specific noise levels, the blind DnCNN the advantage of FFDNet for denoising noisy images with
model (i.e., DnCNN-B) trained on noise level range of [0, 55] spatially variant noise. We select five typical regions to es-
is also used for grayscale image denoising. For color image timate the noise levels, including two background regions, the
denoising, the blind CDnCNN-B is used for comparison. coffee region, the milk-foam region, and the specular reflection
Fig. 8 compares the grayscale image denoising results of region. In our experiment, we manually and interactively set
Noise Clinic, BM3D, DnCNN, DnCNN-B and FFDNet on σ = 10 for the milk-foam and specular reflection regions, σ
RNI6 images. As one can see, Noise Clinic reduces much = 35 for the background region with high noise (i.e., green
the noise, but it also generates many algorithm-induced ar- region), and σ = 25 for the other regions. We then interpolate
tifacts. BM3D, DnCNN and FFDNet produce more visually the non-uniform noise level map for the whole image based on
pleasant results. While the non-blind DnCNN models perform the estimated five noise levels. As one can see, while FFDNet
favorably, the blind DnCNN-B model performs poorly in with a small uniform input noise level can recover the details
removing the non-AWGN real noise. This phenomenon clearly of regions with low noise level, it fails to remove strong noise.
demonstrates the better generalization ability of non-blind On the other hand, FFDNet with a large uniform input noise
model over blind one for controlling the trade-off between level can remove strong noise but it will also smooth out the
noise removal and detail preservation. It is worth noting that, details in the region with low noise level. In contrast, the
for image “Building” which contains structured noise, Noise denoising result with a proper non-uniform noise level map
Clinic and BM3D fail to remove those structured noises not only preserves image details but also removes the strong
since the structured noises fit the nonlocal self-similarity prior noise.
adopted in Noise Clinic and BM3D. In contrast, FFDNet Finally, according to the above experiments on real noisy
and DnCNN successfully remove such noise without losing images, we can see that the FFDNet model trained with un-
underlying image textures. quantized image data performs well on 8-bit quantized real
noisy images.
Fig. 9 shows the denoising results of Noise Clinic, CBM3D,
CDnCNN-B and FFDNet on five color noisy images from
RNI15. It can be seen that CDnCNN-B yields very pleasing F. Running Time
results for noisy image with AWGN-like noise such as im- Table VI lists the running time results of BM3D, DnCNN
age “Frog”, and is still unable to handle non-AWGN noise. and FFDNet for denoising grayscale and color images with
Notably, from the denoising results of “Boy”, one can see size 256×256, 512×512 and 1,024×1,024. The evaluation was
that CBM3D remains the structured color noise unremoved performed in Matlab (R2015b) environment on a computer
whereas FFDNet removes successfully such kind of noise. We with a six-core Intel(R) Core(TM) i7-5820K CPU @ 3.3GHz,
can conclude that while the nonlocal self-similarity prior helps 32 GB of RAM and an Nvidia Titan X Pascal GPU. For
to remove random noise, it hinders the removal of structured BM3D, we evaluate its running time by denoising images
noise. In comparison, the prior implicitly learned by CNN is with noise level 25. For DnCNN, the grayscale and color
able to remove both random noise and structured noise. image denoising models have 17 and 20 convolution layers,
Fig. 10 further shows more visual results of FFDNet on the respectively. The Nvidia cuDNN-v5.1 deep learning library is
other nine images from RNI15. It can be seen that FFDNet used to accelerate the computation of DnCNN and FFDNet.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
11
Fig. 8. Grayscale image denoising results by different methods on real noisy images. From top to bottom: noisy images, denoised images by Noise Clinic,
denoised images by BM3D, denoised images by DnCNN, denoised images by DnCNN-B, denoised images by FFDNet. (a) David Hilbert, σ = 14 (15 for
DnCNN); (b) Old Tom Morris, σ = 15; (c) Chupa Chups, σ = 10; (d) Vinegar, σ = 20; (e) Building, σ = 20; (f) Marilyn, σ = 7 (10 for DnCNN).
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
12
Fig. 9. Color image denoising results by different methods on real noisy images. From top to bottom: noisy images, denoised images by Noise Clinic,
denoised images by CBM3D, denoised images by CDnCNN-B, denoised images by FFDNet. (a) Dog, σ = 28; (b) Frog, σ = 15; (c) Pattern1, σ = 12; (d)
Pattern2, σ = 40; (e) Boy, σ = 45.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
13
Third, FFDNet spends almost the same time for processing [13] W. Dong, L. Zhang, G. Shi, and X. Li, “Nonlocally centralized sparse
grayscale and color images. More specifically, FFDNet with representation for image restoration,” IEEE Transactions on Image
Processing, vol. 22, no. 4, pp. 1620–1630, 2013.
multi-threaded implementation is about three times faster than [14] M. Elad and M. Aharon, “Image denoising via sparse and redundant
DnCNN and BM3D on CPU, and much faster than DnCNN representations over learned dictionaries,” IEEE Transactions on Image
on GPU. Even with single-threaded implementation, FFDNet Processing, vol. 15, no. 12, pp. 3736–3745, 2006.
[15] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image
is also faster than BM3D. Taking denoising performance and restoration,” IEEE Transactions on Image Processing, vol. 17, no. 1, pp.
flexibility into consideration, FFDNet is very competitive for 53–69, 2008.
practical applications. [16] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for
image denoising,” in IEEE Conference on Computer Vision and Pattern
Recognition, vol. 2, 2005, pp. 60–65.
V. C ONCLUSION [17] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible
In this paper, we proposed a new CNN model, namely FFD- framework for fast and effective image restoration,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1256–
Net, for fast, effective and flexible discriminative denoising. To 1272, 2017.
achieve this goal, several techniques were utilized in network [18] H. C. Burger, C. J. Schuler, and S. Harmeling, “Image denoising: Can
design and training, such as the use of noise level map as plain neural networks compete with BM3D?” in IEEE Conference on
Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.
input and denoising in downsampled sub-images space. The
[19] V. Jain and S. Seung, “Natural image denoising with convolutional
results on synthetic images with AWGN demonstrated that networks,” in Advances in Neural Information Processing Systems, 2009,
FFDNet can not only produce state-of-the-art results when pp. 769–776.
input noise level matches ground-truth noise level, but also [20] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
Gaussian denoiser: Residual learning of deep CNN for image denoising,”
have the ability to robustly control the trade-off between IEEE Transactions on Image Processing, vol. 26, no. 7, pp. 3142–3155,
noise reduction and detail preservation. The results on im- July 2017.
ages with spatially variant AWGN validated the flexibility [21] A. Barbu, “Training an active random field for real-time image denois-
ing,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2451–
of FFDNet for handing inhomogeneous noise. The results 2462, 2009.
on real noisy images further demonstrated that FFDNet can [22] K. G. Samuel and M. F. Tappen, “Learning optimized MAP estimates in
deliver perceptually appealing denoising results. Finally, the continuously-valued MRF models,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2009, pp. 477–484.
running time comparisons showed the faster speed of FFDNet [23] J. Sun and M. F. Tappen, “Learning non-local range markov random
over other competing methods such as BM3D. Considering field for image restoration,” in IEEE Conference on Computer Vision
its flexibility, efficiency and effectiveness, FFDNet provides a and Pattern Recognition, 2011, pp. 2745–2752.
[24] U. Schmidt and S. Roth, “Shrinkage fields for effective image restora-
practical solution to CNN denoising applications. tion,” in IEEE Conference on Computer Vision and Pattern Recognition,
2014, pp. 2774–2781.
R EFERENCES [25] U. Schmidt, “Half-quadratic inference and learning for natural images,”
Ph.D. dissertation, Technische Universität, Darmstadt, 2017. [Online].
[1] H. C. Andrews and B. R. Hunt, “Digital image restoration,” Prentice-
Available: http://tuprints.ulb.tu-darmstadt.de/6044/
Hall Signal Processing Series, Englewood Cliffs: Prentice-Hall, 1977,
vol. 1, 1977. [26] S. Lefkimmiatis, “Non-local color image denoising with convolutional
[2] P. Chatterjee and P. Milanfar, “Is denoising dead?” IEEE Transactions neural networks,” in IEEE Conference on Computer Vision and Pattern
on Image Processing, vol. 19, no. 4, pp. 895–911, 2010. Recognition, 2017, pp. 3587–3596.
[3] S. Roth and M. J. Black, “Fields of experts: A framework for learning [27] P. Qiao, Y. Dou, W. Feng, R. Li, and Y. Chen, “Learning non-local
image priors,” in IEEE Computer Society Conference on Computer image diffusion for image denoising,” in Proceedings of the 2017 ACM
Vision and Pattern Recognition, vol. 2, 2005, pp. 860–867. on Multimedia Conference, 2017, pp. 1847–1855.
[4] D. Zoran and Y. Weiss, “From learning models of natural image patches [28] R. Vemulapalli, O. Tuzel, and M.-Y. Liu, “Deep gaussian conditional
to whole image restoration,” in IEEE International Conference on random field network: A model-based deep network for discriminative
Computer Vision, 2011, pp. 479–486. denoising,” in IEEE Conference on Computer Vision and Pattern Recog-
[5] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted nuclear norm nition, June 2016.
minimization with application to image denoising,” in IEEE Conference [29] J. Kruse, C. Rother, and U. Schmidt, “Learning to push the limits
on Computer Vision and Pattern Recognition, 2014, pp. 2862–2869. of efficient FFT-based image deconvolution,” in IEEE International
[6] M. V. Afonso, J. M. Bioucas-Dias, and M. A. Figueiredo, “Fast image Conference on Computer Vision, Oct 2017.
recovery using variable splitting and constrained optimization,” IEEE [30] J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with
Transactions on Image Processing, vol. 19, no. 9, pp. 2345–2356, 2010. deep neural networks,” in Advances in Neural Information Processing
[7] F. Heide, M. Steinberger, Y.-T. Tsai, M. Rouf, D. Pajak, D. Reddy, Systems, 2012, pp. 341–349.
O. Gallo, J. Liu, W. Heidrich, K. Egiazarian et al., “FlexISP: A flexible [31] F. Agostinelli, M. R. Anderson, and H. Lee, “Robust image denoising
camera image processing framework,” ACM Transactions on Graphics, with multi-column deep neural networks,” in Advances in Neural Infor-
vol. 33, no. 6, p. 231, 2014. mation Processing Systems, 2013, pp. 1493–1501.
[8] Y. Romano, M. Elad, and P. Milanfar, “The little engine that could: [32] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep
Regularization by denoising (RED),” submitted to SIAM Journal on network training by reducing internal covariate shift,” in International
Imaging Sciences, 2016. Conference on Machine Learning, 2015, pp. 448–456.
[9] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep CNN denoiser [33] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated
prior for image restoration,” in IEEE Conference on Computer Vision convolutions,” in International Conference on Learning Representations,
and Pattern Recognition, 2017, pp. 3929–3938. 2016.
[10] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image [34] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep
denoising using scale mixtures of gaussians in the wavelet domain,” convolutional encoder-decoder networks with symmetric skip connec-
IEEE Transactions on Image processing, vol. 12, no. 11, pp. 1338–1351, tions,” in Advances in Neural Information Processing Systems, 2016,
2003. pp. 2802–2810.
[11] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by [35] V. Santhanam, V. I. Morariu, and L. S. Davis, “Generalized deep image
sparse 3-D transform-domain collaborative filtering,” IEEE Transactions to image regression,” in IEEE Conference on Computer Vision and
on Image Processing, vol. 16, no. 8, pp. 2080–2095, 2007. Pattern Recognition, 2017, pp. 5609–5619.
[12] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Non-local [36] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory
sparse models for image restoration,” in IEEE International Conference network for image restoration,” in IEEE International Conference on
on Computer Vision, 2009, pp. 2272–2279. Computer Vision, 2017, pp. 4539–4547.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
14
Fig. 10. More denoising results of FFDNet on real image denoising. (a) Flowers, σ = 70; (b) Bears, σ = 15; (c) Audrey Hepburn, σ = 10; (d) Postcards, σ
= 15; (e) Stars, σ = 18; (f) Window, σ = 15; (g) Singer, σ = 30; (h) Movie, σ = 12; (i) Pattern3, σ = 25.
Fig. 11. An example of FFDNet on image “Glass” with spatially variant noise. (a) Noisy image; (b) Denoised image by Noise Clinic; (c) Denoised image
by FFDNet with σ = 10; (d) Denoised image by FFDNet with σ = 25; (e) Denoised image by FFDNet with σ = 35; (f) Denoised image by FFDNet with
non-uniform noise level map.
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2018.2839891, IEEE
Transactions on Image Processing
15
[37] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- [62] C. Liu, R. Szeliski, S. B. Kang, C. L. Zitnick, and W. T. Freeman,
tion with deep convolutional neural networks,” in Advances in neural “Automatic estimation and removal of noise from a single image,” IEEE
information processing systems, 2012, pp. 1097–1105. Transactions on Pattern Analysis and Machine Intelligence, vol. 30,
[38] S. Nam, Y. Hwang, Y. Matsushita, and S. Joo Kim, “A holistic no. 2, pp. 299–314, 2008.
approach to cross-channel image noise modeling and its application to [63] M. Colom, M. Lebrun, A. Buades, and J.-M. Morel, “A non-parametric
image denoising,” in IEEE Conference on Computer Vision and Pattern approach for the estimation of intensity-frequency dependent noise,” in
Recognition, June 2016. IEEE International Conference on Image Processing, 2014, pp. 4261–
[39] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, 4265.
D. Rueckert, and Z. Wang, “Real-time single image and video super- [64] L. Azzari and A. Foi, “Gaussian-cauchy mixture modeling for robust
resolution using an efficient sub-pixel convolutional neural network,” in signal-dependent noise estimation,” in 2014 IEEE International Confer-
IEEE Conference on Computer Vision and Pattern Recognition, 2016, ence on Acoustics, Speech and Signal Processing (ICASSP), 2014, pp.
pp. 1874–1883. 5357–5361.
[40] A. Levin and B. Nadler, “Natural image denoising: Optimality and
inherent bounds,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2011, pp. 2833–2840.
[41] D. Wang, P. Cui, M. Ou, and W. Zhu, “Deep multimodal hashing
with orthogonal regularization.” in International Joint Conference on
Artificial Intelligence, 2015, pp. 2291–2297.
[42] Z. Mhammedi, A. Hellicar, A. Rahman, and J. Bailey, “Efficient or- Kai Zhang received the M.Sc. degree in ap-
thogonal parametrisation of recurrent neural networks using householder plied mathematics from China Jiliang University,
reflections,” arXiv preprint arXiv:1612.00188, 2016. Hangzhou, China, in 2014. He is currently pursuing
[43] K. Jia, “Improving training of deep neural networks via singular value the Ph.D. degree in computer science and technology
bounding,” in IEEE Conference on Computer Vision and Pattern Recog- at Harbin Institute of Technology, Harbin, China,
nition, 2017, pp. 4344–4352. under the supervision of Prof. Wangmeng Zuo and
[44] D. Xie, J. Xiong, and S. Pu, “All you need is beyond a good init: Ex- Prof. Lei Zhang. From July 2015 to June 2017,
ploring better solution for training extremely deep convolutional neural he was a Research Assistant in the Department of
networks with orthonormality and modulation,” in IEEE Conference on Computing, The Hong Kong Polytechnic University,
Computer Vision and Pattern Recognition, 2017, pp. 6176–6185. Hong Kong. His research interests include machine
[45] Y. Sun, L. Zheng, W. Deng, and S. Wang, “SVDNet for pedestrian learning and image processing.
retrieval,” arXiv preprint arXiv:1703.05693, 2017.
[46] D. Mishkin and J. Matas, “All you need is a good init,” ArXiv e-prints,
2015.
[47] G. Riegler, S. Schulter, M. Ruther, and H. Bischof, “Conditioned
regression models for non-blind single image super-resolution,” in IEEE
International Conference on Computer Vision, 2015, pp. 522–530.
[48] S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-Play ADMM Wangmeng Zuo (M’09-SM’14) received the Ph.D.
for image restoration: Fixed-point convergence and applications,” IEEE degree in computer application technology from the
Transactions on Computational Imaging, vol. 3, no. 1, pp. 84–98, 2017. Harbin Institute of Technology, Harbin, China, in
[49] S. Zagoruyko and N. Komodakis, “Diracnets: training very deep neural 2007. He is currently a Professor in the School of
networks without skip-connections,” arXiv preprint arXiv:1706.00388, Computer Science and Technology, Harbin Institute
2017. of Technology. His current research interests include
[50] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” image enhancement and restoration, image and face
in International Conference for Learning Representations, 2015. editing, object detection, visual tracking, and image
[51] T. Plotz and S. Roth, “Benchmarking denoising algorithms with real classification. He has published over 70 papers in
photographs,” in The IEEE Conference on Computer Vision and Pattern toptier academic journals and conferences. He has
Recognition (CVPR), July 2017. served as a Tutorial Organizer in ECCV 2016, an
[52] J.-S. Lee, “Refined filtering of image noise using local statistics,” Associate Editor of the IET Biometrics and Journal of Electronic Imaging.
Computer graphics and image processing, vol. 15, no. 4, pp. 380–389,
1981.
[53] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
A large-scale hierarchical image database,” in IEEE Conference on
Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[54] K. Ma, Z. Duanmu, Q. Wu, Z. Wang, H. Yong, H. Li, and L. Zhang,
“Waterloo exploration database: New challenges for image quality Lei Zhang (M’04-SM’14-F’18) received his B.Sc.
assessment models,” IEEE Transactions on Image Processing, vol. 26, degree in 1995 from Shenyang Institute of Aero-
no. 2, pp. 1004–1016, 2017. nautical Engineering, Shenyang, P.R. China, and
[55] A. Vedaldi and K. Lenc, “MatConvNet: Convolutional neural networks M.Sc. and Ph.D degrees in Control Theory and Engi-
for matlab,” in ACM Conference on Multimedia Conference, 2015, pp. neering from Northwestern Polytechnical University,
689–692. Xi’an, P.R. China, respectively in 1998 and 2001,
[56] M. Lebrun, M. Colom, and J.-M. Morel, “The noise clinic: A blind respectively. From 2001 to 2002, he was a research
image denoising algorithm,” Image Processing On Line, vol. 5, pp. associate in the Department of Computing, The
1–54, 2015. [Online]. Available: http://demo.ipol.im/demo/125/ Hong Kong Polytechnic University. From January
[57] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human 2003 to January 2006 he worked as a Postdoctoral
segmented natural images and its application to evaluating segmentation Fellow in the Department of Electrical and Computer
algorithms and measuring ecological statistics,” in Proc. 8th Int’l Conf. Engineering, McMaster University, Canada. In 2006, he joined as an Assistant
Computer Vision, vol. 2, July 2001, pp. 416–423. Professor with the Department of Computing, The Hong Kong Polytechnic
[58] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, University, where he has been a Chair Professor, Since 2017. He has published
J. Winn, and A. Zisserman, “The pascal visual object classes challenge: over 200 papers in those areas. His research interests include computer vision,
A retrospective,” International Journal of Computer Vision, vol. 111, pattern recognition, image and video analysis, and biometrics. As of 2018, his
no. 1, pp. 98–136, Jan 2015. publications have been cited over 33,000 times in the literature. Prof. Zhang
[59] R. Franzen, “Kodak lossless true color image suite,” source: http://r0k. is an Associate Editor of IEEE Trans. on Image Processing, SIAM Journal
us/graphics/kodak, vol. 4, 1999. of Imaging Sciences and Image and Vision Computing, etc. He is a “Web of
[60] L. Zhang, X. Wu, A. Buades, and X. Li, “Color demosaicking by local Science Highly Cited Researcher” from 2015 to 2017. More information can
directional interpolation and nonlocal adaptive thresholding,” Journal of be found in his homepage http://www4.comp.polyu.edu.hk/∼cslzhang/.
Electronic Imaging, vol. 20, no. 2, pp. 1–15, 2011.
[61] [Online]. Available: https://ni.neatvideo.com/home
1057-7149 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.