Image - Anomaly - Detection With - GAN
Image - Anomaly - Detection With - GAN
1 Introduction
2 Background
Fig. 1. An illustration of ADGAN. In this example, ones from MNIST are considered
normal (yc = 1). After an initial draw from pz , the loss between the first generation
gθ0 (z0 ) and the image x whose anomaly we are assessing is computed. This information
is used to generate a consecutive image gθ1 (z1 ) more alike x. After k steps, samples are
scored. If x is similar to the training data (red example, y = yc ), then a similar object
should be contained in the image of gθk . For a dissimilar x (blue example, y = yc ), no
similar image is found, resulting in a large loss. (Color figure online)
samples drawn from some simple noise prior pz (e.g. a multivariate Gaussian)
to samples in the image space, thereby inducing a distribution qθ (the push-
forward of pz with respect to gθ ) that approximates p. To achieve this a second
neural network, the discriminator dω , learns to classify the data from p and qθ .
Through an alternating training procedure the discriminator becomes better at
separating samples from p and samples from qθ , while the generator adjusts θ
to fool the discriminator, thereby approximating p more closely. The objective
function of the GAN framework is thus:
min max V (θ, ω) = Ex∼p [log dω (x)] + Ez∼pz [log(1 − dω (gθ (z)))] , (1)
θ ω
3 Algorithm
Our proposed method (ADGAN, see Algorithm 1) sets in after GAN training has
converged. If the generator has indeed captured the distribution of the training
data then, given a new sample x ∼ p, there should exist a point z in the latent
space, such that gθ (z) ≈ x. Additionally we expect points away from the support
of p to have no representation in the latent space, or at least occupy a small
portion of the probability mass in the latent distribution, since they are easily
discerned by dω as not coming from p. Thus, given a test sample x, if there
exists no z such that gθ (z) ≈ x, or if such a z is difficult to find, then it can
1
That p may be approximated via transformations from a low-dimensional space is
an assumption that is implicitly motivated from the manifold hypothesis [36].
2
This lower bound becomes tight for an optimal discriminator, making apparent that
V (θ, ω ∗ ) ∝ JS[p|qθ ].
3
This is achieved by restricting the class over which the IPM is optimized to functions
that have Lipschitz constant less than one. Note that in Wasserstein GANs, an
expression corresponding to a lower bound is optimized.
Image Anomaly Detection with Generative Adversarial Networks 7
Fig. 2. The coordinates (z1 , z2 ) of 500 samples from MNIST are shown, represented in
a latent space with d = 2. At different iterations t of ADGAN, no particular structure
arises in the z-space: samples belonging to the normal class (•) and the anomalous
class (•) are scattered around freely. Note that this behavior also prevents pz (zt ) from
providing a sensible anomaly score. The sizes of points correspond to the reconstruction
loss between generated samples and their original image (gθ (zt ), x). The normal and
anomalous class differ markedly in terms of this metric. (Color figure online)
3.1 ADGAN
To find z, we initialize from z0 ∼ pz , where pz is the same noise prior also used
during GAN training. For t = 1, . . . , k steps, we backpropagate the reconstruc-
tion loss between gθ (zt ) and x, making the subsequent generation gθ (zt+1 )
more like x. At each iteration, we also allow a small amount of flexibility to
the parametrization of the generator, resulting in a series of mappings from the
latent space gθ0 (z0 ), . . . , gθk (zk ) that more and more closely resembles x. Adjust-
ing θ gives the generator additional representative capacity, which we found to
improve the algorithm’s performance. Note that these adjustments to θ are not
part of the GAN training procedure and θ is reset back to its original trained
value for each new testing point.
To limit the risk of seeding in unsuitable regions and address the non-convex
nature of the underlying optimization problem, the search is initialized from nseed
individual points. The key idea underlying ADGAN is that if the generator was
trained on the same distribution x was drawn from, then the average over the
final set of reconstruction losses {(x, gθj,k (zj,k ))}nj=1 seed
will assume low values,
and high values otherwise. In Fig. 2 we track a collection of samples through
their search in a latent space of dimensionality d = 2.
Our method may also be understood from the standpoint of approximate
inversion of the generator. In this sense, the above backpropagation finds latent
vectors z that lie close to gθ−1 (x). Inversion of the generator was previously
8 L. Deecke et al.
studied in [7], where it was verified experimentally that this task can be carried
out with high fidelity. In addition [29] showed that generated images can be
successfully recovered by backpropagating through the latent space.4 Jointly
optimizing latent vectors and the generator parametrization via backpropagation
of reconstruction losses was investigated in detail by [4]. The authors found that
it is possible to train the generator entirely without a discriminator, still yielding
a model that incorporates many of the desirable properties of GANs, such as
smooth interpolations between samples.
Given that GAN training also gives us a discriminator for discerning between
real and fake samples, one might reasonably consider directly applying the dis-
criminator for detecting anomalies. However, once converged, the discriminator
exploits checkerboard-like artifacts on the pixel level, induced by the generator
architecture [31,38]. While it perfectly separates real from forged data, it is not
equipped to deal with samples which are completely unlike the training data.
This line of reasoning is verified in Sect. 4 experimentally.
Another approach we considered was to evaluate the likelihood of the final
latent vectors {zj,k }nj=1
seed
under the noise prior pz . This approach was tested
experimentally in Sect. 4, and while it showed some promise, it was consistently
outperformed by ADGAN.
In [46], the authors propose a technique for anomaly detection (called Ano-
GAN) which uses GANs in a way somewhat similar to our proposed algo-
rithm. Their algorithm also begins by training a GAN. Given a test point x,
their algorithm searches for a point z in the latent space such that gθ (z) ≈ x
and computes the reconstruction loss. Additionally they use an intermediate
4
While it was shown that any gθ (z) may be reconstructed from some other z0 ∈ Rd ,
this does not mean that the same holds for an x not in the image of gθ .
Image Anomaly Detection with Generative Adversarial Networks 9
Table 1. ROC-AUC of classic anomaly detection methods. For both MNIST and
CIFAR-10, each model was trained on every class, as indicated by yc , and then used
to score against remaining classes. Results for KDE and OC-SVM are reported both
in conjunction with PCA, and after transforming images with a pre-trained Alexnet.
discriminator layer dω and compute the loss between dω (gθ (z)) and dω (x). They
use a convex combination of these two quantities as their anomaly score.
In ADGAN we never use the discriminator, which is discarded after train-
ing. This makes it easy to couple ADGAN with any GAN-based approach, e.g.
LSGAN [32], but also any other differentiable generator network such as VAEs or
moment matching networks [27]. In addition, we account for the non-convexity of
the underlying optimization by seeding from multiple areas in the latent space.
Lastly, during inference we update not only the latent vectors z, but jointly
update the parametrization θ of the generator.
4 Experiments
4.1 Datasets
Our experiments are carried out on three benchmark datasets with varying com-
plexity: (i) MNIST [26] which contains grayscale scans of handwritten digits. (ii)
CIFAR-10 [23] which contains color images of real world objects belonging to
ten classes. (iii) LSUN [52], a dataset of images that show different scenes (such
as bedrooms, bridges, or conference rooms). For all datasets the training and
test splits remain as their default. All images are rescaled to assume pixel values
in [−1, 1].
we pair it with DCGAN [44], the same architecture also used for training in our
approach.
ADGAN requires a trained generator. For this purpose, we trained on the
WGAN objective (2), as this was much more stable than using GANs. The archi-
tecture was fixed to that of DCGAN [44]. Following [34] we set the dimensionality
of the latent space to d = 256.
For ADGAN, the searches in the latent space were initialized from the same
noise prior that the GAN was trained on (in our case a normal distribution). To
take into account the non-convexity of the problem, we seeded with nseed = 64
points. For the optimization of latent vectors and the parameters of the generator
we used the Adam optimizer [21].5 When searching for a point in the latent space
to match a test point, we found that more iterations helped the performance, but
this gain saturates quickly. As a trade-off between execution time and accuracy
we found k = 5 to be a good value, and used this in the results we report. Unless
otherwise noted, we measured reconstruction quality with a squared L2 loss.
5
From a quick parameter sweep, we set the learning rate to γ = 0.25 and (β1 , β2 ) =
(0.5, 0.999). We update the generator with γθ = 5 · 10−5 , the default learning rate
recommended in [2].
12 L. Deecke et al.
in Fig. 3 (left). The second experiment follows this guideline with the colored
images from CIFAR-10, and the resulting ROC curves are shown in Fig. 3 (right).
In Table 1, we report the AUCs that resulted from leaving out each individual
class.
Fig. 4. Starting from the top left, the first three rows show samples contained in the
LSUN bedrooms validation set which, according to ADGAN, are the most anomalous
(have the highest anomaly score). Again starting from the top left corner, the bottom
rows contain images deemed normal (have the lowest score).
with the highest and lowest anomaly scores of three different scene categories
are shown in Figs. 4, 5, and 6. Note that the large training set sizes in this
experiment would complicate the use of non-parametric methods such as KDE
and OC-SVMs.
Fig. 5. Scenes from LSUN showing conference rooms as ranked by ADGAN. The top
rows contain anomalous samples, the bottom rows scenes categorized as normal.
Fig. 6. Scenes from LSUN showing churches, ranked by ADGAN. Top rows: anomalous
samples. Bottom rows: normal samples.
14 L. Deecke et al.
5 Conclusion
We showed that searching the latent space of the generator can be leveraged for
use in anomaly detection tasks. To that end, our proposed method: (i) delivers
state-of-the-art performance on standard image benchmark datasets; (ii) can be
used to scan large collections of unlabeled images for anomalous samples.
To the best of our knowledge we also reported the first results of using VAEs
for anomaly detection. We remain optimistic that boosting its performance is
possible by additional tuning of the underlying neural network architecture or
an informed substitution of the latent prior.
Accounting for unsuitable initializations by jointly optimizing latent vectors
and generator parameterization are key ingredients to help ADGAN achieve
strong experimental performance. Nonetheless, we are confident that approaches
such as initializing from an approximate inversion of the generator as in ALI
[9,10], or substituting the reconstruction loss for a more elaborate variant, such
as the Laplacian pyramid loss [28], can be used to improve our method further.
References
1. Arjovsky, M., Bottou, L.: Towards principled methods for training generative
adversarial networks. In: International Conference on Learning Representations
(2017)
2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks.
In: International Conference on Machine Learning, pp. 214–223 (2017)
3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning
to align and translate. In: International Conference on Learning Representations
(2015)
4. Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space
of generative networks. In: International Conference on Machine Learning (2018)
5. Campbell, C., Bennett, K.P.: A linear programming approach to novelty detection.
In: Advances in Neural Information Processing Systems, pp. 395–401 (2001)
6. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Com-
put. Surv. (CSUR) 41(3), 15 (2009)
7. Creswell, A., Bharath, A.A.: Inverting the generator of a generative adversarial
network. arXiv preprint arXiv:1611.05644 (2016)
8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale
hierarchical image database. In: Computer Vision and Pattern Recognition, pp.
248–255. IEEE (2009)
9. Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. In: Inter-
national Conference on Learning Representations (2017)
10. Dumoulin, V., et al.: Adversarially learned inference. In: International Conference
on Learning Representations (2017)
11. Dutta, H., Giannella, C., Borne, K., Kargupta, H.: Distributed top-k outlier detec-
tion from astronomy catalogs using the DEMAC system. In: International Confer-
ence on Data Mining, pp. 473–478. SIAM (2007)
12. Edgeworth, F.: XLI. on discordant observations. Lond. Edinb. Dublin Philos. Mag.
J. Sci. 23(143), 364–375 (1887)
13. Emmott, A.F., Das, S., Dietterich, T., Fern, A., Wong, W.K.: Systematic con-
struction of anomaly detection benchmarks from real data. In: ACM SIGKDD
Workshop on Outlier Detection and Description, pp. 16–21. ACM (2013)
14. Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and
large-scale anomaly detection using a linear one-class SVM with deep learning.
Pattern Recognit. 58, 121–134 (2016)
15. Eskin, E.: Anomaly detection over noisy data using learned probability distribu-
tions. In: International Conference on Machine Learning (2000)
16. Evangelista, P.F., Embrechts, M.J., Szymanski, B.K.: Some properties of the
gaussian kernel for one class learning. In: de Sá, J.M., Alexandre, L.A., Duch,
W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 269–278. Springer,
Heidelberg (2007). https://doi.org/10.1007/978-3-540-74690-4 28
17. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Infor-
mation Processing Systems, pp. 2672–2680 (2014)
18. Görnitz, N., Braun, M., Kloft, M.: Hidden Markov anomaly detection. In: Inter-
national Conference on Machine Learning, pp. 1833–1842 (2015)
19. Hu, W., Liao, Y., Vemuri, V.R.: Robust anomaly detection using support vector
machines. In: International Conference on Machine Learning, pp. 282–289 (2003)
20. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by
reducing internal covariate shift. In: International Conference on Machine Learning,
pp. 448–456 (2015)
16 L. Deecke et al.
21. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014)
22. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint
arXiv:1312.6114 (2013)
23. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images.
Technical report, University of Toronto (2009)
24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con-
volutional neural networks. In: Advances in Neural Information Processing Sys-
tems, pp. 1097–1105 (2012)
25. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In:
International Conference on Machine Learning, pp. 1188–1196 (2014)
26. LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.
com/exdb/mnist
27. Li, Y., Swersky, K., Zemel, R.: Generative moment matching networks. In: Inter-
national Conference on Machine Learning, pp. 1718–1727 (2015)
28. Ling, H., Okada, K.: Diffusion distance for histogram comparison. In: Computer
Vision and Pattern Recognition, pp. 246–253. IEEE (2006)
29. Lipton, Z.C., Tripathi, S.: Precise recovery of latent vectors from generative adver-
sarial networks. In: International Conference on Learning Representations, Work-
shop Track (2017)
30. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: International Conference
on Data Mining, pp. 413–422. IEEE (2008)
31. Lopez-Paz, D., Oquab, M.: Revisiting classifier two-sample tests. In: International
Conference on Learning Representations (2017)
32. Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares gener-
ative adversarial networks. In: International Conference on Computer Vision, pp.
2794–2802. IEEE (2017)
33. Masci, J., Meier, U., Cireşan, D., Schmidhuber, J.: Stacked convolutional auto-
encoders for hierarchical feature extraction. Artificial Neural Networks and
Machine Learning (ICANN), pp. 52–59 (2011)
34. Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled generative adversarial
networks. In: International Conference on Learning Representations (2017)
35. Müller, A.: Integral probability metrics and their generating classes of functions.
Adv. Appl. Probab. 29(2), 429–443 (1997)
36. Narayanan, H., Mitter, S.: Sample complexity of testing the manifold hypothesis.
In: Advances in Neural Information Processing Systems, pp. 1786–1794 (2010)
37. Nowozin, S., Cseke, B., Tomioka, R.: f-GAN: training generative neural samplers
using variational divergence minimization. In: Advances in Neural Information
Processing Systems, pp. 271–279 (2016)
38. Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts.
Distill 1(10), e3 (2016)
39. Ourston, D., Matzner, S., Stump, W., Hopkins, B.: Applications of hidden Markov
models to detecting multi-stage network attacks. In: Proceedings of the 36th
Annual Hawaii International Conference on System Sciences. IEEE (2003)
40. Parzen, E.: On estimation of a probability density function and mode. Ann. Math.
Stat. 33(3), 1065–1076 (1962)
41. Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos.
Mag. 2(11), 559–572 (1901)
42. Pelleg, D., Moore, A.W.: Active learning for anomaly and rare-category detection.
In: Advances in Neural Information Processing Systems, pp. 1073–1080 (2005)
Image Anomaly Detection with Generative Adversarial Networks 17
43. Protopapas, P., Giammarco, J., Faccioli, L., Struble, M., Dave, R., Alcock, C.:
Finding outlier light curves in catalogues of periodic variable stars. Mon. Not. R.
Astron. Soc. 369(2), 677–696 (2006)
44. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep
convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434
(2015)
45. Ruff, L., et al.: Deep one-class classification. In: International Conference on
Machine Learning (2018)
46. Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsu-
pervised anomaly detection with generative adversarial networks to guide marker
discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp.
146–157. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59050-9 12
47. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Esti-
mating the support of a high-dimensional distribution. Technical report MSR-TR-
99-87, Microsoft Research (1999)
48. Singliar, T., Hauskrecht, M.: Towards a learning traffic incident detection system.
In: Workshop on Machine Learning Algorithms for Surveillance and Event Detec-
tion, International Conference on Machine Learning (2006)
49. Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.:
On integral probability metrics, φ-divergences and binary classification. arXiv
preprint arXiv:0901.2698 (2009)
50. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural
networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112
(2014)
51. Wong, W.K., Moore, A.W., Cooper, G.F., Wagner, M.M.: Bayesian network
anomaly pattern detection for disease outbreaks. In: International Conference on
Machine Learning, pp. 808–815 (2003)
52. Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: Sun database: large-scale
scene recognition from abbey to zoo. In: Computer Vision and Pattern Recognition,
pp. 3485–3492. IEEE (2010)
53. Yeung, D.Y., Chow, C.: Parzen-window network intrusion detectors. In: Interna-
tional Conference on Pattern Recognition, vol. 4, pp. 385–388. IEEE (2002)