Augmentation Network for Generalised Zero-Shot Learning

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12625))

Included in the following conference series:

Asian Conference on Computer Vision

761 Accesses

Abstract

Generalised zero-shot learning (GZSL) is defined by a training process containing a set of visual samples from seen classes and a set of semantic samples from seen and unseen classes, while the testing process consists of the classification of visual samples from the seen and the unseen classes. Current approaches are based on inference processes that rely on the result of a single modality classifier (visual, semantic, or latent joint space) that balances the classification between the seen and unseen classes using gating mechanisms. There are a couple of problems with such approaches: 1) multi-modal classifiers are known to generally be more accurate than single modality classifiers, and 2) gating mechanisms rely on a complex one-class training of an external domain classifier that modulates the seen and unseen classifiers. In this paper, we mitigate these issues by proposing a novel GZSL method – augmentation network that tackles multi-modal and multi-domain inference for generalised zero-shot learning (AN-GZSL). The multi-modal inference combines visual and semantic classification and automatically balances the seen and unseen classification using temperature calibration, without requiring any gating mechanisms or external domain classifiers. Experiments show that our method produces the new state-of-the-art GZSL results for fine-grained benchmark data sets CUB and FLO and for the large-scale data set ImageNet. We also obtain competitive results for coarse-grained data sets SUN and AWA. We show an ablation study that justifies each stage of the proposed AN-GZSL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dynamic visual-guided selection for zero-shot learning

Article 13 September 2023

Attribute Prototype Network for Any-Shot Learning

Article 11 May 2022

Class Representative Learning for Zero-shot Learning Using Purely Visual Data

Article 01 June 2021

Notes

1.
See supplementary material for more information on data sets.
2.
This work was partially supported by Australian Research Council grants (FT190100525 and CE140100016).

References

Chao, W.-L., Changpinyo, S., Gong, B., Sha, F.: An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 52–68. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_4
Chapter Google Scholar
Felix, R., Vijay Kumar, B.G., Reid, I., Carneiro, G.: Multi-modal cycle-consistent generalized zero-shot learning. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 21–37. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_2
Chapter Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning - a comprehensive evaluation of the good, the bad and the ugly. CoRR abs/1707.00600 (2017)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 951–958 (2009)
Google Scholar
Lampert, C.H., Nickisch, H., Harmeling, S.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Pattern Anal. Mach. Intell. 36, 453–465 (2014)
Article Google Scholar
Bucher, M., Herbin, S., Jurie, F.: Generating visual representations for zero-shot classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2666–2673 (2017)
Google Scholar
Huang, H., Wang, C., Yu, P.S., Wang, C.D.: Generative dual adversarial network for generalized zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 801–810 (2019)
Google Scholar
Li, J., Jin, M., Lu, K., Ding, Z., Zhu, L., Huang, Z.: Leveraging the invariant side of generative zero-shot learning. arXiv preprint arXiv:1904.04092 (2019)
Paul, A., Krishnan, N.C., Munjal, P.: Semantically aligned bias reducing zero shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7056–7065 (2019)
Google Scholar
Sariyildiz, M.B., Cinbis, R.G.: Gradient matching generative networks for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2168–2178 (2019)
Google Scholar
Schonfeld, E., Ebrahimi, S., Sinha, S., Darrell, T., Akata, Z.: Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8247–8255 (2019)
Google Scholar
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. arXiv (2017)
Google Scholar
Verma, V.K., Arora, G., Mishra, A., Rai, P.: Generalized zero-shot learning via synthesized examples. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Atzmon, Y., Chechik, G.: Adaptive confidence smoothing for generalized zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11671–11680 (2019)
Google Scholar
Bhattacharjee, S., Mandal, D., Biswas, S.: Autoencoder based novelty detection for generalized zero shot learning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 3646–3650. IEEE (2019)
Google Scholar
Felix, R., Harwood, B., Sasdelli, M., Carneiro, G.: Generalised zero-shot learning with domain classification in a joint semantic and visual space. In: 2019 Digital Image Computing: Techniques and Applications (DICTA). IEEE (2019)
Google Scholar
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)
Google Scholar
Zhang, H., Koniusz, P.: Model selection for generalized zero-shot learning. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 0–0 (2018)
Google Scholar
Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC, Boca Raton (2012)
Google Scholar
Welinder, P., et al.: Caltech-ucsd birds 200 (2010)
Google Scholar
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. ICVGIP’08, pp. 722–729. IEEE (2008)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition CVPR 2009, pp. 248–255. IEEE (2009)
Google Scholar
Wang, P., Liu, L., Shen, C., Huang, Z., van den Hengel, A., Shen, H.T.: Multi-attention network for one shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 22–25 (2017)
Google Scholar
Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1425–1438 (2016)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, pp. 1321–1330 (2017)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan. arXiv (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–58 (2016)
Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the ICML vol. 30, p. 3 (2013)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: insights and applications. In: Deep Learning Workshop ICML (2015)
Google Scholar
Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: Advances in Neural Information Processing Systems, pp. 3581–3590 (2017)
Google Scholar
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129 (2013)
Google Scholar
Akata, Z., Reed, S., Walter, D., Lee, H., Schiele, B.: Evaluation of output embeddings for fine-grained image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2927–2936 (2015)
Google Scholar
Xian, Y., Akata, Z., Sharma, G., Nguyen, Q., Hein, M., Schiele, B.: Latent embeddings for zero-shot classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 69–77 (2016)
Google Scholar
Romera-Paredes, B., Torr, P.: An embarrassingly simple approach to zero-shot learning. In: International Conference on Machine Learning, pp. 2152–2161 (2015)
Google Scholar
Li, J., Lan, X., Liu, Y., Wang, L., Zheng, N.: Compressing unknown images with product quantizer for efficient zero-shot classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5463–5472 (2019)
Google Scholar
Xie, G.S., et al.: Attentive region embedding network for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9384–9393 (2019)
Google Scholar
Ding, Z., Liu, H.: Marginalized latent semantic encoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6191–6199 (2019)
Google Scholar
Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3174–3183 (2017)
Google Scholar
Xian, Y., Lorenz, T., Schiele, B., Akata, Z.: Feature generating networks for zero-shot learning. In: 31st IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA (2018)
Google Scholar
Zhu, P., Wang, H., Saligrama, V.: Generalized zero-shot recognition based on visually semantic embedding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2995–3003 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Adelaide, Adelaide, Australia
Rafael Felix, Michele Sasdelli, Ian Reid & Gustavo Carneiro
Australian Institute for Machine Learning (AIML), Adelaide, Australia
Rafael Felix, Michele Sasdelli, Ian Reid & Gustavo Carneiro
Australian Centre for Robotic Vision (ACRV), Brisbane, Australia
Rafael Felix, Michele Sasdelli, Ian Reid & Gustavo Carneiro

Authors

Rafael Felix
View author publications
You can also search for this author in PubMed Google Scholar
Michele Sasdelli
View author publications
You can also search for this author in PubMed Google Scholar
Ian Reid
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Carneiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Felix .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2790 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Felix, R., Sasdelli, M., Reid, I., Carneiro, G. (2021). Augmentation Network for Generalised Zero-Shot Learning. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12625. Springer, Cham. https://doi.org/10.1007/978-3-030-69538-5_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-69538-5_27
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69537-8
Online ISBN: 978-3-030-69538-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics