Multi-label image classification with recurrently learning semantic dependencies

Long Chen¹,
Ronggui Wang¹,
Juan Yang¹,
Lixia Xue¹ &
…
Min Hu¹

923 Accesses
21 Citations
Explore all metrics

Abstract

Recognizing multi-label images is a significant but challenging task toward high-level visual understanding. Remarkable success has been achieved by applying CNN–RNN design-based models to capture the underlying semantic dependencies of labels and predict the label distributions over the global-level features output by CNNs. However, such global-level features often fuse the information of multiple objects, leading to the difficulty in recognizing small object and capturing the label co-relation. To better solve this problem, in this paper, we propose a novel multi-label image classification framework which is an improvement to the CNN–RNN design pattern. By introducing the attention network module in the CNN–RNN architecture, the objects features of the attention map are separated by the channels which are further send to the LSTM network to capture dependencies and predict labels sequentially. A category-wise max-pooling operation is then performed to integrate these labels into the final prediction. Experimental results on PASCAL2007 and MS-COCO datasets demonstrate that our model can effectively exploit the correlation between tags to improve the classification performance as well as better recognize the small targets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Huang, G., Liu, Z., Maaten, L., van der Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. Comput. Vis. Pattern Recogn. 7132–7141 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519 (2014)
Nguyen, T.V.: Salient Object detection via objectness proposals. In: AAAI’15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 4286–4287 (2015)
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104, 154–171 (2013)
Article Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In European Conference on Computer Vision, pp. 391–405 (2014)
Arbeláez, P.A., Pont-Tuset, J., Barron, J.T., Marqués, F., Malik, J.: Multiscale combinatorial grouping. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 328–335 (2014)
Wei, Y., Xia, W., Lin, M., et al.: Hcp: A flexible cnn framework for multi-label image classification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1901–1907 (2016)
Article Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Gong, Y., Jia, Y., Leung, T., Toshev, A., Ioffe, S.: Deep convolutional ranking for multilabel image annotation. In: International Conference on Learning Representations (2014)
van der Gaag, L.C., Feelders, A.J.: Probabilistic Graphical Models. Lecture Notes in Artificial Intelligence (2014). https://www.springer.com/cn/book/9783319114323
Jin, J., Nakayama, H.: Annotation order matters: recurrent image annotator for arbitrary length image tagging. In: International Conference on Pattern Recognition (ICPR), pp. 2452–2457 (2016)
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN–RNN: A unified framework for multi-label image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)
Zhang, J., Wu, Q., Shen, C., Zhang, J., Lu, J.: Multi-label image classification with regional latent semantic dependencies. IEEE Trans. Multimed. 20, 2801– 2813 (2018)
Article Google Scholar
Chen, Q., Song, Z., Hua, Y., Huang, Z., Yan, S.: Hierarchical matching with side information for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3426–3433 (2012)
Dong, J., Xia, W., Chen, Q., Feng, J., Huang, Z., Yan, S.: Subcategory-aware object classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 827–834 (2013)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29, 51–59 (1996)
Article Google Scholar
Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1837–1845 (2017)
Yang, H., Zhou, J.T., Zhang, Y., Gao, B., Wu, J., Cai, J.: Exploit bounding box annotations for multi-label object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 280–288 (2016)
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105, 222–245 (2013)
Article MathSciNet MATH Google Scholar
Wang, Z., Chen, T., Li, G., Xu, R., Lin, L.: Multi-label image recognition by recurrently discovering attentional regions. In: IEEE International Conference on Computer Vision (ICCV), pp. 464–472 (2017)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Neural Inf. Process. Syst. 2, 2017–2025 (2015)
Google Scholar
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015)
Article Google Scholar
Lin, T.-Y., Maire, M., Belongie, S.J., Hays, J., et al.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755 (2014)
Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 1, 91–99 (2015)
Google Scholar

Download references

Funding

Funding was provided by the National Natural Science Foundation of China (Grant No. 61672202).

Author information

Authors and Affiliations

School of Computer and Information, Hefei University of Technology, Hefei, 230601, China
Long Chen, Ronggui Wang, Juan Yang, Lixia Xue & Min Hu

Authors

Long Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ronggui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lixia Xue
View author publications
You can also search for this author in PubMed Google Scholar
Min Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Wang, R., Yang, J. et al. Multi-label image classification with recurrently learning semantic dependencies. Vis Comput 35, 1361–1371 (2019). https://doi.org/10.1007/s00371-018-01615-0

Download citation

Published: 15 December 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00371-018-01615-0

Multi-label image classification with recurrently learning semantic dependencies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Multi-label image recognition with attentive transformer-localizer module

Learning semantic dependencies with channel correlation for multi-label classification

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Multi-label image classification with recurrently learning semantic dependencies

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Multi-label image recognition with attentive transformer-localizer module

Learning semantic dependencies with channel correlation for multi-label classification

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now