[go: up one dir, main page]

Skip to main content

Advertisement

Log in

NeighborMix data augmentation for image recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Data augmentation can effectively enrich the diversity of training datasets to improve the generalization ability of deep learning models. Existing augmentation methods have achieved excellent performance on image recognition. However, the images generated by them can not well improve the accuracy and robustness of the model trained with natural scene data. In this paper, an augmentation method called NeighborMix is proposed to address this problem. NeighborMix only performs simple operations on the original image. Specifically, NeighborMix selects and pastes an occlusion object closely related to the original input image in a successive two-stage fashion. In the first stage, a block from a single original image is selected as the occlusion object. In the second stage, the occlusion object is copied and pasted into the original image to generate a new image. Finally, extensive experiments on benchmark datasets demonstrate that NeighborMix achieves top-1 results of 93.54%, 70.62%, and 89.30% superior model generalization performance on CIFAR-10, CIFAR-100, and STL-10, respectively, and performs the most robustness with a standard error of 0.05%, outperforms state-of-the-art data augmentation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Pawar K, Egan GF, Chen Z (2021) Domain knowledge augmentation of parallel mr image reconstruction using deep learning. Comput Med Imaging Graph 92(2):101968. https://doi.org/10.1016/j.compmedimag.2021.101968

    Article  PubMed  Google Scholar 

  2. Dash T, Chitlangia S, Ahuja A, Srinivasan A (2021) Incorporating domain knowledge into deep neural networks. Preprint at arXiv:2103.00180

  3. Pan Y, Jing Y, Wu T, Kong X (2022) Knowledge-based data augmentation of small samples for oil condition prediction. Reliab Eng Syst Saf 217:108114. https://doi.org/10.1016/j.ress.2021.108114

    Article  Google Scholar 

  4. Zhan C, Hu H, Wang Z, Fan R, Niyato D (2020) Unmanned aircraft system aided adaptive video streaming: A joint optimization approach. IEEE Trans Multimed 22(3):795–807. https://doi.org/10.1109/TMM.2019.2931441

    Article  Google Scholar 

  5. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  6. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. https://doi.org/10.5555/2627435.2670313

    Article  MathSciNet  Google Scholar 

  7. Wang Y, Hebert MH (2016) Learning from small sample sets by combining unsupervised meta-training with cnns. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 244–252. Curran Associates, Inc., Barcelona. https://proceedings.neurips.cc/paper/2016/file/140f6969d5213fd0ece03148e62e461e-Paper.pdf

  8. Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  9. Brendel W, Bethge M (2019) Approximating cnns with bag–of–local–features models works surprisingly well on imagenet. Preprint at arXiv:1904.00760

  10. Zhang J, Wu Q, Shen C, Zhang J, Lu J (2018) Multi-label image classification with regional latent semantic dependencies. IEEE Trans Multimed 20(10):2801–2813. https://doi.org/10.1109/TMM.2018.2812605

    Article  Google Scholar 

  11. Zhang H, Luo Y, Ai Q, Wen Y, Hu H (2020) Look, read and feel: Benchmarking ads understanding with multimodal multitask learning. In: Chen CW, Cucchiara R, Hua X-, Qi G-, Ricci E, Zhang , Zimmermann R (eds.) MM ’20: The 28th ACM International Conference on Multimedia, vol. 28, pp. 430–438. ACM, Seattle. https://doi.org/10.1145/3394171.3413582

  12. Simonyan K, Zisserman A (2014) Very deep convolutional networks for largescale image recognition. Preprint at arXiv:1409.1556

  13. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. Preprint at arXiv:2010.11929

  14. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  PubMed  Google Scholar 

  15. Wu Z, Li S, Chen C, Hao A, Qin H (2022) Deeper look at image salient object detection: Bi-stream network with a small training dataset. IEEE Trans Multimed 24:73–86. https://doi.org/10.1109/TMM.2020.3046871

    Article  Google Scholar 

  16. Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: Unified, real–time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788. IEEE Computer Society, Las Vegas. https://doi.org/10.1109/CVPR.2016.91

  17. Chen L-, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184

    Article  PubMed  Google Scholar 

  18. Chen T, Xie G, Yao Y, Wang Q, Shen F, Tang Z, Zhang J (2022) Semantically meaningful class prototype learning for one-shot image semantic segmentation. IEEE Trans Multimed 24:968–980. https://doi.org/10.1109/TMM.2021.3061816

    Article  Google Scholar 

  19. Kim UH, Kim S, Kim JH (2022) Simvodis: Simultaneous visual odometry, object detection, and instance segmentation. IEEE Trans Pattern Anal Mach Intell 44(1):428–441. https://doi.org/10.1109/TPAMI.2020.3007546

    Article  PubMed  Google Scholar 

  20. Yi J, Wu P, Tang H, Liu B, Huang Q, Qu H, Han L, Fan W, Hoeppner DJ, Metaxas DN (2021) Object-guided instance segmentation with auxiliary feature refinement for biological images. IEEE Trans Med Imaging 40(9):2403–2414. https://doi.org/10.1109/TMI.2021.3077285

    Article  PubMed  Google Scholar 

  21. Xu K, Wen L, Li G, Huang Q (2021) Self-supervised deep triplenet for video object segmentation. IEEE Trans Multimed 23:3530–3539. https://doi.org/10.1109/TMM.2020.3026913

    Article  Google Scholar 

  22. Ma D, Tang P, Zhao L, Zhang Z (2021) Review of data augmentation for image in deep learning. Journal of Image and Graphics. Beijing 26(03):487–502. https://doi.org/10.11834/jig.200089

  23. Alexander B, Patrick TV, Christian B, Yoan A, Zoé D, Emeric F, Franois CM, Nicolas G, Bastian H, Jaron KS (2021) Haplotype divergence supports long-term asexuality in the oribatid mite oppiella nova. Proc Natl Acad Sci U S A 118(38). https://doi.org/10.1073/pnas.2101485118

  24. Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. Preprint at arXiv:1708.04552

  25. Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: The Thirty–Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, pp. 13001–13008. AAAI Press, New York. https://ojs.aaai.org/index.php/AAAI/article/view/7000

  26. Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 6022–6031. IEEE, New York. https://doi.org/10.1109/ICCV.2019.00612

  27. Lopes RG, Yin D, Poole B, Gilmer J, Cubuk ED (2019) Improving robustness without sacrificing accuracy with patch gaussian augmentation. Preprint at arXiv:1906.02611

  28. Kim Y, Shahab U, Bae SH (2021) Local augment: Utilizing local bias property of convolutional neural networks for data augmentation. IEEE Access 9:15191–15199. https://doi.org/10.1109/ACCESS.2021.3050758

    Article  Google Scholar 

  29. Zhang H, Cisse M, Dauphin YN, Lopez–Paz D (2018) mixup: Beyond empirical risk minimization. In: 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net, Vancouver. https://openreview.net/forum?id=r1Ddp1-Rb

  30. Han J, Fang P, Li W, Hong J, Armin MA, Reid I, Petersson L, Li H (2022) You only cut once: Boosting data augmentation with a single cut. Preprint at arXiv2201.12078

  31. Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2018) Autoaugment: Learning augmentation policies from data. Preprint at arXiv:1805.09501

  32. Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020. Computer Vision Foundation / IEEE, Seattle. https://doi.org/10.1109/CVPRW50498.2020.00359

  33. Welch DM, Meselson M (2000) Evidence for the evolution of bdelloid rotifers without sexual reproduction or genetic exchange. Science 288:1211–1215

    Article  ADS  CAS  Google Scholar 

  34. D’Hondt M, D’Hondt T (1999) Is domain knowledge an aspect? In: Moreira AMD, Demeyer S (eds.) Object–Oriented Technology, ECOOP’99 Workshop Reader, ECOOP’99 Workshops, Panels, and Posters, vol. 1743, pp. 293–294. Springer, Lisbon. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.7211 &rep=rep1 &type=pdf

  35. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1(4)

  36. Coates A, Lee H, Ng AY, Coates A, Lee H, Ng AY (2011) An analysis of single–layer networks in unsupervised feature learning. In: Gordon GJ, Dunson DB, Dudík M (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, vol. 15, pp. 215–223. JMLR.org, Fort Lauderdale. http://proceedings.mlr.press/v15/coates11a/coates11a.pdf

  37. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778. IEEE Computer Society, Las Vegas. https://doi.org/10.1109/CVPR.2016.90

  38. Zagoruyko S, Komodakis N (2016) Wide residual networks. In: Proceedings of the British Machine Vision Conference 2016, BMVC 2016, pp. 87–18712. BMVA Press, York. http://www.bmva.org/bmvc/2016/papers/paper087/index.html

Download references

Acknowledgements

This work was produced in part by the National Natural Science Foundation of China under Grants 61763019.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Feipeng Wang or Kerong Ben.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, F., Ben, K., Peng, H. et al. NeighborMix data augmentation for image recognition. Multimed Tools Appl 83, 26581–26598 (2024). https://doi.org/10.1007/s11042-023-16603-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16603-3

Keywords

Navigation