Abstract
Data augmentation can effectively enrich the diversity of training datasets to improve the generalization ability of deep learning models. Existing augmentation methods have achieved excellent performance on image recognition. However, the images generated by them can not well improve the accuracy and robustness of the model trained with natural scene data. In this paper, an augmentation method called NeighborMix is proposed to address this problem. NeighborMix only performs simple operations on the original image. Specifically, NeighborMix selects and pastes an occlusion object closely related to the original input image in a successive two-stage fashion. In the first stage, a block from a single original image is selected as the occlusion object. In the second stage, the occlusion object is copied and pasted into the original image to generate a new image. Finally, extensive experiments on benchmark datasets demonstrate that NeighborMix achieves top-1 results of 93.54%, 70.62%, and 89.30% superior model generalization performance on CIFAR-10, CIFAR-100, and STL-10, respectively, and performs the most robustness with a standard error of 0.05%, outperforms state-of-the-art data augmentation techniques.







Similar content being viewed by others
References
Pawar K, Egan GF, Chen Z (2021) Domain knowledge augmentation of parallel mr image reconstruction using deep learning. Comput Med Imaging Graph 92(2):101968. https://doi.org/10.1016/j.compmedimag.2021.101968
Dash T, Chitlangia S, Ahuja A, Srinivasan A (2021) Incorporating domain knowledge into deep neural networks. Preprint at arXiv:2103.00180
Pan Y, Jing Y, Wu T, Kong X (2022) Knowledge-based data augmentation of small samples for oil condition prediction. Reliab Eng Syst Saf 217:108114. https://doi.org/10.1016/j.ress.2021.108114
Zhan C, Hu H, Wang Z, Fan R, Niyato D (2020) Unmanned aircraft system aided adaptive video streaming: A joint optimization approach. IEEE Trans Multimed 22(3):795–807. https://doi.org/10.1109/TMM.2019.2931441
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958. https://doi.org/10.5555/2627435.2670313
Wang Y, Hebert MH (2016) Learning from small sample sets by combining unsupervised meta-training with cnns. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 29, pp. 244–252. Curran Associates, Inc., Barcelona. https://proceedings.neurips.cc/paper/2016/file/140f6969d5213fd0ece03148e62e461e-Paper.pdf
Krizhevsky A, Sutskever I, Hinton G (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Brendel W, Bethge M (2019) Approximating cnns with bag–of–local–features models works surprisingly well on imagenet. Preprint at arXiv:1904.00760
Zhang J, Wu Q, Shen C, Zhang J, Lu J (2018) Multi-label image classification with regional latent semantic dependencies. IEEE Trans Multimed 20(10):2801–2813. https://doi.org/10.1109/TMM.2018.2812605
Zhang H, Luo Y, Ai Q, Wen Y, Hu H (2020) Look, read and feel: Benchmarking ads understanding with multimodal multitask learning. In: Chen CW, Cucchiara R, Hua X-, Qi G-, Ricci E, Zhang , Zimmermann R (eds.) MM ’20: The 28th ACM International Conference on Multimedia, vol. 28, pp. 430–438. ACM, Seattle. https://doi.org/10.1145/3394171.3413582
Simonyan K, Zisserman A (2014) Very deep convolutional networks for largescale image recognition. Preprint at arXiv:1409.1556
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale. Preprint at arXiv:2010.11929
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Wu Z, Li S, Chen C, Hao A, Qin H (2022) Deeper look at image salient object detection: Bi-stream network with a small training dataset. IEEE Trans Multimed 24:73–86. https://doi.org/10.1109/TMM.2020.3046871
Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: Unified, real–time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788. IEEE Computer Society, Las Vegas. https://doi.org/10.1109/CVPR.2016.91
Chen L-, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848. https://doi.org/10.1109/TPAMI.2017.2699184
Chen T, Xie G, Yao Y, Wang Q, Shen F, Tang Z, Zhang J (2022) Semantically meaningful class prototype learning for one-shot image semantic segmentation. IEEE Trans Multimed 24:968–980. https://doi.org/10.1109/TMM.2021.3061816
Kim UH, Kim S, Kim JH (2022) Simvodis: Simultaneous visual odometry, object detection, and instance segmentation. IEEE Trans Pattern Anal Mach Intell 44(1):428–441. https://doi.org/10.1109/TPAMI.2020.3007546
Yi J, Wu P, Tang H, Liu B, Huang Q, Qu H, Han L, Fan W, Hoeppner DJ, Metaxas DN (2021) Object-guided instance segmentation with auxiliary feature refinement for biological images. IEEE Trans Med Imaging 40(9):2403–2414. https://doi.org/10.1109/TMI.2021.3077285
Xu K, Wen L, Li G, Huang Q (2021) Self-supervised deep triplenet for video object segmentation. IEEE Trans Multimed 23:3530–3539. https://doi.org/10.1109/TMM.2020.3026913
Ma D, Tang P, Zhao L, Zhang Z (2021) Review of data augmentation for image in deep learning. Journal of Image and Graphics. Beijing 26(03):487–502. https://doi.org/10.11834/jig.200089
Alexander B, Patrick TV, Christian B, Yoan A, Zoé D, Emeric F, Franois CM, Nicolas G, Bastian H, Jaron KS (2021) Haplotype divergence supports long-term asexuality in the oribatid mite oppiella nova. Proc Natl Acad Sci U S A 118(38). https://doi.org/10.1073/pnas.2101485118
Devries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout. Preprint at arXiv:1708.04552
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2020) Random erasing data augmentation. In: The Thirty–Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, pp. 13001–13008. AAAI Press, New York. https://ojs.aaai.org/index.php/AAAI/article/view/7000
Yun S, Han D, Oh SJ, Chun S, Choe J, Yoo Y (2019) Cutmix: Regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, pp. 6022–6031. IEEE, New York. https://doi.org/10.1109/ICCV.2019.00612
Lopes RG, Yin D, Poole B, Gilmer J, Cubuk ED (2019) Improving robustness without sacrificing accuracy with patch gaussian augmentation. Preprint at arXiv:1906.02611
Kim Y, Shahab U, Bae SH (2021) Local augment: Utilizing local bias property of convolutional neural networks for data augmentation. IEEE Access 9:15191–15199. https://doi.org/10.1109/ACCESS.2021.3050758
Zhang H, Cisse M, Dauphin YN, Lopez–Paz D (2018) mixup: Beyond empirical risk minimization. In: 6th International Conference on Learning Representations, ICLR 2018. OpenReview.net, Vancouver. https://openreview.net/forum?id=r1Ddp1-Rb
Han J, Fang P, Li W, Hong J, Armin MA, Reid I, Petersson L, Li H (2022) You only cut once: Boosting data augmentation with a single cut. Preprint at arXiv2201.12078
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2018) Autoaugment: Learning augmentation policies from data. Preprint at arXiv:1805.09501
Cubuk ED, Zoph B, Shlens J, Le QV (2020) Randaugment: Practical automated data augmentation with a reduced search space. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020. Computer Vision Foundation / IEEE, Seattle. https://doi.org/10.1109/CVPRW50498.2020.00359
Welch DM, Meselson M (2000) Evidence for the evolution of bdelloid rotifers without sexual reproduction or genetic exchange. Science 288:1211–1215
D’Hondt M, D’Hondt T (1999) Is domain knowledge an aspect? In: Moreira AMD, Demeyer S (eds.) Object–Oriented Technology, ECOOP’99 Workshop Reader, ECOOP’99 Workshops, Panels, and Posters, vol. 1743, pp. 293–294. Springer, Lisbon. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.25.7211 &rep=rep1 &type=pdf
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1(4)
Coates A, Lee H, Ng AY, Coates A, Lee H, Ng AY (2011) An analysis of single–layer networks in unsupervised feature learning. In: Gordon GJ, Dunson DB, Dudík M (eds.) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, vol. 15, pp. 215–223. JMLR.org, Fort Lauderdale. http://proceedings.mlr.press/v15/coates11a/coates11a.pdf
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778. IEEE Computer Society, Las Vegas. https://doi.org/10.1109/CVPR.2016.90
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: Proceedings of the British Machine Vision Conference 2016, BMVC 2016, pp. 87–18712. BMVA Press, York. http://www.bmva.org/bmvc/2016/papers/paper087/index.html
Acknowledgements
This work was produced in part by the National Natural Science Foundation of China under Grants 61763019.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, F., Ben, K., Peng, H. et al. NeighborMix data augmentation for image recognition. Multimed Tools Appl 83, 26581–26598 (2024). https://doi.org/10.1007/s11042-023-16603-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16603-3