深度学习图像数据增广方法研究综述 Review of data augmentation for image in deep learning
- 2021年26卷第3期 页码:487-502
纸质出版日期: 2021-03-16 ,
录用日期: 2020-07-05
DOI: 10.11834/jig.200089
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2021-03-16 ,
录用日期: 2020-07-05
移动端阅览
马岽奡, 唐娉, 赵理君, 张正. 深度学习图像数据增广方法研究综述[J]. 中国图象图形学报, 2021,26(3):487-502.
Dongao Ma, Ping Tang, Lijun Zhao, Zheng Zhang. Review of data augmentation for image in deep learning[J]. Journal of Image and Graphics, 2021,26(3):487-502.
数据作为深度学习的驱动力,对于模型的训练至关重要。充足的训练数据不仅可以缓解模型在训练时的过拟合问题,而且可以进一步扩大参数搜索空间,帮助模型进一步朝着全局最优解优化。然而,在许多领域或任务中,获取到充足训练样本的难度和代价非常高。因此,数据增广成为一种常用的增加训练样本的手段。本文对目前深度学习中的图像数据增广方法进行研究综述,梳理了目前深度学习领域为缓解模型过拟合问题而提出的各类数据增广方法,按照方法本质原理的不同,将其分为单数据变形、多数据混合、学习数据分布和学习增广策略等4类方法,并以图像数据为主要研究对象,对各类算法进一步按照核心思想进行细分,并对方法的原理、适用场景和优缺点进行比较和分析,帮助研究者根据数据的特点选用合适的数据增广方法,为后续国内外研究者应用和发展研究数据增广方法提供基础。针对图像的数据增广方法,单数据变形方法主要可以分为几何变换、色域变换、清晰度变换、噪声注入和局部擦除等5种;多数据混合可按照图像维度的混合和特征空间下的混合进行划分;学习数据分布的方法主要基于生成对抗网络和图像风格迁移的应用进行划分;学习增广策略的典型方法则可以按照基于元学习和基于强化学习进行分类。目前,数据增广已然成为推进深度学习在各领域应用的一项重要技术,可以很有效地缓解训练数据不足带来的深度学习模型过拟合的问题,进一步提高模型的精度。在实际应用中可根据数据和任务的特点选择和组合最合适的方法,形成一套有效的数据增广方案,进而为深度学习方法的应用提供更强的动力。在未来,根据数据和任务基于强化学习探索最优的组合策略,基于元学习自适应地学习最优数据变形和混合方式,基于生成对抗网络进一步拟合真实数据分布以采样高质量的未知数据,基于风格迁移探索多模态数据互相转换的应用,这些研究方向十分值得探索并且具有广阔的发展前景。
Deep learning has a tremendous influence on numerous research fields due to its outstanding performance in representing high-level feature for high-dimensional data. Especially in computer vision field
deep learning has shown its powerful abilities for various tasks such as image classification
object detection
and image segmentation. Normally
when constructing networks and using the deep learning-based method
a suitable neural network architecture is designed for our data and task
a reasonable task-oriented objective function is set
and a large amount of labeled training data is used to calculate the target loss
optimize the model parameters by the gradient descent method
and finally train an "end-to-end" deep neural network model to perform our task. Data
as the driving forces for deep learning
is areessential for training the model. With sufficient data
the overfitting problem during training can be alleviated
and the parametric search space can be expanded such that the model can be further optimized toward the global optimal solution. However
in several areas or tasks
attaining sufficient labeled samples for training a model is difficult and expensive. As a result
the overfitting problem during training occurs often and prevents deep learning models from achieving a higher performance. Thus
many methods have been proposed to address this issue
and data augmentation becomes one of the most important solutions to addressthis problem by increasing the amount and variety for the limited data set. Innumerable works have proven the effectiveness of data augmentation for improving the performance of deep learning models
which can be traced back to the seminal work of convolutional neural networks-LeNet. In this review
we examine the most representative image data augmentation methods for deep learning. This review can facilitate the researchers to adopt the appropriate methods for their task and promote the research progression of data augmentation. Current diverse data augmentation methods that can relieve the overfitting problem in deep learning models are compared and analyzed. Based on the difference of internal mechanism
a taxonomy for data augmentation methods is proposed with four classes: single data warping
multiple data mixing
learning the data distribution
and learning the augmentation strategy. First
for the image data
single data warping generates new data by image transformation over spatial space or spectral space. These methods can be divided into five categories: geometric transformations
color space transformations
sharpness transformations
noise injection
and local erasing.These methods have been widely used in image data augmentation for a long time due to their simplicity. Second
multiple data mixing can be divided according to the mixture in image space and the mixture in feature space. The mixing modes include linear mixing and nonlinear mixing for more than one image. Although mixing images seems to be a counter-intuitive method for data augmentation
experiments in many works have proven its effectiveness in improving the performance of the deep learning model. Third
the methods of learning data distribution try to capture the potential probability distribution of training data and generate new samples by sampling in that data distribution. This goal can be achieved by adversarial networks. Therefore this kind of data augmentation method is mainly based on generative adversarial network and the application of image-to-image translation. Fourth
the methods of learning augmentation strategy try to train a model to select the optimal data augmentation strategy adaptively according to the characteristics of the data or task. This goal can be achieved by metalearning
replacing data augmentation with a trainable neural network. The strategy searching problem can also be solved by reinforcement learning. When performing data augmentation in practical applications
researchers can select and combine the most suitable methods from the above methods according to the characteristics of data and tasks to form a set of effective data augmentation schemes
which in turn provides a stronger motivation for the application of deep learning methods with more effective training data. Although a better data augmentation strategy can be obtained more intelligently through learning data distribution or searching data augmentation strategies
how to customize an optimal data augmentation scheme automatically for a given task remains to be studied. In the future
conducting theoretical analysis and experimental verification of the suitability of various data augmentation methods for different data and tasks is of great research significance and application value
and will enable researchers to customize an optimal data augmentation scheme for their task. A large gap remains in applying the idea of metalearning in performing data augmentation
constructing a "data augmentation network" to learn an optimal way of data warping or data mixing. Moreover
improving the ability of generative adversarial networks(GAN)to fit the data distribution more perfectly is substantial because the oversampling in real data space should be the ideal manner of obtaining unobserved new data infinitely. The real world has numerous cross-domain and cross-modality data. The style transfer ability of encoder-decoder networks and GAN can formulate mapping functions between the different data distributions and achieve the complementation of data in different domains. Thus
exploring the application of "image-to-image translation" in different fields has bright prospects.
深度学习过拟合数据增广图像变换生成对抗网络元学习强化学习
deep learningoverfittingdata augmentationimage transformationgenerative adversarial networks(GAN)meta-learningreinforcement learning
Brock A, Donahue J and Simonyan K. 2018. Large scale GAN training for high fidelity natural image synthesis[EB/OL]. 2018-09-28[2020-03-03].https://arxiv.org/pdf/1809.11096.pdfhttps://arxiv.org/pdf/1809.11096.pdf
Chawla N V, Bowyer K W, Hall L O and Kegelmeyer W P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1): 321-357[DOI: 10.1613/jair.953]
Chen P G, Liu S, Zhao H S and Jia J Y. 2020. GridMask data augmentation[EB/OL].2020-01-13[2020-03-03].https://arxiv.org/pdf/2001.04086.pdfhttps://arxiv.org/pdf/2001.04086.pdf
Cubuk E D, Zoph B, ManéD, Vasudevan V and Le Q V. 2019a. AutoAugment: learning augmentation strategies from data//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 113-123[DOI: 10.1109/CVPR.2019.00020http://dx.doi.org/10.1109/CVPR.2019.00020]
Cubuk E D, Zoph B, Shlens J and Le Q V. 2019b. RandAugment: practical automated data augmentation with a reduced search space[EB/OL]. 2019-09-30[2020-03-03].https://arxiv.org/pdf/1909.13719.pdfhttps://arxiv.org/pdf/1909.13719.pdf
Devries T and Taylor G W. 2017a. Dataset augmentation in feature space[EB/OL]. 2017-02-17[2020-03-03].https://arxiv.org/pdf/1702.05538.pdfhttps://arxiv.org/pdf/1702.05538.pdf
Devries T and Taylor G W. 2017b. Improved regularization of convolutional neural networks with cutout[EB/OL]. 2017-08-15[2020-03-03].https://arxiv.org/pdf/1708.04552.pdfhttps://arxiv.org/pdf/1708.04552.pdf
Erhan D, Bengio Y, Courville A, Manzagol P A, Vincent P and Bengio S. 2010. Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11: 625-660[10.5555/1756006.1756025]
Frid-Adar M, Diamant I, Klang E, Amitai M, Goldberger J and Greenspan H. 2018. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing, 321: 321-331[DOI: 10.1016/j.neucom.2018.09.013]
Gatys L A, Ecker A S and Bethge M. 2015. A neural algorithm of artistic style[EB/OL]. 2015-08-26[2020-03-03].https://arxiv.org/pdf/1508.06576.pdfhttps://arxiv.org/pdf/1508.06576.pdf
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hiasa Y, Otake Y, Takao M, Matsuoka T, Takashima K, Carass A, Prince J L, Sugano N and Sato Y. 2018. Cross-modality image synthesis from unpaired data using CycleGAN//Gooya A, Goksel O, Oguz I and Burgos N, eds. Simulation and Synthesis in Medical Imaging. Cham: Springer: 31-41[DOI: 10.1007/978-3-030-00536-8_4http://dx.doi.org/10.1007/978-3-030-00536-8_4]
Huang G, Liu Z, Maaten L V D and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 2261-2269[DOI: 10.1109/CVPR.2017.243http://dx.doi.org/10.1109/CVPR.2017.243]
Inoue H. 2018. Data augmentation by pairing samples for images classification[EB/OL]. 2018-01-09[2020-03-03].https://arxiv.org/pdf/1801.02929.pdfhttps://arxiv.org/pdf/1801.02929.pdf
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. 2015-02-11[2020-03-03].https://arxiv.org/pdf/1502.03167.pdfhttps://arxiv.org/pdf/1502.03167.pdf
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 5967-5976[DOI: 10.1109/CVPR.2017.632http://dx.doi.org/10.1109/CVPR.2017.632]
Jackson P T, Atapour-Abarghouei A, Bonner S, Breckon T and Obara B. 2018. Style augmentation: data augmentation via style randomization[EB/OL]. 2018-09-14[2020-03-03].https://arxiv.org/pdf/1809.05375.pdfhttps://arxiv.org/pdf/1809.05375.pdf
Jurio A, Pagola M, Galar M, Lopez-Molina C and Paternain D. 2010. A comparison study of different color spaces in clustering based image segmentation//Hüllermeier E, Kruse R and Hoffmann F, eds. Information Processing and Management ofUncertainty in Knowledge-Based Systems. Applications. Berlin, Heidelberg: Springer: 532-541[DOI: 10.1007/978-3-642-14058-7_55http://dx.doi.org/10.1007/978-3-642-14058-7_55]
Kang G L, Dong X Y, Zheng L and Yang Y. 2017. PatchShuffle regularization[EB/OL]. 2017-07-22[2020-03-03].https://arxiv.org/pdf/1707.07103.pdfhttps://arxiv.org/pdf/1707.07103.pdf
Karras T, Aila T, Laine S and Lehtinen J. 2017. Progressive growing of gans for improved quality, stability, and variation[EB/OL]. 2017-12-27[2020-03-03].https://arxiv.org/pdf/1710.10196.pdfhttps://arxiv.org/pdf/1710.10196.pdf
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90[DOI: 10.1145/3065386]
LeCun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324[DOI: 10.1109/5.726791]
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521: 436-444[DOI: 10.1038/nature14539]
Lemley J, Bazrafkan S and Corcoran P. 2017. Smart augmentation learning an optimal data augmentation strategy. IEEE Access, 5: 5858-5869[DOI: 10.1109/ACCESS.2017.2696121]
Li S T, Chen Y K, Peng Y L and Bai L. 2018. Learning more robust features with adversarial training[EB/OL]. 2018-04-20[2020-03-03].https://arxiv.org/pdf/1804.07757.pdfhttps://arxiv.org/pdf/1804.07757.pdf
Ma D G, Tang P and Zhao L J. 2019. SiftingGAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro. IEEE Geoscience and Remote Sensing Letters, 16(7): 1046-1050[DOI: 10.1109/LGRS.2018.2890413]
Mirza M and Osindero S. 2014. Conditional generative adversarial nets[EB/OL]. 2014-11-06[2020-03-03].https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf
Moosavi-Dezfooli S M, Fawzi A and Frossard P. 2016. DeepFool: a simple and accurate method to fool Deep neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 2574-2582[DOI: 10.1109/CVPR.2016.282http://dx.doi.org/10.1109/CVPR.2016.282]
Moreno-Barea F J, Strazzera F, Jerez J M, Urda D and Franco L. 2018. Forward noise adjustment scheme for data augmentation//Proceedings of 2018 IEEE Symposium Series on Computational Intelligence (SSCI). Bangalore, India: IEEE: 728-734[DOI: 10.1109/SSCI.2018.8628917http://dx.doi.org/10.1109/SSCI.2018.8628917]
Perez L and Wang J. 2017. The effectiveness of data augmentation in image classification using deep learning[EB/OL]. 2017-12-13[2020-03-03].https://arxiv.org/pdf/1712.04621.pdfhttps://arxiv.org/pdf/1712.04621.pdf
Radford A, Metz L and Chintala S. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. 2015-11-19[2020-03-03].https://arxiv.org/pdf/1511.06434.pdfhttps://arxiv.org/pdf/1511.06434.pdf
Shorten C and Khoshgoftaar T M. 2019. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1): 1-48[DOI: 10.1186/s40537-019-0197-0]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL]. 2014-09-04[2020-03-03].https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf
Singh K K and Lee Y J. 2017. Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3544-3553[DOI: 10.1109/ICCV.2017.381http://dx.doi.org/10.1109/ICCV.2017.381]
Srivastava N, Hinton G, Krizhevsky A, Sutskever I and Salakhutdinov R. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1): 1929-1958[DOI: 10.5555/2627435.2670313]
Su J W, Vargas D V and Sakurai K. 2019. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5): 828-841[DOI: 10.1109/TEVC.2019.2890858]
Summers C and Dinneen M J. 2019. Improved mixed-example data augmentation//Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa Village, USA: IEEE: 1262-1270[DOI: 10.1109/WACV.2019.00139http://dx.doi.org/10.1109/WACV.2019.00139]
Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1199-1208[DOI: 10.1109/CVPR.2018.00131http://dx.doi.org/10.1109/CVPR.2018.00131]
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R. 2013. Intriguing properties of neural networks[EB/OL]. 2013-12-21[2020-03-03].https://arxiv.org/pdf/1312.6199.pdfhttps://arxiv.org/pdf/1312.6199.pdf
Takahashi R, Matsubara T and Uehara K. 2019. Data augmentation using random image cropping and patching for deep CNNs. IEEE Transactions on Circuits and Systems for Video Technology, 30(9): 2917-2931[DOI: 10.1109/TCSVT.2019.2935128]
Taylor L and Nitschke G. 2017. Improving deep learning using generic data augmentation[EB/OL]. 2017-08-20[2020-03-03].https://arxiv.org/pdf/1708.06020.pdfhttps://arxiv.org/pdf/1708.06020.pdf
Tobin J, Fong R, Ray A, Schneider J, Zaremba W and Abbeel P. 2017. Domain randomization for transferring deep neural networks from simulation to the real world//Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, Canada: IEEE: 23-30[DOI: 10.1109/IROS.2017.8202133http://dx.doi.org/10.1109/IROS.2017.8202133]
Tokozume Y, Ushiku Y and Harada T. 2018. Between-class learning for image classification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5486-5494[DOI: 10.1109/CVPR.2018.00575http://dx.doi.org/10.1109/CVPR.2018.00575]
Wang L, Xu X, Yu Y, Yang R, Gui R, Xu Z Z and Pu F L. 2019. SAR-to-optical image translation using supervised cycle-consistent adversarial networks. IEEE Access, 7: 129136-129149[DOI: 10.1109/ACCESS.2019.2939649]
Weiss K, Khoshgoftaar TM and Wang D D. 2016. A survey of transfer learning. Journal of Big Data, 3(1): 1-40[DOI: 10.1186/s40537-016-0043-6]
Wong S C, Gatt A, Stamatescu V and Mcdonnell M D. 2016. Understanding data augmentation for classification: when to warp?//Proceedings of 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA). Gold Coast, Australia: IEEE: 1-6[DOI: 10.1109/DICTA.2016.7797091http://dx.doi.org/10.1109/DICTA.2016.7797091]
Xie L X, Wang J D, Wei Z, Wang M and Tian Q. 2016. DisturbLabel: regularizing CNN on the loss layer//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 4753-4762[DOI: 10.1109/CVPR.2016.514http://dx.doi.org/10.1109/CVPR.2016.514]
Zhang H Y, Cisse M, Dauphin Y N and Lopez-Paz D. 2017. Mixup: beyond empirical risk minimization[EB/OL]. 2017-10-25[2020-03-03].https://arxiv.org/pdf/1710.09412.pdfhttps://arxiv.org/pdf/1710.09412.pdf
Zheng Z D, Zheng L and Yang Y. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3774-3782[DOI: 10.1109/ICCV.2017.405http://dx.doi.org/10.1109/ICCV.2017.405]
Zhong Z, Zheng L, Kang G L, Li S Z and Yang Y. 2017. Random erasing data augmentation[EB/OL]. 2017-08-16[2020-03-03].https://arxiv.org/pdf/1708.04896.pdfhttps://arxiv.org/pdf/1708.04896.pdf
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2242-2251[DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]
Zhu X Y, Liu Y F, Li J H, Wan T and Qin Z C. 2018. Emotion classification with data augmentation using generative adversarial networks//Phung D, Tseng V, Webb G, Ho B, Ganji M and Rashidi L, eds. Advances in Knowledge Discovery and Data Mining. Cham, Switzerland: Springer: 349-360[DOI: 10.1007/978-3-319-93040-4_28http://dx.doi.org/10.1007/978-3-319-93040-4_28]
相关作者
相关机构