深度学习图像数据增广方法研究综述 Review of data augmentation for image in deep learning
- 2021年26卷第3期 页码:487-502
录用日期: 2020-07-05
DOI: 10.11834/jig.200089
马岽奡, 唐娉, 赵理君, 张正. 深度学习图像数据增广方法研究综述[J]. 中国图象图形学报, 2021,26(3):487-502.
Dongao Ma, Ping Tang, Lijun Zhao, Zheng Zhang. Review of data augmentation for image in deep learning[J]. Journal of Image and Graphics, 2021,26(3):487-502.
Deep learning has a tremendous influence on numerous research fields due to its outstanding performance in representing high-level feature for high-dimensional data. Especially in computer vision field
deep learning has shown its powerful abilities for various tasks such as image classification
object detection
and image segmentation. Normally
when constructing networks and using the deep learning-based method
a suitable neural network architecture is designed for our data and task
a reasonable task-oriented objective function is set
and a large amount of labeled training data is used to calculate the target loss
optimize the model parameters by the gradient descent method
and finally train an "end-to-end" deep neural network model to perform our task. Data
as the driving forces for deep learning
is areessential for training the model. With sufficient data
the overfitting problem during training can be alleviated
and the parametric search space can be expanded such that the model can be further optimized toward the global optimal solution. However
in several areas or tasks
attaining sufficient labeled samples for training a model is difficult and expensive. As a result
the overfitting problem during training occurs often and prevents deep learning models from achieving a higher performance. Thus
many methods have been proposed to address this issue
and data augmentation becomes one of the most important solutions to addressthis problem by increasing the amount and variety for the limited data set. Innumerable works have proven the effectiveness of data augmentation for improving the performance of deep learning models
which can be traced back to the seminal work of convolutional neural networks-LeNet. In this review
we examine the most representative image data augmentation methods for deep learning. This review can facilitate the researchers to adopt the appropriate methods for their task and promote the research progression of data augmentation. Current diverse data augmentation methods that can relieve the overfitting problem in deep learning models are compared and analyzed. Based on the difference of internal mechanism
a taxonomy for data augmentation methods is proposed with four classes: single data warping
multiple data mixing
learning the data distribution
and learning the augmentation strategy. First
for the image data
single data warping generates new data by image transformation over spatial space or spectral space. These methods can be divided into five categories: geometric transformations
color space transformations
sharpness transformations
noise injection
and local erasing.These methods have been widely used in image data augmentation for a long time due to their simplicity. Second
multiple data mixing can be divided according to the mixture in image space and the mixture in feature space. The mixing modes include linear mixing and nonlinear mixing for more than one image. Although mixing images seems to be a counter-intuitive method for data augmentation
experiments in many works have proven its effectiveness in improving the performance of the deep learning model. Third
the methods of learning data distribution try to capture the potential probability distribution of training data and generate new samples by sampling in that data distribution. This goal can be achieved by adversarial networks. Therefore this kind of data augmentation method is mainly based on generative adversarial network and the application of image-to-image translation. Fourth
the methods of learning augmentation strategy try to train a model to select the optimal data augmentation strategy adaptively according to the characteristics of the data or task. This goal can be achieved by metalearning
replacing data augmentation with a trainable neural network. The strategy searching problem can also be solved by reinforcement learning. When performing data augmentation in practical applications
researchers can select and combine the most suitable methods from the above methods according to the characteristics of data and tasks to form a set of effective data augmentation schemes
which in turn provides a stronger motivation for the application of deep learning methods with more effective training data. Although a better data augmentation strategy can be obtained more intelligently through learning data distribution or searching data augmentation strategies
how to customize an optimal data augmentation scheme automatically for a given task remains to be studied. In the future
conducting theoretical analysis and experimental verification of the suitability of various data augmentation methods for different data and tasks is of great research significance and application value
and will enable researchers to customize an optimal data augmentation scheme for their task. A large gap remains in applying the idea of metalearning in performing data augmentation
constructing a "data augmentation network" to learn an optimal way of data warping or data mixing. Moreover
improving the ability of generative adversarial networks(GAN)to fit the data distribution more perfectly is substantial because the oversampling in real data space should be the ideal manner of obtaining unobserved new data infinitely. The real world has numerous cross-domain and cross-modality data. The style transfer ability of encoder-decoder networks and GAN can formulate mapping functions between the different data distributions and achieve the complementation of data in different domains. Thus
exploring the application of "image-to-image translation" in different fields has bright prospects.
deep learningoverfittingdata augmentationimage transformationgenerative adversarial networks(GAN)meta-learningreinforcement learning
