Swin-GAN: generative adversarial network based on shifted windows transformer architecture for image generation

S Wang, Z Gao, D Liu - The Visual Computer, 2023 - Springer
S Wang, Z Gao, D Liu
The Visual Computer, 2023Springer
It is well known that every successful generative adversarial network (GAN) relies on the
convolutional neural networks (CNN)-based generators and discriminators. However, CNN
cannot process the long-range dependencies because its convolution operator has a local
receptive field, which can bring some issues to GAN, such as the optimization, the loss of
feature resolution and the fine details. To meet the problem of long-term dependence, we
propose a GAN model based on shifted windows Transformer architecture, called Swin …
Abstract
It is well known that every successful generative adversarial network (GAN) relies on the convolutional neural networks (CNN)-based generators and discriminators. However, CNN cannot process the long-range dependencies because its convolution operator has a local receptive field, which can bring some issues to GAN, such as the optimization, the loss of feature resolution and the fine details. To meet the problem of long-term dependence, we propose a GAN model based on shifted windows Transformer architecture, called Swin-GAN, in which the CNN architecture is replaced by Transformer. In our model, we build a memory-friendly generator based on the shifted window attention mechanism to gradually increase the resolution of feature maps at each stage. Another, we build a multi-scale discriminator to split the image into patches of different sizes as the input at different stages, which can achieve the balance between capturing global contextual semantic information and local detailed features. To further improve the fidelity and stability, we use the techniques such as data enhancement, layer normalization and relative position coding in our model. Compared with the current schemes, the experimental results show that our scheme has better performance, fewer parameters and lower computational cost. Specifically, Params value of Swin-GAN model is 30.254M, and Floating-Point Operations Per Second (FLOPs) value is 4.086G. Inception Score (IS) is 9.04 and Fréchet Inception Distance (FID) is 9.23 in CIFAR-10.
Springer