Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.05400 (cs)

[Submitted on 9 Oct 2023]

Title:Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Authors:Shiyue Cao, Yueqin Yin, Lianghua Huang, Yu Liu, Xin Zhao, Deli Zhao, Kaiqi Huang

View PDF

Abstract:Vector-quantized image modeling has shown great potential in synthesizing high-quality images. However, generating high-resolution images remains a challenging task due to the quadratic computational overhead of the self-attention process. In this study, we seek to explore a more efficient two-stage framework for high-resolution image generation with improvements in the following three aspects. (1) Based on the observation that the first quantization stage has solid local property, we employ a local attention-based quantization model instead of the global attention mechanism used in previous methods, leading to better efficiency and reconstruction quality. (2) We emphasize the importance of multi-grained feature interaction during image generation and introduce an efficient attention mechanism that combines global attention (long-range semantic consistency within the whole image) and local attention (fined-grained details). This approach results in faster generation speed, higher generation fidelity, and improved resolution. (3) We propose a new generation pipeline incorporating autoencoding training and autoregressive generation strategy, demonstrating a better paradigm for image synthesis. Extensive experiments demonstrate the superiority of our approach in high-quality and high-resolution image reconstruction and generation.

Comments:	This paper is accepted to ICCV2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.05400 [cs.CV]
	(or arXiv:2310.05400v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.05400

Submission history

From: Xin Zhao [view email]
[v1] Mon, 9 Oct 2023 04:38:52 UTC (13,151 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Efficient-VQGAN: Towards High-Resolution Image Generation with Efficient Vision Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators