GPT4Video

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

Zhanyu Wang, Longyue Wang*, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou*, Shuming Shi, Zhaopeng Tu

Tencent AI Lab, University of Sydney (*Correspondence)

✨ Demo video

11.24.1.mp4

Framework

Video Encoding stage: The video encoding module employs a frozen ViT-L/14 model to capture raw video features, while the video abstraction module utilizes a transformer-based cross attention layer and two novel learnable tokens, designed to condense information along the temporal and spatial axes.

LLM reasoning: The core of GPT4Video is powered by a frozen LLaMA model, efficiently fine-tuned via LoRA. The LLM is trained with custom video-centric and safety-aligned data, enabling it to comprehend videos and generate appropriate video prompts (indicated by underlined text).

Video Generation: The prompts generated by LLM are then used as text inputs for the models in the Text-to-Video Model Gallery to create videos. We use ZeroScope as our video generation model in this work.

Citation

@articles{wang2023gpt4video,
  title={GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation},
  author={Zhanyu Wang, Longyue Wang, Minghao Wu, Zhen Zhao, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu},
  journal = {CoRR},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__assets__		__assets__
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT4Video

✨ Demo video

Framework

Citation

About

Releases

Packages

Contributors 2

License

gpt4video/GPT4Video

Folders and files

Latest commit

History

Repository files navigation

GPT4Video

✨ Demo video

Framework

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages