As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Pre-trained Large Language Models (LLMs) have demonstrated prominent generalization to various linguistic tasks. However, due to the inherent modality and task discrepancy, parameter-efficient transfer learning for adapting LLMs to vision-language (VL) tasks remains challenging, which may struggle with excessive extra computation and data expenditure for VL pre-training and disconnection between multi-modal representations. This paper concentrates on the parameter-efficient adaptation of LLMs to VL tasks without inflexible multi-modal alignment pre-training on additional image-text pairs. Inspired by Instruction Tuning and the nature of multi-modal representation learning, we propose Multi-modal Prompt Tuning for Language Models (MPT4LM). This method provides text-relevant visual prompts via a plug-and-play Cross-Attention module and integrates them with textual Learnable Instruction as multi-modal prompts into LLMs. We further assemble MPT4LM with the currently prevalent Adapter approach to alleviate the trainable parameter scale and facilitate the collaboration of multi-modal prompts. We evaluate MPT4LM upon two representative LLMs: LLAMA-2 and Flan-T5, over two VL tasks: Visual Question Answering (VQAv2.0, GQA) and Visual Entailment (SNLI-VE). Extensive experimental results reveal that MPT4LM achieves state-of-the-art performance among prompting methods with only fine-tuning about 0.65% of the parameters of backbones, indicating a better trade-off between computation and data overhead and model performance. Our code is available at: https://github.com/YzM1a0/MPT4LM.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.