Computer Science > Computation and Language

arXiv:2407.12841 (cs)

[Submitted on 4 Jul 2024]

Title:What to do if language models disagree? Black-box model ensembling for textual and visual question answering

Authors:Yuxi Xia, Kilm Zaporojets, Benjamin Roth

Abstract:A diverse range of large language models (LLMs), e.g., ChatGPT, and visual question answering (VQA) models, e.g., BLIP, have been developed for solving textual and visual question answering tasks. However, both LLMs and VQA models encounter challenges when applied to task-specific datasets. Fine-tuning these models is either difficult, as it requires access via APIs, rendering them as black-boxes, or costly due to the need of tuning a large number of parameters. To address this, we introduce InfoSel, a data-efficient and lightweight ensemble method that learns to dynamically pick the winner from existing black-box models for predictions on both textual and multimodal visual question answering tasks. Unlike traditional ensemble models, InfoSel does not rely on prediction probabilities or confidences, which typically are not available in black-box models. Experimental results on four datasets demonstrate that our approach achieves an absolute increase of up to +5.27% in the F1-score compared to standalone LLMs. Remarkably, this improvement is achieved by utilizing only 1K training instances and 110M model parameters for training task-specific ensemble models.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.12841 [cs.CL]
	(or arXiv:2407.12841v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.12841

Submission history

From: Yuxi Xia [view email]
[v1] Thu, 4 Jul 2024 12:59:10 UTC (9,949 KB)

Computer Science > Computation and Language

Title:What to do if language models disagree? Black-box model ensembling for textual and visual question answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:What to do if language models disagree? Black-box model ensembling for textual and visual question answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators