Computer Science > Information Retrieval

arXiv:2401.11509 (cs)

[Submitted on 21 Jan 2024 (v1), last revised 5 Jul 2024 (this version, v2)]

Title:Simple Domain Adaptation for Sparse Retrievers

Authors:Mathias Vast, Yuxuan Zong, Basile Van Cooten, Benjamin Piwowarski, Laure Soulier

Abstract:In Information Retrieval, and more generally in Natural Language Processing, adapting models to specific domains is conducted through fine-tuning. Despite the successes achieved by this method and its versatility, the need for human-curated and labeled data makes it impractical to transfer to new tasks, domains, and/or languages when training data doesn't exist. Using the model without training (zero-shot) is another option that however suffers an effectiveness cost, especially in the case of first-stage retrievers. Numerous research directions have emerged to tackle these issues, most of them in the context of adapting to a task or a language. However, the literature is scarcer for domain (or topic) adaptation. In this paper, we address this issue of cross-topic discrepancy for a sparse first-stage retriever by transposing a method initially designed for language adaptation. By leveraging pre-training on the target data to learn domain-specific knowledge, this technique alleviates the need for annotated data and expands the scope of domain adaptation. Despite their relatively good generalization ability, we show that even sparse retrievers can benefit from our simple domain adaptation method.

Comments:	Accepted at ECIR 2024
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:2401.11509 [cs.IR]
	(or arXiv:2401.11509v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2401.11509
Journal reference:	Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14610
Related DOI:	https://doi.org/10.1007/978-3-031-56063-7_32

Submission history

From: Mathias Vast [view email]
[v1] Sun, 21 Jan 2024 14:35:54 UTC (152 KB)
[v2] Fri, 5 Jul 2024 16:28:47 UTC (121 KB)

Computer Science > Information Retrieval

Title:Simple Domain Adaptation for Sparse Retrievers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Simple Domain Adaptation for Sparse Retrievers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators