Computer Science > Computation and Language

arXiv:2402.09394 (cs)

[Submitted on 14 Feb 2024 (v1), last revised 29 Mar 2024 (this version, v2)]

Title:Long-form evaluation of model editing

Authors:Domenic Rosati, Robie Gonzales, Jinkun Chen, Xuemin Yu, Melis Erkan, Yahya Kayani, Satya Deepika Chavatapalli, Frank Rudzicz, Hassan Sajjad

View PDF HTML (experimental)

Abstract:Evaluations of model editing currently only use the `next few token' completions after a prompt. As a result, the impact of these methods on longer natural language generation is largely unknown. We introduce long-form evaluation of model editing (LEME) a novel evaluation protocol that measures the efficacy and impact of model editing in long-form generative settings. Our protocol consists of a machine-rated survey and a classifier which correlates well with human ratings. Importantly, we find that our protocol has very little relationship with previous short-form metrics (despite being designed to extend efficacy, generalization, locality, and portability into a long-form setting), indicating that our method introduces a novel set of dimensions for understanding model editing methods. Using this protocol, we benchmark a number of model editing techniques and present several findings including that, while some methods (ROME and MEMIT) perform well in making consistent edits within a limited scope, they suffer much more from factual drift than other methods. Finally, we present a qualitative analysis that illustrates common failure modes in long-form generative settings including internal consistency, lexical cohesion, and locality issues.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.09394 [cs.CL]
	(or arXiv:2402.09394v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.09394

Submission history

From: Domenic Rosati [view email]
[v1] Wed, 14 Feb 2024 18:45:14 UTC (3,081 KB)
[v2] Fri, 29 Mar 2024 21:17:23 UTC (3,081 KB)

Computer Science > Computation and Language

Title:Long-form evaluation of model editing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Long-form evaluation of model editing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators