Bart: new cache format #35314

zucchini-nlp · 2024-12-18T10:38:11Z

What does this PR do?

As per title, enables new cache format in Bart ans several models copied from Bart. Since there are too many models copying attention from Bart, I decided to not touch the audio ones and changed their "Copied from" statements

TODO:

Run all tests for models with new cache + slow ✅

HuggingFaceDocBuilderDev · 2024-12-18T11:04:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

tests/models/biogpt/test_modeling_biogpt.py

zucchini-nlp · 2025-01-13T12:54:27Z

cc @BenjaminBossan I am running slow tests from transformers side and the current state of PR should be almost ready for review. So we might need to run PEFT tests now

BenjaminBossan · 2025-01-13T13:08:28Z

I am running slow tests from transformers side and the current state of PR should be almost ready for review. So we might need to run PEFT tests now

Thanks for the ping. I skimmed the PR and if I'm not mistaken, of all the models that were changed, Bart is the only one that is covered in the PEFT test suite. Therefore, running tests with -k bart should be sufficient. LMK if you run the tests, otherwise I can get to it later.

zucchini-nlp · 2025-01-13T13:43:36Z

Cool, the code owners tagged all relevant people. Ready for review!

Slow tests for text models that now support cache class are passing on my end

gante

In general LGTM. A few minor nits, hence the approval.

I'm assuming slow tests were run for all touched models, and there are no regression with respect to main 🔍

(I've reviewed /generation, /models/bart, /tests/generation, and /tests/models/bart. I'm assuming other models follow the same pattern)

tests/models/bart/test_modeling_bart.py

src/transformers/models/bart/modeling_bart.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

src/transformers/models/bart/modeling_bart.py

ducviet00 · 2025-03-14T02:34:40Z

@zucchini-nlp
I am curious about the status of this PR. Could you update it?

zucchini-nlp · 2025-03-14T09:30:12Z

@ducviet00 sorry but the PR is blocked by another one.These got stale a bit since we had some higher priority releases recently. I will be back on Bart from next week and get it merged, thanks

ducviet00 · 2025-05-14T04:12:32Z

Hi @zucchini-nlp
Can we have bart with new cache format, it will boost performance much when using with torch.compile

zucchini-nlp · 2025-05-14T09:52:00Z

It is currently blocked by another PR (#35786). cc @ArthurZucker can you review it again please?

ducviet00 · 2025-05-14T10:01:25Z

awesome @zucchini-nlp thank you so much

ArthurZucker

LGTM but let's refactor the EncoderDecoderCache to hide the legacy complicated logic!

src/transformers/models/bart/modeling_bart.py

ArthurZucker · 2025-05-15T12:10:51Z

src/transformers/models/bart/modeling_bart.py

+        position_ids = cache_position.unsqueeze(0)
+        position_ids = self.embed_positions(input, past_key_values_length, position_ids=position_ids)
+        position_ids = position_ids.to(inputs_embeds.device)


unsqueeze can be done in the embed positions, no?

Oh yeah, in case of Bart we could. I wanted like the module to expect correct 2D position ids, to account for padding. But Bart apparently never used padded positions

ArthurZucker · 2025-05-15T12:11:23Z

src/transformers/models/bart/modeling_bart.py

+            next_cache = past_key_values.self_attention_cache
+        if return_legacy_cache:
+            next_cache = past_key_values.to_legacy_cache()
+
        if not return_dict:


can return tuple would be welcome as well but no worries

src/transformers/models/bigbird_pegasus/modeling_bigbird_pegasus.py

src/transformers/models/blenderbot/modeling_blenderbot.py

src/transformers/models/m2m_100/modeling_m2m_100.py

ArthurZucker

Very nice let's go!

smolvlm.py

ArthurZucker · 2025-05-16T08:05:24Z

src/transformers/models/bart/modeling_bart.py

+            return_legacy_cache = True
+            logger.warning_once(
+                "Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.58.0. "
+                "You should pass an instance of `EncoderDecoderCache` instead, e.g. "
+                "`past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`."
            )


let's put this warning inside the from_legacy_cache directly!

I don't think we can put a specific version inside cache. Each model is deprecated until different releases, because we update then slowly at lower priority than other tasks

Also, the from_legacy_cache per se isn't deprecated and will be still available. the warning applies only for model's forward pass

zucchini-nlp · 2025-05-16T11:28:35Z

cc @BenjaminBossan, totally forgot to ping you. Bart is basically same as T5, hope it won't cause red CI on peft 😅

zucchini-nlp added 7 commits December 16, 2024 17:14

bart compile

95f8c97

add mbart

16eef87

some more models touched by fix-copies

173eb69

more

a650eb8

more models

fb9c7bf

even more models

f8dbbd3

fix copies

b7b6b4f

zucchini-nlp added 4 commits December 18, 2024 15:46

fix tests

0c1bfbb

fix copies

d6ea64a

fix

1b93312

biogpt accepts position ids now (breaking?)

cda66f0

zucchini-nlp changed the title ~~[WIP] Bart: new cache format~~ Bart: new cache format Dec 18, 2024

zucchini-nlp mentioned this pull request Jan 9, 2025

tracker: generate compatibility with torch.compile #28981

Closed

33 tasks

zucchini-nlp added 2 commits January 10, 2025 12:42

fix failing non-slow tests

a33e663

fix some tests

ffb734b

zucchini-nlp requested review from gante, Rocketknight1 and ArthurZucker as code owners January 10, 2025 17:38

zucchini-nlp removed the request for review from Rocketknight1 January 10, 2025 17:38

zucchini-nlp commented Jan 10, 2025

View reviewed changes

tests/models/biogpt/test_modeling_biogpt.py Show resolved Hide resolved

zucchini-nlp added 2 commits January 10, 2025 18:41

should not be removed

a01b959

Merge remote-tracking branch 'upstream/main' into bart-compile

75c01e9

small update

490d0c2

zucchini-nlp requested a review from eustlb as a code owner January 13, 2025 13:42

gante approved these changes Jan 13, 2025

View reviewed changes

Update src/transformers/models/bart/modeling_bart.py

989b187

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

zucchini-nlp added 4 commits January 15, 2025 14:30

update for last main

0ed7456

Merge remote-tracking branch 'upstream/main' into bart-compile

29f1d71

fix copies

3819ff5

clone update_causal_mask from llama

735b8af

zucchini-nlp mentioned this pull request Jan 20, 2025

🔴 Remove head mask in generative models #35786

Merged

tmp

6324efc

ducviet00 reviewed Jan 23, 2025

View reviewed changes

src/transformers/models/bart/modeling_bart.py Outdated Show resolved Hide resolved

ducviet00 mentioned this pull request Feb 6, 2025

Add support for Florence-2 #34160

Closed

ducviet00 approved these changes Feb 6, 2025

View reviewed changes

zucchini-nlp added 5 commits May 15, 2025 11:17< 9E88 /relative-time>

Merge remote-tracking branch 'upstream/main' into bart-compile

7667cc2

fixup

a5f1763

why? how?

2a93037

fix bart tests

d3f72cf

dont skip test

7c5e22a

ArthurZucker approved these changes May 15, 2025

View reviewed changes

zucchini-nlp added 3 commits May 15, 2025 18:08

address comments

7c7c0a8

fix tests

601fed1

fix

c9fbaca

ArthurZucker approved these changes May 16, 2025

View reviewed changes

fixup and delete the file

602e16f

zucchini-nlp merged commit 01ad9f4 into huggingface:main May 16, 2025
20 checks passed

vasqu mentioned this pull request May 19, 2025

🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models #38108

Merged

38 tasks

Bart: new cache format #35314

Bart: new cache format #35314

Uh oh!

Conversation

Uh oh!

What does this PR do?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!