Refactor `MambaCache` to `modeling_mamba.py` #38086

manueldeprada · 2025-05-12T15:14:46Z

This PR moves the specialized MambaCache class from cache_utils.py to src/transformers/models/mamba/modeling_mamba.py. This is preliminary work for #38077.

Changes:

Moved MambaCache to its own file, aligning with Zamba, Bamba, etc.
Removed unnecessary Mamba-specific code from generate.
Added Mamba cache init to preparing_inits_from_generation from forward(). See this comment.
- Why? Bamba, Jamba, GraniteMoeHybrid, Zamba, and Zamba2 had settled into initializing custom caches in preparing_inits_from_generation and only Mamba, Mamba2, and FalconMamba were doing it in forward, which is bad for torch.compile.
We dont break BC with any import (thanks joao for the idea!)
Cleaned up some Mamba and FalconMamba slow tests, which had been failing on main for a long time.
Removed DDP Mamba tests. I had a DDP implementation for Mamba in 66b7162 so tests passed but removed it since DDP is not needed in Mamba, as per Joaos's instructions.

github-actions · 2025-05-12T15:14:58Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

HuggingFaceDocBuilderDev · 2025-05-12T15:36:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

LGTM, added a few tiny nits 👍

src/transformers/cache_utils.py

src/transformers/__init__.py

src/transformers/generation/configuration_utils.py

gante · 2025-05-13T09:59:31Z

@manueldeprada have you run slow mamba (and falcon mamba) tests to ensure there are no regressions?

(after we confirm slow tests are okay, let's tag arthur)

manueldeprada · 2025-05-14T13:15:35Z

@manueldeprada have you run slow mamba (and falcon mamba) tests to ensure there are no regressions?

(after we confirm slow tests are okay, let's tag arthur)

Ran the tests locally, there are some failures that already happened before the PR:

FAILED tests/models/mamba/test_modeling_mamba.py::MambaModelTest::test_multi_gpu_data_parallel_forward - TypeError: 'MambaCache' object is not iterable
FAILED tests/models/mamba/test_modeling_mamba.py::MambaIntegrationTests::test_compile_mamba_cache - AssertionError: Attempt to trace forbidden callable <function mark_static_address at 0x7fd585001440>
---
FAILED tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaModelTest::test_multi_gpu_data_parallel_forward - TypeError: 'MambaCache' object is not iterable
FAILED tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_batched_generation - AssertionError: Lists differ: ['Hello today I am going to be talking abo[148 chars]The"] != ["Hello today I'm going to show you how to[149 chars].\n']
FAILED tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_4bit - AssertionError: "Hello today Iava,\n\nI'm sorry to hear that you're having trouble with the " != 'Hello today I\'m going to talk about the "C" in the "C-I-'
FAILED tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_bf16 - AssertionError: "Hello today Iava,\n\nI'm sorry to hear t[31 chars]the " != 'Hello today I am going to show you how t[47 chars]Step'
FAILED tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_torch_compile - AssertionError: "Hello today Iava,\n\nI'm sorry to hear t[31 chars]the " != 'Hello today I am going to show you how t[47 chars]Step'

will run in CI once I have permissions.

…into main

manueldeprada · 2025-05-14T13:51:54Z

Same failures on CI than locally @gante . Let me know if those slow tests should be fixed, ignored or deleted.

…into main

ArthurZucker

Thanks! There are still quite a few outstanding issue notably the imports that are not at the top

src/transformers/models/falcon_mamba/configuration_falcon_mamba.py

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

ArthurZucker · 2025-07-15T08:25:53Z

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

        if use_cache:
+            if cache_params_not_initialized:


Suggested change

if cache_params_not_initialized:

if use_cache and cache_params_not_initialized:

The reason is keeping BC with respect to how generate() did things before, now that we are moving MambaCache out of generate. I just rewrote the same behaviour without a new variable. See this comment: #38086 (comment)

sorry its strictly the same given the indentation no?

src/transformers/models/mamba/modeling_mamba.py

src/transformers/models/mamba2/modeling_mamba2.py

src/transformers/models/mamba/modeling_mamba.py

manueldeprada · 2025-07-16T15:23:13Z

src/transformers/models/mamba/modeling_mamba.py

@@ -651,6 +765,8 @@ def prepare_inputs_for_generation(
    ):
        # Overwritten -- uses `cache_params` as opposed to `past_key_values`

+        if use_cache and cache_params is None:


this 2 new additions are needed for BC and affect all classes that use MambaCache: it emulates the order in which generate() initialized MambaCache. See here

I don't think it is relevant to modify the code to emulate generate. generate is the abstraction that needs to be changed

ArthurZucker

Thanks for proposing the pragma! I think we can try not renaming functions/import sources at all, I don't think any model does that today!

src/transformers/models/falcon_mamba/modular_falcon_mamba.py

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

ArthurZucker · 2025-07-17T09:01:09Z

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

+        selective_scan_fn,
+        causal_conv1d_fn,
+        causal_conv1d_update,
+        mamba_inner_fn,  # modular: no_replace


if (IF) we have the pragma, should not appear here

This reverts commit 80b1cf1.

manueldeprada · 2025-07-17T23:55:09Z

utils/modular_model_converter.py

@@ -141,7 +141,12 @@ def leave_Name(self, original_node, updated_node):
        return updated_node

    def leave_ImportFrom(self, original_node, updated_node):
-        """The imports from other file types (configuration, processing etc) should use original model name."""


Skipping rename for absolute imports does not affect other models and is probably a reasonable assumption.

Relative imports need to be renamed: from .configutation_mamba import ...MambaConfig

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

ArthurZucker

A few small things left!

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py

ArthurZucker · 2025-07-21T09:52:33Z

src/transformers/models/mamba/modeling_mamba.py

@@ -651,6 +765,8 @@ def prepare_inputs_for_generation(
    ):
        # Overwritten -- uses `cache_params` as opposed to `past_key_values`

+        if use_cache and cache_params is None:


I don't think it is relevant to modify the code to emulate generate. generate is the abstraction that needs to be changed

github-actions · 2025-07-21T11:38:36Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: falcon_mamba, mamba, mamba2

manueldeprada · 2025-07-21T11:45:31Z

Thanks, @ArthurZucker! I ended up refactoring the entire prepare_inputs_for_generation method to make it clearer, rather than just making minimal changes to get the tests passing. I’ll do this from the start going forward!

lmk if theres anything left!

ArthurZucker

Thanks, that looks better 😉

* Refactor MambaCache to modeling_mamba.py (parity with Zamba) * ruff * fix dummies * update * update * remove mamba ref in cache tests * remove cache_implementation from tests * update * ruff * ruff * sneaky regression * model consistency * fix test_multi_gpu_data_parallel_forward * fix falcon slow tests * ruff * ruff * add sample false * try to fix slow tests * Revert "fix test_multi_gpu_data_parallel_forward" This reverts commit 66b7162. * fix tests on nvidia t4, remove dataparallel tests from mamba * ruff * remove DDP tests from mamba and falcon_mamba * add explicit error for MambaCache * mamba2 also needs to init cache in prepare_inputs_for_generation * ruff * ruff * move MambaCache to its own file * ruff * unprotected import fix * another attempt to fix unprotected imports * Revert "another attempt to fix unprotected imports" This reverts commit 2338354. * fixing unprotected import, attempt 3 * Update src/transformers/cache_utils.py * ruff's fault * fix arthur review * modular falcon mamba * found a hack * fix config docs * fix docs * add export info * merge modular falcon branch * oopsie * fix fast path failing * new approach * oopsie * fix types * Revert new pragma in modular This reverts commit 80b1cf1. * trying another modular workaround * review & fix ci * oopsie * clear prepare_inputs on mamba/mamba2/falcon_mamba

Refactor MambaCache to modeling_mamba.py (parity with Zamba)

1755d6f

github-actions bot marked this pull request as draft May 12, 2025 15:14

manueldeprada and others added 3 commits May 12, 2025 17:16

ruff

93f7b8a

Merge branch 'main' into main

be81dae

fix dummies

dbdf2cc

manueldeprada marked this pull request as ready for review May 12, 2025 15:30

github-actions bot requested review from ArthurZucker and Rocketknight1 May 12, 2025 15:30

manueldeprada removed request for Rocketknight1 and ArthurZucker May 12, 2025 15:31

manueldeprada requested a review from gante May 12, 2025 15:45

manueldeprada and others added 3 commits May 12, 2025 20:57

update

1237dcc

update

1b07f7f

Merge branch 'main' into main

1ec3d4f

gante approved these changes May 13, 2025

View reviewed changes

src/transformers/cache_utils.py Show resolved Hide resolved

src/transformers/__init__.py Show resolved Hide resolved

src/transformers/generation/configuration_utils.py Show resolved Hide resolved

manueldeprada and others added 3 commits May 13, 2025 17:42

remove mamba ref in cache tests

39e0edc

remove cache_implementation from tests

09ffb0c

Merge branch 'main' into main

d490d08

Merge branch 'main' of https://github.com/manueldeprada/transformers …

a9e445b

…into main

huggingface deleted a comment from github-actions bot May 14, 2025

manueldeprada mentioned this pull request May 15, 2025

Cache System Refactor: Layered Architecture #38077

Draft

33 tasks

manueldeprada added 4 commits May 20, 2025 17:12

update

cae297a

Merge remote-tracking branch 'upstream/main' into main

64541dc

ruff

4b624dc

ruff

b1987f8

Merge branch 'main' of https://github.com/manueldeprada/transformers …

203f103

…into main

manueldeprada mentioned this pull request Jun 26, 2025

Add MambaCache into modeling_mamba and make FalconMamba modular. #39065

Closed

manueldeprada added 2 commits June 26, 2025 22:17

Merge branch 'main' into main

1ec801c

Merge branch 'main' into main

cb21911

manueldeprada mentioned this pull request Jun 30, 2025

[cache refactor] Move all the caching logic to a per-layer approach #39106

Merged

Merge branch 'main' into main

a1044bb

ArthurZucker reviewed Jul 15, 2025

View reviewed changes

manueldeprada added 6 commits July 15, 2025 16:00

fix fast path failing

19d7018

new approach

80b1cf1

oopsie

98cdaab

fix types

339f63a

Merge branch 'main' of github.com:huggingface/transformers into main

1f8b637

Merge branch 'main' of github.com:huggingface/transformers into main

2ffeafd

manueldeprada changed the title ~~Refactor MambaCache to modeling_mamba.py (parity with Zamba)~~ Refactor MambaCache to modeling_mamba.py Jul 16, 2025

manueldeprada commented Jul 16, 2025

View reviewed changes

ArthurZucker reviewed Jul 17, 2025

View reviewed changes

manueldeprada added 5 commits July 17, 2025 11:34

Revert new pragma in modular

2963992

This reverts commit 80b1cf1.

trying another modular workaround

f4776e4

review & fix ci

b56321f

Merge branch 'main' of github.com:huggingface/transformers into main

6a76b05

oopsie

bfb8470

manueldeprada commented Jul 17, 2025

View reviewed changes

src/transformers/models/falcon_mamba/modeling_falcon_mamba.py Show resolved Hide resolved

ArthurZucker reviewed Jul 21, 2025

View reviewed changes

clear prepare_inputs on mamba/mamba2/falcon_mamba

1306594

ArthurZucker approved these changes Jul 21, 2025

View reviewed changes

manueldeprada merged commit 1aa7256 into huggingface:main Jul 21, 2025
25 checks passed

	if cache_params_not_initialized:
	if use_cache and cache_params_not_initialized:

Refactor MambaCache to modeling_mamba.py #38086

Refactor MambaCache to modeling_mamba.py #38086

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Refactor `MambaCache` to `modeling_mamba.py` #38086

Refactor `MambaCache` to `modeling_mamba.py` #38086