Fix error in calculating `cache_position` with past_length for Chatglm and Mamba model #38134

kailixu-x · 2025-05-15T03:00:33Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…m and Mamba model Signed-off-by: Kaili Xu <kaili.xu@intel.com>

github-actions · 2025-05-15T03:00:44Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-05-15T13:28:09Z

cc @gante for generation, but is there an issue or an explanation somewhere for why these changes are needed?

kailixu-x · 2025-05-16T01:50:04Z

e or an explanation somewhe

I am working with model chatglm and mamba,
https://huggingface.co/THUDM/chatglm3-6b,
https://huggingface.co/mistralai/Mamba-Codestral-7B-v0.1,
the 2 models' past_length calculation is not handled correctly, the 1st one use shape[0], the 2nd one use cache_param, so i add the WR.

gante · 2025-05-22T09:37:41Z

@manueldeprada with the PR you have open for mambacache, this won't be needed, correct?

gante · 2025-05-22T09:39:37Z

src/transformers/generation/utils.py

+                if "ChatGLM" in self.__class__.__name__:
+                    past_length = cache[0][0].shape[0]


This would be needed because ChatGLM, a custom model, also uses a custom cache format.

We don't add logic for custom model's custom choices in transformers, my advice would be to open a PR in the Hub repo so as to fix your issue 🤗

manueldeprada · 2025-05-22T10:01:28Z

yes, with #38086 the cache position calculation is more consistent in with mamba models. @kailixu-x could you provide a simple snippet that shows the bug for mamba? thanks for reporting!!

manueldeprada · 2025-05-22T10:02:29Z

src/transformers/generation/utils.py

+        elif model_kwargs.get("cache_params") is not None:
+            cache = model_kwargs["cache_params"]
+            past_length = 0
+            if hasattr(cache, "seqlen_offset"):


where does seqlen_offset come from? custom MambaCache implementation? I can only find it on falcon_h1 in transformers codebase.

kailixu-x · 2025-05-22T11:32:27Z

my modification in src/transformers/generation/utils.py :: _get_initial_cache_position()
my mamba model is mamba and mamba2
my use case is: call generate() with my mamba2 cache,
in Mamba2Output layput, it is "cache_param", not "past_key_values" -> https://github.com/huggingface/transformers/blob/v4.46.3/src/transformers/models/mamba2/modeling_mamba2.py#L762
seqlen_offset comes from->
https://github.com/huggingface/transformers/blob/v4.46.3/src/transformers/models/mamba2/modeling_mamba2.py#L132

Fix error in calculating cache_po 8000 sition with past_length for Chatgl…

42f294b

…m and Mamba model Signed-off-by: Kaili Xu <kaili.xu@intel.com>

github-actions bot marked this pull request as draft May 15, 2025 03:00

kailixu-x marked this pull request as ready for review May 15, 2025 03:29

github-actions bot requested a review from gante May 15, 2025 03:29

gante reviewed May 22, 2025

View reviewed changes

manueldeprada reviewed May 22, 2025

View reviewed changes

kailixu-x closed this 71A3 Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix error in calculating `cache_position` with past_length for Chatglm and Mamba model #38134

Fix error in calculating `cache_position` with past_length for Chatglm and Mamba model #38134

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		if "ChatGLM" in self.__class__.__name__:
		past_length = cache[0][0].shape[0]

Fix error in calculating cache_position with past_length for Chatglm and Mamba model #38134

Fix error in calculating cache_position with past_length for Chatglm and Mamba model #38134

Uh oh!

Conversation

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fix error in calculating `cache_position` with past_length for Chatglm and Mamba model #38134

Fix error in calculating `cache_position` with past_length for Chatglm and Mamba model #38134