8000 Hotfix: Flash Attention 2 support in Pixtral by uminaty · Pull Request #38146 · huggingface/transformers · GitHub
[go: up one dir, main page]

Skip to content

Hotfix: Flash Attention 2 support in Pixtral #38146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 15, 2025

Conversation

uminaty
Copy link
Contributor
@uminaty uminaty commented May 15, 2025

Context

Pixtral support for ALL_ATTENTION_FUNCTIONS was added in this PR, but a subsequent rebase unintentionally modified a line that sets attention_mask to None when using Flash Attention 2.

Currently, without this condition, using Flash Attention 2 with Pixtral raises the following error:

RuntimeError: cu_seqlens_q must have shape (batch_size + 1)

Setting attention_mask to None resolves the issue. It also appears that the current tests doesn’t catch this case.

cc: @zucchini-nlp, @ArthurZucker

@github-actions github-actions bot marked this pull request as draft May 15, 2025 09:10
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@uminaty uminaty marked this pull request as ready for review May 15, 2025 09:10
Copy link
Member
@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a lot! Moving our convo here, I realize that adding a proper test for all models would take some time. No problem for me then, I will add it o my todo :)

@zucchini-nlp
Copy link
Member

run-slow: pixtral

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/pixtral']
quantizations: [] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp merged commit b11b28c into huggingface:main May 15, 2025
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0