8000 Move GLM4 f32 attention fix to the correct function by 0cc4m · Pull Request #13750 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

Move GLM4 f32 attention fix to the correct function #13750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2025
Merged

Conversation

0cc4m
Copy link
Collaborator
@0cc4m 0cc4m commented May 24, 2025

@ggerganov You merged SWA support (#13194) 3 hours before I merged my GLM4 fix (#13639). They touched the same build_attn functions, so there should have been a merge conflict. For whatever reason, my patch was applied to the newly-created build_attn function with a unified_iswa kv cache, which is not used by GLM4. So it didn't work anymore. Here's the fix, moving my patch back to the unified build_attn function... I don't think I've seen something like this before, quite the coincidence.

@0cc4m 0cc4m requested a review from ggerganov May 24, 2025 13:31
@ggerganov
Copy link
Member

Huh, that's indeed strange why we didn't get a merge conflict.

@LostRuins
Copy link
Collaborator

Thanks for spotting it quickly

@0cc4m 0cc4m merged commit 259469c into master May 24, 2025
46 checks passed
@0cc4m 0cc4m deleted the 0cc4m/glm4-fix2 branch May 24, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0