8000 [FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100` by eqy · Pull Request #152378 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[FP8][CUTLASS] xFail honor_sm_carveout on sm100 #152378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

[FP8][CUTLASS] xFail honor_sm_carveout on sm100 #152378

wants to merge 2 commits into from

Conversation

eqy
Copy link
Collaborator
@eqy eqy commented Apr 28, 2025

CUTLASS only supports SM carveout via green contexts on sm100

cc @ptrblck @msaroufim @jerryzh168 @yanbing-j @vkuzo @albanD @kadeng @penguinwu

8000

@eqy eqy added module: cuda Related to torch.cuda, and CUDA support in general open source topic: not user facing topic category matrix multiplication module: float8 For torch.float8_e5m2 and torch.float8_e4m3 labels Apr 28, 2025
@eqy eqy requested a review from lw April 28, 2025 23:33
Copy link
pytorch-bot bot commented Apr 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152378

Note: Links to docs will display an error until the docs builds have been completed.

❌ 50 New Failures, 6 Cancelled Jobs, 2 Unrelated Failures

As of commit 9d23557 with merge base 119f64d (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 29, 2025
Copy link
Collaborator
@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. I would expect the scaled_mm op to still work, even if it is at lower performance?

@eqy
Copy link
Collaborator Author
eqy commented Apr 29, 2025

I'm confused. I would expect the scaled_mm op to still work, even if it is at lower performance?

This test isn't really about exercising scaled_mm, which is still expected to work. Rather it's about gating the number of SMs that the kernel launches, which CUTLASS currently only respects on sm90. In fact the sm setting in the params is expected to throw a compile time warning that it will have no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
matrix multiplication module: cuda Related to torch.cuda, and CUDA support in general module: float8 For torch.float8_e5m2 and torch.float8_e4m3 open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0