8000 [WIP][Inductor-CPU] int8 WoQ concat linear by sanchitintel · Pull Request #153004 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[WIP][Inductor-CPU] int8 WoQ concat linear #153004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sanchitintel
Copy link
Collaborator
@sanchitintel sanchitintel commented May 6, 2025

Summary

int8 WoQ GEMM concat linear optimization pertaining to the same activation applied to 3 sets of weights of the same shape.

  • Add UT corresponding to torchao pattern

Perf data

GPT-J 128 input tokens, 128 output tokens.
32 physical cores of one socket of Intel(R) Xeon(R) 6972P (Xeon Gen 5). tcmalloc & Intel OpenMP were preloaded.

May 8 nightly first token latency First token latency with this implementation Rest token latency with May 8 nightly Rest token latency with this implementation combined with #149373
202 ms 190 ms 33 ms 30 ms

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @chunyuan-w @leslie-fang-intel @Xia-Weiwen

Copy link
pytorch-bot bot commented May 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153004

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0786b16 with merge base d1f1ff8 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0