[WIP][Inductor-CPU] int8 WoQ concat linear #153004

sanchitintel · 2025-05-06T23:20:11Z

Summary

int8 WoQ GEMM concat linear optimization pertaining to the same activation applied to 3 sets of weights of the same shape.

Add UT corresponding to torchao pattern

Perf data

GPT-J 128 input tokens, 128 output tokens.
32 physical cores of one socket of Intel(R) Xeon(R) 6972P (Xeon Gen 5). tcmalloc & Intel OpenMP were preloaded.

May 8 nightly first token latency	First token latency with this implementation	Rest token latency with May 8 nightly	Rest token latency with this implementation combined with #149373
202 ms	190 ms	33 ms	30 ms

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @chunyuan-w @leslie-fang-intel @Xia-Weiwen

pytorch-bot · 2025-05-06T23:20:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153004

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0786b16 with merge base d1f1ff8 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge) (gh) (similar failure)
backends/xnnpack/test/ops/test_conv1d.py::TestConv1d::test_qs8_conv1d_batchnorm_seq

This comment was automatically generated by Dr. CI and updates every 15 minutes.

sanchitintel added 3 commits May 6, 2025 16:09

[skip ci] WoQ concat linear Change 1

558dc10

[skip ci] Change 2/3

406d8ad

[Change 3/3] Add concat linear

6a92f61

pytorch-bot bot added ciflow/inductor module: inductor labels May 6, 2025

sanchitintel added the topic: not user facing topic category label May 6, 2025

pytorchbot added the open source label May 6, 2025

sanchitintel added 4 commits May 9, 2025 00:06

Fix all corner-cases

e7a197b

Reduce duplicate code

545e0bc

Remove redundant code

175a0b0

Merge branch 'pytorch:main' into woq_concat_linear

0786b16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][Inductor-CPU] int8 WoQ concat linear #153004

[WIP][Inductor-CPU] int8 WoQ concat linear #153004

[WIP][Inductor-CPU] int8 WoQ concat linear #153004

Are you sure you want to change the base?

[WIP][Inductor-CPU] int8 WoQ concat linear #153004

Conversation

Summary

Perf data

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153004

✅ You can merge normally! (1 Unrelated Failure)