[cpu][vec] support reduce ops for add and max #144065

Valentine233 · 2025-01-02T08:18:05Z

Description

During the support of INT8 SDPA pytorch/ao#1372, we find that at::vec::vec_reduce_all<int32_t> would go into slow scalar path when doing sum and max. So here, we support the two reduce-related ops reduce_add and reduce_max for vec512 and vec256, using the Sequence instructions.

Details

Support vectorized reduce_add and reduce_max for dtypes int32 and float32, using the Sequence instructions;
Implement the scalar version for fallback path in vec base;
Add the operator reduce in vec base, in order to simplify the codes.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2025-01-02T08:18:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144065

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 211ea77 with merge base 0431d47 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mingfeima

generally OK, but it's better to have avx2 added as well. otherwise avx2 is going to be slow, which might not be aware by the caller.

Valentine233 · 2025-01-03T05:55:10Z

generally OK, but it's better to have avx2 added as well. otherwise avx2 is going to be slow, which might not be aware by the caller.

Thanks! The avx2-related ops have been added.

Valentine233 · 2025-01-03T09:17:51Z

@pytorchbot merge

pytorchmergebot · 2025-01-03T09:19:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[cpu][vec] support reduce ops: add and max

4bb4e4c

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Jan 2, 2025

Valentine233 added the topic: not user facing topic category label Jan 2, 2025

Valentine233 requested review from mingfeima and jgong5 January 2, 2025 08:18

pytorchbot added the open source label Jan 2, 2025

mingfeima approved these changes Jan 3, 2025

View reviewed changes

support reduce ops: add and max

7894af5

Valentine233 added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 3, 2025

Valentine233 added 4 commits January 3, 2025 01:43

support reduce ops: add and max

dcd3e88

support reduce ops: add and max

6c97829

support reduce ops: add and max

eef25df

support reduce ops: add and max

211ea77

pytorchmergebot added the merging label Jan 3, 2025

pytorchmergebot added the Merged label Jan 3, 2025

pytorchmergebot closed this in a1ae8fa Jan 3, 2025

pytorchmergebot removed the merging label Jan 3, 2025

github-actions bot deleted the add_reduce_op branch February 3, 2025 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cpu][vec] support reduce ops for add and max #144065

[cpu][vec] support reduce ops for add and max #144065

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[cpu][vec] support reduce ops for add and max #144065

[cpu][vec] support reduce ops for add and max #144065

Uh oh!

Conversation

Uh oh!

Description

Details

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144065

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!