Use float data type for Half sum in fallback implementation of batchnorm backward on CPU #147353

CaoE · 2025-02-18T03:13:23Z

Fixes #147303.
Use float data type for Half sum in fallback implementation of batchnorm backward on CPU as the representation range of Half is small.

pytorch-bot · 2025-02-18T03:13:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147353

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 746a203 with merge base e8b20f6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

leslie-fang-intel

Please write the PR description.

cpuhrsch · 2025-02-19T03:49:31Z

test/test_nn.py

+            for bwd_format in [torch.contiguous_format, torch.channels_last]:
+                helper(self, nn.BatchNorm2d, (16, 3, 224, 224), torch.float, fwd_format, bwd_format)
+
+        for fwd_format in [torch.contiguous_format, torch.channels_last_3d]:


nit: you could also write this as

formats = [torch.contiguous_format, torch.channels_last_3d] for (fwd_format, bwd_format) in itertools.product(formats, formats): helper(...)

which should be easier to read.

See the following to illustrate

>>> choices = [0, 1] >>> list(itertools.product(choices, choices)) [(0, 0), (0, 1), (1, 0), (1, 1)]

or use parametrize as in the other tests to be able to get one test for each combo.

This should make it easier to catch any future failures.

Thanks for your comments. Revised.

cpuhrsch · 2025-02-19T03:50:39Z

aten/src/ATen/native/Normalization.cpp

+  auto sum = grad_out_.scalar_type() == kHalf
+      ? at::sum(grad_out_.to(ScalarType::Float), /*dim=*/reduce_dims)
+      : at::sum(grad_out_, /*dim=*/reduce_dims);
+  using sum_t = std::conditional_t<std::is_same_v<scalar_t, at::Half>, float, scalar_t>;


Also a nit, but would it be useful to use accscalar_t from above?

Thanks for your comments. Do you mean using sum_t = std::conditional_t<std::is_same_v<scalar_t, at::Half>, accscalar_t , scalar_t>; ? We may not be able to use accscalar_t instead of sum_t as I only convert the input of the half type sum into float calculation.

I meant accscalar_t instead of sum_t since for half it should map to float: https://github.com/pytorch/pytorch/blob/af3164039158f38ebe7ff17300c0307ecb0abcd6/aten/src/ATen/AccumulateType.h

This will make other types, such as accscalar_t = float for scalar_t = BFloat16 and accscalar_t = double for float. This PR is currently only intended to fix the half overflow case. Do you mean to make other types also use higher precision for sum?

@CaoE - oh, I'd still guard on Half, but replace float with accscalar_t to create an explicit connection / prevent issues under any changes. In any case, it might be worthwhile to check the other types as well if there are numerical stability issues for Half. But this can be addressed in a follow up PR.

cpuhrsch · 2025-02-19T03:53:48Z

aten/src/ATen/native/Normalization.cpp

-  auto sum_a = sum.accessor<scalar_t, 1>();
+  // Using float data type for Half sum to avoid overflow
+  // since the representation range of Half is small.
+  auto sum = grad_out_.scalar_type() == kHalf


also very nit: I think an if/else here might be more readable than a ternary expression

Can I keep the ternary expression? If use if else, the code will become longer like：

Tensor sum; if (grad_out_.scalar_type() == kHalf) { sum = at::sum(grad_out_.to(ScalarType::Float), /*dim=*/reduce_dims); } else { sum = at::sum(grad_out_, /*dim=*/reduce_dims); }

Sure, either one should work. I personally don't mind more verbose / longer code if it's more readable and easier to maintain.

…orm backward on CPU

cpuhrsch · 2025-02-21T01:25:58Z

@pytorchbot merge

pytorchmergebot · 2025-02-21T01:27:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…orm backward on CPU (#147353) Fixes #147303. Use float data type for Half sum in fallback implementation of batchnorm backward on CPU as the representation range of Half is small. Pull Request resolved: #147353 Approved by: https://github.com/leslie-fang-intel, https://github.com/cpuhrsch

…orm backward on CPU (pytorch#147353) Fixes pytorch#147303. Use float data type for Half sum in fallback implementation of batchnorm backward on CPU as the representation range of Half is small. Pull Request resolved: pytorch#147353 Approved by: https://github.com/leslie-fang-intel, https://github.com/cpuhrsch

pytorch-bot bot added the release notes: nn release notes category label Feb 18, 2025

CaoE added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Feb 18, 2025

CaoE force-pushed the fix_bn_bwd branch from c574ecc to 04722bb Compare February 18, 2025 03:16

pytorchbot added the open source label Feb 18, 2025

CaoE force-pushed the fix_bn_bwd branch from 04722bb to 9b5ef40 Compare February 18, 2025 03:20

CaoE requested a review from leslie-fang-intel February 18, 2025 03:21

leslie-fang-intel approved these changes Feb 18, 2025

View reviewed changes

CaoE requested a review from mingfeima February 18, 2025 06:42

CaoE marked this pull request as ready for review February 18, 2025 06:43

CaoE requested a review from cpuhrsch February 18, 2025 06:43

cpuhrsch reviewed Feb 19, 2025

View reviewed changes

CaoE force-pushed the fix_bn_bwd branch from 9b5ef40 to 6bf4fc8 Compare February 19, 2025 05:16

CaoE requested a review from cpuhrsch February 19, 2025 05:22

use float data type for Half sum in fallback implementation of batchn…

746a203

…orm backward on CPU

CaoE force-pushed the fix_bn_bwd branch from 6bf4fc8 to 746a203 Compare February 20, 2025 01:23

cpuhrsch approved these changes Feb 21, 2025

View reviewed changes

pytorchmergebot added the merging label Feb 21, 2025

pytorchmergebot added the Merged label Feb 21, 2025

pytorchmergebot closed this in 8b818ab Feb 21, 2025

pytorchmergebot removed the merging label Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use float data type for Half sum in fallback implementation of batchnorm backward on CPU #147353

Use float data type for Half sum in fallback implementation of batchnorm backward on CPU #147353

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Use float data type for Half sum in fallback implementation of batchnorm backward on CPU #147353

Use float data type for Half sum in fallback implementation of batchnorm backward on CPU #147353

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147353

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!