Add assertion to align with cuda #153233

shiyang-weng · 2025-05-09T05:39:13Z

Fixes #153137

Aligned batch_norm_cpu_out assertion to batch_norm_cuda_out.

cc @malfet

pytorch-bot · 2025-05-09T05:39:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153233

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5fd22bf with merge base c1055f4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shiyang-weng · 2025-05-09T05:48:42Z

@pytorchbot label "module: error checking"

shiyang-weng · 2025-05-09T05:49:43Z

@pytorchbot label "module: norms and normalization"

aten/src/ATen/native/Normalization.cpp

shiyang-weng · 2025-05-12T02:23:24Z

import torch

print(f"PyTorch Version: {torch.__version__}")

# Common parameters for torch.batch_norm
weight_param = None
bias_param = None
is_training_param = True # Error occurs with True or False
momentum_param = 0.1
eps_param = 1e-5
cudnn_enabled_param = True # Also occurs with False on GPU

# --- Scenario 1: running_mean is Tensor, running_var is None ---
print("\n--- Scenario 1: running_mean is Tensor, running_var is None ---")
# Input tensor
input_tensor_shape = (3, 4, 5) # N, C, D*
num_features = input_tensor_shape[1]

# CPU
print("  CPU (Scenario 1):")
try:
    input_tensor_cpu = torch.randn(input_tensor_shape)
    running_mean_param_cpu = torch.randn(num_features)
    running_var_param_cpu = None

    torch.batch_norm(
        input_tensor_cpu,
        weight_param,
        bias_param,
        running_mean_param_cpu,
        running_var_param_cpu,
        is_training_param,
        momentum_param,
        eps_param,
        cudnn_enabled_param
    )
    print("    CPU: Error not triggered.")
except ValueError as e:
    print(f"    CPU Error: {e}")
    if "Expected has_running_mean == has_running_var to be true, but got false" in str(e):
        print("    CPU: Successfully triggered the target error (unexpected based on current behavior).")

# GPU
if torch.cuda.is_available():
    print("  GPU (Scenario 1):")
    try:
        input_tensor_gpu = torch.randn(input_tensor_shape).cuda()
        running_mean_param_gpu = torch.randn(num_features).cuda()
        running_var_param_gpu = None

        torch.batch_norm(
            input_tensor_gpu,
            weight_param,
            bias_param,
            running_mean_param_gpu,
            running_var_param_gpu,
            is_training_param,
            momentum_param,
            eps_param,
            cudnn_enabled_param
        )
        print("    GPU: Error not triggered (unexpected for this specific error message).")
    except ValueError as e:
        print(f"    GPU Error: {e}")
        if "Expected has_running_mean == has_running_var to be true, but got false" in str(e):
            print("    GPU: Successfully triggered the target error.")
else:
    print("  GPU (Scenario 1): CUDA not available, skipping GPU test.")

# --- Scenario 2: running_mean is None, running_var is Tensor ---
print("\n--- Scenario 2: running_mean is None, running_var is Tensor ---")

# CPU
print("  CPU (Scenario 2):")
try:
    input_tensor_cpu = torch.randn(input_tensor_shape)
    running_mean_param_cpu = None
    running_var_param_cpu = torch.randn(num_features)

    torch.batch_norm(
        input_tensor_cpu,
        weight_param,
        bias_param,
        running_mean_param_cpu,
        running_var_param_cpu,
        is_training_param,
        momentum_param,
        eps_param,
        cudnn_enabled_param
    )
    print("    CPU: Error not triggered.")
except ValueError as e:
    print(f"    CPU Error: {e}")
    if "Expected has_running_mean == has_running_var to be true, but got false" in str(e):
        print("    CPU: Successfully triggered the target error (unexpected based on current behavior).")

# GPU
if torch.cuda.is_available():
    print("  GPU (Scenario 2):")
    try:
        input_tensor_gpu = torch.randn(input_tensor_shape).cuda()
        running_mean_param_gpu = None
        running_var_param_gpu = torch.randn(num_features).cuda()

        torch.batch_norm(
            input_tensor_gpu,
            weight_param,
            bias_param,
            running_mean_param_gpu,
            running_var_param_gpu,
            is_training_param,
            momentum_param,
            eps_param,
            cudnn_enabled_param
        )
        print("    GPU: Error not triggered (unexpected for this specific error message).")
    except ValueError as e:
        print(f"    GPU Error: {e}")
        if "Expected has_running_mean == has_running_var to be true, but got false" in str(e):
            print("    GPU: Successfully triggered the target error.")
else:
    print("  GPU (Scenario 2): CUDA not available, skipping GPU test.")

Change to use ValueError.
Output is expected:
--- Scenario 1: running_mean is Tensor, running_var is None ---
CPU (Scenario 1):
CPU Error: running_mean and running_var must either both be None or neither be None
GPU (Scenario 1): CUDA not available, skipping GPU test.

--- Scenario 2: running_mean is None, running_var is Tensor ---
CPU (Scenario 2):
CPU Error: running_mean and running_var must either both be None or neither be None
GPU (Scenario 2): CUDA not available, skipping GPU test.

shiyang-weng · 2025-05-15T07:50:14Z

Could you help review this pr? @Skylion007

shiyang-weng · 2025-05-23T02:32:11Z

Could someone help review this pr?
@Skylion007 @colesbury @malfet

malfet · 2025-05-23T02:34:09Z

@shiyang-weng do you mind adding the test?

malfet

If it'll pass the tests, sure

shiyang-weng · 2025-05-23T04:53:42Z

@shiyang-weng do you mind adding the test?

This is error checking issue. Not related to functionality and performance
Is it not need add unit tests for this issue?

shiyang-weng · 2025-05-23T04:56:51Z

@pytorchbot merge

pytorchmergebot · 2025-05-23T04:59:11Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This comment was marked as outdated.

Sign in to view

pytorchbot added the open source label May 9, 2025

pytorch-bot bot added the module: error checking Bugs related to incorrect/lacking error checking label May 9, 2025

pytorch-bot bot added the module: norms and normalization label May 9, 2025

Skylion007 reviewed May 9, 2025

View reviewed changes

aten/src/ATen/native/Normalization.cpp Outdated Show resolved Hide resolved

shiyang-weng added 2 commits May 11, 2025 21:45

Add assertion to align with cuda

2a93f9f

use TORCH_CHECK_VALUE instead of TORCH_CHECK

136ab31

shiyang-weng force-pushed the wengshiy/add_bn_assert branch from e402021 to 136ab31 Compare May 12, 2025 02:19

shiyang-weng requested review from eqy, syed-ahmed, kulinseth and malfet as code owners May 12, 2025 02:19

py 8000 torch-bot bot added the release notes: mps Release notes category label May 12, 2025

shiyang-weng requested a review from Skylion007 May 12, 2025 02:21

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 13, 2025

Merge branch 'main' into wengshiy/add_bn_assert

5fd22bf

malfet approved these changes May 23, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 23, 2025

pytorchmergebot added the merging label May 23, 2025

pytorchmergebot added the Merged label May 23, 2025

pytorchmergebot closed this in ba5d45d May 23, 2025

pytorchmergebot removed the merging label May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add assertion to align with cuda #153233

Add assertion to align with cuda #153233

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add assertion to align with cuda #153233

Add assertion to align with cuda #153233

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153233

✅ No Failures

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!