8000 RuntimeError when running backward on MPS: "view size is not compatible" with self-attention block · Issue #142344 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

RuntimeError when running backward on MPS: "view size is not compatible" with self-attention block #142344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NripeshN opened this issue Dec 9, 2024 · 12 comments
Assignees
Labels
high priority module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: mps Related to Apple Metal Performance Shaders framework module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@NripeshN
Copy link
NripeshN commented Dec 9, 2024

🐛 Describe the bug

I’m running a simple model with a self-attention block on an Apple M2 Max using the MPS backend. The code runs fine on CPU and CUDA, but fails on MPS with a runtime error during the backward pass. The error suggests an issue with tensor shape or memory layout. Even after using .contiguous() and .reshape() instead of .view(), the problem persists only on MPS.

Minimal Reproducer:

import torch
import torch.nn as nn
import torch.optim as optim

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

class SelfAttentionBlock(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.query_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
        self.key_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
        self.value_conv = nn.Conv2d(in_channels, in_channels, kernel_size=1)
        self.gamma = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        B, C, H, W = x.size()
        query = self.query_conv(x).contiguous().reshape(B, -1, H*W)
        key   = self.key_conv(x).contiguous().reshape(B, -1, H*W)
        value = self.value_conv(x).contiguous().reshape(B, -1, H*W)

        attention = torch.bmm(query.permute(0, 2, 1), key)
        attention = torch.softmax(attention, dim=-1)
        out = torch.bmm(value, attention.permute(0, 2, 1))
        out = out.contiguous().reshape(B, C, H, W)
        return self.gamma * out + x

class SimpleDenoiseNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_in = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.attn = SelfAttentionBlock(32)
        self.conv_out = nn.Conv2d(32, 3, kernel_size=3, padding=1)
    
    def forward(self, x):
        x = self.conv_in(x)
        x = self.attn(x)
        x = self.conv_out(x)
        return x

model = SimpleDenoiseNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

input_data = torch.rand(2, 3, 32, 32, device=device)
target_data = torch.rand(2, 3, 32, 32, device=device)

optimizer.zero_grad()
output = model(input_data)
loss = criterion(output, target_data)
loss.backward()  # Fails on MPS, works on CPU/CUDA
optimizer.step()

Error:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Full Traceback (if applicable):

{
	"name": "RuntimeError",
	"message": "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[135], line 52
     50 output = model(input_data)
     51 loss = criterion(output, target_data)
---> 52 loss.backward()  # Check if this triggers the RuntimeError on MPS
     53 optimizer.step()
     55 print(\"Forward and backward pass completed successfully!\")

File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/_tensor.py:581, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    571 if has_torch_function_unary(self):
    572     return handle_torch_function(
    573         Tensor.backward,
    574         (self,),
   (...)
    579         inputs=inputs,
    580     )
--> 581 torch.autograd.backward(
    582     self, gradient, retain_graph, create_graph, inputs=inputs
    583 )

File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/autograd/__init__.py:347, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    342     retain_graph = create_graph
    344 # The reason we repeat the same comment below is that
    345 # some Python versions print out the first line of a multi-line function
    346 # calls in the traceback and some print out the last line
--> 347 _engine_run_backward(
    348     tensors,
    349     grad_tensors_,
    350     retain_graph,
    351     create_graph,
    352     inputs,
    353     allow_unreachable=True,
    354     accumulate_grad=True,
    355 )

File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/autograd/graph.py:825, in _engine_run_backward(t_outputs, *args, **kwargs)
    823     unregister_hooks = _register_logging_hooks_on_whole_graph(t_outputs)
    824 try:
--> 825     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    826         t_outputs, *args, **kwargs
    827     )  # Calls into the C++ engine to run the backward pass
    828 finally:
    829     if attach_logging_hooks:

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
}

This seems like a backend-specific bug. Any guidance or fix would be appreciated.

Versions

(venv) nripeshn@Nripeshs-MacBook-Pro ~/D/P/ML123 (main)> python collect_env.py
Collecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.2 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.24.1)
CMake version: version 3.30.3
Libc version: N/A

Python version: 3.10.14 (main, May 12 2024, 02:15:34) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime)
Python platform: macOS-15.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M2 Max

Versions of relevant libraries:
[pip3] nirtorch==1.0
[pip3] numpy==2.1.3
[pip3] snntorch==0.9.1
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen

@NripeshN
Copy link
Author
NripeshN commented Dec 9, 2024

@pytorchbot label "module: mps"

@pytorch-bot pytorch-bot bot added the module: mps Related to Apple Metal Performance Shaders framework label Dec 9, 2024
@malfet malfet added module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: regression It used to work, and now it doesn't labels Dec 9, 2024
@malfet malfet added this to the 2.6.0 milestone Dec 9, 2024
@malfet
Copy link
Contributor
malfet commented Dec 9, 2024

This is a regression, i.e. the same code worked fine in 2.4, but is broken in 2.5, therefore must be fixed before 2.6

The exception is thrown from mps_convolution_backward_input

auto gradOutputPlaceholder = Placeholder(cachedGraph->gradOutputTensor_, grad_output_t, gradOutputShape);

Where gradOutputShape is @[@2, @32, @32, @4] but grad_output_t shape is {2, 4, 32, 32} and (surprise surprise ) is_channels_last is true

@malfet malfet self-assigned this Dec 9, 2024
@malfet
Copy link
Contributor
malfet commented Dec 9, 2024

Oh, it's almost exactly the same as #140902 : I've fixed it for weights, but not for inputs

@janeyx99 janeyx99 added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Dec 9, 2024
@NripeshN
Copy link
Author

Hi @janeyx99 @malfet

Is there any updates on this issue, I do not mind writing a PR myself with a little help. I believe others are facing similar issues(#143123)

cc @sebassaras02

@malfet
Copy link
Contributor
malfet commented Dec 13, 2024

Adjusting simple reproducer from #140902 to include input as well:

import torch
device,ic,oc,f = 'mps', 1, 2, 3

bias = torch.rand(oc, device=device, requires_grad=True)
weight = torch.rand(oc, ic, 3, device=device, requires_grad=True)
inp = torch.rand(1, ic, f, device=device, requires_grad=True)
out = torch.nn.functional.conv1d(inp, weight, bias, padding=1)
torch.autograd.grad((out,), (inp, weight, bias), (torch.rand(1, f, oc, device=device).transpose(1, 2),))

And here is the patch that fixes it

diff --git a/aten/src/ATen/native/mps/operations/Convolution.mm b/aten/src/ATen/native/mps/operations/Convolution.mm
index 5852be8fb74..1e977c8a327 100644
--- a/aten/src/ATen/native/mps/operations/Convolution.mm
+++ b/aten/src/ATen/native/mps/operations/Convolution.mm
@@ -372,6 +372,7 @@ static Tensor mps_convolution_backward_input(IntArrayRef input_size,
   using namespace at::native::mps;
   using namespace mps;
   bool is3DConv = grad_output_t.dim() == 5;
+  const auto has_strided_api = is_macos_13_or_newer(MacOSVersion::MACOS_VER_15_0_PLUS);
 
   if (!is_macos_13_or_newer(MacOSVersion::MACOS_VER_15_1_PLUS)) {
     // On macOS < 15.1, MPS convolution kernel does not support output channels > 2^16
@@ -417,7 +418,7 @@ static Tensor mps_convolution_backward_input(IntArrayRef input_size,
         assert(0 && "Check should have been done earlier\n");
     }
 
-    MPSShape* gradOutputShape = getMPSShape(grad_output_t, memory_format);
+    MPSShape* gradOutputShape = has_strided_api ? getMPSShape(grad_output_t) : getMPSShape(grad_output_t, memory_format);
     MPSShape* mps_input_shape = getMPSShape(input_size);
     NSString* ns_shape_key = [[gradOutputShape valueForKey:@"description"] componentsJoinedByString:@","];
     string key;
@@ -440,7 +441,7 @@ static Tensor mps_convolution_backward_input(IntArrayRef input_size,
       MPSGraphTensor* weightTensor = mpsGraphRankedPlaceHolder(mpsGraph, weight_t);
 
       MPSGraphTensor* gradOutputTensorTranspose = gradOutputTensor;
-      if (is_channels_last) {
+      if (is_channels_last && !has_strided_api) {
         gradOutputTensorTranspose = mps::convertNHWCtoNCHW(mpsGraph, gradOutputTensorTranspose);
       }
       MPSGraphTensor* gradInputTensor;

malfet added a commit that referenced this issue Dec 13, 2024
This is a continuation of #140902 but extends the same logic to input

Fixes #142344
@malfet
Copy link
Contributor
malfet commented Dec 13, 2024

And as expected, it fixes the problem on MacOS-15, but produces garbage on older MacOS

@NripeshN
Copy link
Author

And as expected, it fixes the problem on MacOS-15, but produces garbage on older MacOS

Can we have a fallback to run something similar to this commit for older MacOS. I believe this fixes the issue for older Macs too.

@malfet
Copy link
Contributor
malfet commented Dec 14, 2024

Can we have a fallback to run something similar to this commit for older MacOS. I believe this fixes the issue for older Macs too.

Sorry, if I weren't clear, I had no intention of landing the change that would break previous release of MacOS, as PyTorch should be accessible and work regression free on last two OS releases. So instead of trying to preserve the channels last logic in backward_input op, I've deleted it because it failed to produce any results on MacOS15 and produced garbage on MacOS-14. I.e. PR that were landed should result in faster backward on Sequoia and slower but correct one on Sonoma/Ventura.

bluenote10 pushed a commit to bluenote10/pytorch that referenced this issue Dec 14, 2024
This is a continuation of pytorch#140902 but extends the same logic to input.

Looks like existing channels-last logic just produced incorrect results on pre MacOS-15 versions and fails on MacOS-15, so removing it feels like a right idea

Fixes pytorch#142344
Pull Request resolved: pytorch#143196
Approved by: https://github.com/manuelcandales
aditew01 pushed a commit to aditew01/pytorch that referenced this issue Dec 18, 2024
This is a continuation of pytorch#140902 but extends the same logic to input.

Looks like existing channels-last logic just produced incorrect results on pre MacOS-15 versions and fails on MacOS-15, so removing it feels like a right idea

Fixes pytorch#142344
Pull Request resolved: pytorch#143196
Approved by: https://github.com/manuelcandales
@sidharrth2002
Copy link

Hi all, I'm also facing this issue on my M2 Pro. Downgrading to PyTorch 2.4.1 seems to fix the issue though

pytorchbot pushed a commit that referenced this issue Jan 10, 2025
This is a continuation of #140902 but extends the same logic to input.

Looks like existing channels-last logic just produced incorrect results on pre MacOS-15 versions and fails on MacOS-15, so removing it feels like a right idea

Fixes #142344
Pull Request resolved: #143196
Approved by: https://github.com/manuelcandales

(cherry picked from commit 8a04018)
kit1980 pushed a commit that referenced this issue Jan 10, 2025
[MPS] Fix conv backward for channels last (cont) (#143196)

This is a continuation of #140902 but extends the same logic to input.

Looks like existing channels-last logic just produced incorrect results on pre MacOS-15 versions and fails on MacOS-15, so removing it feels like a right idea

Fixes #142344
Pull Request resolved: #143196
Approved by: https://github.com/manuelcandales

(cherry picked from commit 8a04018)

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
@shaanchandra
Copy link

Hi
I am facing the same issue (Conv and attention blocks) on my M4 pro chip MPS Macbook pro 4 (OS 15.2).
I tried downgrading to 2.4.1, but the problem now has become:
RuntimeError: Expected scalar_type == ScalarType::Float || inputTensor.scalar_type() == ScalarType::Int || scalar_type == ScalarType::Bool to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I have tried the following as well, but the same Float problem persists

loss = self.model(data, target=data)

loss = loss / self.gradient_accumulate_every

loss.to(torch.float32)

loss.backward()

Can someone assist?

@atalman
Copy link
Contributor
atalman commented Jan 21, 2025

Hi @shaanchandra please try nightly or test

pip3 install torch --index-url https://download.pytorch.org/whl/nightly/cpu --force-reinstall

or

pip3 install torch --index-url https://download.pytorch.org/whl/test/cpu --force-reinstall

Confirmed fixed in final rc 2.6:

python test_mps.py
2.6.0

Broken in 2.5.1:

python test_mps.py 
Traceback (most recent call last):
  File "/Users/atalman/Downloads/release26/test_mps.py", line 50, in <module>
    loss.backward()  # Fails on MPS, works on CPU/CUDA
  File "/Users/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/Users/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/Users/atalman/miniconda3/envs/py310/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

@sebassaras02
Copy link

@shaanchandra install the version 2.4.0 this worked out to me. However, there are still some functionalities not fully implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: mps Related to Apple Metal Performance Shaders framework module: regression It used to work, and now it doesn't triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
7 participants
0