-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Closed
Labels
high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)Problems related to convolutions (THNN, THCUNN, CuDNN)module: mpsRelated to Apple Metal Performance Shaders frameworkRelated to Apple Metal Performance Shaders frameworkmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Describe the bug
I’m running a simple model with a self-attention block on an Apple M2 Max using the MPS backend. The code runs fine on CPU and CUDA, but fails on MPS with a runtime error during the backward pass. The error suggests an issue with tensor shape or memory layout. Even after using .contiguous() and .reshape() instead of .view(), the problem persists only on MPS.
Minimal Reproducer:
import torch
import torch.nn as nn
import torch.optim as optim
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
class SelfAttentionBlock(nn.Module):
def __init__(self, in_channels):
super().__init__()
self.query_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
self.key_conv = nn.Conv2d(in_channels, in_channels // 8, kernel_size=1)
self.value_conv = nn.Conv2d(in_channels, in_channels, kernel_size=1)
self.gamma = nn.Parameter(torch.zeros(1))
def forward(self, x):
B, C, H, W = x.size()
query = self.query_conv(x).contiguous().reshape(B, -1, H*W)
key = self.key_conv(x).contiguous().reshape(B, -1, H*W)
value = self.value_conv(x).contiguous().reshape(B, -1, H*W)
attention = torch.bmm(query.permute(0, 2, 1), key)
attention = torch.softmax(attention, dim=-1)
out = torch.bmm(value, attention.permute(0, 2, 1))
out = out.contiguous().reshape(B, C, H, W)
return self.gamma * out + x
class SimpleDenoiseNet(nn.Module):
def __init__(self):
super().__init__()
self.conv_in = nn.Conv2d(3, 32, kernel_size=3, padding=1)
self.attn = SelfAttentionBlock(32)
self.conv_out = nn.Conv2d(32, 3, kernel_size=3, padding=1)
def forward(self, x):
x = self.conv_in(x)
x = self.attn(x)
x = self.conv_out(x)
return x
model = SimpleDenoiseNet().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()
input_data = torch.rand(2, 3, 32, 32, device=device)
target_data = torch.rand(2, 3, 32, 32, device=device)
optimizer.zero_grad()
output = model(input_data)
loss = criterion(output, target_data)
loss.backward() # Fails on MPS, works on CPU/CUDA
optimizer.step()Error:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
Full Traceback (if applicable):
{
"name": "RuntimeError",
"message": "view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.",
"stack": "---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[135], line 52
50 output = model(input_data)
51 loss = criterion(output, target_data)
---> 52 loss.backward() # Check if this triggers the RuntimeError on MPS
53 optimizer.step()
55 print(\"Forward and backward pass completed successfully!\")
File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/_tensor.py:581, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
571 if has_torch_function_unary(self):
572 return handle_torch_function(
573 Tensor.backward,
574 (self,),
(...)
579 inputs=inputs,
580 )
--> 581 torch.autograd.backward(
582 self, gradient, retain_graph, create_graph, inputs=inputs
583 )
File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/autograd/__init__.py:347, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
342 retain_graph = create_graph
344 # The reason we repeat the same comment below is that
345 # some Python versions print out the first line of a multi-line function
346 # calls in the traceback and some print out the last line
--> 347 _engine_run_backward(
348 tensors,
349 grad_tensors_,
350 retain_graph,
351 create_graph,
352 inputs,
353 allow_unreachable=True,
354 accumulate_grad=True,
355 )
File ~/Documents/Python Programs/ML_assignment1/venv/lib/python3.10/site-packages/torch/autograd/graph.py:825, in _engine_run_backward(t_outputs, *args, **kwargs)
823 unregister_hooks = _register_logging_hooks_on_whole_graph(t_outputs)
824 try:
--> 825 return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
826 t_outputs, *args, **kwargs
827 ) # Calls into the C++ engine to run the backward pass
828 finally:
829 if attach_logging_hooks:
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead."
}
This seems like a backend-specific bug. Any guidance or fix would be appreciated.
Versions
(venv) nripeshn@Nripeshs-MacBook-Pro ~/D/P/ML123 (main)> python collect_env.pyCollecting environment information...
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.2 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.24.1)
CMake version: version 3.30.3
Libc version: N/A
Python version: 3.10.14 (main, May 12 2024, 02:15:34) [Clang 15.0.0 (clang-1500.3.9.4)] (64-bit runtime)
Python platform: macOS-15.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M2 Max
Versions of relevant libraries:
[pip3] nirtorch==1.0
[pip3] numpy==2.1.3
[pip3] snntorch==0.9.1
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[conda] Could not collect
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen
MauritsDescamps, gdudek, gscheff and Kenechukwu452
Metadata
Metadata
Assignees
Labels
high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)Problems related to convolutions (THNN, THCUNN, CuDNN)module: mpsRelated to Apple Metal Performance Shaders frameworkRelated to Apple Metal Performance Shaders frameworkmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module