-
Notifications
You must be signed in to change notification settings - Fork 24.3k
MPS Regression when rendering LTXVideo (after pytorch2.4.1) #141471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you please provide the ComfyUI workflow you're using? In the meantime I'll try to repro with the model directly. @pytorchbot label "module: mps" "module: correctness (silent)" |
I can confirm it fails in the same way using the LTX repo code with the inference.py modified to use MPS |
MPS inferences script, fails with pytorch 2.5.1 and nightlies (last nightly tried 22-11-2024) in the same way as reported by the OP. |
Found the workflow in Lightricks/LTX-Video#5. I was also able to reproduce. Will propose device selection option to downstream. Adding repro steps using model directly (without comfy) here after that. Attempting to repro good on 2.4.1. Will run bisect if successful to find culprit. |
I was able to repro good on 2.4.1. Running a bisect to identify the culprit. |
Culprit commit identified as 861bdf9 (PR #128393). Bisect replay log
|
For the device option proposal please see Lightricks/LTX-Video#25. The failure mode via LTX-Video can be reproduced by running CKPT_DIR=~/.cache/huggingface/hub/models--Lightricks--LTX-Video/snapshots/a5ab70cf0b89a0b90dfafe3556c24f1b4767bdc8
PROMPT="The waves crash against the pearly white beach of the shoreline, sending spray high into the tropical air. The palm trees streches high, with beautiful green leaves and bright colors. The water is a clear blue-green, with white foam where the waves break and details of the ocean floor visible below the pure water. The sky is blue, with a few white clouds dotting the horizon."
python inference.py --ckpt_dir ${CKPT_DIR:?} --prompt ${PROMPT
8000
:?} --seed 0 --num_frames 9 --bfloat16 --device mps with the above mentioned proposal. Still need to figure out where in PyTorch the bug is. |
Found a crumb. diff --git a/aten/src/ATen/native/mps/operations/Convolution.mm b/aten/src/ATen/native/mps/operations/Convolution.mm
index f0aac14814b..7f4e611898f 100644
--- a/aten/src/ATen/native/mps/operations/Convolution.mm
+++ b/aten/src/ATen/native/mps/operations/Convolution.mm
@@ -125,7 +125,7 @@ static Tensor _mps_convolution_impl(const Tensor& input_t_,
int64_t groups,
std::optional<IntArrayRef> input_shape) {
const bool is_macOS_13_2_or_newer = is_macos_13_or_newer(MacOSVersion::MACOS_VER_13_2_PLUS);
- const bool is_macOS_15_0_or_newer = is_macos_13_or_newer(MacOSVersion::MACOS_VER_15_0_PLUS);
+ const bool is_macOS_15_0_or_newer = false;
Tensor input_t = input_t_;
if (!is_macOS_15_0_or_newer) {
input_t = input_t.contiguous(); produces good golden. |
Are you also seeing the slower s/it on the failed versions as i was, seems odd since this commit was for a performance improvement, seems odd to break this path + be slower |
I've seen the slower speed with the more recent (broken) pytorch. One thought might be that because the data becomes garbage at some point, it could be getting into denorm fp values, invalid values, NaNs, etc. all that might be hitting much slower paths as they're operated on further. |
Torch 2.5.1 has performance regression from 2.4.1 (for MPS), and uses significantly more memory with things used in Image Generation stuff. they're significantly improved in the nightlies, being a little faster than 2.4.1 in the test cases but there's still a small memory increase compared to 2.4.1. See #139389 |
Well, it's not that simple though. I tried downgrading to 2.4.1, I couldn't even get comfy to boot anymore after that. I'm now going back to torch nightly it seems. I hope the issue can be fixed, trying to go back to torch==2.4.1 doesn't seem to be an option probably dependency issues |
ya comfy works fine on 2.4.1 out of the box, I'd imagine you've got some other dependency breaking compatibility, i know a few extensions are dependent on the autocast stuff that was added in more recent versions, but i too am hoping the team can deduce whats going on here. It's definitely over my head, glad to see that hvaara managed to track it down to a single file/value, though still seems like we don't know why that code is breaking it, i don't understand enough about the stride api to even guess lol I looked through the code and the is_macOS_15_0_or_newer variable is only used in a few spots an doesn't seem to affect too many things outside of that scope, so hopefully its traceable |
@hvaara a bit unrelated, but when I've tried to reproduce it with nightlies, using
until I've disabled autocast completely... |
Unrelated, but still a bug 😄 Tried to RCA it; Prior to #139390 you'd get a warning when you run LTX-Video without
because it'll then use mixed precision with bf16. Autocast was disabled because bf16 isn't supported. After #139390 bf16 is supported, so you don't get the warning anymore, but it isn't really supported for SDPA. With diff --git a/aten/src/ATen/autocast_mode.cpp b/aten/src/ATen/autocast_mode.cpp
index 1129892dd25..6649708c706 100644
--- a/aten/src/ATen/autocast_mode.cpp
+++ b/aten/src/ATen/autocast_mode.cpp
@@ -236,6 +236,7 @@ TORCH_LIBRARY_IMPL(aten, AutocastMPS, m) {
KERNEL_MPS(chain_matmul, lower_precision_fp)
KERNEL_MPS(linalg_multi_dot, lower_precision_fp)
KERNEL_MPS(lstm_cell, lower_precision_fp)
+ KERNEL_MPS(scaled_dot_product_attention, lower_precision_fp)
// fp32
KERNEL_MPS(acos, fp32) autocast works for me. |
Think I got to the bottom of it. Looks like the RC is a memory format issue ( Fix incoming. |
Look forward to it trying to learn more about these type of weird bugs |
Fix proposal in #141780. xref https://gist.github.com/hvaara/340bc4bf740d97c15351db7b6759643d |
Hi @hvaara did it work for you?
|
@haqatak yes, it worked for me, but I might have run it differently than you. What's the command/code you ran? I'd like to try to repro so I can debug it. |
…`nn.Conv3d` (pytorch#141780) When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in pytorch#141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous. Added a regression test that verifies the output by running the same op on the CPU. I'm unsure if Conv3d supports the channels last memory format after pytorch#128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context? Fixes pytorch#141471 Pull Request resolved: pytorch#141780 Approved by: https://github.com/malfet
Hm, I can't test this from nightlies -- both |
Unfortunately #141780 didn't make it into torch-2.6.0.dev20241202, but it should be in torch-2.6.0.dev20241203. The torchvision release process for macos does indeed look broken. From https://download.pytorch.org/whl/nightly/cpu/torchvision/ I can see that the last successful release seems to be torchvision-0.20.0.dev20241126. The release process workflow also fails with If vision still fails to promote a nightly you might have to compile it manually after installing the torch nightly if you want to test the changes from #141780. |
Ah, thanks! I wasn't sure if the nightly builds failing warranted filing an issue, now I know :) |
…`nn.Conv3d` (pytorch#141780) When the input tensor to Conv3d is in the channels_last_3d memory format the Conv3d op will generate incorrect output (see example image in pytorch#141471). This PR checks if the op is 3d, and then attempts to convert the input tensor to contiguous. Added a regression test that verifies the output by running the same op on the CPU. I'm unsure if Conv3d supports the channels last memory format after pytorch#128393. If it does, we should consider updating the logic to utilize this as it would be more efficient. Perhaps @DenisVieriu97 knows or has more context? Fixes pytorch#141471 Pull Request resolved: pytorch#141780 Approved by: https://github.com/malfet
The issue of nightly builds seems to be fixed, looking at the nightly downloads. Looks like, on my first test, the regression looks to be addressed. I generated video without the noise issue, clean and clear, using:
|
Awesome! Thanks a lot everyone for collaborating on this issue! 😄 |
pip install --pre torch==2.6.0.dev20241205 torchvision==0.20.0.dev20241205 torchaudio==2.5.0.dev20241205 --extra-index-url https://download.pytorch.org/whl/nightly/cpu This worked for me |
What's the latest pip install for this to work? The one from @peterdn1 didn't work. ERROR: Ignored the following yanked versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.11.0, 0.15.0 On Sequoia 15.1.1 with an M3 Max. |
@pachieh please post the command you ran including the entire output on https://gist.github.com/ and add a link here. If you try a new environment (venv, conda, or whatever you're using), does it work when you try to install with the recommended install procedure for nightly from https://pytorch.org/ (ie. |
Confirmed - this is fixed in release 2.6:
|
Uh oh!
There was an error while loading. Please reload this page.
🐛 Describe the bug
Testing on Apple MPS using ComfyUI with various PyTorch versions as on nightly and 2.5.1 result in nothing but noise, however on PyTorch 2.4.1 LTX Video model renders correctly without any issues.
Oddly the broken version of pytorch with ltxvideo is ~40% slower, 11.31s/it vs 15.89s/it ... Not sure how to narrow down more whats causing the completely borked results and slower iterations but its definitely due to pytorch changes from 2.4.1 to 2.5.1
2.4.1 Results. 11.31s/it

2.5.1 Result. 15.89s/it

Versions
Working Version:
PyTorch version: 2.4.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.2 (arm64)
GCC version: Could not collect
Clang version: 19.1.3
CMake version: version 3.31.1
Libc version: N/A
Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M3 Pro
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] torch==2.4.1
[pip3] torchaudio==2.4.1
[pip3] torchsde==0.2.6
[pip3] torchvision==0.19.1
[conda] numpy 1.24.3 py311hb57d4eb_0
[conda] numpy-base 1.24.3 py311h1d85a46_0
[conda] numpydoc 1.5.0 py311hca03da5_0
[conda] onnx2torch 1.5.14 pypi_0 pypi
[conda] torch 2.3.0 pypi_0 pypi
[conda] torchinfo 1.8.0 pypi_0 pypi
[conda] torchvision 0.18.0 pypi_0 pypi
Failing version: (also failed on nightly version i tried yesterday)
PyTorch version: 2.5.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.2 (arm64)
GCC version: Could not collect
Clang version: 19.1.3
CMake version: version 3.31.1
Libc version: N/A
Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.2-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M3 Pro
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] torch==2.5.1
[pip3] torchaudio==2.5.1
[pip3] torchsde==0.2.6
[pip3] torchvision==0.20.1
[conda] numpy 1.24.3 py311hb57d4eb_0
[conda] numpy-base 1.24.3 py311h1d85a46_0
[conda] numpydoc 1.5.0 py311hca03da5_0
[conda] onnx2torch 1.5.14 pypi_0 pypi
[conda] torch 2.3.0 pypi_0 pypi
[conda] torchinfo 1.8.0 pypi_0 pypi
[conda] torchvision 0.18.0 pypi_0 pypi
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @kulinseth @albanD @malfet @DenisVieriu97 @jhavukainen
The text was updated successfully, but these errors were encountered: