triggered internal assert in matmul #153172

Angramme · 2025-05-08T17:39:01Z

🐛 Describe the bug

The error triggers when calling matmul

https://github.com/huggingface/transformers/blob/d23aae2b8c8738a12ab1b6710e60ae5866beaf9d/src/transformers/models/qwen2/modeling_qwen2.py#L116

# code taken from transformers/models/qwen2/
def eager_attention_forward(
    module: nn.Module,
    query: torch.Tensor,
    key: torch.Tensor,
    value: torch.Tensor,
    attention_mask: Optional[torch.Tensor],
    scaling: float,
    dropout: float = 0.0,
    **kwargs,
):
    key_states = repeat_kv(key, module.num_key_value_groups)
    value_states = repeat_kv(value, module.num_key_value_groups)

    ##### HERE
    attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
    #########
    if attention_mask is not None:
        causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
        attn_weights = attn_weights + causal_mask

    attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
    attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
    attn_output = torch.matmul(attn_weights, value_states)
    attn_output = attn_output.transpose(1, 2).contiguous()

    return attn_output, attn_weights

RuntimeError: !handles_.at(i) INTERNAL ASSERT FAILED at "/pytorch/c10/cuda/CUDACachingAllocator.cpp":396, please report a bug to PyTorch.

I apologise in advance, since the tensors are quite large, I think it would be difficult to include them here.

pip freeze

accelerate==1.6.0
aiohappyeyeballs==2.6.1
aiohttp==3.11.18
aiosignal==1.3.2
asttokens==3.0.0
async-timeout==5.0.1
attrs==25.3.0
bitsandbytes==0.45.5
certifi==2025.4.26
charset-normalizer==3.4.2
-e git+ssh://git@github.com/McLavish/cnlp_icl.git@93c0ce1948b19d9592a94bcdd8793d3c458992eb#egg=cnlp_icl
comm==0.2.2
contourpy==1.3.2
cycler==0.12.1
datasets==3.5.1
debugpy==1.8.14
decorator==5.2.1
dill==0.3.8
exceptiongroup==1.2.2
executing==2.2.0
filelock==3.18.0
fonttools==4.57.0
frozenlist==1.6.0
fsspec==2025.3.0
huggingface-hub==0.30.2
idna==3.10
ipykernel==6.29.5
ipython==8.36.0
jedi==0.19.2
Jinja2==3.1.6
joblib==1.5.0
jupyter_client==8.6.3
jupyter_core==5.7.2
kiwisolver==1.4.8
MarkupSafe==3.0.2
matplotlib==3.10.1
matplotlib-inline==0.1.7
mpmath==1.3.0
multidict==6.4.3
multiprocess==0.70.16
nest-asyncio==1.6.0
networkx==3.4.2
numpy==2.2.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
packaging==25.0
pandas==2.2.3
parso==0.8.4
pexpect==4.9.0
pillow==11.2.1
platformdirs==4.3.7
prompt_toolkit==3.0.51
propcache==0.3.1
psutil==7.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==20.0.0
Pygments==2.19.1
pyparsing==3.2.3
python-dateutil==2.9.0.post0
pytz==2025.2
PyYAML==6.0.2
pyzmq==26.4.0
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
scikit-learn==1.6.1
scipy==1.15.2
seaborn==0.13.2
six==1.17.0
stack-data==0.6.3
sympy==1.14.0
threadpoolctl==3.6.0
tokenizers==0.21.1
torch==2.7.0
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
transformers==4.51.3
triton==3.3.0
typing_extensions==4.13.2
tzdata==2025.2
urllib3==2.4.0
wcwidth==0.2.13
xxhash==3.5.0
yarl==1.20.0

Versions

[pip3] numpy==2.2.5
[pip3] nvidia-cublas-cu12==12.6.4.1
[pip3] nvidia-cuda-cupti-cu12==12.6.80
[pip3] nvidia-cuda-nvrtc-cu12==12.6.77
[pip3] nvidia-cuda-runtime-cu12==12.6.77
[pip3] nvidia-cudnn-cu12==9.5.1.17
[pip3] nvidia-cufft-cu12==11.3.0.4
[pip3] nvidia-curand-cu12==10.3.7.77
[pip3] nvidia-cusolver-cu12==11.7.1.2
[pip3] nvidia-cusparse-cu12==12.5.4.2
[pip3] nvidia-cusparselt-cu12==0.6.3
[pip3] nvidia-nccl-cu12==2.26.2
[pip3] nvidia-nvjitlink-cu12==12.6.85
[pip3] nvidia-nvtx-cu12==12.6.77
[pip3] torch==2.7.0
[pip3] triton==3.3.0
[conda] Could not collect

cc @ptrblck @msaroufim @eqy @jerryzh168

The text was updated successfully, but these errors were encountered:

jbschlosser · 2025-05-08T17:55:27Z

Hey @Angramme, is it at all possible to distill this down into a small reproduction script demonstrating the problem (e.g. with random, similarly-sized tensors)? That will help us investigate this. It seems possible to me this is a result of an OOM.

Angramme · 2025-05-13T14:22:55Z

Hey, sorry for the delay. Yes this occurs interchangeably with OOM errors. I will try to make a reproducible example soon.

jbschlosser added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: CUDACachingAllocator labels May 8, 2025

jbschlosser added the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

triggered internal assert in matmul #153172

triggered internal assert in matmul #153172

Uh oh!

Uh oh!

triggered internal assert in matmul #153172

triggered internal assert in matmul #153172

Comments

Uh oh!

🐛 Describe the bug

Versions

Uh oh!

Uh oh!