8000 [ROCm] fix torch.layer_norm invalid configuration problem when input is large tensor by hongxiayang · Pull Request #144007 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[ROCm] fix torch.layer_norm invalid configuration problem when input is large tensor #144007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hongxiayang
Copy link
Collaborator
@hongxiayang hongxiayang commented Dec 31, 2024

Fixes #136291

This PR is to fix the invalid configuration argument problem happened on ROCm when input is a large tensor when calling torch.layer_norm.

 File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/functional.py", line 2573, in layer_norm
    return torch.layer_norm
RuntimeError: HIP error: invalid configuration argument

After investigation, I found that the reason why this error happened is: The amd compute language runtime checks whether gridDim.x * blockDim.x is greater than std::numeric_limits<uint32_t>::max() or not. If yes, it will error out with the "invalid configuration argument" message.

The fix is to split the whole task to several chunks so that each chunk will not trigger the failure condition. This will ensure the correctness and completeness given the current kernel implementation logic of vectorized_layer_norm_kernel.

Also added a largeTensor layer_norm unit test test_layer_norm_large_tensor with the same shape [16, 3000, 3000, 16] as the one used by the pytorch issue #136291 so that the unit test can check the expected output value to ensure correctness.

The future work may include performance optimization of layer_norm and CK layer_norm integration.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @naromero77amd

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch release notes: cuda release notes category labels Dec 31, 2024
Copy link
pytorch-bot bot commented Dec 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144007

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b67eb8b with merge base 2966fb3 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Collaborator
@eqy eqy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for this case

@hongxiayang
Copy link
Collaborator Author
hongxiayang commented Dec 31, 2024

Please add a test for this case

will do.
Added a large tensor test.

@hongxiayang
Copy link
Collaborator Author

@eqy @malfet : All checks are green now. Can you help to merge this PR? Thanks.

@eqy
Copy link
Collaborator
eqy commented Jan 7, 2025

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 7, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source release notes: cuda release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On AMD GPUs (ROCm 5.7-6.2), cannot backpropagate loss tensor containing more than 2e8 elements
4 participants
0