8000 [ROCm] remove caffe2 from hipify by jeffdaily · Pull Request #137157 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[ROCm] remove caffe2 from hipify #137157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

jeffdaily
Copy link
Collaborator
@jeffdaily jeffdaily commented Oct 2, 2024
  • Remove all "MasqueradingAsCUDA" files and classes.
  • Do not rename "CUDA" classes to "HIP".

NOTE: Also need to merge corresponding hipify_torch PR to be in sync with this PR if/when it is merged again: ROCm/hipify_torch#73

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @albanD @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

- Remove all "MasqueradingAsCUDA" files and classes.
- Do not rename "CUDA" classes to "HIP".
@jeffdaily jeffdaily requested a review from pruthvistony October 2, 2024 02:23
Copy link
pytorch-bot bot commented Oct 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137157

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures, 4 Unrelated Failures

As of commit 765a9fd with merge base bcaa0f5 (image):

NEW FAILURES - The following jobs have failed:

  • inductor / cuda12.1-py3.10-gcc9-sm86 / build (gh)
  • inductor / cuda12.1-py3.12-gcc9-sm86 / build (gh)
  • inductor / cuda12.4-py3.10-gcc9-sm86 / build (gh)
  • inductor / linux-jammy-cpu-py3.12-gcc11-inductor-halide / build (gh)
  • inductor / linux-jammy-cpu-py3.12-gcc11-inductor-triton-cpu / build (gh)
  • inductor / linux-jammy-cpu-py3.9-gcc11-inductor / build (gh)
  • inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / build (gh)
  • inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / build (gh)
  • linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh)
    RuntimeError: recursive_directory_iterator in used pre-CXX11 binaries, see; ['std::filesystem::recursive_directory_iterator::recursion_pending() const', 'std::filesystem::recursive_directory_iterator::depth() const', 'std::filesystem::recursive_directory_iterator::options() const', 'std::filesystem::recursive_directory_iterator::operator*() const', 'std::filesystem::recursive_directory_iterator::disable_recursion_pending()', 'std::filesystem::recursive_directory_iterator::pop(std::error_code&)', 'std::filesystem::recursive_directory_iterator::pop()', 'std::filesystem::recursive_directory_iterator::pop() [clone .cold]', 'std::filesystem::recursive_directory_iterator::increment(std::error_code&)', 'std::filesystem::recursive_directory_iterator::increment(std::error_code&) [clone .cold]', 'std::filesystem::recursive_directory_iterator::operator=(std::filesystem::recursive_directory_iterator&&)', 'std::filesystem::recursive_directory_iterator::operator=(std::filesystem::recursive_directory_iterator const&)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*) [clone .cold]', 'std::filesystem::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::recursive_directory_iterator::operator++()', 'std::filesystem::recursive_directory_iterator::operator++() [clone .cold]']
  • linux-binary-manywheel / manywheel-py3_9-cuda12_1-test / test (gh)
    RuntimeError: recursive_directory_iterator in used pre-CXX11 binaries, see; ['std::filesystem::recursive_directory_iterator::recursion_pending() const', 'std::filesystem::recursive_directory_iterator::depth() const', 'std::filesystem::recursive_directory_iterator::options() const', 'std::filesystem::recursive_directory_iterator::operator*() const', 'std::filesystem::recursive_directory_iterator::disable_recursion_pending()', 'std::filesystem::recursive_directory_iterator::pop(std::error_code&)', 'std::filesystem::recursive_directory_iterator::pop()', 'std::filesystem::recursive_directory_iterator::pop() [clone .cold]', 'std::filesystem::recursive_directory_iterator::increment(std::error_code&)', 'std::filesystem::recursive_directory_iterator::increment(std::error_code&) [clone .cold]', 'std::filesystem::recursive_directory_iterator::operator=(std::filesystem::recursive_directory_iterator&&)', 'std::filesystem::recursive_directory_iterator::operator=(std::filesystem::recursive_directory_iterator const&)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*) [clone .cold]', 'std::filesystem::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::recursive_directory_iterator::operator++()', 'std::filesystem::recursive_directory_iterator::operator++() [clone .cold]']
  • linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh)
    RuntimeError: recursive_directory_iterator in used pre-CXX11 binaries, see; ['std::filesystem::recursive_directory_iterator::recursion_pending() const', 'std::filesystem::recursive_directory_iterator::depth() const', 'std::filesystem::recursive_directory_iterator::options() const', 'std::filesystem::recursive_directory_iterator::operator*() const', 'std::filesystem::recursive_directory_iterator::disable_recursion_pending()', 'std::filesystem::recursive_directory_iterator::pop(std::error_code&)', 'std::filesystem::recursive_directory_iterator::pop()', 'std::filesystem::recursive_directory_iterator::pop() [clone .cold]', 'std::filesystem::recursive_directory_iterator::increment(std::error_code&)', 'std::filesystem::recursive_directory_iterator::increment(std::error_code&) [clone .cold]', 'std::filesystem::recursive_directory_iterator::operator=(std::filesystem::recursive_directory_iterator&&)', 'std::filesystem::recursive_directory_iterator::operator=(std::filesystem::recursive_directory_iterator const&)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*)', 'std::filesystem::recursive_directory_iterator::recursive_directory_iterator(std::filesystem::path const&, std::filesystem::directory_options, std::error_code*) [clone .cold]', 'std::filesystem::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::recursive_directory_iterator::~recursive_directory_iterator()', 'std::filesystem::recursive_directory_iterator::operator++()', 'std::filesystem::recursive_directory_iterator::operator++() [clone .cold]']

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch release notes: sparse release notes category labels Oct 2, 2024
@jeffdaily jeffdaily force-pushed the hipify_without_caffe branch from c5758b9 to 8903858 Compare October 2, 2024 18:11
@jeffdaily
Copy link
Collaborator Author

@albanD will need your help with the LOC PR sanity check. In the cuda_to_hip_mappings file we removed 2 fields from each mapping. The changes were all automated using a script so nothing was missed. But it results in a lot of changed lines.

@jeffdaily
Copy link
Collaborator Author

@albanD the new cuda_to_hip_mappings.py file was generated using this script.

https://gist.github.com/jeffdaily/be8961a5ee180ff2ec3dc2da6db0461f

@jeffdaily jeffdaily marked this pull request as ready for review October 4, 2024 19:11
@jeffdaily jeffdaily added the topic: not user facing topic category label Oct 4, 2024
@albanD
Copy link
Collaborator
albanD commented Oct 4, 2024

Skipping sanity check since this is generated code. Thanks for the details!

@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 4, 2024
@cyyever
Copy link
Collaborator
cyyever commented Oct 5, 2024

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 5, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@cyyever
Copy link
Collaborator
cyyever commented Oct 5, 2024

@pytorchmergebot merge -f "Unrelated failures."

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@xw285cornell
Copy link
Contributor
xw285cornell commented Oct 8, 2024

@pytorchbot revert -m "this is breaking internal where we still use caffe2"

Copy link
pytorch-bot bot commented Oct 8, 2024

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@xw285cornell
Copy link
Contributor

@pytorchbot revert -m "this is breaking internal where we still use caffe2" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Oct 8, 2024
This reverts commit 40d8260.

Reverted #137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](#137157 (comment)))
@pytorchmergebot
Copy link
Collaborator

@jeffdaily your PR has been successfully reverted.

@@ -65,7 +64,7 @@ def __str__(self):
'preprocess_file_and_save_result', 'compute_stats', 'add_dim3', 'processKernelLaunches', 'find_closure_group',
'find_bracket_group', 'find_parentheses_group', 'replace_math_functions', 'hip_header_magic', 'replace_extern_shared',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace_extern_shared definition should be marked deprecated and the following lines should be deleted:

# Replace the extern __shared__

@@ -619,7 +613,7 @@ def is_out_of_place(rel_filepath):
return True


# Keep this synchronized with includes/ignores in build_amd.py
# deprecated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet These are added to keep BC linter happy. When are we allowed to get rid of them? Is there a deprecation policy written out somewhere?

Copy link
Contributor

@xw285cornell has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@functionstackx
Copy link
Contributor

Hi,

This PR as it currently stands breaks my fp8 MI300X internal training codebase & breaks ROCm/TransformerEngine :(

cc: @hliuca

Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor module: rocm AMD GPU support for Pytorch open source release notes: sparse release notes category Reverted skip-pr-sanity-checks Stale topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0