-
Notifications
You must be signed in to change notification settings - Fork 24.3k
[symbolic shapes] Log SymNode id for provenance #146532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse: <img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" /> cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: inductor / unit-test / cuda12.4-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / unit-test / cuda12.4-py3.13-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse: <img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" /> cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows: ```python if key == "missing_fake_kernel": return hash((key, data["op"])) # Same ops get deduped elif key == "mismatched_fake_kernel": return hash((key, data["op"], data["reason"])) # Same op and reason for errors get deduped elif key == "propagate_real_tensors": return hash((key, json.dumps(data["stack"]))) # Guards appearing on the same stacktrace get deduped elif key == "create_unbacked_symbol": return hash((key, json.dumps(data["stack"]))) # Unbacked symbols appearing on the same stacktrace get deduped ``` Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this. The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue. Pull Request resolved: #146533 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #146532
Added some additional logging so we can also run tlparse on generic export errors Pull Request resolved: #146534 Approved by: https://github.com/pianpwk ghstack dependencies: #146532, #146533
Added a utility function for capturing the user stack and framework stacktrace. Pull Request resolved: #146858 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #146532, #146533, #146534
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse: <img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" /> Pull Request resolved: #146532 Approved by: https://github.com/bobrenjc93
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows: ```python if key == "missing_fake_kernel": return hash((key, data["op"])) # Same ops get deduped elif key == "mismatched_fake_kernel": return hash((key, data["op"], data["reason"])) # Same op and reason for errors get deduped elif key == "propagate_real_tensors": return hash((key, json.dumps(data["stack"]))) # Guards appearing on the same stacktrace get deduped elif key == "create_unbacked_symbol": return hash((key, json.dumps(data["stack"]))) # Unbacked symbols appearing on the same stacktrace get deduped ``` Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this. The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue. Pull Request resolved: #146533 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #146532
Added some additional logging so we can also run tlparse on generic export errors Pull Request resolved: #146534 Approved by: https://github.com/pianpwk ghstack dependencies: #146532, #146533
Added a utility function for capturing the user stack and framework stacktrace. Pull Request resolved: #146858 Approved by: https://github.com/bobrenjc93 ghstack dependencies: #146532, #146533, #146534
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse: <img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" /> Pull Request resolved: pytorch#146532 Approved by: https://github.com/bobrenjc93
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows: ```python if key == "missing_fake_kernel": return hash((key, data["op"])) # Same ops get deduped elif key == "mismatched_fake_kernel": return hash((key, data["op"], data["reason"])) # Same op and reason for errors get deduped elif key == "propagate_real_tensors": return hash((key, json.dumps(data["stack"]))) # Guards appearing on the same stacktrace get deduped elif key == "create_unbacked_symbol": return hash((key, json.dumps(data["stack"]))) # Unbacked symbols appearing on the same stacktrace get deduped ``` Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this. The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue. Pull Request resolved: pytorch#146533 Approved by: https://github.com/avikchaudhuri ghstack dependencies: pytorch#146532
Added some additional logging so we can also run tlparse on generic export errors Pull Request resolved: pytorch#146534 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#146532, pytorch#146533
Added a utility function for capturing the user stack and framework stacktrace. Pull Request resolved: pytorch#146858 Approved by: https://github.com/bobrenjc93 ghstack dependencies: pytorch#146532, pytorch#146533, pytorch#146534
Pull Request resolved: pytorch#146859 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#146532, pytorch#146533, pytorch#146534, pytorch#146858
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse: <img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" /> Pull Request resolved: pytorch#146532 Approved by: https://github.com/bobrenjc93
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows: ```python if key == "missing_fake_kernel": return hash((key, data["op"])) # Same ops get deduped elif key == "mismatched_fake_kernel": return hash((key, data["op"], data["reason"])) # Same op and reason for errors get deduped elif key == "propagate_real_tensors": return hash((key, json.dumps(data["stack"]))) # Guards appearing on the same stacktrace get deduped elif key == "create_unbacked_symbol": return hash((key, json.dumps(data["stack"]))) # Unbacked symbols appearing on the same stacktrace get deduped ``` Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this. The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue. Pull Request resolved: pytorch#146533 Approved by: https://github.com/avikchaudhuri ghstack dependencies: pytorch#146532
Added some additional logging so we can also run tlparse on generic export errors Pull Request resolved: pytorch#146534 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#146532, pytorch#146533
Added a utility function for capturing the user stack and framework stacktrace. Pull Request resolved: pytorch#146858 Approved by: https://github.com/bobrenjc93 ghstack dependencies: pytorch#146532, pytorch#146533, pytorch#146534
Pull Request resolved: pytorch#146859 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#146532, pytorch#146533, pytorch#146534, pytorch#146858
Stack from ghstack (oldest at bottom):
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse:

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv