8000 [symbolic shapes] Log SymNode id for provenance by angelayi · Pull Request #146532 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[symbolic shapes] Log SymNode id for provenance #146532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

angelayi
Copy link
Contributor
@angelayi angelayi commented Feb 5, 2025

Copy link
pytorch-bot bot commented Feb 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146532

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 5 Unrelated Failures

As of commit 92c0f4a with merge base 6818945 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
@angelayi angelayi requested a review from bobrenjc93 February 10, 2025 22:28

We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse:
<img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" />


cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
@angelayi
Copy link
Contributor Author
angelayi 8000 commented Feb 11, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 11, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator


We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse:
<img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" />


cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2025
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows:

```python
        if key == "missing_fake_kernel":
            return hash((key, data["op"]))  # Same ops get deduped
        elif key == "mismatched_fake_kernel":
            return hash((key, data["op"], data["reason"]))  # Same op and reason for errors get deduped
        elif key == "propagate_real_tensors":
            return hash((key, json.dumps(data["stack"])))  # Guards appearing on the same stacktrace get deduped
        elif key == "create_unbacked_symbol":
            return hash((key, json.dumps(data["stack"])))  # Unbacked symbols appearing on the same stacktrace get deduped
```

Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this.

The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue.
Pull Request resolved: #146533
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #146532
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2025
Added some additional logging so we can also run tlparse on generic export errors

Pull Request resolved: #146534
Approved by: https://github.com/pianpwk
ghstack dependencies: #146532, #146533
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2025
Added a utility function for capturing the user stack and framework stacktrace.

Pull Request resolved: #146858
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #146532, #146533, #146534
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2025
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse:
<img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" />

Pull Request resolved: #146532
Approved by: https://github.com/bobrenjc93
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows:

```python
        if key == "missing_fake_kernel":
            return hash((key, data["op"]))  # Same ops get deduped
        elif key == "mismatched_fake_kernel":
            return hash((key, data["op"], data["reason"]))  # Same op and reason for errors get deduped
        elif key == "propagate_real_tensors":
            return hash((key, json.dumps(data["stack"])))  # Guards appearing on the same stacktrace get deduped
        elif key == "create_unbacked_symbol":
            return hash((key, json.dumps(data["stack"])))  # Unbacked symbols appearing on the same stacktrace get deduped
```

Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this.

The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue.
Pull Request resolved: #146533
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #146532
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Added some additional logging so we can also run tlparse on generic export errors

Pull Request resolved: #146534
Approved by: https://github.com/pianpwk
ghstack dependencies: #146532, #146533
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Added a utility function for capturing the user stack and framework stacktrace.

Pull Request resolved: #146858
Approved by: https://github.com/bobrenjc93
ghstack dependencies: #146532, #146533, #146534
Raymo111 pushed a commit that referenced this pull request Feb 20, 2025
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse:
<img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" />

Pull Request resolved: pytorch#146532
Approved by: https://github.com/bobrenjc93
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows:

```python
        if key == "missing_fake_kernel":
            return hash((key, data["op"]))  # Same ops get deduped
        elif key == "mismatched_fake_kernel":
            return hash((key, data["op"], data["reason"]))  # Same op and reason for errors get deduped
        elif key == "propagate_real_tensors":
            return hash((key, json.dumps(data["stack"])))  # Guards appearing on the same stacktrace get deduped
        elif key == "create_unbacked_symbol":
            return hash((key, json.dumps(data["stack"])))  # Unbacked symbols appearing on the same stacktrace get deduped
```

Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this.

The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue.
Pull Request resolved: pytorch#146533
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: pytorch#146532
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
Added some additional logging so we can also run tlparse on generic export errors

Pull Request resolved: pytorch#146534
Approved by: https://github.com/pianpwk
ghstack dependencies: pytorch#146532, pytorch#146533
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
Added a utility function for capturing the user stack and framework stacktrace.

Pull Request resolved: pytorch#146858
Approved by: https://github.com/bobrenjc93
ghstack dependencies: pytorch#146532, pytorch#146533, pytorch#146534
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this pull request Feb 24, 2025
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
We can use the SymNode id to point us back to how previous expressions were created, and construct this nice tree in tlparse:
<img width="761" alt="image" src="https://github.com/user-attachments/assets/531b03e8-4398-4d0a-bd11-16078256041c" />

Pull Request resolved: pytorch#146532
Approved by: https://github.com/bobrenjc93
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
Using a custom logger so that we can store our own buffer to dedup logs that look the same. The schema for deduping is as follows:

```python
        if key == "missing_fake_kernel":
            return hash((key, data["op"]))  # Same ops get deduped
        elif key == "mismatched_fake_kernel":
            return hash((key, data["op"], data["reason"]))  # Same op and reason for errors get deduped
        elif key == "propagate_real_tensors":
            return hash((key, json.dumps(data["stack"])))  # Guards appearing on the same stacktrace get deduped
        elif key == "create_unbacked_symbol":
            return hash((key, json.dumps(data["stack"])))  # Unbacked symbols appearing on the same stacktrace get deduped
```

Notably, guards appearing on the same stacktrace get deduped. This is because there are some cases in PT2I models where a piece of code which creates a new unbacked symint + runs into a DDE gets called 800 times, causing 800 new symints to be created, and 800 propagate_real_tensor errors that are all the same expression. This is hard to look at, so we should just deduplicate this.

The con of this is that if there exists multiple DDE on the same stacktrace, we will only show the first issue.
Pull Request resolved: pytorch#146533
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: pytorch#146532
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
Added some additional logging so we can also run tlparse on generic export errors

Pull Request resolved: pytorch#146534
Approved by: https://github.com/pianpwk
ghstack dependencies: pytorch#146532, pytorch#146533
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
Added a utility function for capturing the user stack and framework stacktrace.

Pull Request resolved: pytorch#146858
Approved by: https://github.com/bobrenjc93
ghstack dependencies: pytorch#146532, pytorch#146533, pytorch#146534
majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
@github-actions github-actions bot deleted the gh/angelayi/66/head branch March 23, 2025 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fx Merged release notes: fx release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0