8000 [3.9] bpo-43048: RecursionError traceback RecursionError bugfix for cpy3.9 (GH-24460) by vlad17 · Pull Request #24460 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

[3.9] bpo-43048: RecursionError traceback RecursionError bugfix for cpy3.9 (GH-24460) #24460

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

vlad17
Copy link
@vlad17 vlad17 commented Feb 6, 2021

When traceback is formatting a RecursionError with a long __context__ chain it runs the risk of itself raising a RecursionError. This patch catches the corresponding recursive call's RecursionError, truncates the traceback and emits a warning instead.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@vlad17

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@vlad17 vlad17 changed the title bpo-43048: RecursionError traceback RecursionError bugfix for cpy3.9 (GH-24460) [3.9] bpo-43048: RecursionError traceback RecursionError bugfix for cpy3.9 (GH-24460) Feb 7, 2021
f()
except Exception:
exc_info = sys.exc_info()
traceback.format_exception(exc_info[0], exc_info[1], exc_info[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of testing the traceback methods it would be ok to unindent the traceback.format_exception(exc_info[0], exc_info[1], exc_info[2]) line so that it's not happening in the except block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Lib/traceback.py Outdated
@@ -465,6 +465,9 @@ class TracebackException:
- :attr:`msg` For syntax errors - the compiler error message.
"""

def hello(self, exc_value):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comment here, the surprising fact that this seemingly no-op function is necessary for tests to pass is what tells me there's a bug in whatever mechanism is used to clear the tstate->recursion_headroom flag.

Solving this seems well out of scope for this PR. (I just pushed this so that you could see what was going on.) I'll remove this no-op hello as well as the problematic test case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create a standalone script to show what happened, and report that as a separate issue?

I think what happened here was that when format_exception is called in the except block, the recursion limit was already reached, so things go haywire. I don't think it's an exponential blowup in traversing the context and cause links.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I got it wrong - it was in the except after the recursion returned. I haven't tried running it yet, will try to look into it tomorrow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if you check out 8550dbdd53b87dde1f32288f250507bc43b68130 then without the call to hello() there's a fatal error regardless of the indentation of format_exception() in the test case.

I tried for a bit to make a minimum example script but wasn't able to replicate. I agree there's no exponential blowup (I was mistaken in my long comment in the other thread) because the seen set prevents revisiting nodes in the implicit binary tree I was imagining. Will try some more later.

I still have no explanation as to why the headroom flag isn't getting cleared when we pop out of the recursion error that happens inside the TracebackException constructor (i.e., the second recursion error in the unit test overall in time).

Lib/traceback.py Outdated
'Warning: the above exception was the cause of other '
'exceptions, but those were truncated to avoid '
'stack overflow in the exception handler. '
'See https://bugs.python.org/issue43048 for details.\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct - you truncated the cause of what you emit, not the other way around.

Also, I think this should be more concise, something like:

yield "(chained exceptions have been truncated to avoid deep recursion):"

(and certainly not a link to the bpo issue).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, was looking for a more concise way of putting it. I did something along those lines.

…ssages

There seems to be another bug that's out-of-scope for this PR, which is handling the tstate->recursion_headroom clearing that's needed for the cause_context test case to pass. I fixed this temporarily by calling a no-op function which I think allows cpython to notice that the stack has been unwound, but it's a convoluted hack and is better addressed head-on by a fix in the ceval code.
@vlad17
Copy link
Author
vlad17 commented Feb 8, 2021 via email

Lib/traceback.py Outdated
@@ -476,34 +476,50 @@ def __init__(self, exc_type, exc_value, exc_traceback, *, limit=None,
_seen.add(id(exc_value))
# Gracefully handle (the way Python 2.4 and earlier did) the case of
# being called with no type or value (None, None, None).
self.__suppress_context__ = False
self._truncated = False
if (exc_value and exc_value.__cause__ is not None
and id(exc_value.__cause__) not in _seen):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If recursion into constructing cause exceeded the recursion limit then surely the call to construct context will too. Why do we need to try blocks?

@vlad17
Copy link
Author
vlad17 commented Feb 8, 2021 via email

@@ -0,0 +1 @@
When traceback is formatting a RecursionError with a long __context__ chain it runs the risk of itself raising a RecursionError. This patch catches the corresponding recursive call's RecursionError, truncates the traceback and emits a warning instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ambiguous because there is also traceback formatting code in the interpreter. Also probably too long, and mentions context but not cause. This is probably enough to indicate the impact (and there are always more details in the bpo):

Handle RecursionErrors in :class:~traceback.TracebackException's constructor, so that long exceptions chains are truncated instead of causing traceback formatting to fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. thanks for the patience!

Copy link
Member
@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice and simple.

@iritkatriel
Copy link
Member
9E81

Thanks. You'll need to sign a CLA as well.

@vlad17
Copy link
Author
vlad17 commented Feb 25, 2021

@iritkatriel should this PR have "awaiting core review" as a label?

@iritkatriel
Copy link
Member

@iritkatriel should this PR have "awaiting core review" as a label?

Yes, it needs to be approved by a core developer.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Mar 28, 2021
@ambv ambv closed this Jul 12, 2021
@ambv ambv reopened this Jul 12, 2021
@ambv
Copy link
Contributor
ambv commented Jul 12, 2021

Closed and re-opened to trigger CI.

@ambv ambv merged commit 489c273 into python:3.9 Jul 12, 2021
@bedevere-bot
Copy link

@ambv: Please replace # with GH- in the commit message next time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0