8000 gh-83151: Make closure work on pdb by gaogaotiantian · Pull Request #111094 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-83151: Make closure work on pdb #111094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 6, 2024
Merged

Conversation

gaogaotiantian
Copy link
Member
@gaogaotiantian gaogaotiantian commented Oct 20, 2023

Closure on pdb has been an issue for years, and is often considered "to difficult to fix". PEP709 alleviated the issue by inlining all list, dict and set comprehensions, but the fundamental issue is still there - you can easily reproduce the issue with lambda functions ((lambda: x)()) or generators (any(x for x in lst if x != z)). Now that pdb supports multi-line statement, you can even create your own function scope. Neither of them work in current pdb.

The related (open) issues I can find are #83151, #80052, #76987, #70260 and #65360.

This PR fixes all the issues mentioned above, while keep (almost?) all the current correct behaviors.

Boiler alert, black magic involved.

The fundamental reason for the issue above, is that when python compiles code that generates nested functions, the inner function has no idea about the context. For example:

def f():
    z = 1
    any([x for x in range(2) if x != z])

The code above works fine because python compiles any([x for x in range(2) if x != z]) with the awareness that z is a local variable of f, so closure should be used. However, in pdb, if you execute/compile any([x for x in range(2) if x != z]), there's no way for Python to know that z is an outer local variable and should be passed as a cell variable.

So, the solution is - trick the compiler.

What we can do, is to force compiler to compile the code we need in a closure-awareness environment. Take the code above as an example, instead of compiling any([x for x in range(2) if x != z]), we compile the following code:

def outer():
    z = None
    def __pdb_scope():
        nonlocal z
        any([x for x in range(2) if x != z])

Then we get the code object of outer.__pdb_scope, and that is the code object we actually want - it considers z as a free variable. Then, we pass the value of z with closure argument of exec:

exec(code, globals, locals, closure=(types.CellType(1),))

The solution is not done, as we still need to write the local variables back (locals won't be changed as all the variables are considered freevar). Also, it's possible for the code itself to create new variables and we want that in locals as well (k=1 for example). So, we just write all the local variables in the scope back by adding some new code at the end of the code.

To keep the original behavior as close as possible, if any exception is raised in the new method, we fallback to the original simple exec solution (it might be a valid exception, but we will always have the same exception as before).

Why not ...

  • just combine globals() and locals() as the global namespace?
    • result of globals() would be wrong
    • result of global x; print(x) would be wrong if x appears in both global and local namespace (if local is preferred, same issue if global is preferred).
    • impossible to do writeback correctly
  • use the new method only?
    • the new method will give different exception on some cases (UnboundLocalError vs NameError)
    • the new method does not work well with global x; print(x) as x would be defined as both global and nonlocal (it will fallback to the original method and work)

This is not the perfect solution, but it's pretty close. It passed all the existing tests and fixed all the cases in the issues mentioned above. I have not came up with an example that does not work with this solution (I would guess there will be some dark corners).

Yes, I know this is complicated, and it kind of depends on the implementation of CPython, but I still believe we should do this, because this issue has been repeatedly raised by users of pdb for probably > 10 years.

@gvanrossum
Copy link
Member

Great work! I'm not sure I have the time (or even, at this point, the context) to review this. Maybe someone else can do this more justice?

@gaogaotiantian
Copy link
Member Author

Great work! I'm not sure I have the time (or even, at this point, the context) to review this. Maybe someone else can do this more justice?

Oh I invited you because you reopened #65360 (comment) and this fix involves 8000 some black magic of Python. But no worries I'm sure I can find great reviewers for feedbacks of the PR (a couple invited already)!

Copy link
Member
@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been playing around with this locally for a while, and it seems like it works really well! Nice job.

Here's a first round of review. This is a tricky task, but I think the current solution may be overcomplicating it in places:

@brandtbucher brandtbucher self-assigned this May 4, 2024
@gaogaotiantian
Copy link
Member Author

I just realized this might have some conflict with PEP 667 which will probably be merged today. I will rework the code. Also taking the notes from your reviews. I will also hopefully finish this tomorrow.

@gaogaotiantian
Copy link
Member Author

I had to make quite a bit of changes to make this work with PEP 667 - locals() is not write-through anymore for function scope. Neither can it do pop(). So a couple of things were changed:

  1. locals is copied first and the writeback only happens at the end of the function.
  2. Whether there is nested function is checked first, if not, use the normal exec. This actually solves plenty of interesting issues like locals() (which will print the extra helper variables without the protection).
  3. Some of the reviews are applied.
  4. We can't use locals() to pass around data anymore, so everything we need is now stored in a dict __pdb_eval__.

This relies on the fix from #118583 so please do not merge it yet. I applied the change from that issue just to make the tests pass. I'll merge it in this PR when that one is merged in main.

Copy link
Member
@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this is starting to look. Some more thoughts:

gaogaotiantian and others added 2 commits May 5, 2024 21:12
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
@gaogaotiantian
Copy link
Member Author

We need to put the other exec in the try .. except block as well. The newly added test case is for that. Basically if we have a global variable x and we do

global x; (lambda: x)()

We should be able to get the value. But compiling this with our instrumentation would be:

nonlocal x
global x; (lambda: x)()

and returns an error that's confusing to users - *** SyntaxError: name 'g' is nonlocal and global

That's why we need to fall back to normal.

Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
Copy link
Member
@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this seems correct! Thanks for tackling such a longstanding and hairy issue, and for taking the time to walk me through the fix. :)

@brandtbucher brandtbucher merged commit e5353d4 into python:main May 6, 2024
35 checks passed
@gaogaotiantian gaogaotiantian deleted the pdb-closure branch May 6, 2024 19:41
@gaogaotiantian
Copy link
Member Author

I'm glad this finally got fixed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0