8000 gh-120452: improve documentation about private name mangling by picnixz · Pull Request #120451 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-120452: improve documentation about private name mangling #120451

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jul 13, 2024

Conversation

picnixz
Copy link
Member
@picnixz picnixz commented Jun 13, 2024

@bedevere-app bedevere-app bot added docs Documentation in the Doc dir skip news awaiting review labels Jun 13, 2024
@picnixz picnixz changed the title [docs] improve documentation about private name mangling gh-120452: improve documentation about private name mangling Jun 13, 2024
@nineteendo
Copy link
Contributor

Maybe it's worth linking to https://docs.python.org/3/tutorial/classes.html#private-variables?

@picnixz
Copy link
Member Author
picnixz commented Jun 13, 2024

Oh great, there are actually TWOTHREE sections. Let me actually check whether I should rather link or not.

@picnixz
Copy link
Member Author
picnixz commented Jun 13, 2024

Ok, there are some imprecision in the paragraph you just linked:

Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped.

And I think I know why I always forget about the "all underscore" case... Because it's not explained here.

@evle-zzz
Copy link

There are also weird situations with import statement, which, I belive, are not described by this.
import __super_module is mangled like _A__super_module
whereas import __super_module.sub remains not mangled.

Also, this explain classes named with underscores, but not identifiers within classes. Like, self.__ = 1 within class with any name will not be mangled.

Additionally it might be nice to explain the goal behind mangling and applicability, but that could be a more debatable topic that require longer discussion.

@picnixz
Copy link
Member Author
picnixz commented Jun 14, 2024

Also, this explain classes named with underscores, but not identifiers within classes. Like, self.__ = 1 within class with any name will not be mangled.

This one can indeed be important since I do not think it's ever been mentioned explicitly (like, the distinction between using the identifier via self. or cls. internally and externally).

Additionally it might be nice to explain the goal behind mangling and applicability, but that could be a more debatable topic that require longer discussion.

This should be done in a separate PR IMO. There is already some kind of motivation where we justify its use for private/protected interfaces (not in those terms though).

import __super_module is mangled like _A__super_module
whereas import __super_module.sub remains not mangled.

Do you mean something like this:

# .
# ├── __haha
# │   ├── hehe.py
# │   └── __init__.py
# └── __haha.py 

class A1:
	import __haha    # fails

class A2:
	import _A2__haha  # impossible to import since '_A2__haha' does not exist

class A3:
	import __haha.hehe  # ok

A3._A3__haha  		# <module '__haha' from '/tmp/__haha/__init__.py'>
A3._A3__haha.hehe 	# <module '__haha.hehe' from '/tmp/__haha/hehe.py'>

class __:
	import __haha   # success

__.__haha  			# <module '__haha' from '/tmp/__haha/__init__.py'>

@picnixz
Copy link
Member Author
picnixz commented Jun 14, 2024

Actually, this highlights the impossibility of importing the __haha module in the scope of a class using an import statement 1 unless the class only consists of underscores! The question I have now is: should I document this behaviour or not? I would put it in the Expressions section but not in the others because this is a quite convoluted case.

Footnotes

  1. You can still do something like __haha = __import__("__haha") (credits to https://github.com/python/cpython/pull/120451#issuecomment-2167522715).

@nineteendo
Copy link
Contributor

this highlights the impossibility of importing the __haha module

It's possible if you really need it:

class A:
    __haha = __import__("__haha")

@picnixz
Copy link
Member Author
picnixz commented Jun 14, 2024

I don't think I will document this hack (but I'll edit my answer) but thank you!

@picnixz
Copy link
Member Author
picnixz commented Jun 14, 2024

FTR (and for future PRs), I'm force-pushing when I'm hotfixing typos in a commit (it does not make sense to review a commit with a known error).

@nineteendo
Copy link
Contributor

Thanks, I always to prefer to know why a force push was necessary.

@picnixz
Copy link
Member Author
picnixz commented Jun 14, 2024

In the end, I documented the hack because it feels a bit weird to say "you cannot do it" without telling you how to do it if you really need to.

Thanks, I always to prefer to know why a force push was necessary.

Yes, and I know that maintainers don't like force-pushing (and I don't like it either when working on Sphinx, it's hard to review and my "reviewed" changes becomes messy).

@evle-zzz
Copy link
evle-zzz commented Jun 14, 2024

There is already some kind of motivation where we justify its use for private/protected interfaces (not in those terms though).

There is, and it's misread and misused to often by newcomers, which is why it would benefit from rephrasing. But I agree, that's way wider topic that clarification that this PR provides.

Do you mean something like this:

Yes, like this.

The question I have now is: should I document this behaviour or not?
There are actually three questions:

  1. Is this an intended behaviour or just a fluke?
  2. Is this part of language specification to be relied on or an implementation detail? It would be bad to document somehing that is not guaranteed.
  3. Should that be known by public?
    Basically I'm ok with any wording if it's not wrong. Initial explanation from PR is not entirely correct, because real behaviour contradicts documentation. Just adding something like "Behaviour might be different for import statements, do not rely on mangling for that" or something like that could be a sufficient solution.

BTW, Murphy from mCoding made a nice video with investigation on how mangling works in different scenarios. I might have missed some cases. Not sure if YouTube links are appropriate here, but should be searchable by "Every Python dev falls for this (name mangling)" name.

@nineteendo
Copy link
Contributor
  1. Is this an intended behaviour or just a fluke?
  2. Is this part of language specification to be relied on or an implementation detail?

Yes, __import__("__haha") works the same as getattr(self, "__haha"), and I wouldn't expect anything different:

class A:
    def __init__(self, haha) -> None:
        setattr(self, "__haha", haha)

    @property
    def haha(self):
        return getattr(self, "__haha")

print(A(123).haha) # 123

But that is also not documented here.

@picnixz
Copy link
Member Author
picnixz commented Jun 14, 2024

Is this an intended behaviour or just a fluke?

The way I see it still makes partially sense but this is subject to interpretation. When you do a import __spam, I assume that the parser first changes the name and tries to import it, which is correct according to the specs (transformation is done before code is emitted, this is what is being written). On the other hand, when you do import __spam.util, it feels to me that it's a bit inconsistent with what is being said. I would expect __spam being changed into _Ham__spam and then import _Ham__spam.util which should fail.

@JelleZijlstra since you have quite a good knowledge of compiler-related stuff, do you have an explanation for the second behaviour?

Is this part of language specification to be relied on or an implementation detail? It would be bad to document somehing that is not guaranteed.

I don't think it's an implementation detail. The only implementation detail is when the name is longer than 255 characters and its something that is in the docs already.

Basically I'm ok with any wording if it's not wrong. Initial explanation from PR is not entirely correct

That's what I added in the expressions section (so the one that should serve as the formal definition).

But that is also not documented here.

Actually, it's documented below in the Classes section (at the end of the paragraph). But I may move that one into the formal definition as well:

Notice that code passed to exec() or eval() does not consider the classname of
the invoking class to be the current class; this is similar to the effect of
the global statement, the effect of which is likewise restricted to code that is
byte-compiled together. The same restriction applies to getattr(), setattr() and
delattr(), as well as when referencing __dict__ directly.

@evle-zzz
Copy link
  1. Is this an intended behaviour or just a fluke?
  2. Is this part of language specification to be relied on or an implementation detail?

Yes, __import__("__haha") works the same as getattr(self, "__haha"), and I wouldn't expect anything different:

I'm talking about mangling rules as a whole, not __import__ trick, this one is obvious.
I just doubt that anyone actually paid much attention to what would happen to import statements with double undrescores inside a class because of mangling to make this behaviour deliberate: it's an extremely weird scenario. As soon as it documented, we are forcing future parser changes to comply with this behaviour (or thinking about changing documentation again).

On the other hand, when you do import __spam.util, it feels to me that it's a bit inconsistent with what is being said.

I have a hunch that in this case whole string is sent into mangling procedure and a dot prevents further actions silently. I'm too lazy to go ahead and check though. :-)

@JelleZijlstra
Copy link
Member

I'll have to look at this later. @evle-zzz is probably right that the interaction with import may have been partly accidental, but backward compatibility probably means we can't change it now, so we may have to document and test the current behavior.

@bedevere-app
Copy link
bedevere-app bot commented Jun 15, 2024

Thanks for making the requested changes!

@JelleZijlstra: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from JelleZijlstra June 15, 2024 15:42
@picnixz
Copy link
Member Author
picnixz commented Jul 12, 2024

@JelleZijlstra Friendly ping for reviewing this one again (or drop the PR)

picnixz and others added 3 commits July 12, 2024 17:08
@picnixz
Copy link
Member Author
picnixz commented Jul 13, 2024

The journey was long but I think we eventually managed to effictively improve the documentation. And I'd bet that I'll forget about it next time I'll face such issue :D

@JelleZijlstra JelleZijlstra merged commit f4d6e45 into python:main Jul 13, 2024
25 checks passed
@miss-islington-app
Copy link

Thanks @picnixz for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 13, 2024
…ythonGH-120451)

(cherry picked from commit f4d6e45)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@bedevere-app
Copy link
bedevere-app bot commented Jul 13, 2024

GH-121715 is a backport of this pull request to the 3.13 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 13, 2024
…ythonGH-120451)

(cherry picked from commit f4d6e45)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jul 13, 2024
@bedevere-app
Copy link
bedevere-app bot commented Jul 13, 2024

GH-121716 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Jul 13, 2024
@picnixz picnixz deleted the docs-mangling branch July 13, 2024 14:51
JelleZijlstra added a commit that referenced this pull request Jul 13, 2024
…H-120451) (#121716)

gh-120452: improve documentation about private name mangling (GH-120451)
(cherry picked from commit f4d6e45)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
JelleZijlstra added a commit that referenced this pull request Jul 13, 2024
…H-120451) (#121715)

gh-120452: improve documentation about private name mangling (GH-120451)
(cherry picked from commit f4d6e45)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0