-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: allow importlib.LazyLoader to work with numpy and add test of this #22045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dschult. The numpy/lib/
changes seem more or less fine, the numpy/core/
ones less so.
This is yet again a pretty bad importlib
bug it looks like, it is soooo buggy as a whole. I think we can get this merged, but it's best not to also use LazyLoader
at all in networkx
. Do you use it with scipy
as well right now? I'd expect problems there too, based on this diff.
@@ -75,8 +75,10 @@ | |||
from . import fromnumeric | |||
from .fromnumeric import * | |||
from . import defchararray as char | |||
from . import records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already imported in the line below, exposing it under two names seems wrong. Better to adjust the calling code like np.core.records
-> np.core.rec
(or a direct import and not use the fully-qualified name at all, like for _byteorderconv
in ma/mrecords.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worried about the two names also, rec
and records
. But the standard import exposes two names, and while the numpy __init__
code mostly uses records, the rec
is included in an __all__
so that would be a public API change. All we're doing is making the lazyloader version have the same names as the standard version. If we don't provide both names then we're breaking current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the difference is that we have np.rec
but not np.records
. np.rec
is the public namespace; everything else is private anyway. I checked that this change does not cause np.records
to appear, so we should be okay with this.
numpy/core/__init__.py
Outdated
from . import records as rec | ||
from .records import record, recarray, format_parser | ||
from . import memmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.core.memmap
exists already: np.core.memmap is np.memmap
. So this is not correct and could break some existing code possibly (ideally no one should use np.core.<xxx>
, but they may be doing that today).
from . import npyio | ||
from . import arrayterator | ||
from . import arraypad | ||
from . import _version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There aren't really harmful imports to add, because they're already public-ish today; we should clean this up later though. Again no one should be using these namespaces directly; I wonder if the LazyLoader
issue can also be fixed by not using the fully-qualified names elsewhere?
I think using LazyLoader within a single package is probably just fine. You have control over how it is used there. But NetworkX went beyond the bounds of our package and that caused the trouble. We have backed that out -- it was only numpy and not for very long... Luckily it requires many factors before it shows its head. Perhaps those of us interested in lazyloading should take a look at |
That makes sense, thanks for the context. And +1 for having a look at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more comments on the test, then it should be ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a few places we run such tests as a subprocess to avoid reloading, etc. (e.g. the test which tests that warning). But since this works, I don't mind it and I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, in it goes. Thanks @dschult
Python's LazyLoader tool from importlib, does not load a submodule name to the local namespace. This differs from a standard import. It is mentioned in PEP 650 as a difference. Indeed, it makes sense that
from .type_check import *
would load the names within thetype_check
module and not load the nametype_check
. But the standard import does that and is unlikely to change soon.The difference in treatment of submodule names between standard and lazy imports is not a concern unless the package being imported depends on its submodule names being available without explicitly importing them. Even then it is only a problem if the user tries to import from a subpackage without first importing the package.
Currently when a user LazyLoads numpy (or any package the user imports LazyLoads numpy) and imports a subpackage as the first access to numpy that triggers a load, a NameError is raised. E.g.
from numpy.lib import recfunctions
works for standard imports, but not if the NumPy package has been lazyloaded and the lazyload has not been triggered. Addingimport numpy
before thefrom ...
subpackage import solves the trouble.See networkx/networkx#5838 for more discussion.
This PR allows numpy to be loaded with the
importlib.LazyLoader
tools by explicitly importing submodule names where needed. There are only a few places where numpy does this, most notably innumpy.lib.__init__.py
.A test is also added to check that this works. The test does not check every submodule, so in the future, if some code is added that expects a submodule name to exist in the local namespace without explicitly importing it, that will potentially break the lazyloading/subpackage-access-as-1st-access capability for that submodule. I'm not sure it is worth testing every submodule, but we could attempt to do that.
The test manually changes
sys.modules
(and restores it in aFinally
clause) because we must mimic conditions wherenumpy
has not been previously imported. Changingsys.modules
always makes me nervous, but I don't see another way to test this. We could leave it untested too -- especially since this test is not checking every subpackage anyway.I'd love for this issue to be fixed by Python's builtin
importlib
. But the PEP indicates it is a known difference between the import systems. We are likely going to need to point out the difficulty that this difference creates. But in the meantime, it seems reasonable to make sure our code explicitly imports the submodule names when it uses them.