-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Don't import pkg_resources unless we need to parse dev version numbers. #19102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @anntzer !
# setuptools not installed | ||
parse_version = LooseVersion # type: ignore | ||
|
||
def parse_version(v): |
There was a problem hiding this comment.
Choose a reason for hiding 8000 this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have a small test to make sure that parse_version
works versions with dev version numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
a7ccf62
to
9d73b5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that disutils is getting deprecated, we most likely can have a simple version comparison function for version strings that contain "0123456789."
sklearn/utils/fixes.py
Outdated
parse_version = LooseVersion # type: ignore | ||
|
||
def parse_version(v): | ||
if not {*v} <= {*"0123456789."}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Do you think this would be clearer? (I have no strong preference)
if not {*v} <= {*"0123456789."}: | |
if not set(v) <= set("0123456789."): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, works for me.
9d73b5f
to
a62d442
Compare
Let's leave handling distutils deprecation to another time? This PR doesn't change the situation wrt. distutils. |
Yes, let's address import times for now, and deal with distutils deprecation/removal when we get there. I imagine it would be non trivial with numpy.distutils anyway numpy/numpy#18588 @anntzer Could you please add a test requested in #19102 (comment)? Otherwise LGTM! |
sklearn/utils/fixes.py
Outdated
parse_version = LooseVersion # type: ignore | ||
|
||
def parse_version(v): | ||
if not set(v) <= set("0123456789."): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe to be more explicit,
if not set(v) <= set("0123456789."): | |
if not set(v).issubset("0123456789."): |
I had to check documentation to understand what <= does for sets.
Also I would have expected not (a <= b)
to imply a > b
, however that's not true for sets as far as I can tell, so it is kind of confusing,
>>> a = {1}
>>> b = {2}
>>> a <= b, a > b
(False, False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
With this PR, the following will fail because from sklearn.utils.fixes import parse_version
parse_version("1.0.dev0") < parse_version("1.1") |
Nice catch! Note sure what we can do then. It's unfortunate that https://github.com/pypa/packaging/blob/main/packaging/version.py is difficult to vendor. |
If we really want to vendor it, I think we can combine https://github.com/pypa/packaging/blob/main/packaging/_structures.py and https://github.com/pypa/packaging/blob/main/packaging/version.py into one file. Or keep it seperate into its own "vendor/packaging" directory. |
Yeah, but vendoring that would still be unpleasant. Adding a runtime dependency for this is also hard to justify.. |
pkg_resources can be very slow to import, although this depends on the details of the python setup. On my machine, this PR speeds up `import sklearn` (with no dev versions of anything other than sklearn installed) by ~33%, from ~550ms to ~360ms.
a62d442
to
eb68443
Compare
(I haven't added a test yet because I agree the uncomparability between the two kinds of versions is probably more a dealbreaker that need to be resolved first...) |
I think our options are a bit limited. (setuptools vendors packaging https://github.com/pypa/setuptools/tree/main/pkg_resources/_vendor/packaging). |
Reference Issues/PRs
Fixes #19098.
What does this implement/fix? Explain your changes.
pkg_resources can be very slow to import, although this depends on the
details of the python setup. On my machine, avoiding to import it (this
PR) speeds up
import sklearn
(with no dev versions of anything otherthan sklearn installed) by ~33%, from ~550ms to ~360ms.
Any other comments?