8000 Don't import pkg_resources unless we need to parse dev version numbers. by anntzer · Pull Request #19102 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Don't import pkg_resources unless we need to parse dev version numbers. #19102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented Jan 4, 2021

Reference Issues/PRs

Fixes #19098.

What does this implement/fix? Explain your changes.

pkg_resources can be very slow to import, although this depends on the
details of the python setup. On my machine, avoiding to import it (this
PR) speeds up import sklearn (with no dev versions of anything other
than sklearn installed) by ~33%, from ~550ms to ~360ms.

Any other comments?

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @anntzer !

# setuptools not installed
parse_version = LooseVersion # type: ignore

def parse_version(v):
Copy link
Member

Choose a reason for hiding 8000 this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a small test to make sure that parse_version works versions with dev version numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Base automatically changed from master to main January 22, 2021 10:53
@anntzer anntzer force-pushed the delaypkgresources branch from a7ccf62 to 9d73b5f Compare March 7, 2021 19:36
Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that disutils is getting deprecated, we most likely can have a simple version comparison function for version strings that contain "0123456789."

parse_version = LooseVersion # type: ignore

def parse_version(v):
if not {*v} <= {*"0123456789."}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do you think this would be clearer? (I have no strong preference)

Suggested change
if not {*v} <= {*"0123456789."}:
if not set(v) <= set("0123456789."):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, works for me.

@anntzer anntzer force-pushed the delaypkgresources branch from 9d73b5f to a62d442 Compare March 7, 2021 22:50
@anntzer
Copy link
Contributor Author
anntzer commented Mar 7, 2021

Let's leave handling distutils deprecation to another time? This PR doesn't change the situation wrt. distutils.

@rth
Copy link
Member
rth commented Mar 10, 2021

Let's leave handling distutils deprecation to another time?

Yes, let's address import times for now, and deal with distutils deprecation/removal when we get there. I imagine it would be non trivial with numpy.distutils anyway numpy/numpy#18588

@anntzer Could you please add a test requested in #19102 (comment)? Otherwise LGTM!

parse_version = LooseVersion # type: ignore

def parse_version(v):
if not set(v) <= set("0123456789."):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe to be more explicit,

Suggested change
if not set(v) <= set("0123456789."):
if not set(v).issubset("0123456789."):

I had to check documentation to understand what <= does for sets.

Also I would have expected not (a <= b) to imply a > b, however that's not true for sets as far as I can tell, so it is kind of confusing,

>>> a = {1}
>>> b = {2}
>>> a <= b, a > b
(False, False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@thomasjpfan
Copy link
Member

With this PR, the following will fail because parse_version returns two different "Version" objects:

from sklearn.utils.fixes import parse_version
parse_version("1.0.dev0") < parse_version("1.1")

@rth
Copy link
Member
rth commented Mar 10, 2021

Nice catch! Note sure what we can do then. It's unfortunate that https://github.com/pypa/packaging/blob/main/packaging/version.py is difficult to vendor.

@thomasjpfan
Copy link
Member
thomasjpfan commented Mar 10, 2021

If we really want to vendor it, I think we can combine https://github.com/pypa/packaging/blob/main/packaging/_structures.py and https://github.com/pypa/packaging/blob/main/packaging/version.py into one file.

Or keep it seperate into its own "vendor/packaging" directory.

@rth
Copy link
Member
rth commented Mar 10, 2021

Yeah, but vendoring that would still be unpleasant. Adding a runtime dependency for this is also hard to justify..

pkg_resources can be very slow to import, although this depends on the
details of the python setup.  On my machine, this PR speeds up `import
sklearn` (with no dev versions of anything other than sklearn installed)
by ~33%, from ~550ms to ~360ms.
@anntzer anntzer force-pushed the delaypkgresources branch from a62d442 to eb68443 Compare March 10, 2021 22:31
@anntzer
Copy link
Contributor Author
anntzer commented Mar 10, 2021

(I haven't added a test yet because I agree the uncomparability between the two kinds of versions is probably more a dealbreaker that need to be resolved first...)

@thomasjpfan
Copy link
Member

Yeah, but vendoring that would still be unpleasant. Adding a runtime dependency for this is also hard to justify..

I think our options are a bit limited. (setuptools vendors packaging https://github.com/pypa/setuptools/tree/main/pkg_resources/_vendor/packaging).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid importing pkg_resources to speed up import
4 participants
0