8000 Avoid importing pkg_resources to speed up import · Issue #19098 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Avoid importing pkg_resources to speed up import #19098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anntzer opened this issue Jan 3, 2021 · 3 comments · Fixed by #19826
Closed

Avoid importing pkg_resources to speed up import #19098

anntzer opened this issue Jan 3, 2021 · 3 comments · Fixed by #19826

Comments

@anntzer
Copy link
Contributor
anntzer commented Jan 3, 2021

Describe the bug

Since #17670, sklearn unconditionally imports pkg_resources. Such an import can be very slow (pypa/setuptools#926), so it should be avoided if possible.

Given that the point of #17670 was to improve support for prereleases, a possible fix could be for example to delay importing pkg_resources until needed, e.g.

def parse_version(v):
    if {*v} <= {*"1234567890."}:
        return LooseVersion(v)
    else:
        import pkg_resources
        return pkg_resources.parse_version(v)

Steps/Code to Reproduce

Expected Results

Actual Results

N/A

Versions

System:
    python: 3.9.1 (default, Dec 13 2020, 11:55:53)  [GCC 10.2.0]
executable: /usr/bin/python
   machine: Linux-5.9.14-arch1-1-x86_64-with-glibc2.32

Python dependencies:
          pip: 20.3.3
   setuptools: 51.1.1
      sklearn: 0.23.2
        numpy: 1.19.4
        scipy: 1.5.4
       Cython: 0.29.21
       pandas: 1.1.5
   matplotlib: 3.3.2.post2048+g263120690
       joblib: 1.0.0
threadpoolctl: 2.1.0

Built with OpenMP: True
@glemaitre
Copy link
Member

It could be an improvement. I don't know if this is currently dramatic since we mainly import parse_version in the test (but we might import utils.fixes that will load pkg_resources)

@venaturum
Copy link

Would it be worth trying importlib.metadata first and catching exception (for python < 3.8) before trying pkg_resources?
https://docs.python.org/3/library/importlib.metadata.html

@anntzer
Copy link
Contributor Author
anntzer commented Jan 4, 2021

I don't know if this is currently dramatic since we mainly import parse_version in the test (but we might import utils.fixes that will load pkg_resources)

But a plain import sklearn already imports sklearn.utils.fixes... (check e.g. with python -c 'import sys, sklearn; print("sklearn.utils.fixes" in sys.modules)').

Would it be worth trying importlib.metadata first

importlib.metadata.version returns a string, so it doesn't help here as the point is the need to parse the version string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
0