-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MNT Bump min Python version to 3.10 and dependencies #30895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2f613a8
to
5b77b05
Compare
Do you mean that numpy 1.21.0 doesn't have Python 3.10 wheels ?
I think we can be less conservative with optional dependencies. Their bump have a much lower impact.
This one is the most problematic to me because we bumped from 20.04 to 22.04 not that long ago and 22.04 is still used a lot. I don't know however how much users rely on the system numpy, scipy, ... |
For the record, we didn't "forget" 😄 Nobody expressed a necessity/desire to do so, and it didn't allow that much cleaning so there was no real incentive to do it. |
Another consideration before merging this: So far we've been backporting all CI stuff in the stable branch (1.6.X currently) to keep both synced. If we merge this before the last 1.6 bug-fix release before 1.7, then all CI fixes/updates won't be backportable anymore and both CIs will diverge. I think it's fine because we use lock files so the CI in the stable branch shouldn't stop working suddenly as it used to. |
Indeed 1.21.0 doesn't support Python 3.10. 1.21.2 is the oldest numpy release that supports Python 3.10, see for example https://pypi.org/project/numpy/1.21.2/ and it also has Python 3.10 wheels.
I went for a minor (rather than bugfix) version because I find it simpler, it allows to do more clean-ups (while not being that different in terms of timing, a few months generally), and this is what SPEC0 does too. Of course this is one of the many variants that is up to discussion 😉.
Yeah I could have guessed this maybe 😉. In my opinion, the Ubuntu rule was a historical artifact that was useful as a guiding principle but I would advocate we should not be influenced by older Ubuntu packaged numpy package versions, i.e. testing on Ubuntu 24.04 is perfectly fine:
How many of the Ubuntu users are on Ubuntu 22.04 rather than a more recent version? How many of them wants to use system numpy rather than create a venv with only Python and pip install in it? How many of them are happy with system numpy and system scipy but strongly insist on having a more recent version of scikit-learn than system scikit-learn 2 ? Honestly at this point I think a very small number of our users may care about this. Footnotes
|
70e9bd8
to
8a34dcc
Compare
I removed the cleanups due to bumping numpy and scipy to make the PR easier to review and the discussion centered around deciding on our bumping policy. Once this is decided, I will open a separate PR with the clean-ups. |
I dont know what my opinion on the Ubuntu version based "policy" is. I think we could have a lot more "ubuntu" users than we estimate from the website because people use windows/mac on their laptop but deploy their models/run them in docker containers that are based on ubuntu. (I heard that 2026 is the year of linux on the desktop) But probably/hopefully they use something like venv/conda envs in the container. There is also https://hub.docker.com/_/python which I've used in the past as base images for "deployment". These are based on debian (or alpine). Which is even more conservative than ubuntu (I think). Here I did just install things with pip and used the system python, because I didn't want to add more layers of "stuff". The reason I picked this as a base image was because it was a well maintained, trust worthy image. But I did not rely on the OS provided packages, so maybe this is all irrelevant (up to Python version compatibility) |
Maybe the way to think about "ubuntu versions" support is that it isn't strictly about supporting people who are literally using ubuntu, but about supporting people who are stuck with old versions on systems where they can't easily update them (clusters, grid computing, and various other centrally administered systems for scientists). |
So trying to take a step back (I think talking about Ubuntu is a bit of a distraction, I should not have taken the bait 😅), the main discussion point until now seems to be between:
Points for 1.: It seems a slighly simpler rule (one less digit to care about), is not that different in terms of timing (a few months I could redo the plan with bugfix versions from #30888 to see the impact), allows to get rid of sklearn.utils.fixes a bit quicker. If we think longer-term this is also more consistent with SPEC0 which only talks about minor versions (i.e. Points for 2.: it's less constraining for the user and more inline to what we have done historically. Let's wait for more opinions on this ! Opinion counts
|
I'm fine with |
I prefer 2 because it is more inline with what we have done historically. |
Here is a comparison of plans Plan 1.: oldest minor (
|
scikit-learn | scikit-learn-date | numpy | numpy-date-diff | scipy | scipy-date-diff | pandas | pandas-date-diff |
---|---|---|---|---|---|---|---|
1.7 | 2025-06-01 | 1.22.0 | 3 years 5 months | 1.8.0 | 3 years 4 months | 1.4.0 | 3 years 4 months |
1.8 | 2025-12-01 | 1.24.0 | 2 years 12 months | 1.11.0 | 2 years 5 months | 2.0.0 | 2 years 8 months |
1.9 | 2026-06-01 | 1.24.0 | 3 years 6 months | 1.11.0 | 2 years 11 months | 2.0.0 | 3 years 2 months |
1.10 | 2026-12-01 | 1.26.0 | 3 years 3 months | 1.12.0 | 2 years 10 months | 2.2.0 | 2 years 10 months |
Plan 2.: oldest bugfix X.Y.Z
with Python wheels
scikit-learn | scikit-learn-date | numpy | numpy-date-diff | scipy | scipy-date-diff | pandas | pandas-date-diff |
---|---|---|---|---|---|---|---|
1.7 | 2025-06-01 | 1.21.2 | 3 years 10 months | 1.8 | 3 years 4 months | 1.3.4 | 3 years 8 months |
1.8 | 2025-12-01 | 1.23.3 | 3 years 3 months | 1.10.1 | 2 years 10 months | 1.5.2 | 3 years 0 months |
1.9 | 2026-06-01 | 1.23.3 | 3 years 9 months | 1.10.1 | 3 years 3 months | 1.5.2 | 3 years 6 months |
1.10 | 2026-12-01 | 1.26.0 | 3 years 3 months | 1.11.3 | 3 years 2 months | 2.2.0 | 2 years 10 months |
Quick Analysis
This changes only by a few (3-5) months. I would guess that oldest release are a bit more affected because I think the numpy/scipy cadence has increased since.
For example scipy is a bit less than 3 years:
- December 2025 release: plan 1 is 2 years 5 months (+5 months for plan 2)
- December 2026 release: 2 years 10 months (+4 months for plan 2).
To be honest if you look at #30888 (comment) in scikit-learn 1.1 bump 1, we have done 2 years 6 months for scipy and I don't remember anyone complaining about this ...
I am going to cc @scikit-learn/core-devs to have more opinion on this.
Footnotes
-
our more aggressive bump in scikit-learn history since even min Python was only 2 years 7 months old, but again I don't remember anyone complaining about this ... ↩
all these options are nice enough to me. I'm happy either way. |
plan 1 is easier to explain to me. I'll not get upset if we go with any plan thought. I just want that we converge because it will be a net improvement either way :) |
I also care more about us making a choice for either 1 or 2 than I do about which one we choose. |
I would either vote for plan 2 or to plan 1 without the ".0" to imply that we support the latest Z value for a fixed
Supporting .0 might not be possible because the .0 release of a dependency might be broken at the time we produce a given scikit-learn release. |
cc @crusaderky FYI, we can remove the 3.9 shims in |
I'm happy with Plan 1.B: #30895 (comment) |
So the consensus at the biweekly scikit-learn meeting seems to be the following: plan 1 ( Amongst other things, plan 1 is what this PR is doing, so reviews would be more than welcome 🙏! |
"polars": ("0.20.30", "docs, tests"), | ||
"pyarrow": ("12.0.0", "tests"), | ||
"sphinx": ("7.3.7", "docs"), | ||
"sphinx-copybutton": ("0.5.2", "docs"), | ||
"sphinx-gallery": ("0.17.1", "docs"), | ||
"numpydoc": ("1.2.0", "docs, tests"), | ||
"Pillow": ("7.1.2", "docs"), | ||
"Pillow": ("8.4.0", "docs"), | ||
"pooch": ("1.6.0", "docs, examples, tests"), | ||
"sphinx-prompt": ("1.4.0", "docs"), | ||
"sphinxext-opengraph": ("0.9.1", "docs"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also bump all the other dependencies (e.g. tests
and docs
similarly), or shall we do this on a per-need basis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we write a script to automate min version bumping based on the planned date of the next scikit-learn winter release (to be run after the summer release)? Maybe for a follow-up PR ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also bump all the other dependencies (e.g. tests and docs similarly), or shall we do this on a per-need basis?
Right now this PR is only bumps compiled dependencies. I didn't look at pure Python dependencies, but this can be done in a further PR with the agreed rule "take the release that is a bit more than 2 years old" i.e. roughly the same as SPEC0.
In general I would argue we should try to apply the same rule everywhere, unless there is a good reason to be bump more aggressively for test and docs (e.g. an annoying bug in pytest, or a matplotlib change that makes an example significantly more complicated).
Shall we write a script to automate min version bumping based on the planned date of the next scikit-learn winter release (to be run after the summer release)? Maybe for a follow-up PR ;)
Why not 😉. I was thinking about writing one for the Python + compiled dependency rule because I had been doing it by hand until now and this is a bit cumbersome.
There was a problem hiding this comment.
Choose 10000 a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel there you go a hacky script that does the job today with plenty of room for improvements 😉
Here is the output for a release in November 2025:
For future release at date 2025-11-01
- python: 3.10 -> 3.11
- numpy: 1.22.0 -> 1.24.0
- scipy: 1.8.0 -> 1.10.0
- pandas: 1.4.0 -> 1.5.0
- matplotlib: 3.5.0 -> 3.6.0
- joblib: 1.2.0 -> 1.3.0
- threadpoolctl: 3.1.0 -> 3.2.0
This made me notice that pyamg was updated to aggressively because I checked the python version metadata in the trove classifier and not the wheels availability.
thanks all! |
Other locations where the Python version should be bumped accordingly include:
|
Split off from #30888.
Since we forgot to bump our dependencies in 1.6, this bumps up Python min version to 3.10 and dependencies following the plan from #30888. This is a good test to see in practice what it actually means and how people feel about it 😉.
A few things to note:
utils.fixes
are based on minor version comparison not bugfix), following what SPEC0 does.CI results
pytest sklearn/datasets
after removing the fetch_20newsgroups and fetch_lfw datasets and they ran fine locally