8000 MNT Bump min Python version to 3.10 and dependencies by lesteve · Pull Request #30895 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MNT Bump min Python version to 3.10 and dependencies #30895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Mar 18, 2025

Conversation

lesteve
Copy link
Member
@lesteve lesteve commented Feb 25, 2025

Split off from #30888.

Since we forgot to bump our dependencies in 1.6, this bumps up Python min version to 3.10 and dependencies following the plan from #30888. This is a good test to see in practice what it actually means and how people feel about it 😉.

A few things to note:

  • the minimum numpy, scipy etc ... using a minor version and not a bugfix, i.e. pick numpy 1.22.0 rather than numpy 1.21.2 (which also has Python 3.10 wheels). numpy 1.22.0 was released 2021-12-31, i.e. 4 months after numpy 1.21.2 2021-08-15. Are we fine with this? The main reasons behind this: a simpler rule, be able to clean-up things more easily (generally utils.fixes are based on minor version comparison not bugfix), following what SPEC0 does.
  • for pyam 8000 g which is a non pure-Python dependency, the minor version rule requires to bump to 5.0.0 (released 2023-04-17) so more than one year after 4.2.2 (released 2022-02-07) which is the first bugfix with Python 3.10 wheels. I think it is kind of acceptable since in June pyamg 5.0.0 will be a bit more than 2 years old and it is not a core dependency but happy to hear other opinions. This may need a refinement for the rule about non-core dependencies 😓.
  • Ubuntu needs to be bumped to 24.04. Ubuntu 22.04 does not have recent enough versions of numpy. It has numpy 1.21.2 and we require numpy 1.22.

CI results

  • Full build log passed in 38836eb
  • wheels build passed in 3363c23
  • the codecov red status is because tests downloading data are skipped in PRs, I tested locally pytest sklearn/datasets after removing the fetch_20newsgroups and fetch_lfw datasets and they ran fine locally

Copy link
github-actions bot commented Feb 25, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 32a8b0c. Link to the linter CI: here

@lesteve lesteve marked this pull request as ready for review February 25, 2025 09:30
@jeremiedbb
Copy link
Member

using a minor version and not a bugfix, i.e. pick numpy 1.22.0 rather than numpy 1.21.2 (which also has Python 3.10 wheels). numpy 1.22.0 was released 2021-12-31, i.e. 4 months after numpy 1.21.2 2021-08-15. Are we fine with this?

Do you mean that numpy 1.21.0 doesn't have Python 3.10 wheels ?
In the past, we haven't always set a minor version as our min dependency. For instance numpy min is currently 1.19.5. I don't really see the motivation for enforcing a minor as min version. (besides that 1.22 allows to get rid of more fixes 😄 )

for pyamg which is a non pure-Python dependency, the minor version rule requires to bump to 5.0.0 (released 2023-04-17) so more than one year after 4.2.2 (released 2022-02-07) which is the first bugfix with Python 3.10 wheels.

I think we can be less conservative with optional dependencies. Their bump have a much lower impact.

Ubuntu needs to be bumped to 24.04. Ubuntu 22.04 does not have recent enough versions of numpy. It has numpy 1.21.2 and we require numpy 1.22.

This one is the most problematic to me because we bumped from 20.04 to 22.04 not that long ago and 22.04 is still used a lot. I don't know however how much users rely on the system numpy, scipy, ...
If we set 1.21.2 as our min dep, this issue goes away :)

@jeremiedbb
Copy link
Member
jeremiedbb commented Feb 25, 2025

Since we forgot to bump our dependencies in 1.6

For the record, we didn't "forget" 😄 Nobody expressed a necessity/desire to do so, and it didn't allow that much cleaning so there was no real incentive to do it.

@jeremiedbb
Copy link
Member

Another consideration before merging this:

So far we've been backporting all CI stuff in the stable branch (1.6.X currently) to keep both synced. If we merge this before the last 1.6 bug-fix release before 1.7, then all CI fixes/updates won't be backportable anymore and both CIs will diverge.

I think it's fine because we use lock files so the CI in the stable branch shouldn't stop working suddenly as it used to.

@lesteve
Copy link
Member Author
lesteve commented Feb 25, 2025

Do you mean that numpy 1.21.0 doesn't have Python 3.10 wheels ?

Indeed 1.21.0 doesn't support Python 3.10. 1.21.2 is the oldest numpy release that supports Python 3.10, see for example https://pypi.org/project/numpy/1.21.2/ and it also has Python 3.10 wheels.

In the past, we haven't always set a minor version as our min dependency. For instance numpy min is currently 1.19.5. I don't really see the motivation for enforcing a minor as min version. (besides that 1.22 allows to get rid of more fixes 😄 )

I went for a minor (rather than bugfix) version because I find it simpler, it allows to do more clean-ups (while not being that different in terms of timing, a few months generally), and this is what SPEC0 does too. Of course this is one of the many variants that is up to discussion 😉.

This one is the most problematic to me because we bumped from 20.04 to 22.04 not that long ago and 22.04 is still used a lot. I don't know however how much users rely on the system numpy, scipy, ...

Yeah I could have guessed this maybe 😉. In my opinion, the Ubuntu rule was a historical artifact that was useful as a guiding principle but I would advocate we should not be influenced by older Ubuntu packaged numpy package versions, i.e. testing on Ubuntu 24.04 is perfectly fine:

  • a small number of our users use Ubuntu 1
  • I would argue that it is a lot less relevant now than before because people will create a venv with only Python and pip install everything in it. It may have been useful at one point before wheels, since it was easier to use system numpy than compile from source.

How many of the Ubuntu users are on Ubuntu 22.04 rather than a more recent version? How many of them wants to use system numpy rather than create a venv with only Python and pip install in it? How many of them are happy with system numpy and system scipy but strongly insist on having a more recent version of scikit-learn than system scikit-learn 2 ? Honestly at this point I think a very small number of our users may care about this.

Footnotes

  1. according to the website stats 2.4% users are on Ubuntu (not sure how much reliable this is so maybe add ~6% of Linux users) so let's say 10%. You need to click on OS on the bottom right panel to see these numbers.

  2. 0.23.2 if you are curious, which is quite surprising/impressive in itself because 1.1.0 is our first release that supports Python 3.10 🤔

@lesteve
Copy link
Member Author
lesteve commented Feb 26, 2025

I removed the cleanups due to bumping numpy and scipy to make the PR easier to review and the discussion centered around deciding on our bumping policy.

Once this is decided, I will open a separate PR with the clean-ups.

@betatim
Copy link
Member
betatim commented Feb 26, 2025

I dont know what my opinion on the Ubuntu version based "policy" is. I think we could have a lot more "ubuntu" users than we estimate from the website because people use windows/mac on their laptop but deploy their models/run them in docker containers that are based on ubuntu. (I heard that 2026 is the year of linux on the desktop) But probably/hopefully they use something like venv/conda envs in the container.

There is also https://hub.docker.com/_/python which I've used in the past as base images for "deployment". These are based on debian (or alpine). Which is even more conservative than ubuntu (I think). Here I did just install things with pip and used the system python, because I didn't want to add more layers of "stuff". The reason I picked this as a base image was because it was a well maintained, trust worthy image. But I did not rely on the OS provided packages, so maybe this is all irrelevant (up to Python version compatibility)

@betatim
Copy link
Member
betatim commented Feb 26, 2025

Maybe the way to think about "ubuntu versions" support is that it isn't strictly about supporting people who are literally using ubuntu, but about supporting people who are stuck with old versions on systems where they can't easily update them (clusters, grid computing, and various other centrally administered systems for scientists).

@lesteve
Copy link
Member Author
lesteve commented Feb 27, 2025

So trying to take a step back (I think talking about Ubuntu is a bit of a distraction, I should not have taken the bait 😅), the main discussion point until now seems to be between:

  1. numpy 1.22.0 = oldest numpy minor version (i.e. X.Y) that supports Python 3.10
  2. numpy 1.21.2 = oldest numpy bugfix version (i.e. X.Y.Z) that supports Python 3.10

Points for 1.: It seems a slighly simpler rule (one less digit to care about), is not that different in terms of timing (a few months I could redo the plan with bugfix versions from #30888 to see the impact), allows to get rid of sklearn.utils.fixes a bit quicker. If we think longer-term this is also more consistent with SPEC0 which only talks about minor versions (i.e. X.Y not X.Y.Z).

Points for 2.: it's less constraining for the user and more inline to what we have done historically.

Let's wait for more opinions on this !

Opinion counts

@jeremiedbb
Copy link
Member

I'm fine with 1. I was just asking because it was not the rule we used to apply most of the time so I needed convincing arguments to change. Your arguments convinced me :)

@thomasjpfan
Copy link
Member

I prefer 2 because it is more inline with what we have done historically.

@lesteve
Copy link
Member Author
lesteve commented Feb 28, 2025

Here is a comparison of plans

Plan 1.: oldest minor (X.Y) release with Python wheels

scikit-learn scikit-learn-date numpy numpy-date-diff scipy scipy-date-diff pandas pandas-date-diff
1.7 2025-06-01 1.22.0 3 years 5 months 1.8.0 3 years 4 months 1.4.0 3 years 4 months
1.8 2025-12-01 1.24.0 2 years 12 months 1.11.0 2 years 5 months 2.0.0 2 years 8 months
1.9 2026-06-01 1.24.0 3 years 6 months 1.11.0 2 years 11 months 2.0.0 3 years 2 months
1.10 2026-12-01 1.26.0 3 years 3 months 1.12.0 2 years 10 months 2.2.0 2 years 10 months

Plan 2.: oldest bugfix X.Y.Z with Python wheels

scikit-learn scikit-learn-date numpy numpy-date-diff scipy scipy-date-diff pandas pandas-date-diff
1.7 2025-06-01 1.21.2 3 years 10 months 1.8 3 years 4 months 1.3.4 3 years 8 months
1.8 2025-12-01 1.23.3 3 years 3 months 1.10.1 2 years 10 months 1.5.2 3 years 0 months
1.9 2026-06-01 1.23.3 3 years 9 months 1.10.1 3 years 3 months 1.5.2 3 years 6 months
1.10 2026-12-01 1.26.0 3 years 3 months 1.11.3 3 years 2 months 2.2.0 2 years 10 months

Quick Analysis

This changes only by a few (3-5) months. I would guess that oldest release are a bit more affected because I think the numpy/scipy cadence has increased since.

For example scipy is a bit less than 3 years:

  • December 2025 release: plan 1 is 2 years 5 months (+5 months for plan 2)
  • December 2026 release: 2 years 10 months (+4 months for plan 2).

To be honest if you look at #30888 (comment) in scikit-learn 1.1 bump 1, we have done 2 years 6 months for scipy and I don't remember anyone complaining about this ...

I am going to cc @scikit-learn/core-devs to have more opinion on this.

Footnotes

  1. our more aggressive bump in scikit-learn history since even min Python was only 2 years 7 months old, but again I don't remember anyone complaining about this ...

@adrinjalali
Copy link
Member

all these options are nice enough to me. I'm happy either way.

@glemaitre
Copy link
Member

plan 1 is easier to explain to me.

I'll not get upset if we go with any plan thought. I just want that we converge because it will be a net improvement either way :)

@betatim
Copy link
Member
betatim commented Mar 4, 2025

I also care more about us making a choice for either 1 or 2 than I do about which one we choose.

@ogrisel
Copy link
Member
ogrisel commented Mar 4, 2025

I would either vote for plan 2 or to plan 1 without the ".0" to imply that we support the latest Z value for a fixed X.Y and any given time:

  • Plan 1.b: newest (X.Y.Z) bugfix of the oldest fixed minor release with Python wheels

Supporting .0 might not be possible because the .0 release of a dependency might be broken at the time we produce a given scikit-learn release.

@lucascolley
Copy link
Contributor

cc @crusaderky FYI, we can remove the 3.9 shims in xpx once this merges

@thomasjpfan
Copy link
Member

I'm happy with Plan 1.B: #30895 (comment)

Copy link
Member Author
lesteve commented Mar 17, 2025

So the consensus at the biweekly scikit-learn meeting seems to be the following: plan 1 (X.Y version for dependencies) with the disclaimer that X.Y.0 may not be supported if it is too much of an effort for us, for example if there is a critical bugfix or security fix in X.Y.Z we may decide to require X.Y.Z.

Amongst other things, plan 1 is what this PR is doing, so reviews would be more than welcome 🙏!

"polars": ("0.20.30", "docs, tests"),
"pyarrow": ("12.0.0", "tests"),
"sphinx": ("7.3.7", "docs"),
"sphinx-copybutton": ("0.5.2", "docs"),
"sphinx-gallery": ("0.17.1", "docs"),
"numpydoc": ("1.2.0", "docs, tests"),
"Pillow": ("7.1.2", "docs"),
"Pillow": ("8.4.0", "docs"),
"pooch": ("1.6.0", "docs, examples, tests"),
"sphinx-prompt": ("1.4.0", "docs"),
"sphinxext-opengraph": ("0.9.1", "docs"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also bump all the other dependencies (e.g. tests and docs similarly), or shall we do this on a per-need basis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we write a script to automate min version bumping based on the planned date of the next scikit-learn winter release (to be run after the summer release)? Maybe for a follow-up PR ;)

Copy link
Member Author
@lesteve lesteve Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also bump all the other dependencies (e.g. tests and docs similarly), or shall we do this on a per-need basis?

Right now this PR is only bumps compiled dependencies. I didn't look at pure Python dependencies, but this can be done in a further PR with the agreed rule "take the release that is a bit more than 2 years old" i.e. roughly the same as SPEC0.

In general I would argue we should try to apply the same rule everywhere, unless there is a good reason to be bump more aggressively for test and docs (e.g. an annoying bug in pytest, or a matplotlib change that makes an example significantly more complicated).

Shall we write a script to automate min version bumping based on the planned date of the next scikit-learn winter release (to be run after the summer release)? Maybe for a follow-up PR ;)

Why not 😉. I was thinking about writing one for the Python + compiled dependency rule because I had been doing it by hand until now and this is a bit cumbersome.

Copy link
Member Author

Choose 10000 a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel there you go a hacky script that does the job today with plenty of room for improvements 😉

Here is the output for a release in November 2025:

For future release at date 2025-11-01
- python: 3.10 -> 3.11
- numpy: 1.22.0 -> 1.24.0
- scipy: 1.8.0 -> 1.10.0
- pandas: 1.4.0 -> 1.5.0
- matplotlib: 3.5.0 -> 3.6.0
- joblib: 1.2.0 -> 1.3.0
- threadpoolctl: 3.1.0 -> 3.2.0

This made me notice that pyamg was updated to aggressively because I checked the python version metadata in the trove classifier and not the wheels availability.

@ogrisel ogrisel merged commit a76b029 into scikit-learn:main Mar 18, 2025
35 checks passed
@lesteve lesteve deleted the bump-python-3.10 branch March 18, 2025 15:30
@lucascolley
Copy link
Contributor

thanks all!

@DimitriPapadopoulos
Copy link
Contributor

Other locations where the Python version should be bumped accordingly include:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0