10000 RFC Write an explicit rule about bumping our minimum dependencies · Issue #30888 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

RFC Write an explicit rule about bumping our minimum dependencies #30888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lesteve opened this issue Feb 24, 2025 · 16 comments · Fixed by #31118
Closed

RFC Write an explicit rule about bumping our minimum dependencies #30888

lesteve opened this issue Feb 24, 2025 · 16 comments · Fixed by #31118
Labels

Comments

@lesteve
Copy link
Member
lesteve commented Feb 24, 2025

Roughly a year ago, SPEC0 was rejected following a vote and we said we would write our own rule, but we did not 😅.

Until now 💪.

This was spurred by a Discord discussion with @lucascolley, @betatim, @jeremiedbb and @ogrisel.

cc @glemaitre whom I had a chat with about this.

Proposed rule

  • Python: in each scikit-learn December release, we bump our minimum supported Python to the Python version that was released a bit more than 3 years ago (Python releases happened yearly beginning of October).
  • non pure-Python dependencies (numpy, scipy, pandas, etc ...): in each December release they are bumped to the minimum minor release that has wheels for the minimum Python version.
  • pure Python dependencies: in each release (December and June) bump to the most recent minor release older than 2 years old
  • we expect that exceptions may arise, although hopefully not too often, for example security or critical bug fixes

Rationale

  • we want a simple rule
  • we don't want to be even more conservative that what we have been doing historically
  • in an ideal world, we would want to try to avoid requiring newer versions if there is not a "good reason" too, although there is some tension between having a "simple rule" and this bullet point

Proposed plan

we didn't bump our dependency in 1.6 so we would bump them in 1.7 (June 2025) and start the regular December version bump in December 2025.

This is what it would look like for the next 4 scikit-learn releases (python-date-diff column is the age of the min Python at the time of the scikit-learn release, and similarly for other dependencies).

Python:

  scikit-learn scikit-learn-date python  python-date-diff
0          1.7        2025-06-01   3.10          3.660274
1          1.8        2025-12-01   3.11          3.106849
2          1.9        2026-06-01   3.11          3.605479
3         1.10        2026-12-01   3.12          3.167123

non pure-Python dependencies:

  scikit-learn scikit-learn-date numpy  numpy-date-diff scipy  scipy-date-diff pandas  pandas-date-diff
0          1.7        2025-06-01  1.22         3.416438   1.8         3.317808    1.4          3.356164
1          1.8        2025-12-01  1.24         2.953425  1.11         2.435616    2.0          2.663014
2          1.9        2026-06-01  1.24         3.452055  1.11         2.934247    2.0          3.161644
3         1.10        2026-12-01  1.26         3.208219  1.12         2.863014    2.2          2.863014

pure Python dependencies:

  scikit-learn scikit-learn-date joblib  joblib-date-diff threadpoolctl  threadpoolctl-date-diff
0          1.7        2025-06-01    1.2          2.706849           3.1                 3.331507
1          1.8        2025-12-01    1.3          2.427397           3.2                 2.386301
2          1.9        2026-06-01    1.4          2.145205           3.5                 2.087671
3         1.10        2026-12-01    1.4          2.646575           3.5                 2.589041

Comparison to SPEC0

  • Python: the proposed rule corresponds to dropping Python support after 4 years (instead of 3 years for SPEC0)
  • non pure-Python dependencies: This corresponds to dropping support for a version that is a bit more than 3 years old (rather than 2 years for SPEC0).
  • pure-Python dependencies: same as SPEC0

Note: SPEC0 talks about the age of the dropped dependency release, which I find slightly harder to think about than the age of the minimum depency release. The age of the dropped dependency release also depends on the dependency release cadence. This does not matter than much for dependencies with regular releases like numpy, but makes more of a difference for dependencies with less regular releases, like joblib.

"What is our implicit rule?" investigation

tl; dr: very ad-hoc and sometimes more adventurous than the proposed rule

I used a quick and dirty script to figure out what we have been doing historically.

Python:

  scikit-learn python  python-date-diff
0         0.21    3.5          3.657534
1         0.22    3.5          4.224658
2         0.23    3.6          3.386301
3          1.0    3.7          3.246575
4          1.1    3.8          2.578082
5          1.2    3.8          3.153425
6          1.3    3.8          3.712329
7          1.4    3.9          3.287671
8          1.5    3.9          3.627397
9          1.6    3.9          4.180822

the proposed rule means our minimum Python is a bit more than 3 years old, which fits what we have done historically. Note that scikit-learn 1.1 dropped 3.7 quite aggressively (#22617 does not mention any particular reason), and that we did not bump our dependency for 1.6.

numpy:

  scikit-learn python   numpy  numpy-date-diff
0         0.21    3.5  1.11.0         3.117808
1         0.22    3.5  1.11.0         3.684932
2         0.23    3.6  1.13.3         2.616438
3          1.0    3.7  1.14.6         3.002740
4          1.1    3.8  1.17.3         2.567123
5          1.2    3.8  1.17.3         3.142466
6          1.3    3.8  1.17.3         3.701370
7          1.4    3.9  1.19.5         3.032877
8          1.5    3.9  1.19.5         3.372603
9          1.6    3.9  1.19.5         3.926027

roughly 3 years in terms of time-based rule, which roughly matches what the proposed rule would do. Note the aggressive bump in 1.1.

scipy:

  scikit-learn python   scipy  scipy-date-diff
0         0.21    3.5  0.17.0         3.293151
1         0.22    3.5  0.17.0         3.860274
2         0.23    3.6  0.19.1         2.887671
3          1.0    3.7   1.1.0         3.389041
4          1.1    3.8   1.3.2         2.504110
5          1.2    3.8   1.3.2         3.079452
6          1.3    3.8   1.5.0         3.021918
7         
8000
 1.4    3.9   1.6.0         3.046575
8          1.5    3.9   1.6.0         3.389041
9          1.6    3.9   1.6.0         3.942466

roughly 3 years in terms of time-based rule, which roughly matches what the proposed rule would do. Note the aggressive bump in 1.1.

pandas:

  scikit-learn python  pandas  scipy-date-diff
0         0.21    3.5     NaN         3.293151
1         0.22    3.5     NaN         3.860274
2         0.23    3.6     NaN         2.887671
3          1.0    3.7  0.25.0         3.389041
4          1.1    3.8   1.0.5         2.504110
5          1.2    3.8   1.0.5         3.079452
6          1.3    3.8   1.0.5         3.021918
7          1.4    3.9   1.1.5         3.046575
8          1.5    3.9   1.1.5         3.389041
9          1.6    3.9   1.1.5         3.942466

roughly 3 years in terms of time-based rule, which roughly matches what the proposed rule would do. Note the aggressive bump in 1.1.

joblib:

  scikit-learn python joblib  joblib-date-diff
0         0.21    3.5   0.11          2.186301
1         0.22    3.5   0.11          2.756164
2         0.23    3.6   0.11          3.197260
3          1.0    3.7   0.11          4.567123
4          1.1    3.8  1.0.0          1.408219
5          1.2    3.8  1.1.1          0.161644
6          1.3    3.8  1.1.1          0.717808
7          1.4    3.9  1.2.0          1.336986
8          1.5    3.9  1.2.0          1.679452
9          1.6    3.9  1.2.0          2.232877

the proposed rule (2 years) roughly corresponds to what we have been doing when deciding to bump. Note:

  • aggressive bumping in 1.1: joblib min version was less than 1.5 year old
  • bumping in 1.2 was due to a security bug-fix
  scikit-learn python threadpoolctl  threadpoolctl-date-diff
0         0.21    3.5           NaN                      NaN
1         0.22    3.5           NaN                      NaN
2         0.23    3.6           NaN                      NaN
3          1.0    3.7         2.0.0                 1.802740
4          1.1    3.8         2.0.0                 2.432877
5          1.2    3.8         2.0.0                 3.008219
6          1.3    3.8         2.0.0                 3.567123
7          1.4    3.9         2.0.0                 4.120548
8          1.5    3.9         3.1.0                 2.304110
9          1.6    3.9         3.1.0                 2.854795

The proposed rule (2 years) roughly corresponds to what we have been doing historically when deciding to bump.

@github-actions github-actions bot added the Needs Triage Issue requires triage label Feb 24, 2025
@lesteve lesteve changed the title RFC Writing an explicit rule about upgrading our dependency RFC Write an explicit rule about upgrading our dependencies Feb 24, 2025
@lesteve lesteve changed the title RFC Write an explicit rule about upgrading our dependencies RFC Write an explicit rule about bumping our minimum dependencies Feb 24, 2025
@glemaitre
Copy link
Member
glemaitre commented Feb 24, 2025

Speaking with @lesteve in real life and looking at the proposal, I'm fine with it.

I really think that there is value in having a statement instead of going back each time we need to move forward with version bumping. To me, it is a strong proposal because:

  • it makes a retrospective and applies rules that we implicitly followed in the past;
  • the fact that it looks at other libraries and their availability for a specific Python version is pragmatic: if a user tries to install a version not available with wheels, the experience will not be great. So it makes total sense to me because this rational argument is based on user experience.

@betatim
Copy link
Member
betatim commented Feb 24, 2025

I like the proposed schedule.

I think we should formalise this as quickly as reasonable and then make a blog post announcing the changes that are coming for 1.7. We could also announce the general policy, but maybe in a separate post to give room to show the policy, the retrospective, etc

Note: SPEC0 talks about the age of the dropped dependency release, which I find slightly harder to think about than the age of the minimum depency release.

Can you explain this again? Though maybe it isn't worth it because I like the proposal, even without understanding this difference.

@jeremiedbb jeremiedbb added RFC and removed Needs Triage Issue requires triage labels Feb 24, 2025
@jeremiedbb
Copy link
Member

pure Python dependencies: in each release (December and June) bump to the most recent minor release older than 2 years old

What do you think about bumping pure python deps only in november (it's usually december but the goal is november 😄 ) as well ?
This way the rule is even simpler while not keeping old dependencies much more. It also makes the may release simpler.

@adrinjalali
Copy link
Member

I'm happy with this proposal, and we should add a note to our "how to release doc" to make sure we don't forget.

cc @scikit-learn/core-devs for visibility.

@thomasjpfan
Copy link
Member
thomasjpfan commented Feb 24, 2025

I am happy with the proposal. Can we formalize this proposal with a quick SLEP? This way users know the schedule and other projects can see an alternative to SPEC0.

Edit: I meant a SLEP (scikit-learn enhancement proposal)

@lesteve
Copy link
Member Author
lesteve commented Feb 25, 2025

So as discussed during the monthly meeting, the soft consensus seemed to be:

  • bump up to Python 3.10 and dependencies in main now, since we forgot to do it for 1.6. I opened a PR about this MNT Bump min Python version to 3.10 and dependencies #30895
  • since this is more or less writing down something we kind of have done historically, a SPEC may not be required. @thomasjpfan slightly leans towards a quick SLEP.
  • if not a SLEP, it would be nice to have it somewhere we can point people to: doc and/org blog post
  • @ogrisel said it could be called SPEC0+1, which is roughly the case for Python and non pure-Python dependencies, it is similar SPEC0 for pure Python dependencies.

@lesteve
Copy link
Member Author
lesteve commented Feb 26, 2025

What do you think about bumping pure python deps only in november (it's usually december but the goal is november 😄 ) as well ?
This way the rule is even simpler while not keeping old dependencies much more. It also makes the may release simpler.

One of the many variants to be discussed 😉. At the meeting @ogrisel suggested bumping numpy, scipy etc ... both in December and in June, whereas I initially went for only in December where you bump Python to be more conservative.

I'll wait for more feed-back and try to identify the kind of variants we have ...

@betatim
Copy link
Member
betatim commented Feb 26, 2025

I think we should write a blog post no matter what. I think it gets read by more people and/or we can use it to announce that there is a SLEP.

Jeremie's suggestion to only bump things i 8000 n the November release sounds nice.


When you say SPEC, do you mean SLEP? To me it seems like if we want to formally write things down it should be a SLEP as this is just about scikit-learn. Or is is useful for others to use this "SPEC0+1", in which case maybe a SPEC?

@lucascolley
Copy link
Contributor

If possible, the best case scenario would be to add to the existing SPEC0 IMO. Stefan said on discord that he is open to including a variant for longer term support. I think that makes sense since there is nothing completely specific to scikit-learn here, right?

@lesteve
Copy link
Member Author
lesteve commented Feb 27, 2025

My current plan:

  • get MNT Bump min Python version to 3.10 and dependencies #30895 merged. This will already be a small victory by bumping minimum Python to 3.10 and iron out a few possible variations that have been mentioned on a specific case.
  • agree on a scikit-learn general rule internally in this issue
  • longer term see how the rule scikit-learn agreed on can be fed upstream to SPEC0

Maybe I am being overly cautious, but IMO coupling this scikit-learn discussion too early with SPEC0 is the best way to tangle too many things and lose the momentum that has been going on on this topic 😉.

@lucascolley
Copy link
Contributor
lucascolley commented Mar 22, 2025

agree on a scikit-learn general rule internally in this issue

from #30895 (comment) @lesteve :

So the consensus at the biweekly scikit-learn meeting seems to be the following: plan 1 (X.Y version for dependencies) with the disclaimer that X.Y.0 may not be supported if it is too much of an effort for us, for example if there is a critical bugfix or security fix in X.Y.Z we may decide to require X.Y.Z.

With "plan 1" described at #30895 (comment)

@lesteve
Copy link
Member Author
lesteve commented Mar 28, 2025

I also have a quick and dirty script as mentioned in #30895 (comment).

I'll try to devote some time to have something written up for next monthly developer meeting (next Monday, i.e. March 31)

@lesteve
Copy link
Member Author
lesteve commented Mar 31, 2025

Here is a draft of what the consensus seems to be for our "dependencies minimum version bumping rule":

Guideline

  • minimum Python version: at the time of a minor scikit-learn release (X.Y.0), we drop Python whose initial release was more than 4 years old. In other words, our minimum Python version is between 3 and 4 years old.
  • compiled dependencies (numpy, scipy, as well as compiled optional dependencies like pandas, matplotlib, pyamg, pillow): we take the oldest minor release (X.Y.0) that has wheels for our minimum Python version. In practice this means that our minimum supported version is around 3 years old, maybe a bit less.
  • pure Python dependencies (joblib, threadpoolctl): at the time of the scikit-learn release our minimum supported version is the most recent minor release (X.Y.0) that is at least 2 years old.
  • we may decide to be less conservative than this guideline in some edge cases. These edge cases include: a security bugfix in one of our dependencies or a critical bugfix in one of our dependencies makes it too costly to support it in terms of maintenance.

Example for the upcoming scikit-learn 1.8 release

The upcoming 1.8 release is scheduled around November/December, let's pretend it will happen on 2025-12-01, Python 3.10 (released in October 2021) will be more than 4 years old and will be dropped so our minimum supported Python will be Python 3.11 (a bit more than 3 years old).

The dependencies minimum versions will be bumped as follow:

  • python: 3.10 -> 3.11
  • numpy: 1.22.0 -> 1.24.0 (oldest version with Python 3.11 wheels)
  • scipy: 1.8.0 -> 1.10.0 (oldest scipy version with Python 3.11 wheels)
  • pandas: 1.4.0 -> 1.5.0 (oldest pandas version with Python 3.11 wheels)
  • matplotlib: 3.5.0 -> 3.6.0 (oldest matplotlib version with Python 3.11 wheels)
  • joblib: 1.2.0 -> 1.3.0 (1.3.0 will be ~2 years 5 month old, 1.4.0 will be less 8000 than 2 years old, see release history)
  • threadpoolctl: 3.1.0 -> 3.2.0 (3.2.0 will be ~2 years 5 months old, 3.3.0 will be less than 2 years old, see release history)

The new minimum version were generated from this hacky script, with definitely some room for improvement.

Rough Comparison with SPEC0

  • Python has one more year of support: 4 years instead of 3 years.
  • compiled dependencies have roughly one more year of support: roughly 3 years instead of 2 years. In practice the exact support window depends how quickly our dependencies released a new minor version (X.Y.0) that supported our minimum Python version.
  • pure Python dependencies: a bit more conservative than SPEC0. Our minimum supported version is at least 2 years old, whereas SPEC0 says you can drop support for a version if it's more than 2 years old. This also depends on our dependencies release cadence. joblib and threadpoolctl tends to not have very regular release cadences, compared to scikit-learn or numpy (roughly 2 minor releases a year). This makes SPEC0 too aggressive compared to what we have been doing in scikit-learn. For example for scikit-learn 1.8 release (scheduled 2025-12-01), SPEC0 would recommend to drop joblib 1.3 (1.3.0 is a bit more than 2 years old at the time of release), but the next minor joblib version 1.4.0 happened in April 2024 so it would be less than one year old in December 2025, which seems way too recent compared to what we have been doing historically (minimum version is roughly 2 years old).

@lucascolley
Copy link
Contributor
lucascolley commented Mar 31, 2025

I unfortunately won't be there, but just to flag this, is anyone from scikit-learn planning to attend the Scientific Python Developer Summit this year? If so, I think it would be really useful to work on an extension of SPEC 0 which accommodates for these slightly more conservative rules.

@lesteve
Copy link
Member Author
lesteve commented Apr 1, 2025

I think @virchan is planning to attend physically. Some of us expressed interest to go to the Scientific Python Developer Summit, but unfortunately can not make it this year. @glemaitre (and maybe I) may try to get involved remotely, depending on how feasible this is.

I tried to highlight the difference with respect to SPEC0 in #30888 (comment). IMO, one of the main difference is that SPEC0 thinks in term of "when I can drop support for a particular version" whereas I find it slightly more natural to think in terms of "how old should my minimum supported version be".

For dependencies with regular releases (Python, numpy, scikit-learn, etc ...), the difference between the two is small. If your dependency releases every 6 months, dropping support for a release that is more than 2 years old means your minimum version is roughly 1.5 years old.

For dependencies with somewhat irregular release cadence (e.g. joblib or threadpoolctl), the difference is greater. Dropping support for joblib 1.3.0 that will be more than 2 years old in December 2025 means our minimum version 1.4.0 would be less than 8 months old, which is too aggressive compared to what we have been doing (rule of thumb for joblib + threadpoolctl: minimum version is roughly 2 years old).

@virchan
Copy link
Member
virchan commented Apr 8, 2025

The Scientific Python Summit 2025 issue tracker is now open. It might be a good idea to mention this SPEC0+1 discussion there as well. I’d be happy to follow up on anything for scikit-learn during the summit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
0