pip install sklearn behaviour #8215

lesteve · 2017-01-19T10:22:37Z

I was surprised that pip install sklearn would do something without giving an error, which I learned from
#8185 (comment). All in all it is not so bad because scikit-learn is in install_requires of sklearn setup.py so you do end-up with scikit-learn installed. Still it is a bit confusing that pip list shows both scikit-learn 0.18.1 and sklearn 0.0.

@chrisfilo if I believe https://pypi.org/project/sklearn it looks like you uploaded it. I see a similar approach is done for bids and pybids https://pypi.org/project/bids. Is it a common thing to do prevent package names to be squatted on PyPI?

There must be a way to have a slightly clever setup.py that allows upload but prevents install with an error saying something like "Please use pip install scikit-learn".

The text was updated successfully, but these errors were encountered:

glemaitre · 2017-01-19T11:04:56Z

Maybe you could make sklearn an alias of scikit-learn.
I went across the following package.
I didn't check the mechanism thought.

lesteve · 2017-01-19T11:18:41Z

Quickly looking at the pypi-alias code it may well be that pypi-alias was used to generate the sklearn PyPI package indeed.

glemaitre · 2017-01-19T12:38:52Z

Indeed you are right.

At the end, pip install sklearn or pip install scikit-learn --- apart from the annoying sklearn (0.0) showed in the pip list --- will install the latest available build from PyPI. I would have thought that this is the expected behaviour.

However, I could find a case which I find annoying. As a user, I would expect pip install sklearn==0.17 to do something as pip install scikit-learn==0.17. This will throw an error since that sklearn as only a (0.0) version.

Regarding the issue in #8185, I was checking on the doc if it was mentioned that you should use pip install . to trigger the install of the local version. I couldn't find it. Did I miss it?

glemaitre · 2017-01-19T12:40:20Z

And I forgot that pip uninstall sklearn will not do what you are expecting as well if you expect to remove scikit-learn.

hugovk · 2017-11-15T14:52:15Z

Maybe it would be wise to phase out the sklearn package.

How about:

first allowing installation through pip install sklearn but with a warning,
then some time later don't install and just show an error,
and finally remove the package altogether.

lesteve · 2017-11-15T15:50:33Z

My earlier comment is still valid:

There must be a way to have a slightly clever setup.py that allows upload but prevents install with an error saying something like "Please use pip install scikit-learn".

@hugovk if you can find or put together a setup.py that just raises an exception that would be a step forward on this issue.

Just to be clear I don't think we should have a warning phase (since the problem is super easy to fix and to be honest I suspect very few people are doing pip install sklearn). We should not remove the package it is there for a good reason: people might still do pip install sklearn and/or it could be squatted by not so benevolent people.

hugovk · 2017-11-15T16:08:07Z

Here's PyPI downloads stats (via pypinfo) for the last month suggesting over a quarter of scikit-learn installs are through sklearn!

package	download_count
sklearn	283,395
scikit-learn	1,082,341

How about something like this in sklearn's setup.py?

raise RuntimeError("sklearn is now known as scikit-learn. Please run: pip install scikit-learn")

chrisgorgo · 2017-11-15T16:25:34Z

Happy to transfer ownership of the sklearn pypi to someone more involved in the project.

My two cents: if it ain't broken don't fix it. I see little benefit from preventing people from using sklearn and a lot of broken CI setups and packages if you do implement this exception.
I also vaguely remember that we added this meta package because of some meta-programming pattern that was looking for packages, by the name of the the module (which is sklearn). This would also break - sadly I don't remember the details.

amueller · 2017-11-15T16:50:32Z

@lesteve do you have an account? @chrisfilo you could also transfer to me. (t3kcit I think).

I would argue that keeping it as an alias is fine. And yes, I think a lot of CI scripts will break and it'll be no benefit to anyone imho.

lesteve · 2017-11-15T18:09:27Z

Here are a few ways this is broken:

pip install sklearn==0.18.1

Output:

 Could not find a version that satisfies the requirement sklearn==0.18.1 (from versions: 0.0)
No matching distribution found for sklearn==0.18.1

pip install sklearn
pip uninstall -y sklearn
python -c "import sklearn"  # Oh wait why does this work I thought I just uninstalled sklearn

pip list ouput is also is misleading (why do I have two versions of scikit-learn and why is one 0.0, which was actually the question from #8185 (comment)):

scikit-learn (0.19.1)
sklearn (0.0)

If we believe the numbers from PyPI (1 in 5 scikit-learn pip install is through pip install sklearn really ???), maybe it's worth having a warning for a while. Because the fix is so simple and could be very well explained in the error message I would slightly be in favour of just raising an exception.

My two cents: if it ain't broken don't fix it. I see little benefit from preventing people from using sklearn and a lot of broken CI setups and packages if you do implement this exception.

@chrisfilo I tend to be with you on this but in this case I am not, not sure exactly why. Probably part of the story is the mix of surprise and WTF I felt when I realised pip install sklearn was actually working. It's a good point that sklearn in requirements.txt or setup.py could be an issue. Maybe it is worth setting using github advanced search or using the github API to figure out how much people do that.

@lesteve do you have an account? @chrisfilo you could also transfer to me. (t3kcit I think).

I do have an account on PyPI which is lesteve.

chrisgorgo · 2017-11-16T17:57:09Z

Ownership transferred. Remember: With great power comes... ability to mess up a lot of packages ;)

ghost · 2018-01-29T01:28:13Z

pip install -U scikit-learn==0.18

rth · 2018-06-27T14:58:52Z

For future reference, as I couldn't find a version under version control, I have made a copy of the current contents of https://pypi.org/project/sklearn/ under https://github.com/rth/scikit-learn-invalid-pypi

The consensus about this issue seems to be not to do anything, but if we decide to change something in that PyPi package in the future, please make a PR there first and ping relevant maintainers so the changes can be reviewed / commented on before uploading to PyPi.

lesteve · 2018-06-27T15:50:06Z

For future reference, as I couldn't find a version under version control, I have made a copy of the current contents of https://pypi.org/project/sklearn/ under https://github.com/rth/scikit-learn-invalid-pypi

Thanks! For compleness' sake, last time I looked I got the feeling that they were done through a pypi alias tool, maybe https://pypi.org/project/pypi-alias?

The consensus about this issue seems to be not to do anything,

Although it makes me slightly sad, I think that this is not worth the pain indeed.

If someone has the time, it would be great to:

look at PyPI downloads to see whether pip install sklearn behaviour #8215 (comment) is still true
see whether a github search can help evaluating the number of packages using the sklearn package on PyPI, e.g. how many packages have sklearn in their setup.py

Other ideas:

adding a warning in setup.py could be a middle ground solution and reevaluating in a few years. I doubt a warning would have significant impact though
coupling the breaking change to a scikit-learn release so that we can advertise it and people are more aware of it.

Just for the story, I was on the wrong side on msgpack-python being renamed to msgpack on PyPI and this was clearly not my definition of fun (bonuses were: interaction between conda keeping the old name and pip, different behaviour whether you install a wheel or from source, see dask/distributed#1913 (comment) for the short version if you really care).

zhenruiliao · 2018-08-15T21:59:45Z

An issue that could cause significant confusion is that if a new version is released after you install using this alias, pip install --upgrade sklearn fails silently (the only requirement is some version of scikit-learn, not necessarily the most recent version). I suppose it would be possible to continually release new versions of sklearn requiring the most recent version of scikit-learn, but that would be absurd

rth · 2018-08-16T06:35:45Z

Another possibility without breaking peoples code could be to detect whether the sklearn package is installed (e.g. by looking for site-packages/sklearn-0.0.dist-info/ or with some more standard python way), and raise a warning at import time if that is the case. This may add some small overhead for the import though.

timkpaine · 2019-04-22T19:24:22Z

for reference from airflow

import os
from setuptools import setup, find_packages

with open("README.md", "r") as fh:
    long_description = fh.read()

setup(
    name="airflow",
    version="0.6",
    author="Apache Author",
    author_email="dev@airflow.incubator.apache.org",
    description="Placeholder for the old Airflow package",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/apache/incubator-airflow",
    packages=find_packages(),
    classifiers=[
        'Development Status :: 5 - Production/Stable',
        'Environment :: Console',
        'Environment :: Web Environment',
        'Intended Audience :: Developers',
        'Intended Audience :: System Administrators',
        'License :: OSI Approved :: Apache Software License',
        'Programming Language :: Python :: 2.7',
        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: 3.5',
        'Topic :: System :: Monitoring',
    ]
)

if not os.getenv("OVERRIDE"):
    raise RuntimeError('Please install package apache-airflow instead of airflow')

See scikit-learn/scikit-learn#8215 (comment)

lesteve · 2021-06-21T07:00:57Z

I guess maybe the 1.0 release could be an opportunity to do this cleanup + breaking change?

I am imagining something like to make the disruption a bit smoother:

soon: uploading sklearn 0.1 on PyPI which raises an error explaining the change and allow an easy way out through an environment variable (better suggestions welcome for the easy way out)
in combination with the scikit-learn 1.0 release (I guess in a few months?): upload sklearn 0.2 on PyPI which always raises an error (no easy way out). Optionally remove sklearn 0.1 from PyPI and maybe sklearn 0.0.

Alternatives considered but discarded: add a warning in setup.py, the warning will only be visible when using pip install -v (see https://stackoverflow.com/a/44617404) and will probably be ignored anyway (pip install -v is very verbose) ...

One possible thing to look at is sklearn dependents from https://libraries.io/pypi/sklearn. One caveat is that there are plenty of false positives which seems like historical dependents of sklearn. For example pycaret is listed probably because sklearn was amongst the dependencies at one point (2 years ago):
https://github.com/pycaret/pycaret/blob/b9756afeae6b24107d86c74544deb83fe5527b50/requirements.txt#L7
but this has been fixed since.

For completeness, I checked the numbers pypistats.org, roughly 29% of the scikit-learn installs come from the sklearn package ... which is in the same ballpark as #8215 (comment). I find this both amazing and slightly sad at the same time 😉.

sklearn

https://pypistats.org/packages/sklearn
Downloads last month: 6,528,058

scikit-learn

https://pypistats.org/packages/scikit-learn
Downloads last month: 22,800,647

rth · 2021-06-21T09:23:44Z

It would be good to fix this. I think if we error we always need a way out, because some popular but no longer maintained libraries can have sklearn in their list of dependencies and there is not much one can do about it.

Maybe we could make 1 sklearn release,

that changes the description in https://pypi.org/project/sklearn/ to be much more verbose and explicitly say that bad things will happen if you use it.
Always keep the env variable to disable the error
say starting from September 1st start producing an error each Monday between 3PM CEST to 7PM CEST . This should raise enough attention so that issues are opened in popular libraries and this is fixed.
Then progressively increase the frequency of failure, from once per week to every other day, every day etc, with a sufficiently long period. All this can be defined initially in that 0.1 sklearn release.

This would give time libraries that have sklearn in their dependencies to update and make a new release. I forgot what's the name for such voluntary intermittent failures is.

I think it would be better to do this completely independently from any scikit-learn release, to avoid implying this is somehow a breaking change / regression, though we can mention it in release notes.

hugovk · 2021-06-21T10:01:58Z

I forgot what's the name for such voluntary intermittent failures is.

Brownout? Compare:

To help make sure all affected customers are aware of this change, we will temporarily suspend workflows on Ubuntu 16.04 LTS environment for two short 'brownout' periods ahead of the final removal. Builds that are scheduled to run during the brownout periods will fail.

https://github.blog/changelog/2021-04-29-github-actions-ubuntu-16-04-lts-virtual-environment-will-be-removed-on-september-20-2021/

rth · 2021-06-21T10:06:38Z

Yes, thanks. For some reason my brain only suggested hashbrown when I tried to find the term :)

lesteve · 2021-06-21T13:43:05Z

I think if we error we always need a way out, because some popular but no longer maintained libraries can have sklearn in their list of dependencies and there is not much one can do about it.

Good point, this also applies to old package versions that at one point required sklearn and unfortunately we can not fix the past 😉.

I think it would be better to do this completely independently from any scikit-learn release, to avoid implying this is somehow a breaking change / regression, though we can mention it in release notes.

OK I am fine with this.

About the brownout, why not ... I'd rather have something simpler if possible. It feels slightly weird to have something failing and then working the next day (I do get the increasing failure frequency thing but still ...). IMO one of the reason it works for github is that they can detect you are still not using the recommended way and they can send you emails from time to time to nudge you into the right direction (e.g. creating a token to use git on the command line if you are using https).

So maybe a fixed schedule with a few different environment variables like IGNORE_SKLEARN_PACKAGE_WARNING, IGNORE_SKLEARN_PACKAGE_SECOND_WARNING, IGNORE_SKLEARN_PACKAGE_LAST_WARNING ... one small advantage is that then you can easily google who is actually using these environment variables (I am slightly annoyed I was not able to find a good way to know who is the root cause of all of these sklearn package PyPI downloads, it must be the CI of a popular package and of its dependents ...).

rth · 2021-06-21T14:38:55Z

IMO one of the reason it works for github is that they can detect you are still not using the recommended way and they can send you emails from time to time to nudge you

Brownout is the nudge. Other per-user notifications are not expected. PyPi did a similar thing for deprecating non SNI compatible clients pypi/support#978. It quite suitable for situations where you want some change to happen, make sure users are aware but don't want to make it a hard blocker either. It also allows one to monitor the migration as it happens (in our case PyPi stats) and adjust the schedule if necessary.

If we just fail permanently, it becomes a blocker for all maintainers who happened to depend on sklearn. They would be expected to drop everything and start working on their CI, updating documentation, responding to user complaints etc. Which I find is not very nice, particularly that there is no urgency to fix this. In our case, we don't really care if this migration happens even say by mid 2022 as long as it does happens.

So maybe a fixed schedule with a few different environment variables like IGNORE_SKLEARN_PACKAGE_WARNING, IGNORE_SKLEARN_PACKAGE_SECOND_WARNING, IGNORE_SKLEARN_PACKAGE_LAST_WARNING

That would be a bit confusing for downstream packages to document IMHO :) I would be rather +1 for a single env variable.

I think we can search for packages that currently use sklearn as a dependency with BigQuery, assuming one can filter files to requirements*.txt, setup.py only and search inside the file contents.

ogrisel · 2021-06-22T15:02:11Z

+1 for the brownout strategy. We can setup a wiki page or an issue that documents the planned brownout schedule and announce it on twitter + the mailing list.

ogrisel · 2021-06-22T15:02:32Z

We should discuss this on the next online dev meeting.

hugovk · 2021-06-24T15:44:33Z

sklearn had 7,232,064 downloads over 30 days.

I've created PRs for four packages which had a total 7,039,690 downloads over 30 days:

These don't necessarily account for 7,039,690 sklearn downloads, but it'll help!

rth · 2021-06-24T15:58:23Z

Thanks a lot @hugovk!

lesteve · 2021-06-25T09:18:18Z

Very nice indeed @hugovk! Out of curiosity (and in case we need to do it again in the future), how did you find the sklearn dependents with the most PyPI dowloads?

hugovk · 2021-06-26T10:25:04Z

I knocked up a script to find which sdist and wheel packages in https://github.com/DavHau/pypi-deps-db have sklearn as a dependency, and filtered out only those in the top 4k most-downloaded: https://github.com/hugovk/top-pypi-packages:

jsonpickle {'jsonpickle-1.5.2-py2.py3-none-any.whl'}
xgboost {'xgboost-1.0.0rc1-py2.py3-none-manylinux1_x86_64.whl'}
shap {'0.5'}
hmmlearn {'hmmlearn-0.2.0-cp34-cp34m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl', 'hmmlearn-0.2.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl', 'hmmlearn-0.2.0-cp33-cp33m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl'}
moviepy {'moviepy-0.2.3.2-py2.py3-none-any.whl'}
fastai {'0.7.0'}
pyldavis {'3.3.0'}
dbnd {'dbnd-0.42.1-py2.py3-none-any.whl'}
pdpbox {'0.2.1'}
datarobot {'datarobot-2.22.0-py2-none-any.whl', 'datarobot-2.25.0-py3-none-any.whl'}
caer {'1.3.6'}
flair {'0.4.1'}
recordlinkage {'0.2'}

Some of these are older releases, and they've since replaced sklearn. I made PRs for the rest, except for https://pypi.org/project/datarobot/ (108k downloads in 30 days) which might be closed source (or I couldn't find the repo). Could be worth emailing them.

Here's also a list of all matching packages: sklearn-all.txt

(To run the script, clone the repo and check the setup info in the top-level docstring. Likely Python 3.9+ but easy to adjust for 3.6+. If anyone would like a hand running the script, feel free to open an issue in its repo!)

rth · 2021-06-26T10:40:07Z

except for pypi.org/project/datarobot (108k downloads in 30 days) which might be closed source (or I couldn't find the repo). Could be worth emailing them.

Email sent. It's only used in their extra examples, so most of their users probably wouldn't download it.

lesteve · 2021-06-28T16:44:11Z

Looking at #8215 (comment) I had some big hope that most (6.7M over 7.2M) sklearn downloads were coming from jsonpickle but actually sklearn is only in their testing extras_require (see https://github.com/jsonpickle/jsonpickle/blob/master/setup.cfg#L59) so this is quite unlikely that jsonpickle is the main culprit.

ahjota · 2021-09-13T18:46:16Z

except for pypi.org/project/datarobot (108k downloads in 30 days) which might be closed source (or I couldn't find the repo). Could be worth emailing them.

Email sent. It's only used in their extra examples, so most of their users probably wouldn't download it.

Speaking as one of the maintainers, we just released datarobot v2.26.0 which is no longer dependent on sklearn. Same with our related datarobot-early-access package in v2.26.0.2021.9.13. Thanks for the heads up.

eikevons · 2021-09-21T19:35:43Z

For reference, there is a anti-typo-squatting package for NLTK that performs the "fail with helpful error" strategy: ntlk

lesteve · 2022-10-13T15:37:01Z

I have created a repo to to implement the sklearn package deprecation brownout:
https://github.com/scikit-learn/sklearn-pypi-package

Feed-back on the repo, more than welcome. The start date is currently set to November 1st with a brownout period of one year but this is open to discussion.

lesteve · 2022-11-09T05:27:21Z

sklearn 0.0.post1 has been uploaded, the brownout will start December 1st. I think we can close this even if the brownout period will end December 1st 2023.

More details can be found on the repo: https://github.com/scikit-learn/sklearn-pypi-package

rth mentioned this issue Aug 10, 2017

PIP Package: Missed dependency? #9519

Closed

rth mentioned this issue Feb 6, 2018

Minor fixes johannfaouzi/pyts#4

Merged

plbecker mentioned this issue Jun 26, 2018

Scikit learn instead of sklearn tmadl/semisup-learn#9

Closed

lesteve mentioned this issue Jun 27, 2018

problem installing scikit-learn #11358

Closed

lesteve mentioned this issue Jun 27, 2018

Use scikit-learn in requirements.txt tmadl/semisup-learn#10

Open

hugovk mentioned this issue Sep 11, 2018

Override sklearn with scikit-learn brettcannon/caniusepython3#185

Merged

jamadden mentioned this issue Apr 22, 2019

PEP541 - sklearn transfer to proper owner pypi/warehouse#5730

Closed

splaisan mentioned this issue Jun 6, 2019

sklearn not found bcgsc/NanoSim#61

Closed

felker added a commit to PPPLDeepLearning/plasma-python that referenced this issue Oct 14, 2019

Install scikit-learn, not dummy sklearn, from pip

3ec7037

See scikit-learn/scikit-learn#8215 (comment)

lesteve mentioned this issue Oct 30, 2019

Fix scikit-learn package name in setup.py microsoft/tensorwatch#45

Merged

hugovk mentioned this issue Jan 3, 2020

Replace PIL with Pillow on PyPI python-pillow/Pillow#1535

Closed

AntoineTheb mentioned this issue Feb 26, 2020

Smooth shuffle streamlines scilus/scilpy#116

Merged

cmarmo mentioned this issue Oct 23, 2020

Locally build development sklearn then get both scikit-learn (0.18.1) and sklearn (0.0) #8185

Closed

lesteve mentioned this issue Jun 21, 2021

Use scikit-learn package rather than sklearn in setup.py huggingface/datasets#2525

Merged

lesteve mentioned this issue Jun 21, 2021

Use scikit-learn rather than sklearn in docs/requirements.txt jax-ml/jax#7036

Merged

This was referenced Jun 24, 2021

Replace sklearn with scikit-learn jsonpickle/jsonpickle#360

Merged

Replace sklearn with scikit-learn bmabey/pyLDAvis#212

Closed

Replace sklearn with scikit-learn databand-ai/dbnd#51

Closed

Replace sklearn with scikit-learn SauceCat/PDPbox#77

Closed

rth mentioned this issue Aug 19, 2022

Upload a Python wheel for sklearn package #24204

Closed

lesteve mentioned this issue Aug 19, 2022

Use correct name for scikit-learn PyPI package StatguyUser/BaselineRemoval#9

Merged

lesteve closed this as completed Nov 9, 2022

NickleDave mentioned this issue Feb 17, 2023

Rename the top-level package from maad to scikit_maad scikit-maad/scikit-maad#71

Closed

thomasjpfan mentioned this issue Nov 6, 2023

FIX Use scikit-learn instead of sklearn in requirements.txt unionai-oss/unionml#239

Open

bkolosk1 mentioned this issue Dec 28, 2024

Fix requirments SkBlaz/autobot#3

Merged

Uh oh!

pip install sklearn behaviour #8215

pip install sklearn behaviour #8215

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sklearn

scikit-learn

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!