-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
pip install sklearn behaviour #8215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Maybe you could make |
Quickly looking at the pypi-alias code it may well be that pypi-alias was used to generate the sklearn PyPI package indeed. |
Indeed you are right. At the end, However, I could find a case which I find annoying. As a user, I would expect Regarding the issue in #8185, I was checking on the doc if it was mentioned that you should use |
And I forgot that |
Maybe it would be wise to phase out the sklearn package. How about:
|
My earlier comment is still valid:
@hugovk if you can find or put together a setup.py that just raises an exception that would be a step forward on this issue. Just to be clear I don't think we should have a warning phase (since the problem is super easy to fix and to be honest I suspect very few people are doing |
Here's PyPI downloads stats (via pypinfo) for the last month suggesting over a quarter of scikit-learn installs are through sklearn!
How about something like this in sklearn's setup.py? raise RuntimeError("sklearn is now known as scikit-learn. Please run: pip install scikit-learn") |
Happy to transfer ownership of the sklearn pypi to someone more involved in the project. My two cents: if it ain't broken don't fix it. I see little benefit from preventing people from using sklearn and a lot of broken CI setups and packages if you do implement this exception. |
@lesteve do you have an account? @chrisfilo you could also transfer to me. (t3kcit I think). I would argue that keeping it as an alias is fine. And yes, I think a lot of CI scripts will break and it'll be no benefit to anyone imho. |
Here are a few ways this is broken:
Output:
pip list ouput is also is misleading (why do I have two versions of scikit-learn and why is one 0.0, which was actually the question from #8185 (comment)):
If we believe the numbers from PyPI (1 in 5 scikit-learn pip install is through
@chrisfilo I tend to be with you on this but in this case I am not, not sure exactly why. Probably part of the story is the mix of surprise and WTF I felt when I realised
I do have an account on PyPI which is lesteve. |
Ownership transferred. Remember: With great power comes... ability to mess up a lot of packages ;) |
pip install -U scikit-learn==0.18 |
For future reference, as I couldn't find a version under version control, I have made a copy of the current contents of https://pypi.org/project/sklearn/ under https://github.com/rth/scikit-learn-invalid-pypi The consensus about this issue seems to be not to do anything, but if we decide to change something in that PyPi package in the future, please make a PR there first and ping relevant maintainers so the changes can be reviewed / commented on before uploading to PyPi. |
Thanks! For compleness' sake, last time I looked I got the feeling that they were done through a pypi alias tool, maybe https://pypi.org/project/pypi-alias?
Although it makes me slightly sad, I think that this is not worth the pain indeed. If someone has the time, it would be great to:
Other ideas:
Just for the story, I was on the wrong side on msgpack-python being renamed to msgpack on PyPI and this was clearly not my definition of fun (bonuses were: interaction between conda keeping the old name and pip, different behaviour whether you install a wheel or from source, see dask/distributed#1913 (comment) for the short version if you really care). |
An issue that could cause significant confusion is that if a new version is released after you install using this alias, |
Another possibility without breaking peoples code could be to detect whether the |
for reference from airflow import os
from setuptools import setup, find_packages
with open("README.md", "r") as fh:
long_description = fh.read()
setup(
name="airflow",
version="0.6",
author="Apache Author",
author_email="dev@airflow.incubator.apache.org",
description="Placeholder for the old Airflow package",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/apache/incubator-airflow",
packages=find_packages(),
classifiers=[
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'Environment :: Web Environment',
'Intended Audience :: Developers',
'Intended Audience :: System Administrators',
'License :: OSI Approved :: Apache Software License',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Topic :: System :: Monitoring',
]
)
if not os.getenv("OVERRIDE"):
raise RuntimeError('Please install package apache-airflow instead of airflow') |
I guess maybe the 1.0 release could be an opportunity to do this cleanup + breaking change? I am imagining something like to make the disruption a bit smoother:
Alternatives considered but discarded: add a warning in setup.py, the warning will only be visible when using One possible thing to look at is sklearn dependents from https://libraries.io/pypi/sklearn. One caveat is that there are plenty of false positives which seems like historical dependents of sklearn. For example pycaret is listed probably because sklearn was amongst the dependencies at one point (2 years ago): For completeness, I checked the numbers pypistats.org, roughly 29% of the scikit-learn installs come from the sklearnhttps://pypistats.org/packages/sklearn scikit-learnhttps://pypistats.org/packages/scikit-learn |
It would be good to fix this. I think if we error we always need a way out, because some popular but no longer maintained libraries can have Maybe we could make 1
This would give time libraries that have I think it would be better to do this completely independently from any scikit-learn release, to avoid implying this is somehow a breaking change / regression, though we can mention it in release notes. |
Brownout? Compare:
|
Yes, thanks. For some reason my brain only suggested hashbrown when I tried to find the term :) |
Good point, this also applies to old package versions that at one point required sklearn and unfortunately we can not fix the past 😉.
OK I am fine with this. About the brownout, why not ... I'd rather have something simpler if possible. It feels slightly weird to have something failing and then working the next day (I do get the increasing failure frequency thing but still ...). IMO one of the reason it works for github is that they can detect you are still not using the recommended way and they can send you emails from time to time to nudge you into the right direction (e.g. creating a token to use git on the command line if you are using https). So maybe a fixed schedule with a few different environment variables like IGNORE_SKLEARN_PACKAGE_WARNING, IGNORE_SKLEARN_PACKAGE_SECOND_WARNING, IGNORE_SKLEARN_PACKAGE_LAST_WARNING ... one small advantage is that then you can easily google who is actually using these environment variables (I am slightly annoyed I was not able to find a good way to know who is the root cause of all of these sklearn package PyPI downloads, it must be the CI of a popular package and of its dependents ...). |
Brownout is the nudge. Other per-user notifications are not expected. PyPi did a similar thing for deprecating non SNI compatible clients pypi/support#978. It quite suitable for situations where you want some change to happen, make sure users are aware but don't want to make it a hard blocker either. It also allows one to monitor the migration as it happens (in our case PyPi stats) and adjust the schedule if necessary. If we just fail permanently, it becomes a blocker for all maintainers who happened to depend on sklearn. They would be expected to drop everything and start working on their CI, updating documentation, responding to user complaints etc. Which I find is not very nice, particularly that there is no urgency to fix this. In our case, we don't really care if this migration happens even say by mid 2022 as long as it does happens.
That would be a bit confusing for downstream packages to document IMHO :) I would be rather +1 for a single env variable. I think we can search for packages that currently use sklearn as a dependency with BigQuery, assuming one can filter files to |
+1 for the brownout strategy. We can setup a wiki page or an issue that documents the planned brownout schedule and announce it on twitter + the mailing list. |
We should discuss this on the next online dev meeting. |
sklearn had 7,232,064 downloads over 30 days. I've created PRs for four packages which had a total 7,039,690 downloads over 30 days:
These don't necessarily account for 7,039,690 sklearn downloads, but it'll help! |
Thanks a lot @hugovk! |
Very nice indeed @hugovk! Out of curiosity (and in case we need to do it again in the future), how did you find the sklearn dependents with the most PyPI dowloads? |
I knocked up a script to find which sdist and wheel packages in https://github.com/DavHau/pypi-deps-db have sklearn as a dependency, and filtered out only those in the top 4k most-downloaded: https://github.com/hugovk/top-pypi-packages:
Some of these are older releases, and they've since replaced sklearn. I made PRs for the rest, except for https://pypi.org/project/datarobot/ (108k downloads in 30 days) which might be closed source (or I couldn't find the repo). Could be worth emailing them. Here's also a list of all matching packages: sklearn-all.txt (To run the script, clone the repo and check the setup info in the top-level docstring. Likely Python 3.9+ but easy to adjust for 3.6+. If anyone would like a hand running the script, feel free to open an issue in its repo!) |
Email sent. It's only used in their extra |
Looking at #8215 (comment) I had some big hope that most (6.7M over 7.2M) |
Speaking as one of the maintainers, we just released |
For reference, there is a anti-typo-squatting package for NLTK that performs the "fail with helpful error" strategy: ntlk |
I have created a repo to to implement the Feed-back on the repo, more than welcome. The start date is currently set to November 1st with a brownout period of one year but this is open to discussion. |
sklearn 0.0.post1 has been uploaded, the brownout will start December 1st. I think we can close this even if the brownout period will end December 1st 2023. More details can be found on the repo: https://github.com/scikit-learn/sklearn-pypi-package |
Uh oh!
There was an error while loading. Please reload this page.
I was surprised that
pip install sklearn
would do something without giving an error, which I learned from#8185 (comment). All in all it is not so bad because scikit-learn is in install_requires of sklearn setup.py so you do end-up with scikit-learn installed. Still it is a bit confusing that
pip list
shows both scikit-learn 0.18.1 and sklearn 0.0.@chrisfilo if I believe https://pypi.org/project/sklearn it looks like you uploaded it. I see a similar approach is done for bids and pybids https://pypi.org/project/bids. Is it a common thing to do prevent package names to be squatted on PyPI?
There must be a way to have a slightly clever setup.py that allows upload but prevents install with an error saying something like "Please use pip install scikit-learn".
The text was updated successfully, but these errors were encountered: