-
Notifications
You must be signed in to change notification settings - Fork 24.3k
CPU version of PyTorch on PyPI #26340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
the gating issues here are:
|
Assuming wheels are built with For example: setuptools.setup(name="torch-cpu", packages=["torch"]) This will produce a wheel like After
You still know which version of PyTorch is installed in your environment:
This is currently not possible, because both the GPU and CPU version have Using local version identifiers becomes obsolete when publishing as a separate project, time and effort is minimal (might be wrong here, I know building PyTorch is quite complex), because you only have to set either |
I am also having issues with this mainly because we use nexus internally. And that has issues with the plus sign. But if pypi had the +cpu versions, it would make things simpler for us. |
For those who rely on
|
So we’re using Nexus and apparently the sync doesn’t work with plus signs.
So we’re waiting on that to be resolved.
…On Thu, Jan 2, 2020 at 7:56 PM Dong Zhou ***@***.***> wrote:
For those who rely on requirements.txt for deployment, we can install
torch+cpu with format like below:
numpy==1.17.2
pandas==0.25.2
-f https://download.pytorch.org/whl/torch_stable.html
torch==1.3.1+cpu
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#26340?email_source=notifications&email_token=AFLLQEJPRJAIOUTTTFS5DGTQ32EK3A5CNFSM4IXMK6CKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH77CDQ#issuecomment-570421518>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFLLQEJCUX5KIPXS4IBJARTQ32EK3ANCNFSM4IXMK6CA>
.
|
An alternative way to make things compatible with poetry could be to support pip repo format additionally to plain HTML pages with package links. Say, several repos targeted to different cuda versions should do the trick. Poetry allows specifying custom repos and pointing to the proper one for a specific package. Update: I've found this issue #25639 which is about it, I guess. |
I would like to gently request that you take another look at this. We're using torch in https://github.com/neuropoly/spinalcordtoolbox/ but we can't ask our users to download almost a gigabyte of software. Not everyone is running in a big high performance data centre. A lot of our users just have some old Windows computer, or their macbook, and not all of our users even use us for the neural network parts. |
But keep requirements.txt, because we need to use it for pytorch (pytorch/pytorch#26340) and --find-links (-f) isn't supported by pip's new URL pinning format: pypa/pip#5898 (comment)
But keep requirements.txt, because we need to use it for pytorch (pytorch/pytorch#26340) and --find-links (-f) isn't supported by pip's new URL pinning format: pypa/pip#5898 (comment)
Packages/Projects depending on PyTorch could distinguish between pip install mylib[cpu] or pip install mylib[gpu] where the |
That's a great idea @sisp! The mutually-exclusiveness problem could be sidestepped if |
But keep requirements.txt, because we need to use it for pytorch (pytorch/pytorch#26340) and --find-links (-f) isn't supported by pip's new URL pinning format: pypa/pip#5898 (comment)
Hi. Some good ideas here. Just a note, that it might make sense to name the extension |
For context of how this might be used, here's how a HPC cluster is currently consuming scientific python: https://docs.computecanada.ca/wiki/Python#Installing_dependent_packages (archive):
I think they built their own wheels for tensorflow, compiled with tuning for their particular cluster set up. But it's awkward to have to consume them that way (with |
It's even awkward to install dependencies adhoc using |
I fully agree with what @sisp said. Unfortunately, this sounds like just uploading an optimized wheel-file to a custom PyPi server and pointing to that using |
@dsuess Oh, wow, that's unexpected behavior! I've never tested it but expected setting $ pip install --no-deps --index-url <source> --extra-index-url https://pypi.org/simple/ <package> |
Yeah, I've used the same pattern successfully to give precedence to a private PyPi server. The main downside to this approach would be of course that this wouldn't be portable. If someone wanted to install packages from both a private pip server and a pip server with optimized wheels, which one do you put in as The other way that I can see is to have the pypi |
All the bulk that takes up that 2.4GB is in the libs folder. You could simply upload your payload to another service. When torch runs it checks to see everything is downloaded, if not then it fetches it from your file services. For example of how this could be done, check out my library Torch+cuda could actually be very small in pypi. |
@zackees I believe this to be a bad idea. I would expect |
It's not insecure because it's trivial add sha checking on the archive and that sha would live in the repo itself. Are you concerned that someone could hack the file repo and pypi at the same time? Is the solution to this problem is for app developers to fork torch+cuda, manually separate the lib files into an archive and then upload the unofficial package ourselves? |
@zackees Leave package management tasks to package managers. I would have to review the code of such a library responsible for downloads and checking hashes with its every release, and still do extra special case treatment of this specific package to make it available for off-line cases. |
In light of the security vulnerability that happened today, I again call on the pytorch team to put the GPU package on pypi and sideload the large install files. App developers need this and not having it means we have to use the |
@zackees I didn't notice your comment from last month, but it leaves me confused. The version of PyTorch on PyPI is the GPU version. We've worked closely with nvidia to even have cudatoolkit now published on PyPI and the PyTorch wheel on PyPI now depends on it (instead of us plugging in a bunch of binary code) |
@soumith, @zackees must have made a typo and meant "CPU version". I had the exact same thought as soon as I read the writeup (thank you for the writeup, it is very clear, honest, to the point, and helpfully includes remediation steps). pip is susceptible by design to dependency confusion attacks
and my sense is most of their team strongly resists anyone using extra repos; for example, they don't have any method to specify alternate indexes within wheels -- it must be done at the command line, with they explained once to me that they think forcing end-users to type out I'm not sure achieves that. Clearly lots of people copy-pasted the pip line off https://pytorch.org/get-started/locally/ without interrogating it, but that's their choice so we have to play in their sandbox, and here with this attack, it wasn't that the extra index was compromised, but that the extra index was inadvertantly pointing to the the "safe" PyPI that had had a malicious package uploaded. Though what it definitely does protect against, and this is important too, is people building packages that have a URL to a single obscure repo server baked in, all of which then break when funding for that repo runs out. Given that it's python's sandbox you're playing in, I think you've gotta recognize that pytorch is the one using the wrong design. I know it's a lot of work to change torch. But it's even more work to change pip when tens of thousands of public packages have been built against its currently behaviour, and the need split-packaging tricks only seems to ever show up with corporate packages, and that's why they are extremely hesitant to even broach it:
torch is in a funny grey-zone: it is open source now, but it was built by Facebook, with Facebook's environment in mind (i.e. lots of cheap Linux VMs on commodity hardware backed by an expensive storage cluster, devs mostly working on macOS), which is very different from the environment the rest of us (usually one, maybe two machines, with limited disk space, on any OS). It seems like it was originally packaged for this "private, internal environment", hence why the Linux packages vendor torch and the macOS and Windows ones don't, which is the root of this whole thread. When it was open-sourced instead of re-packaging it to fit in pypi, the existing infrastructure of alternate So please, reconsider rearranging torch. Put separate |
If they had I've done this exact same pattern with https://github.com/zackees/static-ffmpeg so I know it works. The current issue seems to be that the devs are resistant to side loading assets. It was said that it was open to attacks but then I pointed out that the sha fingerprint could be used to verify the integrity. I offered to do the work and was told no. Not having this creates problems for app developers like myself. For example see the gpu installation step in my https://github.com/zackees/transcribe-anything app. I was able to get it to use one line of code like this:
But I consider this an ugly hack to work around this issue. |
I couldn't agree more!
I think the problem is that 2.4GB number. PyPA are very aware of this problem, but it's clearly not something they can fix. This is more evidence to me that Torch was designed to be used internally in Facebook's network: 2.4GB is nothing inside a datacentre where they own all the wires, but it's a lot for PyPI to pay to serve to the general public. Most of that 2.4GB is not torch, it's CUDA. Torch could help the situation by figuring out ways to compromise by cutting down what parts of CUDA need to be included.
We basically do the same thing, we have a custom installer written in bash which calls and our requirements.txt uses (funny that requirements.txt is allowed, sort of half contradicting PyPA's assertion that users should ALWAYS see what repos they are using up front) I also have a lot of misgivings about this design. But I don't see a way around it, given the lack of a good CPU copy of torch on pypi! |
I don't think you quite understand what I'm saying. For torch gpu, there is only one large asset, the .so/.dll file. This one file can be moved outside of pip and then downloaded as soon as torch needs it. If this one file is moved off of pypi, the entire torch gpu package fits. The psuedo code for this would be something like: with FileLock(lock_file) as locked:
if os.path.exists(large_shared_object):
return
download_file(large_shared_object) Here's an example in my repo: |
I'm hoping the large DLL can be broken up. Surely a DLL that large consists of sublibraries.
I understand this pattern. We are using this pattern. But I think it's a bad idea. It means your dependencies are now: a. pinned to specific URLs, which can easily linkrot |
All these problems already exist now. We have to use extra index url which resolves to a pinned url. Also torch gpu can't be packaged in any app. The proposal reduces friction and allows gpu accelerated app deployment. All the points you brought up with my proposal already exists with the status quo. |
I guess so, but I'm here to solve the problem not to replace it with the same problem I'm trying to make packaging for my team less crazy, ideally with standard reference docs we can point to (I've been using this page a lot). Sideloading breaks that. If you're already going to ignore the standards and write a custom installer to work around
This is exactly my concern with your solution, the offline case and the extra review overhead every time something changed. Please don't ask |
Hey Folks. a lot of discussion, so let me try to respond in order.
|
This is amazing news. Thank you! |
With regard to pip not propagating index information:
This is not a problem with pip; there is simply no way to express this in the Python ecosystem at present. What you are asking for is a hybrid of standard index/constraint dependencies and direct references, which is not something on anyone's radar in the Python packaging world as of present. |
btw, there are discussions on adding support of external hosting to pypi, e.g. https://discuss.python.org/t/external-hosting-linked-to-via-pypi/8917 https://discuss.python.org/t/fallback-links-on-pypi-for-the-same-file/14678. I think there is no fundamental blocker to this idea, just that someone has to come up with a convincing design. |
@soumith https://download.pytorch.org/whl/ is not PEP503 compatible, which specially describes Pypi API. For example, Please take a look at the issue in Nexus, pytorch could be installed via it. Should I create a separate issue? |
Coming to this from another angle, which is LICENSEs and only needing the torch CPU version. We have a setup, where packages from pypi are ingested into an AWS CodeArtifact repo and our teams are only allowed to use packages there. There are policies regarding which licenses of packages that are approved for ingestion which exclude proprietary ones. The nvidia-* packages have a proprietary license and are declared as requirements for the linux wheels e.g. pypi's torch-1.13.1-cp39-cp39-manylinux1_x86_64.whl. So, we're left in a non-ideal situation, wanting to adopt newer torch versions for CPU usage, but unable to use it, due to the unnecessary requirement for the nvidia-* packages. |
@smiles3983 @mironnn on issue with Nexus, I came across the same issue as you and worked with Sonatype Community to create a mirror of PyTorch index that works with Nexus Repository feedback appreciated |
What can we do to make this happen? Does this require community funding or appeal to pypi to grant torch a larger amount of space? This would be huge for many that depend on torch. I also think defaulting new versions published on pypi to CPU would decrease overall load on pypi itself since the default behavior changes to avoid pulling the heavy cuda dependencies from pypi. |
Uh oh!
There was an error while loading. Please reload this page.
🚀 Feature
Publish the CPU version of PyTorch on PyPI to make it installable in a more convenient way.
Motivation
Using PyTorch in production does not require necessarily the (~700 MB big) GPU version, but installing the much smaller CPU version as suggested on the website:
makes it hard to use tools like Poetry, which do not work with
pip
itself and therefore do not support an argument like-f https://download.pytorch.org/whl/torch_stable.html
.Pitch
Publish the CPU version (e.g. as
torch-cpu
) on PyPI.The text was updated successfully, but these errors were encountered: