-
Notifications
You must be signed in to change notification settings - Fork 317
Small files on portable drives #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Your effort to act on the root cause problem upstream are welcomed. I never expected WinPython to be "good enough" on a usb stick, that's interesting. the Wiki is open for your advice. |
truly the best solutions would be to store little files:
|
This came to mind again so I made a change to the installation section of the wiki to include space requirements and mention this issue. A fast decompressor like zstd would be good, but unlike zip, zstd is just a compressor not an archiver, and usually used with a separate archiver like tar. (That's why JAR uses zip and why Python eggs used it too.) Accessing an individual file in a tar.zst essentially requires decompressing the entire archive. So another compression-aware archive format would be required. |
Wait for the moment 7zip will support zstd. |
It's not a WinPython-level problem. It's in the nature of the FileSystem and Physical Support you use. So:
|
apparently the moment is now, for reading: windows an 7zip can read zstd compression |
WinPython includes a ton of tiny files. Of the 77898 files in my WinPython install, the median is 2940 bytes and 92% of the files are less than 32KB.
Since WinPython's portability is a major selling point, some people (myself included) will be using it on thumb drives. The default ExFAT cluster size for modern drives is either 32KB (drives up to 32GB) or 128KB (from 32GB up to the ExFAT maximum size, 256 TB). When a single-byte file takes 128KB on disk, then because WinPython has so many small files, even though it's only 2.5GB it takes nearly 12GB on disk. Massive wasted space.
(This would also apply to any disk where people have a large cluster size, but NTFS keeps 4K sector sizes by default all the way to 16TB, ergo not worth worrying about.)
In addition, many thumb drives perform much much worse with large collections of small files. Thankfully that's improved a good bit on the last 5 years and is mostly only obvious on writes, giving WinPython a painfully slow install process but usually OK performance after install.
In other languages, shared libraries, JARs, etc avoid this type of problem. I don't know enough about the status of egg / wheel / zipimport etc to know whether something like that would be a reasonable option.
It's worth at least simply putting a notice in your documentation: "If you're actually using our portable python on a portable drive you may want to format your drive with a non-default cluster size to save tons of space."
The text was updated successfully, but these errors were encountered: