8000 Small files on portable drives · Issue #757 · winpython/winpython · GitHub
[go: up one dir, main page]

Skip to content

Small files on portable drives #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
peripatetes opened this issue May 23, 2019 · 8 comments
Open

Small files on portable drives #757

peripatetes opened this issue May 23, 2019 · 8 comments
Labels
Procedure Procedure

Comments

@peripatetes
Copy link

WinPython includes a ton of tiny files. Of the 77898 files in my WinPython install, the median is 2940 bytes and 92% of the files are less than 32KB.

Since WinPython's portability is a major selling point, some people (myself included) will be using it on thumb drives. The default ExFAT cluster size for modern drives is either 32KB (drives up to 32GB) or 128KB (from 32GB up to the ExFAT maximum size, 256 TB). When a single-byte file takes 128KB on disk, then because WinPython has so many small files, even though it's only 2.5GB it takes nearly 12GB on disk. Massive wasted space.

(This would also apply to any disk where people have a large cluster size, but NTFS keeps 4K sector sizes by default all the way to 16TB, ergo not worth worrying about.)

In addition, many thumb drives perform much much worse with large collections of small files. Thankfully that's improved a good bit on the last 5 years and is mostly only obvious on writes, giving WinPython a painfully slow install process but usually OK performance after install.

In other languages, shared libraries, JARs, etc avoid this type of problem. I don't know enough about the status of egg / wheel / zipimport etc to know whether something like that would be a reasonable option.

It's worth at least simply putting a notice in your documentation: "If you're actually using our portable python on a portable drive you may want to format your drive with a non-default cluster size to save tons of space."

@stonebig
Copy link
Contributor

Your effort to act on the root cause problem upstream are welcomed.

I never expected WinPython to be "good enough" on a usb stick, that's interesting.

the Wiki is open for your advice.

@stonebig
Copy link
Contributor
stonebig commented May 26, 2019

truly the best solutions would be to store little files:

@peripatetes
Copy link
Author

This came to mind again so I made a change to the installation section of the wiki to include space requirements and mention this issue.

A fast decompressor like zstd would be good, but unlike zip, zstd is just a compressor not an archiver, and usually used with a separate archiver like tar. (That's why JAR uses zip and why Python eggs used it too.) Accessing an individual file in a tar.zst essentially requires decompressing the entire archive. So another compression-aware archive format would be required.

@stonebig
Copy link
Contributor

Wait for the moment 7zip will support zstd.

@stonebig
Copy link
Contributor
stonebig commented Jan 5, 2020

It's not a WinPython-level problem. It's in the nature of the FileSystem and Physical Support you use.

So:

  • either you just use a "limited" version of WinPython, like WinPythonDot or WinPythonZero,
  • either a virtualisation layer, or a better FileSystem+Physical Support combination, solves it for you,
  • either the Python fundamentals or node-js fundamentals tackle it,

@stonebig
Copy link
Contributor
stonebig commented Jul 21, 2024

apparently the moment is now, for reading: windows an 7zip can read zstd compression

@stonebig stonebig added the Procedure Procedure label May 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Procedure Procedure
Projects
None yet
Development

No branches or pull requests

3 participants
@stonebig @peripatetes and others
0