convert_hf : faster lazy safetensors #8482
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, with Lazy conversion, a relatively big portion of the model files is read before even beginning to write the output file, and then if the disk cache it smaller than the model, it will be read from disk again when actually converting.
Most of the time in the initial read is spent on
model_part.get_tensor(name)
(at least when usingsafetensors
).Turns out
safetensors
has the much faster.get_slice(name)
which doesn't read the tensor data before it's needed, while still giving access to the shape and dtype of each tensor.As a nice result, this makes
convert_hf_to_gguf.py --dry-run
much, much faster than before for slow disks and/or big models (seconds instead of minutes). Normal lazy conversion is also faster, since the initial metadata reading step doesn't unnecessarily read all the data anymore.Note that I've also removed some unused code in
gguf-py/gguf/tensor_mapping.py
related to the number of experts.xid
does not exist in the mappings since stacked experts were implemented, so.format(xid = xid)
does not do anything.Testing
After fixing the problem found in #8482 (comment), I've ran some more tests.
-no-slices-
meansmaster
at commit 97bdd26, while-slices-recurse-
means after the memory leak was fixed in b971122.