8000 docs : add Moondream2 pre-quantized link by ddpasa · Pull Request #13745 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

docs : add Moondream2 pre-quantized link #13745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 25, 2025
Merged

Conversation

ddpasa
Copy link
Contributor
@ddpasa ddpasa commented May 24, 2025

Moondream2 model GGUF has been updated in https://huggingface.co/vikhyatk/moondream2 to the latest version, and it works with llama.cpp. However, the model vikhyatk published does not have a default chat template. The version at https://huggingface.co/Hahasb/moondream2-20250414-GGUF has been updated with tokenizer.chat_template=vicuna, which seems to work ok, but not sure if this is the optimal setup.

Fixes #13332
Fixes vikhyat/moondream#96

Moondream2 is an crazy good model compared to its tiny size. After this is merged, I'll start experimenting with quantizations, but even the fp16 version is tiny (less than 3GB for text, less than 1GB for the mmproj).

@ddpasa
Copy link
Contributor Author
ddpasa commented May 24, 2025

@ngxson for visibility. It might be good to move the model ggufs from a private repo to the official ggml-org repo.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 24, 2025
@ngxson
Copy link
Collaborator
ngxson commented May 24, 2025

Can you also share the steps and commands you used to generate the mmproj GGUF?

It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution

@ddpasa
Copy link
Contributor Author
ddpasa commented May 24, 2025

Can you also share the steps and commands you used to generate the mmproj GGUF?

It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution

Hello @ngxson , I didn't create the mmproj. The author updated them in Huggingface a few days ago. However, that text model didn't have a chat template in it, so I just edited the gguf to add that field.

There is a create_gguf.py script in one of the branches of the moondream repo, I expect it came from there: https://github.com/vikhyat/moondream/blob/moondream-ggml/create_gguf.py

@ngxson ngxson changed the title Multimodal: Added Moondream2 model and fixed ggml.org link docs : add Moondream2 pre-quantized link May 25, 2025
@ngxson ngxson merged commit a08c1d2 into ggml-org:master May 25, 2025
2 checks passed
@kth8
Copy link
kth8 commented May 25, 2025

I saw this model on /r/locallama the other day and benchmarks looked impressive so I ran this through a few tests with Gemini 2.5 as judge https://gist.github.com/kth8/195bfe61e8c3b2ef8cce4bf263808e2d

@lus105
Copy link
lus105 commented May 28, 2025

Hello, is it possible to use it with detect or point methods in llama.cpp?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to run on llama.cpp Feature Request: moondream2 vlm support in mtmd
4 participants
0