docs : add Moondream2 pre-quantized link #13745

ddpasa · 2025-05-24T10:35:19Z

Moondream2 model GGUF has been updated in https://huggingface.co/vikhyatk/moondream2 to the latest version, and it works with llama.cpp. However, the model vikhyatk published does not have a default chat template. The version at https://huggingface.co/Hahasb/moondream2-20250414-GGUF has been updated with tokenizer.chat_template=vicuna, which seems to work ok, but not sure if this is the optimal setup.

Fixes #13332
Fixes vikhyat/moondream#96

Moondream2 is an crazy good model compared to its tiny size. After this is merged, I'll start experimenting with quantizations, but even the fp16 version is tiny (less than 3GB for text, less than 1GB for the mmproj).

ddpasa · 2025-05-24T10:36:14Z

@ngxson for visibility. It might be good to move the model ggufs from a private repo to the official ggml-org repo.

ngxson · 2025-05-24T12:54:48Z

Can you also share the steps and commands you used to generate the mmproj GGUF?

It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution

ddpasa · 2025-05-24T13:05:52Z

Can you also share the steps and commands you used to generate the mmproj GGUF?

It would be nice if we can add llava to convert_hf_to_gguf, but I still don't yet have time. A guide specifically for moondream can be a temporary solution

Hello @ngxson , I didn't create the mmproj. The author updated them in Huggingface a few days ago. However, that text model didn't have a chat template in it, so I just edited the gguf to add that field.

There is a create_gguf.py script in one of the branches of the moondream repo, I expect it came from there: https://github.com/vikhyat/moondream/blob/moondream-ggml/create_gguf.py

docs/multimodal.md

kth8 · 2025-05-25T21:24:59Z

I saw this model on /r/locallama the other day and benchmarks looked impressive so I ran this through a few tests with Gemini 2.5 as judge https://gist.github.com/kth8/195bfe61e8c3b2ef8cce4bf263808e2d

lus105 · 2025-05-28T06:18:27Z

Hello, is it possible to use it with detect or point methods in llama.cpp?

Multimodal: Added Moondream2 model and fixed ggml.org link

8db2386

github-actions bot added the documentation Improvements or additions to documentation label May 24, 2025

ngxson reviewed May 25, 2025

View reviewed changes

docs/multimodal.md Outdated Show resolved Hide resolved

docs/multimodal.md Outdated Show resolved Hide resolved

Apply suggestions from code review

40fb654

ngxson changed the title ~~Multimodal: Added Moondream2 model and fixed ggml.org link~~ docs : add Moondream2 pre-quantized link May 25, 2025

ngxson approved these changes May 25, 2025

View reviewed changes

ngxson merged commit a08c1d2 into ggml-org:master May 25, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs : add Moondream2 pre-quantized link #13745

docs : add Moondream2 pre-quantized link #13745

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

docs : add Moondream2 pre-quantized link #13745

docs : add Moondream2 pre-quantized link #13745

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!