Open
Description
I came across a model on Huggingface that supports Llama3 multimodal Bunny-Llama-3-8B-V: bunny-llama, and I'd like to be able to deploy it using llama-cpp-python!
But I found that the existing chat_format:llama-3 doesn't seem to support running it.
I converted it to gguf format via llama.cpp and ran it with the following configuration
python llama.cpp/convert.py \
Bunny-Llama-3-8B-V --outtype f16 \
--outfile converted.bin \
--vocab-type bpe
{
"host": "0.0.0.0",
"port": 8080,
"api_key":"xx",
"models": [
{
"model": "bunny-llama.gguf",
"model_alias": "bunny-llama",
"chat_format": "llama-3",
"n_gpu_layers": -1,
"offload_kqv": true,
"n_threads": 12,
"n_batch": 512,
"n_ctx": 2048
}
]
}
python3 -m llama_cpp.server \
--config_file bunny-llama.json
Metadata
Metadata
Assignees
Labels
No labels