Multimodal Llama3 Support


I came across a model on Huggingface that supports Llama3 multimodal [Bunny-Llama-3-8B-V: bunny-llama](https://huggingface.co/BAAI/Bunny-Llama-3-8B-V), and I'd like to be able to deploy it using llama-cpp-python!

But I found that the existing [chat_format:llama-3](https://github.com/abetlen/llama-cpp-python/blob/main/llama_cpp/llama_chat_format.py#L931) doesn't seem to support running it.

I converted it to gguf format via llama.cpp and ran it with the following configuration

```shell
python llama.cpp/convert.py \
Bunny-Llama-3-8B-V --outtype f16 \
--outfile converted.bin \
--vocab-type bpe
```

```json
{
    "host": "0.0.0.0",
    "port": 8080,
    "api_key":"xx",
    "models": [
        {
            "model": "bunny-llama.gguf",
            "model_alias": "bunny-llama",
            "chat_format": "llama-3",
            "n_gpu_layers": -1,
            "offload_kqv": true,
            "n_threads": 12,
            "n_batch": 512,
            "n_ctx": 2048
        }
    ]    
}
```
```shell
python3 -m llama_cpp.server \
--config_file bunny-llama.json
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions