Closed
Description
Is your feature request related to a problem? Please describe.
Current high-level implementation of multimodality is relying on a specific prompt format.
Describe the solution you'd like
Models like Obsidian work with llama.cpp server and have a different format. It would be nice to have a high-level API for multimodality in llama-cpp-python to be able to pass image
/images
as an argument after initializing Llama()
with all the paths to required extra-models, without relying on a pre-defined prompt format such as Llava15ChatHandler
.
Describe alternatives you've considered
Alternatively, a custom prompt format class that supports images can be implemented, where prompt string is passed as an argument.