10000 docs: Update high-level python api examples in README to include chat… · jerryrelmore/llama-cpp-python@bd43fb2 · GitHub
[go: up one dir, main page]

Skip to content

Commit bd43fb2

Browse files
committed
docs: Update high-level python api examples in README to include chat formats, function calling, and multi-modal models.
1 parent d977b44 commit bd43fb2

File tree

1 file changed

+112
-2
lines changed

1 file changed

+112
-2
lines changed

README.md

Lines changed: 112 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -110,12 +110,17 @@ Detailed MacOS Metal GPU install documentation is available at [docs/install/mac
110110

111111
The high-level API provides a simple managed interface through the `Llama` class.
112112

113-
Below is a short example demonstrating how to use the high-level API to generate text:
113+
Below is a short example demonstrating how to use the high-level API to for basic text completion:
114114

115115
```python
116116
>>> from llama_cpp import Llama
117117
>>> llm = Llama(model_path="./models/7B/llama-model.gguf")
118-
>>> output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
118+
>>> output = llm(
119+
"Q: Name the planets in the solar system? A: ", # Prompt
120+
max_tokens=32, # Generate up to 32 tokens
121+
stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
122+
echo=True # Echo the prompt back in the output
123+
)
119124
>>> print(output)
120125
{
121126
"id": "cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
@@ -138,6 +143,111 @@ Below is a short example demonstrating how to use the high-level API to generate
138143
}
139144
```
140145

146+
### Chat Completion
147+
148+
The high-level API also provides a simple interface for chat completion.
149+
150+
Note that `chat_format` option must be set for the particular model you are using.
151+
152+
```python
153+
>>> from llama_cpp import Llama
154+
>>> llm = Llama(model_path="path/to/llama-2/llama-model.gguf", chat_format="llama-2")
155+
>>> llm.create_chat_completion(
156+
messages = [
157+
{"role": "system", "content": "You are an assistant who perfectly describes images."},
158+
{
159+
"role": "user",
160+
"content": "Describe this image in detail please."
161+
}
162+
]
163+
)
164+
```
165+
166+
### Function Calling
167+
168+
The high-level API also provides a simple interface for function calling.
169+
170+
Note that the only model that supports full function calling at this time is "functionary".
171+
The gguf-converted files for this model can be found here: [functionary-7b-v1](https://huggingface.co/abetlen/functionary-7b-v1-GGUF)
172+
173+
174+
```python
175+
>>> from llama_cpp import Llama
176+
>>> llm = Llama(model_path="path/to/functionary/llama-model.gguf", chat_format="functionary")
177+
>>> llm.create_chat_completion(
178+
messages = [
179+
{
180+
"role": "system",
181+
"content": "A chat between a curious user and an artificial intelligence assitant. The assistant gives helpful, detailed, and polite answers to the user's questions. The assistant callse functions with appropriate input when necessary"
182+
},
183+
{
184+
"role": "user",
185+
"content": "Extract Jason is 25 years old"
186+
}
187+
],
188+
tools=[{
189+
"type": "function",
190+
"function": {
191+
"name": "UserDetail",
192+
"parameters": {
193+
"type": "object"
194+
"title": "UserDetail",
195+
"properties": {
196+
"name": {
197+
"title": "Name",
198+
"type": "string"
199+
},
200+
"age": {
201+
"title": "Age",
202+
"type": "integer"
203+
}
204+
},
205+
"required": [ "name", "age" ]
206+
}
207+
}
208+
}],
209+
tool_choices=[{
210+
"type": "function",
211+
"function": {
212+
"name": "UserDetail"
213+
}
214+
}]
215+
)
216+
```
217+
218+
### Multi-modal Models
219+
220+
221+
`llama-cpp-python` supports the llava1.5 family of multi-modal models which allow the language model to
222+
read information from both text and images.
223+
224+
You'll first need to download one of the available multi-modal models in GGUF format:
225+
226+
- [llava-v1.5-7b](https://huggingface.co/mys/ggml_llava-v1.5-7b)
227+
- [llava-v1.5-13b](https://huggingface.co/mys/ggml_llava-v1.5-13b)
228+
- [bakllava-1-7b](https://huggingface.co/mys/ggml_bakllava-1)
229+
230+
Then you'll need to use a custom chat handler to load the clip model and process the chat messages and images.
231+
232+
```python
233+
>>> from llama_cpp import Llama
234+
>>> from llama_cpp.llama_chat_format import Llava15ChatHandler
235+
>>> chat_handler = Llava15ChatHandler(clip_model_path="path/to/llava/mmproj.bin")
236+
>>> llm = Llama(model_path="./path/to/llava/llama-model.gguf", chat_handler=chat_handler)
237+
>>> llm.create_chat_completion(
238+
messages = [
239+
{"role": "system", "content": "You are an assistant who perfectly describes images."},
240+
{
241+
"role": "user",
242+
"content": [
243+
{"type": "image_url", "image_url": {"url": "https://.../image.png"}},
244+
{"type" : "text", "text": "Describe this image in detail please."}
245+
]
246+
}
247+
]
248+
)
249+
```
250+
141251
### Adjusting the Context Window
142252

143253
The context window of the Llama models determines the maximum number of tokens that can be processed at once. By default, this is set to 512 tokens, but can be adjusted based on your requirements.

0 commit comments

Comments
 (0)
0