You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -268,9 +269,9 @@ Below is a short example demonstrating how to use the high-level API to for basi
268
269
269
270
Text completion is available through the [`__call__`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.__call__) and [`create_completion`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.create_completion) methods of the [`Llama`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama) class.
270
271
271
-
## Pulling models from Hugging Face
272
+
###Pulling models from Hugging Face Hub
272
273
273
-
You can pull`Llama` models from Hugging Face using the `from_pretrained` method.
274
+
You can download`Llama` models in `gguf` format directly from Hugging Face using the [`from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.from_pretrained) method.
274
275
You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`).
275
276
276
277
```python
@@ -281,7 +282,7 @@ llm = Llama.from_pretrained(
281
282
)
282
283
```
283
284
284
-
By default the `from_pretrained` method will download the model to the huggingface cache directory so you can manage installed model files with the `huggingface-cli` tool.
285
+
By default [`from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.from_pretrained)will download the model to the huggingface cache directory, you can then manage installed model files with the [`huggingface-cli`](https://huggingface.co/docs/huggingface_hub/en/guides/cli) tool.
285
286
286
287
### Chat Completion
287
288
@@ -308,13 +309,16 @@ Note that `chat_format` option must be set for the particular model you are usin
308
309
309
310
Chat completion is available through the [`create_chat_completion`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.create_chat_completion) method of the [`Llama`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama) class.
310
311
312
+
For OpenAI API v1 compatibility, you use the [`create_chat_completion_openai_v1`](https://llama-cpp-python.readthedocs.io/en/latest/api-reference/#llama_cpp.Llama.create_chat_completion_openai_v1) method which will return pydantic models instead of dicts.
313
+
314
+
311
315
### JSON and JSON Schema Mode
312
316
313
-
If you want to constrain chat responses to only valid JSON or a specific JSON Schema you can use the `response_format` argument to the `create_chat_completion` method.
317
+
To constrain chat responses to only valid JSON or a specific JSON Schema use the `response_format` argument in [`create_chat_completion`](http://localhost:8000/api-reference/#llama_cpp.Llama.create_chat_completion).
314
318
315
319
#### JSON Mode
316
320
317
-
The following example will constrain the response to be valid JSON.
321
+
The following example will constrain the response to valid JSON strings only.
318
322
319
323
```python
320
324
>>>from llama_cpp import Llama
@@ -336,7 +340,7 @@ The following example will constrain the response to be valid JSON.
336
340
337
341
#### JSON Schema Mode
338
342
339
-
To constrain the response to a specific JSON Schema, you can use the `schema` property of the `response_format` argument.
343
+
To constrain the response further to a specific JSON Schema add the schema to the `schema` property of the `response_format` argument.
340
344
341
345
```python
342
346
>>>from llama_cpp import Llama
@@ -471,7 +475,7 @@ llama = Llama(
471
475
472
476
### Embeddings
473
477
474
-
`llama-cpp-python` supports generating embeddings from the text.
478
+
To generate text embeddings use [`create_embedding`](http://localhost:8000/api-reference/#llama_cpp.Llama.create_embedding).
0 commit comments