8000 docs: Update README with more param common examples · sjanaX01/llama-cpp-python@5b258bf · GitHub
[go: up one dir, main page]

Skip to content

Commit 5b258bf

Browse files
committed
docs: Update README with more param common examples
1 parent c343baa commit 5b258bf

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

README.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,7 @@ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
104104
### Windows Notes
105105

106106
If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install:
107+
107108
```ps
108109
$env:CMAKE_GENERATOR = "MinGW Makefiles"
109110
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"
@@ -118,17 +119,19 @@ Detailed MacOS Metal GPU install documentation is available at [docs/install/mac
118119
#### M1 Mac Performance Issue
119120

120121
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
121-
```
122+
123+
```bash
122124
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
123125
bash Miniforge3-MacOSX-arm64.sh
124126
```
127+
125128
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
126129

127130
#### M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`
128131

129132
Try installing with
130133

131-
```
134+
```bash
132135
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
133136
```
134137

@@ -152,7 +155,12 @@ Below is a short example demonstrating how to use the high-level API to for basi
152155

153156
```python
154157
>>> from llama_cpp import Llama
155-
>>> llm = Llama(model_path="./models/7B/llama-model.gguf")
158+
>>> llm = Llama(
159+
model_path="./models/7B/llama-model.gguf",
160+
# n_gpu_layers=-1, # Uncomment to use GPU acceleration
161+
# seed=1337, # Uncomment to set a specific seed
162+
# n_ctx=2048, # Uncomment to increase the context window
163+
)
156164
>>> output = llm(
157165
"Q: Name the planets in the solar system? A: ", # Prompt
158166
max_tokens=32, # Generate up to 32 tokens
@@ -191,7 +199,10 @@ Note that `chat_format` option must be set for the particular model you are usin
191199

192200
```python
193201
>>> from llama_cpp import Llama
194-
>>> llm = Llama(model_path="path/to/llama-2/llama-model.gguf", chat_format="llama-2")
202+
>>> llm = Llama(
203+
model_path="path/to/llama-2/llama-model.gguf",
204+
chat_format="llama-2"
205+
)
195206
>>> llm.create_chat_completion(
196207
messages = [
197208
{"role": "system", "content": "You are an assistant who perfectly describes images."},

0 commit comments

Comments
 (0)
0