@@ -104,6 +104,7 @@ CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
104
104
### Windows Notes
105
105
106
106
If you run into issues where it complains it can't find ` 'nmake' ` ` '?' ` or CMAKE_C_COMPILER, you can extract w64devkit as [ mentioned in llama.cpp repo] ( https://github.com/ggerganov/llama.cpp#openblas ) and add those manually to CMAKE_ARGS before running ` pip ` install:
107
+
107
108
``` ps
108
109
$env:CMAKE_GENERATOR = "MinGW Makefiles"
109
110
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.exe -DCMAKE_CXX_COMPILER=C:/w64devkit/bin/g++.exe"
@@ -118,17 +119,19 @@ Detailed MacOS Metal GPU install documentation is available at [docs/install/mac
118
119
#### M1 Mac Performance Issue
119
120
120
121
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
121
- ```
122
+
123
+ ``` bash
122
124
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
123
125
bash Miniforge3-MacOSX-arm64.sh
124
126
```
127
+
125
128
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
126
129
127
130
#### M Series Mac Error: ` (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')) `
128
131
129
132
Try installing with
130
133
131
- ```
134
+ ``` bash
132
135
CMAKE_ARGS=" -DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
133
136
```
134
137
@@ -152,7 +155,12 @@ Below is a short example demonstrating how to use the high-level API to for basi
152
155
153
156
``` python
154
157
>> > from llama_cpp import Llama
155
- >> > llm = Llama(model_path = " ./models/7B/llama-model.gguf" )
158
+ >> > llm = Llama(
159
+ model_path = " ./models/7B/llama-model.gguf" ,
160
+ # n_gpu_layers=-1, # Uncomment to use GPU acceleration
161
+ # seed=1337, # Uncomment to set a specific seed
162
+ # n_ctx=2048, # Uncomment to increase the context window
163
+ )
156
164
>> > output = llm(
157
165
" Q: Name the planets in the solar system? A: " , # Prompt
158
166
max_tokens = 32 , # Generate up to 32 tokens
@@ -191,7 +199,10 @@ Note that `chat_format` option must be set for the particular model you are usin
191
199
192
200
``` python
193
201
>> > from llama_cpp import Llama
194
- >> > llm = Llama(model_path = " path/to/llama-2/llama-model.gguf" , chat_format = " llama-2" )
202
+ >> > llm = Llama(
203
+ model_path = " path/to/llama-2/llama-model.gguf" ,
204
+ chat_format = " llama-2"
205
+ )
195
206
>> > llm.create_chat_completion(
196
207
messages = [
197
208
{" role" : " system" , " content" : " You are an assistant who perfectly describes images." },
0 commit comments