8000 sycl : backend documentation review (#13544) · ggml-org/llama.cpp@725f23f · GitHub
[go: up one dir, main page]

Skip to content

Commit 725f23f

Browse files
authored
sycl : backend documentation review (#13544)
* sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table
1 parent 92ecdcc commit 725f23f

File tree

6 files changed

+96
-39
lines changed

6 files changed

+96
-39
lines changed

docs/backend/SYCL.md

Lines changed: 51 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -17,25 +17,25 @@
1717

1818
**SYCL** is a high-level parallel programming model designed to improve developers productivity writing code across various hardware accelerators such as CPUs, GPUs, and FPGAs. It is a single-source language designed for heterogeneous computing and based on standard C++17.
1919

20-
**oneAPI** is an open ecosystem and a standard-based specification, supporting multiple architectures including but not limited to intel CPUs, GPUs and FPGAs. The key components of the oneAPI ecosystem include:
20+
**oneAPI** is an open ecosystem and a standard-based specification, supporting multiple architectures including but not limited to Intel CPUs, GPUs and FPGAs. The key components of the oneAPI ecosystem include:
2121

2222
- **DPCPP** *(Data Parallel C++)*: The primary oneAPI SYCL implementation, which includes the icpx/icx Compilers.
2323
- **oneAPI Libraries**: A set of highly optimized libraries targeting multiple domains *(e.g. Intel oneMKL, oneMath and oneDNN)*.
24-
- **oneAPI LevelZero**: A high performance low level interface for fine-grained control over intel iGPUs and dGPUs.
24+
- **oneAPI LevelZero**: A high performance low level interface for fine-grained control over Intel iGPUs and dGPUs.
2525
- **Nvidia & AMD Plugins**: These are plugins extending oneAPI's DPCPP support to SYCL on Nvidia and AMD GPU targets.
2626

2727
### Llama.cpp + SYCL
2828

29-
The llama.cpp SYCL backend is designed to support **Intel GPU** firstly. Based on the cross-platform feature of SYCL, it also supports other vendor GPUs: Nvidia and AMD.
29+
The llama.cpp SYCL backend is primarily designed for **Intel GPUs**.
30+
SYCL cross-platform capabilities enable support for Nvidia GPUs as well, with limited support for AMD.
3031

3132
## Recommended Release
3233

33-
The SYCL backend would be broken by some PRs due to no online CI.
34-
35-
The following release is verified with good quality:
34+
The following releases are verified and recommended:
3635

3736
|Commit ID|Tag|Release|Verified Platform| Update date|
3837
|-|-|-|-|-|
38+
|24e86cae7219b0f3ede1d5abdf5bf3ad515cccb8|b5377 |[llama-b5377-bin-win-sycl-x64.zip](https://github.com/ggml-org/llama.cpp/releases/download/b5377/llama-b5377-bin-win-sycl-x64.zip) |ArcB580/Linux/oneAPI 2025.1<br>LNL Arc GPU/Windows 11/oneAPI 2025.1.1|2025-05-15|
3939
|3bcd40b3c593d14261fb2abfabad3c0fb5b9e318|b4040 |[llama-b4040-bin-win-sycl-x64.zip](https://github.com/ggml-org/llama.cpp/releases/download/b4040/llama-b4040-bin-win-sycl-x64.zip) |Arc770/Linux/oneAPI 2024.1<br>MTL Arc GPU/Windows 11/oneAPI 2024.1| 2024-11-19|
4040
|fb76ec31a9914b7761c1727303ab30380fd4f05c|b3038 |[llama-b3038-bin-win-sycl-x64.zip](https://github.com/ggml-org/llama.cpp/releases/download/b3038/llama-b3038-bin-win-sycl-x64.zip) |Arc770/Linux/oneAPI 2024.1<br>MTL Arc GPU/Windows 11/oneAPI 2024.1||
4141

@@ -106,15 +106,14 @@ SYCL backend supports Intel GPU Family:
106106
|-------------------------------|---------|---------------------------------------|
107107
| Intel Data Center Max Series | Support | Max 1550, 1100 |
108108
| Intel Data Center Flex Series | Support | Flex 170 |
109-
| Intel Arc Series | Support | Arc 770, 730M, Arc A750 |
110-
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake, Arrow Lake |
111-
| Intel iGPU | Support | iGPU in 13700k,iGPU in 13400, i5-1250P, i7-1260P, i7-1165G7 |
109+
| Intel Arc Series | Support | Arc 770, 730M, Arc A750, B580 |
110+
| Intel built-in Arc GPU | Support | built-in Arc GPU in Meteor Lake, Arrow Lake, Lunar Lake |
111+
| Intel iGPU | Support | iGPU in 13700k, 13400, i5-1250P, i7-1260P, i7-1165G7 |
112112

113113
*Notes:*
114114

115115
- **Memory**
116116
- The device memory is a limitation when running a large model. The loaded model size, *`llm_load_tensors: buffer_size`*, is displayed in the log when running `./bin/llama-cli`.
117-
118117
- Please make sure the GPU shared memory from the host is large enough to account for the model's size. For e.g. the *llama-2-7b.Q4_0* requires at least 8.0GB for integrated GPU and 4.0GB for discrete GPU.
119118

120119
- **Execution Unit (EU)**
@@ -138,19 +137,22 @@ Note: AMD GPU support is highly experimental and is incompatible with F16.
138137
Additionally, it only supports GPUs with a sub_group_size (warp size) of 32.
139138

140139
## Docker
141-
The docker build option is currently limited to *intel GPU* targets.
140+
141+
The docker build option is currently limited to *Intel GPU* targets.
142142

143143
### Build image
144+
144145
```sh
145146
# Using FP16
146147
docker build -t llama-cpp-sycl --build-arg="GGML_SYCL_F16=ON" --target light -f .devops/intel.Dockerfile .
147148
```
148149

149150
*Notes*:
150151

151-
To build in default FP32 *(Slower than FP16 alternative)*, you can remove the `--build-arg="GGML_SYCL_F16=ON"` argument from the previous command.
152+
To build in default FP32 *(Slower than FP16 alternative)*, set `--build-arg="GGML_SYCL_F16=OFF"` in the previous command.
152153

153154
You can also use the `.devops/llama-server-intel.Dockerfile`, which builds the *"server"* alternative.
155+
Check the [documentation for Docker](../docker.md) to see the available images.
154156

155157
### Run container
156158

@@ -250,7 +252,7 @@ sycl-ls
250252

251253
- **Intel GPU**
252254

253-
When targeting an intel GPU, the user should expect one or more level-zero devices among the available SYCL devices. Please make sure that at least one GPU is present, for instance [`level_zero:gpu`] in the sample output below:
255+
When targeting an intel GPU, the user should expect one or more devices among the available SYCL devices. Please make sure that at least one GPU is present via `sycl-ls`, for instance `[level_zero:gpu]` in the sample output below:
254256

255257
```
256258
[opencl:acc][opencl:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.10.0.17_160000]
@@ -282,7 +284,7 @@ For AMD GPUs we should expect at least one SYCL-HIP device [`hip:gpu`]:
282284

283285
#### Intel GPU
284286

285-
```
287+
```sh
286288
./examples/sycl/build.sh
287289
```
288290

@@ -351,7 +353,7 @@ cmake --build build --config Release -j -v
351353

352354
#### Retrieve and prepare model
353355

354-
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
356+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model preparation, or download an already quantized model like [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) or [Meta-Llama-3-8B-Instruct-Q4_0.gguf](https://huggingface.co/aptha/Meta-Llama-3-8B-Instruct-Q4_0-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_0.gguf).
355357

356358
##### Check device
357359

@@ -398,11 +400,15 @@ Choose one of following methods to run.
398400

399401
```sh
400402
./examples/sycl/run-llama2.sh 0
403+
# OR
404+
./examples/sycl/run-llama3.sh 0
401405
```
402406
- Use multiple devices:
403407

404408
```sh
405409
./examples/sycl/run-llama2.sh
410+
# OR
411+
./examples/sycl/run-llama3.sh
406412
```
407413

408414
2. Command line
@@ -425,13 +431,13 @@ Examples:
425431
- Use device 0:
426432

427433
```sh
428-
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -no-cnv -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm none -mg 0
434+
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -no-cnv -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 99 -sm none -mg 0
429435
```
430436

431437
- Use multiple devices:
432438

433439
```sh
434-
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -no-cnv -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 -sm layer
440+
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -no-cnv -m models/llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 99 -sm layer
435441
```
436442

437443
*Notes:*
@@ -452,7 +458,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
452458

453459
1. Install GPU driver
454460

455-
Intel GPU drivers instructions guide and download page can be found here: [Get intel GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
461+
Intel GPU drivers instructions guide and download page can be found here: [Get Intel GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
456462

457463
2. Install Visual Studio
458464

@@ -629,7 +635,7 @@ Once it is completed, final results will be in **build/Release/bin**
629635

630636
#### Retrieve and prepare model
631637

632-
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model prepration, or simply download [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) model as example.
638+
You can refer to the general [*Prepare and Quantize*](README.md#prepare-and-quantize) guide for model preparation, or download an already quantized model like [llama-2-7b.Q4_0.gguf](https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q4_0.gguf) or [Meta-Llama-3-8B-Instruct-Q4_0.gguf](https://huggingface.co/aptha/Meta-Llama-3-8B-Instruct-Q4_0-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_0.gguf).
633639

634640
##### Check device
635641

@@ -648,7 +654,7 @@ Similar to the native `sycl-ls`, available SYCL devices can be queried as follow
648654
build\bin\llama-ls-sycl-device.exe
649655
```
650656

651-
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *intel GPU* it would look like the following:
657+
This command will only display the selected backend that is supported by SYCL. The default backend is level_zero. For example, in a system with 2 *Intel GPU* it would look like the following:
652658
```
653659
found 2 SYCL devices:
654660
| | | |Compute |Max compute|Max work|Max sub| |
@@ -658,13 +664,14 @@ found 2 SYCL devices:
658664
| 1|[level_zero:gpu:1]| Intel(R) UHD Graphics 770| 1.3| 32| 512| 32| 53651849216|
659665
660666
```
667+
661668
#### Choose level-zero devices
662669

663670
|Chosen Device ID|Setting|
664671
|-|-|
665-
|0|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"` or no action|
672+
|0|Default option. You may also want to `set ONEAPI_DEVICE_SELECTOR="level_zero:0"`|
666673
|1|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
667-
|0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"`|
674+
|0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"` or `set ONEAPI_DEVICE_SELECTOR="level_zero:*"`|
668675

669676
#### Execute
670677

@@ -673,7 +680,13 @@ Choose one of following methods to run.
673680
1. Script
674681

675682
```
676-
examples\sycl\win-run-llama2.bat
683+
examples\sycl\win-run-llama-2.bat
684+
```
685+
686+
or
687+
688+
```
689+
examples\sycl\win-run-llama-3.bat
677690
```
678691

679692
2. Command line
@@ -697,13 +710,13 @@ Examples:
697710
- Use device 0:
698711

699712
```
700-
build\bin\llama-cli.exe -no-cnv -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm none -mg 0
713+
build\bin\llama-cli.exe -no-cnv -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 99 -sm none -mg 0
701714
```
702715

703716
- Use multiple devices:
704717

705718
```
706-
build\bin\llama-cli.exe -no-cnv -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 33 -s 0 -sm layer
719+
build\bin\llama-cli.exe -no-cnv -m models\llama-2-7b.Q4_0.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 99 -sm layer
707720
```
708721

709722

@@ -714,7 +727,9 @@ Note:
714727
```sh
715728
detect 1 SYCL GPUs: [0] with top Max compute units:512
716729
```
730+
717731
Or
732+
718733
```sh
719734
use 1 SYCL GPUs: [0] with Max compute units:512
720735
```
@@ -726,15 +741,17 @@ use 1 SYCL GPUs: [0] with Max compute units:512
726741

727742
| Name | Value | Function |
728743
|--------------------|---------------------------------------|---------------------------------------------|
729-
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path.<br>FP32 path - recommended for better perforemance than FP16 on quantized model|
744+
| GGML_SYCL | ON (mandatory) | Enable build with SYCL code path. |
730745
| GGML_SYCL_TARGET | INTEL *(default)* \| NVIDIA \| AMD | Set the SYCL target device type. |
731746
| GGML_SYCL_DEVICE_ARCH | Optional (except for AMD) | Set the SYCL device architecture, optional except for AMD. Setting the device architecture can improve the performance. See the table [--offload-arch](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OffloadDesign.md#--offload-arch) for a list of valid architectures. |
732-
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. |
747+
| GGML_SYCL_F16 | OFF *(default)* \|ON *(optional)* | Enable FP16 build with SYCL code path. (1.) |
733748
| GGML_SYCL_GRAPH | ON *(default)* \|OFF *(Optional)* | Enable build with [SYCL Graph extension](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc). |
734749
| GGML_SYCL_DNN | ON *(default)* \|OFF *(Optional)* | Enable build with oneDNN. |
735750
| CMAKE_C_COMPILER | `icx` *(Linux)*, `icx/cl` *(Windows)* | Set `icx` compiler for SYCL code path. |
736751
| CMAKE_CXX_COMPILER | `icpx` *(Linux)*, `icx` *(Windows)* | Set `icpx/icx` compiler for SYCL code path. |
737752

753+
1. FP16 is recommended for better prompt processing performance on quantized models. Performance is equivalent in text generation but set `GGML_SYCL_F16=OFF` if you are experiencing issues with FP16 builds.
754+
738755
#### Runtime
739756

740757
| Name | Value | Function |
@@ -752,7 +769,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
752769

753770
## Q&A
754771

755-
- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
772+
- Error: `error while loading shared libraries: libsycl.so: cannot open shared object file: No such file or directory`.
756773

757774
- Potential cause: Unavailable oneAPI installation or not set ENV variables.
758775
- Solution: Install *oneAPI base toolkit* and enable its ENV through: `source /opt/intel/oneapi/setvars.sh`.
@@ -781,18 +798,18 @@ use 1 SYCL GPUs: [0] with Max compute units:512
781798

782799
It's same for other projects including llama.cpp SYCL backend.
783800

784-
- Meet issue: `Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -999 (UNKNOWN PI error)` or `failed to allocate SYCL0 buffer`
801+
- `Native API failed. Native API returns: 39 (UR_RESULT_ERROR_OUT_OF_DEVICE_MEMORY)`, `ggml_backend_sycl_buffer_type_alloc_buffer: can't allocate 3503030272 Bytes of memory on device`, or `failed to allocate SYCL0 buffer`
785802

786-
Device Memory is not enough.
803+
You are running out of Device Memory.
787804

788805
|Reason|Solution|
789806
|-|-|
790-
|Default Context is too big. It leads to more memory usage.|Set `-c 8192` or smaller value.|
791-
|Model is big and require more memory than device's.|Choose smaller quantized model, like Q5 -> Q4;<br>Use more than one devices to load model.|
807+
| The default context is too big. It leads to excessive memory usage.|Set `-c 8192` or a smaller value.|
808+
| The model is too big and requires more memory than what is available.|Choose a smaller model or change to a smaller quantization, like Q5 -> Q4;<br>Alternatively, use more than one device to load model.|
792809

793810
### **GitHub contribution**:
794-
Please add the **[SYCL]** prefix/tag in issues/PRs titles to help the SYCL-team check/address them without delay.
811+
Please add the `SYCL :` prefix/tag in issues/PRs titles to help the SYCL contributors to check/address them without delay.
795812

796813
## TODO
797814

798-
- NA
815+
- Review ZES_ENABLE_SYSMAN: https://github.com/intel/compute-runtime/blob/master/programmers-guide/SYSMAN.md#support-and-limitations

docs/docker.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ Additionally, there the following images, similar to the above:
2222
- `ghcr.io/ggml-org/llama.cpp:full-musa`: Same as `full` but compiled with MUSA support. (platforms: `linux/amd64`)
2323
- `ghcr.io/ggml-org/llama.cpp:light-musa`: Same as `light` but compiled with MUSA support. (platforms: `linux/amd64`)
2424
- `ghcr.io/ggml-org/llama.cpp:server-musa`: Same as `server` but compiled with MUSA support. (platforms: `linux/amd64`)
25+
- `ghcr.io/ggml-org/llama.cpp:full-intel`: Same as `full` but compiled with SYCL support. (platforms: `linux/amd64`)
26+
- `ghcr.io/ggml-org/llama.cpp:light-intel`: Same as `light` but compiled with SYCL support. (platforms: `linux/amd64`)
27+
- `ghcr.io/ggml-org/llama.cpp:server-intel`: Same as `server` but compiled with SYCL support. (platforms: `linux/amd64`)
2528

2629
The GPU enabled images are not currently tested by CI beyond being built. They are not built with any variation from the ones in the Dockerfiles defined in [.devops/](../.devops/) and the GitHub Action defined in [.github/workflows/docker.yml](../.github/workflows/docker.yml). If you need different settings (for example, a different CUDA, ROCm or MUSA library, you'll need to build the images locally for now).
2730

examples/sycl/run-llama2.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,16 @@ source /opt/intel/oneapi/setvars.sh
1212

1313
INPUT_PROMPT="Building a website can be done in 10 simple steps:\nStep 1:"
1414
MODEL_FILE=models/llama-2-7b.Q4_0.gguf
15-
NGL=33
16-
CONEXT=4096
15+
NGL=99
16+
CONTEXT=4096
1717

1818
if [ $# -gt 0 ]; then
1919
GGML_SYCL_DEVICE=$1
2020
echo "use $GGML_SYCL_DEVICE as main GPU"
2121
#use signle GPU only
22-
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -s 0 -c ${CONEXT} -mg $GGML_SYCL_DEVICE -sm none
22+
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -s 0 -c ${CONTEXT} -mg $GGML_SYCL_DEVICE -sm none
2323

2424
else
2525
#use multiple GPUs with same max compute units
26-
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -s 0 -c ${CONEXT}
26+
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -s 0 -c ${CONTEXT}
2727
fi

examples/sycl/run-llama3.sh

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/bash
2+
3+
# MIT license
4+
# Copyright (C) 2025 Intel Corporation
5+
# SPDX-License-Identifier: MIT
6+
7+
# If you want more control, DPC++ Allows selecting a specific device through the
8+
# following environment variable
9+
#export ONEAPI_DEVICE_SELECTOR="level_zero:0"
10+
source /opt/intel/oneapi/setvars.sh
11+
12+
#export GGML_SYCL_DEBUG=1
13+
14+
#ZES_ENABLE_SYSMAN=1, Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory. Recommended to use when --split-mode = layer.
15+
16+
INPUT_PROMPT="Building a website can be done in 10 simple steps:\nStep 1:"
17+
MODEL_FILE=models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
18+
NGL=99 # Layers offloaded to the GPU. If the device runs out of memory, reduce this value according to the model you are using.
19+
CONTEXT=4096
20+
21+
if [ $# -gt 0 ]; then
22+
GGML_SYCL_DEVICE=$1
23+
echo "Using $GGML_SYCL_DEVICE as the main GPU"
24+
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -c ${CONTEXT} -mg $GGML_SYCL_DEVICE -sm none
25+
else
26+
#use multiple GPUs with same max compute units
27+
ZES_ENABLE_SYSMAN=1 ./build/bin/llama-cli -m ${MODEL_FILE} -p "${INPUT_PROMPT}" -n 400 -e -ngl ${NGL} -c ${CONTEXT}
28+
fi

examples/sycl/win-run-llama2.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
66
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
77

88

9-
.\build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 33 -s 0
9+
.\build\bin\llama-cli.exe -m models\llama-2-7b.Q4_0.gguf -p %INPUT2% -n 400 -e -ngl 99 -s 0

examples/sycl/win-run-llama3.bat

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
:: MIT license
2+
:: Copyright (C) 2024 Intel Corporation
3+
:: SPDX-License-Identifier: MIT
4+
5+
set INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
6+
@call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" intel64 --force
7+
8+
9+
.\build\bin\llama-cli.exe -m models\Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -p %INPUT2% -n 400 -e -ngl 99

0 commit comments

Comments
 (0)
0