You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+60-1Lines changed: 60 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,66 @@ Welcome to **llama-cpp-delphi**, the Delphi bindings for [llama.cpp](https://git
36
36
37
37
### Libraries
38
38
39
-
The necessary **llama.cpp** libraries are distributed as part of the releases of this repository. You can find them under the "Release" section in the repository.
39
+
The necessary **llama.cpp** libraries are distributed as part of the releases of this repository. You can find them under the "Release" section in the repository. Here's an explanation of the libraries available:
40
+
41
+
#### CPU Build
42
+
43
+
CPU-only builds for Windows, Linux, and macOS. Inference runs slow on CPU—consider using a GPU-based library.
44
+
45
+
#### BLAS Build
46
+
47
+
Building the program with BLAS support may lead to some performance improvements in prompt processing using batch sizes higher than 32 (the default is 512). Using BLAS doesn't affect the generation performance. There are several different BLAS implementations available for build and use:
48
+
49
+
-**Accelerate Framework**: Available on macOS, enabled by default.
50
+
-**OpenBLAS**: Provides CPU-based BLAS acceleration. Ensure OpenBLAS is installed on your machine.
51
+
-**BLIS**: A high-performance portable BLAS framework. [Learn more](https://github.com/flame/blis).
52
+
-**Intel oneMKL**: Optimized for Intel processors, supporting advanced instruction sets like avx\_vnni.
53
+
54
+
#### SYCL
55
+
56
+
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators.
57
+
58
+
llama.cpp based on SYCL is used to **support Intel GPU** (Data Center Max series, Flex series, Arc series, Built-in GPU and iGPU).
59
+
60
+
For detailed info, please refer to [[llama.cpp for SYCL](./backend/SYCL.md)](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md).
61
+
62
+
#### Metal Build
63
+
64
+
On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU.
65
+
66
+
When built with Metal support, you can explicitly disable GPU inference with the `--n-gpu-layers 0` option in the Llama settings.
67
+
68
+
#### CUDA
69
+
70
+
Provides GPU acceleration using an NVIDIA GPU. [Refer to the CUDA guide](https://github.com/ggerganov/llama.cpp/blob/master/docs/cuda-fedora.md) for Fedora setup.
71
+
72
+
#### Vulkan
73
+
74
+
Vulkan provides GPU acceleration through a modern, low-overhead API. To use Vulkan:
75
+
76
+
* Ensure Vulkan is installed and supported by your GPU drivers.
77
+
78
+
Learn more at the [official Vulkan site](https://vulkan.org).
79
+
80
+
#### Kompute
81
+
82
+
Kompute offers efficient compute operations for GPU workloads. It's designed for AI inference tasks and works seamlessly with Vulkan.
83
+
84
+
#### CANN
85
+
86
+
Provides NPU acceleration using the AI cores of Ascend NPUs. [Learn more about CANN](https://www.hiascend.com/en/software/cann).
87
+
88
+
#### SYCL
89
+
90
+
SYCL enables GPU acceleration on Intel GPUs. Refer to the [SYCL documentation](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md) for setup details.
91
+
92
+
#### HIP
93
+
94
+
Supports GPU acceleration on AMD GPUs compatible with HIP.
95
+
96
+
#### MUSA
97
+
98
+
Provides GPU acceleration using the MUSA cores of Moore Threads MTT GPUs.
0 commit comments