Pulse · ggml-org/llama.cpp · GitHub

8000 Pulse · ggml-org/llama.cpp · GitHub

May 17, 2025 – May 24, 2025

Overview

87 Active pull requests

73 Active issues

49 Releases published by 1 person

b5416
published May 19, 2025
b5417
published May 19, 2025
b5421
published May 19, 2025
b5422
published May 19, 2025
b5423
published May 19, 2025
b5425
published May 19, 2025
b5426
published May 19, 2025
b5427
published May 20, 2025
b5429
published May 20, 2025
b5430
published May 20, 2025
b5431
published May 20, 2025
b5432
published May 20, 2025
b5434
published May 20, 2025
b5435
published May 20, 2025
b5436
published May 20, 2025
b5437
published May 20, 2025
b5438
published May 20, 2025
b5439
published May 21, 2025
b5440
published May 21, 2025
8000
b5441
published May 21, 2025
b5442
published May 21, 2025
b5443
published May 21, 2025
b5444
published May 21, 2025
b5446
published May 21, 2025
b5448
published May 21, 2025
b5449
published May 21, 2025
b5450
published May 21, 2025
b5451
published May 21, 2025
b5452
published May 21, 2025
b5453
published May 22, 2025
b5454
published May 22, 2025
b5456
published May 22, 2025
b5458
published May 22, 2025
b5459
published May 22, 2025
b5460
published May 22, 2025
b5461
published May 23, 2025
b5462
published May 23, 2025
b5463
published May 23, 2025
b5464
published May 23, 2025
b5465
published May 23, 2025
b5466
published May 23, 2025
b5468
published May 23, 2025
b5471
published May 24, 2025
b5472
published May 24, 2025
b5473
published May 24, 2025
b5474
published May 24, 2025
b5475
published May 24, 2025
b5476
published May 24, 2025
b5477
published May 24, 2025

59 Pull requests merged by 25 people

releases : bundle llvm omp library in windows release
#13763 merged May 24, 2025
releases : enable openmp in windows cpu backend build
#13756 merged May 24, 2025
ggml-cpu : set openmp wait time if not set
#13758 merged May 24, 2025
Move GLM4 f32 attention fix to the correct function
#13750 merged May 24, 2025
ggml : add ggml_gelu_erf() CUDA kernel
#13719 merged May 24, 2025
vocab : fix ugm tokenizer precision
#13743 merged May 24, 2025
CUDA: fix race condition in FA vector kernels
#13742 merged May 24, 2025
ci : enable winget package updates
#13734 merged May 23, 2025
ci : add winget package updater
#13732 merged May 23, 2025
hparams : initialize arrays
#13728 merged May 23, 2025
llama : allow custom list of swa_layers
#13726 merged May 23, 2025
server : support audio input
#13714 merged May 23, 2025
[CANN]Support OP MUL_MAT_ID Q8 && Q4
#13705 merged May 23, 2025
ggml : fix the order of ggml_unary_op
#13718 merged May 23, 2025
vulkan: support CPY from any type to itself
#13695 merged May 23, 2025
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it
#13696 merged May 23, 2025
use LOG_WARN to replace std::cerr
#13657 merged May 23, 2025
release : fix windows hip release
#13707 merged May 22, 2025
tts : fix n_ubatch + make WavTokenizer cache-less
#13713 merged May 22, 2025
mtmd : add ultravox audio input
#13623 merged May 22, 2025
common: Include torch package for s390x
#13699 merged May 22, 2025
server : pad small embedding batches
#13692 merged May 22, 2025
gguf-py : correct charsmap parameter typing
#13701 merged May 22, 2025
sycl: Remove waits from async functions call
#13702 merged May 22, 2025
SYCL: Avoid using SYCL-Graph for unsupported nodes
#13587 merged May 22, 2025
opencl: Add support for multiple devices
#12622 merged May 21, 2025
opencl: fix couple crashes
#12795 merged May 21, 2025
releases : build CPU backend separately (windows)
#13642 merged May 21, 2025
hparams : support models for which all layers use SWA
#13682 merged May 21, 2025
server : improve error reporting
#13680 merged May 21, 2025
convert : add qwen2vl support for unsloth merges
#13686 merged May 21, 2025
examples : switch retrieval to llama_encode
#13685 merged May 21, 2025
gguf-py : display the invalid gguf type
#13687 merged May 21, 2025
ggml : add ggml_gelu_erf()
#13667 merged May 21, 2025
Add the endpoints /api/tags and /api/chat
#13659 merged May 21, 2025
server : fix first message identification
#13634 merged May 21, 2025
kv-cache : simplify the interface
#13660 merged May 21, 2025
model : disable SWA for Phi models
#13676 merged May 21, 2025
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
#13647 merged May 21, 2025
vulkan: small fixes
#13626 merged May 20, 2025
mtmd-helper : bug fix to token batching in mtmd
#13650 merged May 20, 2025
model : fix llama4 graph
#13663 merged May 20, 2025
llama : remove llama_kv_cache_view API + remove deprecated
#13653 merged May 20, 2025
CUDA: skip fully masked-out KV in FA vec kernel
#13584 merged May 20, 2025
tests : avoid github urls due to throttling
#13654 merged May 20, 2025
sycl: disable reorder for sycl mulmat
#13536 merged May 20, 2025
Fix GLM4 incoherence with fp16 accumulators
#13639 merged May 20, 2025
metal : fix typo in FA kernel comments
#13651 merged May 20, 2025
kv-cache : add SWA support
#13194 merged May 20, 2025
[CANN] Update CANN model support status
#13162 merged May 20, 2025
sycl : Overcoming workaround for mmap() allocation on Windows
#13482 merged May 20, 2025
added load_progress_callback to common_params
#13617 merged May 19, 2025
Vulkan: Support fp32 accumulator in quantized matmul to fix GLM4-32B incoherence
#13607 merged May 19, 2025
sycl : reviewing the backend documentation
#13544 merged May 19, 2025
mtmd : add vision support for llama 4
#13282 merged May 19, 2025
ci : upgraded oneAPI version in SYCL workflows and dockerfile
#13532 merged May 19, 2025
sync : ggml
#13630 merged May 19, 2025
fix: check model pointer validity before use
#13631 merged May 19, 2025
[CANN]Support OP MUL_MAT_ID
#13042 merged May 19, 2025

28 Pull requests opened by 22 people

SYCL: Add non contiguous support in RMS_NORM and NORM kernels
#13611 opened May 18, 2025
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 opened May 18, 2025
cuda: fix CMAKE_CUDA_COMPILER not found error (#13528)
#13625 opened May 19, 2025
[CANN]: add the basic supports of Flash Attention kernel
#13627 opened May 19, 2025
sycl: Add more debug prints
#13640 opened May 19, 2025
sycl: add find_package call for OpenCL
#13643 opened May 19, 2025
webui: Allow editing file attachments when editing messages.
#13645 opened May 20, 2025
MLA kv cache: fix split graph backend assignment when kv cache store on CPU
#13648 opened May 20, 2025
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 opened May 20, 2025
Memory tests
#13669 opened May 20, 2025
model : jina-embeddings-v3 support
#13693 opened May 21, 2025
kv-cache : rework kv_cell
#13706 opened May 22, 2025
common/llama: align structures for reduce cacheline size on 64bit platforms
#13710 opened May 22, 2025
Replace alert and confirm with custom modals.
#13711 opened May 22, 2025
ggml : riscv: add xtheadvector support
#13720 opened May 23, 2025
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 opened May 23, 2025
Move page cache via mbind to prevent cross-NUMA access
#13731 opened May 23, 2025
SYCL: Implement few same quantized type copy kernels
#13739 opened May 24, 2025
cmake : set `RPATH` to `$ORIGIN` on Linux (#13740)
#13741 opened May 24, 2025
Multimodal: Added Moondream2 model and fixed ggml.org link
#13745 opened May 24, 2025
kv-cache : simplify
#13746 opened May 24, 2025
SYCL: add gelu_erf kernel
#13749 opened May 24, 2025
SYCL: Temporarily revert "sycl: simplify bin_bcast_kernel (#13383)"
#13752 opened May 24, 2025
SYCL: Add mrope kernel
#13755 opened May 24, 2025
convert : fix nomic-bert-moe mask token
#13757 opened May 24, 2025
mtmd : add support for Qwen2-Audio and SeaLLM-Audio
#13760 opened May 24, 2025
vulkan: readd GGML_VULKAN_PERF
#13761 opened May 24, 2025
Add comprehensive test for llama_batch/sbatch/ubatch concepts
#13764 opened May 24, 2025

33 Issues closed by 15 people

Misc. bug: Starting from b5450 to latest version, token generation rate for model Qwen3-30B-A3B is reduced to ~5 tok/s.
#13738 closed May 24, 2025
Misc. bug: llama-server token per second slow down sigificant after release b5450 (#13642)
#13735 closed May 24, 2025
Eval bug: UGM tokenizer sometimes outputs wrong tokens/in the wrong order
#13725 closed May 24, 2025
Misc. bug: Eval bug: Repetitive Output After Certain Token Count When Using -np > 1 in llama.cpp (Ver. b5468)
#13733 closed May 24, 2025
Compile bug: Build failure for Intel oneMKL on Windows
#12478 closed May 24, 2025
Add support for gemma 3 in the server?
#12762 closed May 24, 2025
CUDA performance bug when two cards are visible and only one is used
#12838 closed May 24, 2025
Eval bug: llama-server can only load 27 layers into the Vulkan, but llama-run can load 33 layers for no apparent reason
#12840 closed May 24, 2025
Eval bug: llama_model_load: error loading model: error loading model hyperparameters: key not found in model: llama.context_length
#12857 closed May 24, 2025
Misc. bug: Overflow in Cast (
#13722 closed May 23, 2025
Phi-4-mini reasoning CRASH!!! (Vulkan)
#13464 closed May 23, 2025
Eval bug: llama.cpp/ggml/src/ggml-backend.cpp:750: pre-allocated tensor (cache_k_l32 (view) (copy of cache_k_l32 (view))) in a buffer (Vulkan0) that cannot run the operation (CPY)
#13684 closed May 23, 2025
OpenCL: Performance comparison depending on gpu_offloads
#12810 closed May 23, 2025
Llama 4 convert_hf_to_gguf.py tokenizer error
#12819 closed May 23, 2025
Misc. bug: HIP / ROCm memory allocation broken after release b5450
#13698 closed May 22, 2025
Eval bug: `llama-tts` fails (abort) with longer lines
#13712 closed May 22, 2025
GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed
#13689 closed May 22, 2025
Eval bug: MUSA backend cause non-sense output on unsloth/deepseek-r1 quantized model
#12779 closed May 22, 2025
Misc. bug: Metric names are invalid
#12803 closed May 22, 2025
crash: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN")
#13688 closed May 21, 2025
Eval bug: phi-4 crashes with new versions
#13665 closed May 21, 2025
OpenCL: Add CPU fallback for unsupported operations
#13621 closed May 21, 2025
Eval bug: Cannot run unsloth/deepseek-r1 2bit Model
#12778 closed May 21, 2025
Qwen3 32B and 30B models are similar size, But there is 4x difference between the performance!?
#13652 closed May 20, 2025
Eval bug: NVIDIA Jetson AGX Xavier CUDA Compatibility Issue with llama.cpp
#13629 closed May 20, 2025
Eval bug: vulkan Llama cpp prefers shared memory over dedicated memory
#12748 closed May 20, 2025
Compile bug: `binary-ops.cpp: error: invalid conversion`
#12765 closed May 20, 2025
Cannot compile SYCL backend SYCL_LIBRARY=SYCL_LIBRARY - NOTFOUND as per documentation
#12696 closed May 19, 2025
Eval bug: No output using llama-batched-bench
#13553 closed May 19, 2025
Feature Request: when llama.cpp can support convert qwen2.5 VL 7B/72B model to gguf?
#11541 closed May 19, 2025
Eval bug: Long text numbers/words in prompt breaks llama.cpp permanently in parallel mode with flash attention
#12758 closed May 19, 2025
Misc. bug: HIP backend performs poorly on AMD Ryzen AI MAX 395 (Strix Halo gfx1151)
#13565 closed May 18, 2025
Misc. bug: llama_tokenize parse_special is ignored in newer versions
#12743 closed May 18, 2025

40 Issues opened by 38 people

Misc. bug: segfault in test-gbnf-validator
#13762 opened May 24, 2025
Feature Request: video support in mtmd-cli / server
#13754 opened May 24, 2025
Compile bug: Vulkan shaders build fails due to missing vulkan-shaders directory during ExternalProject\_Add configure step
#13753 opened May 24, 2025
Large performance drop when using pipeline parallelism and layer splitting on multiple GPUs
#13751 opened May 24, 2025
Feature Request: Add keep_alive function for llama-server
#13748 opened May 24, 2025
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 opened May 24, 2025
Compile bug: ‘ggml_gelu_erf’ was not declared in this scope; did you mean ‘ggml_gelu’
#13744 opened May 24, 2025
Misc. bug: RUNPATH properties are not properly set
#13740 opened May 24, 2025
open source dataset for low bit quantization?
#13736 opened May 24, 2025
Eval bug: [CUDA] MoE model (Qwen3-30B-A3B) loads to GPU but does not utilize CUDA for inference in build b5466
#13729 opened May 23, 2025
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 opened May 23, 2025
Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp.
#13723 opened May 23, 2025
Eval bug: gemma3 getting stuck with no output when
#13715 opened May 22, 2025
Feature Request: (webui) do not throw away message if there is error in stream
#13709 opened May 22, 2025
Eval bug: Segmentation fault when loading SmolVLM-500M-Instruct-Q8\_0.gguf on Termux / Android ARM64, only in Termux, not in Prooted ones, other gguf work fine
#13708 opened May 22, 2025
Misc. bug: llama-mtmd-cli ignores multiple image input
#13704 opened May 22, 2025
Eval bug: Server Returns Empty Responses Under High Load
#13703 opened May 22, 2025
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 opened May 22, 2025
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 opened May 21, 2025
Eval bug: std::regex to split the text
#13691 opened May 21, 2025
Eval bug: terminate called after throwing an instance of 'std::runtime_error' what(): Unexpected empty grammar stack after accepting piece: [control_36]
#13690 opened May 21, 2025
Eval bug: swa_full = true is slower than false
#13683 opened May 21, 2025
Feature Request: Falcon-H1
#13681 opened May 21, 2025
devops/nix: `flake.lock` is very obsolete
#13679 opened May 21, 2025
Misc. bug: AMX is not ready to be used!
#13678 opened May 21, 2025
Eval bug: SYCL branch produces mul_mat bug when trying to run.
#13674 opened May 21, 2025
Eval bug: Output garbled in dual-GPU environment
#13673 opened May 21, 2025
Feature Request: Llama-bench improvement
#13671 opened May 20, 2025
Misc. bug: Speed degradation in `bin-win-cpu-x64` compared to `bin-win-avx2-x64` on Intel Core i7-12700H
#13664 opened May 20, 2025
Feature Request: Procedure for reproducing test models
#13662 opened May 20, 2025
Eval bug: Not splitting model across rows correctly
#13661 opened May 20, 2025
Compile bug: GPU Detection Fails during cmake --build
#13636 opened May 19, 2025
Feature Request: Support for Qwen with Parallel Scaling
#13632 opened May 19, 2025
can't quant llama3 with expanded tokenizer
#13628 opened May 19, 2025
webui: First user prompt sometimes disappears after sending
#13622 opened May 18, 2025
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 opened May 18, 2025
Feature Request: update readme for ideal MOE tensor override calculation
#13616 opened May 18, 2025
Compile bug: tools build failing
#13614 opened May 18, 2025
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 opened May 18, 2025
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 opened May 18, 2025

70 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on May 22, 2025 • 27 new comments
Update python verions
#13574 commented on May 23, 2025 • 18 new comments
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 commented on May 23, 2025 • 7 new comments
server : separate the notion of position and KV tokens, remove prompt truncation
#13576 commented on May 19, 2025 • 6 new comments
ggml: aarch64: Implement SVE F32 kernels for Mamba Model
#13602 commented on May 22, 2025 • 4 new comments
llama : try loading tensors with pre-computed hashes
#13106 commented on May 24, 2025 • 4 new comments
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on May 24, 2025 • 1 new comment
feat: Hybrid unified/recurrent cache
#13276 commented on May 23, 2025 • 1 new comment
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 commented on May 23, 2025 • 0 new comments
Feature Request: Add support for Kokoro TTS
#11050 commented on May 23, 2025 • 0 new comments
Feature Reequest: Multi model cli tools: Add a possibility to specify a image in conversation mode plus tab auto completion for path
#12983 commented on May 23, 2025 • 0 new comments
Eval bug: LLaVa convert_image_encoder_to_gguf.py fails to byteswap v.head.ffn_up.bias tensor on Big-Endian system
#12863 commented on May 23, 2025 8000 • 0 new comments
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on May 24, 2025 • 0 new comments
Model Repeats Nonsensical Output
#13066 commented on May 24, 2025 • 0 new comments
Feature Request: Improve model load time when using the RPC backend
#12954 commented on May 23, 2025 • 0 new comments
Eval bug: Vulkan: "Requested buffer size exceeds device memory allocation limit" even with `-ngl 0` when trying to run very large models
#13024 commented on May 23, 2025 • 0 new comments
Feature Request: Installable package via winget
#8188 commented on May 24, 2025 • 0 new comments
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 commented on May 24, 2025 • 0 new comments
llama : initial Mamba-2 support
#9126 commented on May 18, 2025 • 0 new comments
Allow user to compile with any cuda version using github actions
#10928 commented on May 23, 2025 • 0 new comments
Compile bug: NVIDIA A800-SXM4-40GB ggml_cuda_init failed
#13059 commented on May 23, 2025 • 0 new comments
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on May 23, 2025 • 0 new comments
(draft) tts: Orpheus support
#12487 commented on May 18, 2025 • 0 new comments
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on May 19, 2025 • 0 new comments
Support for OuteTTS 1.0
#12794 commented on May 20, 2025 • 0 new comments
gguf-py: byteswapping improvements
#12851 commented on May 21, 2025 • 0 new comments
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on May 23, 2025 • 0 new comments
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on May 22, 2025 • 0 new comments
[Perf] [CPU] eliminate redundant memory access in group query attention
#13319 commented on May 19, 2025 • 0 new comments
Update README.md for using llama.cpp in Microsoft Word locally
#13401 commented on May 20, 2025 • 0 new comments
convert: Swap GLM4 EOS / EOT token
#13505 commented on May 20, 2025 • 0 new comments
cuda: set cuda compiler path (#13527)
#13528 commented on May 21, 2025 • 0 new comments
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on May 19, 2025 • 0 new comments
Granite Four
#13550 commented on May 23, 2025 • 0 new comments
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method
#13561 commented on May 24, 2025 • 0 new comments
Doc. bug: docs/multimodal/gemma3.md need to be updated
#13064 commented on May 23, 2025 • 0 new comments
changelog : `libllama` API
#9289 commented on May 20, 2025 • 0 new comments
tutorials : list for llama.cpp
#13523 commented on May 20, 2025 • 0 new comments
Eval bug: repeated output for llama-server
#12782 commented on May 20, 2025 • 0 new comments
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 commented on May 20, 2025 • 0 new comments
Eval bug: RWKV inference issue with llama-server
#13018 commented on May 20, 2025 • 0 new comments
llama : combined beam search + grammar sampling strategy
#2923 commented on May 19, 2025 • 0 new comments
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 commented on May 19, 2025 • 0 new comments
llama : add CLI assistant
#10688 commented on May 19, 2025 • 0 new comments
Feature Request: dynamic number of experts (hyperparam per request)
#13572 commented on May 19, 2025 • 0 new comments
Misc. bug: logit-bias doesn't seem to work
#13605 commented on May 19, 2025 • 0 new comments
Feature Request: add per-request "reasoning" options in llama-server
#13272 commented on May 19, 2025 • 0 new comments
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 commented on May 19, 2025 • 0 new comments
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 commented on May 19, 2025 • 0 new comments
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 commented on May 19, 2025 • 0 new comments
Eval bug: why Gemma 3 model has run into CPU inference
#13004 commented on May 19, 2025 • 0 new comments
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on May 18, 2025 • 0 new comments
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 commented on May 18, 2025 • 0 new comments
Eval bug: AttributeError: Moonlight-16B-A3B-Instruct - TikTokenTokenizer has no attribute vocab
#13072 commented on May 23, 2025 • 0 new comments
Feature Request: add jina embeddings model availible convert to gguf
#12327 commented on May 22, 2025 • 0 new comments
Feature Request: support for image input in llama-server (and web ui)
#12792 commented on May 22, 2025 • 0 new comments
Compile bug: Prooted Debian in Droid Termux only
#12452 commented on May 22, 2025 • 0 new comments
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 commented on May 22, 2025 • 0 new comments
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on May 22, 2025 • 0 new comments
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 commented on May 22, 2025 • 0 new comments
Misc. bug: Inconsistent Vulkan segfault
#10528 commented on May 21, 2025 • 0 new comments
Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp
#13609 commented on May 21, 2025 • 0 new comments
changelog : `llama-server` REST API
#9291 commented on May 21, 2025 • 0 new comments
Feature Request: Mapping model name to LoRA config
#11031 commented on May 21, 2025 • 0 new comments
something with llama_server? slow vs llama_cli
#13560 commented on May 21, 2025 • 0 new comments
Error while converting peft finetuned merged model to gguf
#12494 commented on May 21, 2025 • 0 new comments
Feature Request: Support Jina V3 arch
#9585 commented on May 21, 2025 • 0 new comments
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 commented on May 21, 2025 • 0 new comments
Perplexity script for non GGUF quantization
#13015 commented on May 21, 2025 • 0 new comments
How to start gemma3 multimodal model service using llama_server
#13465 commented on May 20, 2025 • 0 new comments

0