-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
49 Releases published by 1 person
-
b5416
published
May 19, 2025 -
b5417
published
May 19, 2025 -
b5421
published
May 19, 2025 -
b5422
published
May 19, 2025 -
b5423
published
May 19, 2025 -
b5425
published
May 19, 2025 -
b5426
published
May 19, 2025 -
b5427
published
May 20, 2025 -
b5429
published
May 20, 2025 -
b5430
published
May 20, 2025 -
b5431
published
May 20, 2025 -
b5432
published
May 20, 2025 -
b5434
published
May 20, 2025 -
b5435
published
May 20, 2025 -
b5436
published
May 20, 2025 -
b5437
published
May 20, 2025 -
b5438
published
May 20, 2025 -
b5439
published
May 21, 2025 -
b5440
published
8000May 21, 2025 -
b5441
published
May 21, 2025 -
b5442
published
May 21, 2025 -
b5443
published
May 21, 2025 -
b5444
published
May 21, 2025 -
b5446
published
May 21, 2025 -
b5448
published
May 21, 2025 -
b5449
published
May 21, 2025 -
b5450
published
May 21, 2025 -
b5451
published
May 21, 2025 -
b5452
published
May 21, 2025 -
b5453
published
May 22, 2025 -
b5454
published
May 22, 2025 -
b5456
published
May 22, 2025 -
b5458
published
May 22, 2025 -
b5459
published
May 22, 2025 -
b5460
published
May 22, 2025 -
b5461
published
May 23, 2025 -
b5462
published
May 23, 2025 -
b5463
published
May 23, 2025 -
b5464
published
May 23, 2025 -
b5465
published
May 23, 2025 -
b5466
published
May 23, 2025 -
b5468
published
May 23, 2025 -
b5471
published
May 24, 2025 -
b5472
published
May 24, 2025 -
b5473
published
May 24, 2025 -
b5474
published
May 24, 2025 -
b5475
published
May 24, 2025 -
b5476
published
May 24, 2025 -
b5477
published
May 24, 2025
59 Pull requests merged by 25 people
-
releases : bundle llvm omp library in windows release
#13763 merged
May 24, 2025 -
releases : enable openmp in windows cpu backend build
#13756 merged
May 24, 2025 -
ggml-cpu : set openmp wait time if not set
#13758 merged
May 24, 2025 -
Move GLM4 f32 attention fix to the correct function
#13750 merged
May 24, 2025 -
ggml : add ggml_gelu_erf() CUDA kernel
#13719 merged
May 24, 2025 -
vocab : fix ugm tokenizer precision
#13743 merged
May 24, 2025 -
CUDA: fix race condition in FA vector kernels
#13742 merged
May 24, 2025 -
ci : enable winget package updates
#13734 merged
May 23, 2025 -
ci : add winget package updater
#13732 merged
May 23, 2025 -
hparams : initialize arrays
#13728 merged
May 23, 2025 -
llama : allow custom list of swa_layers
#13726 merged
May 23, 2025 -
server : support audio input
#13714 merged
May 23, 2025 -
[CANN]Support OP MUL_MAT_ID Q8 && Q4
#13705 merged
May 23, 2025 -
ggml : fix the order of ggml_unary_op
#13718 merged
May 23, 2025 -
vulkan: support CPY from any type to itself
#13695 merged
May 23, 2025 -
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it
#13696 merged
May 23, 2025 -
use LOG_WARN to replace
std::cerr
#13657 merged
May 23, 2025 -
release : fix windows hip release
#13707 merged
May 22, 2025 -
tts : fix n_ubatch + make WavTokenizer cache-less
#13713 merged
May 22, 2025 -
mtmd : add ultravox audio input
#13623 merged
May 22, 2025 -
common: Include torch package for s390x
#13699 merged
May 22, 2025 -
server : pad small embedding batches
#13692 merged
May 22, 2025 -
gguf-py : correct charsmap parameter typing
#13701 merged
May 22, 2025 -
sycl: Remove waits from async functions call
#13702 merged
May 22, 2025 -
SYCL: Avoid using SYCL-Graph for unsupported nodes
#13587 merged
May 22, 2025 -
opencl: Add support for multiple devices
#12622 merged
May 21, 2025 -
opencl: fix couple crashes
#12795 merged
May 21, 2025 -
releases : build CPU backend separately (windows)
#13642 merged
May 21, 2025 -
hparams : support models for which all layers use SWA
#13682 merged
May 21, 2025 -
server : improve error reporting
#13680 merged
May 21, 2025 -
convert : add qwen2vl support for unsloth merges
#13686 merged
May 21, 2025 -
examples : switch retrieval to llama_encode
#13685 merged
May 21, 2025 -
gguf-py : display the invalid gguf type
#13687 merged
May 21, 2025 -
ggml : add ggml_gelu_erf()
#13667 merged
May 21, 2025 -
Add the endpoints /api/tags and /api/chat
#13659 merged
May 21, 2025 -
server : fix first message identification
#13634 merged
May 21, 2025 -
kv-cache : simplify the interface
#13660 merged
May 21, 2025 -
model : disable SWA for Phi models
#13676 merged
May 21, 2025 -
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
#13647 merged
May 21, 2025 -
vulkan: small fixes
#13626 merged
May 20, 2025 -
mtmd-helper : bug fix to token batching in mtmd
#13650 merged
May 20, 2025 -
model : fix llama4 graph
#13663 merged
May 20, 2025 -
llama : remove llama_kv_cache_view API + remove deprecated
#13653 merged
May 20, 2025 -
CUDA: skip fully masked-out KV in FA vec kernel
#13584 merged
May 20, 2025 -
tests : avoid github urls due to throttling
#13654 merged
May 20, 2025 -
sycl: disable reorder for sycl mulmat
#13536 merged
May 20, 2025 -
Fix GLM4 incoherence with fp16 accumulators
#13639 merged
May 20, 2025 -
metal : fix typo in FA kernel comments
#13651 merged
May 20, 2025 -
kv-cache : add SWA support
#13194 merged
May 20, 2025 -
[CANN] Update CANN model support status
#13162 merged
May 20, 2025 -
sycl : Overcoming workaround for mmap() allocation on Windows
#13482 merged
May 20, 2025 -
added load_progress_callback to common_params
#13617 merged
May 19, 2025 -
Vulkan: Support fp32 accumulator in quantized matmul to fix GLM4-32B incoherence
#13607 merged
May 19, 2025 -
sycl : reviewing the backend documentation
#13544 merged
May 19, 2025 -
mtmd : add vision support for llama 4
#13282 merged
May 19, 2025 -
ci : upgraded oneAPI version in SYCL workflows and dockerfile
#13532 merged
May 19, 2025 -
sync : ggml
#13630 merged
May 19, 2025 -
fix: check model pointer validity before use
#13631 merged
May 19, 2025 -
[CANN]Support OP MUL_MAT_ID
#13042 merged
May 19, 2025
28 Pull requests opened by 22 people
-
SYCL: Add non contiguous support in RMS_NORM and NORM kernels
#13611 opened
May 18, 2025 -
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 opened
May 18, 2025 -
cuda: fix CMAKE_CUDA_COMPILER not found error (#13528)
#13625 opened
May 19, 2025 -
[CANN]: add the basic supports of Flash Attention kernel
#13627 opened
May 19, 2025 -
sycl: Add more debug prints
#13640 opened
May 19, 2025 -
sycl: add find_package call for OpenCL
#13643 opened
May 19, 2025 -
webui: Allow editing file attachments when editing messages.
#13645 opened
May 20, 2025 -
MLA kv cache: fix split graph backend assignment when kv cache store on CPU
#13648 opened
May 20, 2025 -
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 opened
May 20, 2025 -
Memory tests
#13669 opened
May 20, 2025 -
model : jina-embeddings-v3 support
#13693 opened
May 21, 2025 -
kv-cache : rework kv_cell
#13706 opened
May 22, 2025 -
common/llama: align structures for reduce cacheline size on 64bit platforms
#13710 opened
May 22, 2025 -
Replace alert and confirm with custom modals.
#13711 opened
May 22, 2025 -
ggml : riscv: add xtheadvector support
#13720 opened
May 23, 2025 -
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 opened
May 23, 2025 -
Move page cache via mbind to prevent cross-NUMA access
#13731 opened
May 23, 2025 -
SYCL: Implement few same quantized type copy kernels
#13739 opened
May 24, 2025 -
cmake : set `RPATH` to `$ORIGIN` on Linux (#13740)
#13741 opened
May 24, 2025 -
Multimodal: Added Moondream2 model and fixed ggml.org link
#13745 opened
May 24, 2025 -
kv-cache : simplify
#13746 opened
May 24, 2025 -
SYCL: add gelu_erf kernel
#13749 opened
May 24, 2025 -
SYCL: Temporarily revert "sycl: simplify bin_bcast_kernel (#13383)"
#13752 opened
May 24, 2025 -
SYCL: Add mrope kernel
#13755 opened
May 24, 2025 -
convert : fix nomic-bert-moe mask token
#13757 opened
May 24, 2025 -
mtmd : add support for Qwen2-Audio and SeaLLM-Audio
#13760 opened
May 24, 2025 -
vulkan: readd GGML_VULKAN_PERF
#13761 opened
May 24, 2025 -
Add comprehensive test for llama_batch/sbatch/ubatch concepts
#13764 opened
May 24, 2025
33 Issues closed by 15 people
-
Misc. bug: llama-server token per second slow down sigificant after release b5450 (#13642)
#13735 closed
May 24, 2025 -
Eval bug: UGM tokenizer sometimes outputs wrong tokens/in the wrong order
#13725 closed
May 24, 2025 -
Compile bug: Build failure for Intel oneMKL on Windows
#12478 closed
May 24, 2025 -
Add support for gemma 3 in the server?
#12762 closed
May 24, 2025 -
CUDA performance bug when two cards are visible and only one is used
#12838 closed
May 24, 2025 -
Misc. bug: Overflow in Cast (
#13722 closed
May 23, 2025 -
Phi-4-mini reasoning CRASH!!! (Vulkan)
#13464 closed
May 23, 2025 -
OpenCL: Performance comparison depending on gpu_offloads
#12810 closed
May 23, 2025 -
Llama 4 convert_hf_to_gguf.py tokenizer error
#12819 closed
May 23, 2025 -
Misc. bug: HIP / ROCm memory allocation broken after release b5450
#13698 closed
May 22, 2025 -
Eval bug: `llama-tts` fails (abort) with longer lines
#13712 closed
May 22, 2025 -
GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed
#13689 closed
May 22, 2025 -
Eval bug: MUSA backend cause non-sense output on unsloth/deepseek-r1 quantized model
#12779 closed
May 22, 2025 -
Misc. bug: Metric names are invalid
#12803 closed
May 22, 2025 -
crash: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN")
#13688 closed
May 21, 2025 -
Eval bug: phi-4 crashes with new versions
#13665 closed
May 21, 2025 -
OpenCL: Add CPU fallback for unsupported operations
#13621 closed
May 21, 2025 -
Eval bug: Cannot run unsloth/deepseek-r1 2bit Model
#12778 closed
May 21, 2025 -
Qwen3 32B and 30B models are similar size, But there is 4x difference between the performance!?
#13652 closed
May 20, 2025 -
Eval bug: NVIDIA Jetson AGX Xavier CUDA Compatibility Issue with llama.cpp
#13629 closed
May 20, 2025 -
Eval bug: vulkan Llama cpp prefers shared memory over dedicated memory
#12748 closed
May 20, 2025 -
Compile bug: `binary-ops.cpp: error: invalid conversion`
#12765 closed
May 20, 2025 -
Cannot compile SYCL backend SYCL_LIBRARY=SYCL_LIBRARY - NOTFOUND as per documentation
#12696 closed
May 19, 2025 -
Eval bug: No output using llama-batched-bench
#13553 closed
May 19, 2025 -
Feature Request: when llama.cpp can support convert qwen2.5 VL 7B/72B model to gguf?
#11541 closed
May 19, 2025 -
Misc. bug: HIP backend performs poorly on AMD Ryzen AI MAX 395 (Strix Halo gfx1151)
#13565 closed
May 18, 2025 -
Misc. bug: llama_tokenize parse_special is ignored in newer versions
#12743 closed
May 18, 2025
40 Issues opened by 38 people
-
Misc. bug: segfault in test-gbnf-validator
#13762 opened
May 24, 2025 -
Feature Request: video support in mtmd-cli / server
#13754 opened
May 24, 2025 -
Large performance drop when using pipeline parallelism and layer splitting on multiple GPUs
#13751 opened
May 24, 2025 -
Feature Request: Add keep_alive function for llama-server
#13748 opened
May 24, 2025 -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 opened
May 24, 2025 -
Compile bug: ‘ggml_gelu_erf’ was not declared in this scope; did you mean ‘ggml_gelu’
#13744 opened
May 24, 2025 -
Misc. bug: RUNPATH properties are not properly set
#13740 opened
May 24, 2025 -
open source dataset for low bit quantization?
#13736 opened
May 24, 2025 -
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 opened
May 23, 2025 -
Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp.
#13723 opened
May 23, 2025 -
Eval bug: gemma3 getting stuck with no output when
#13715 opened
May 22, 2025 -
Feature Request: (webui) do not throw away message if there is error in stream
#13709 opened
May 22, 2025 -
Misc. bug: llama-mtmd-cli ignores multiple image input
#13704 opened
May 22, 2025 -
Eval bug: Server Returns Empty Responses Under High Load
#13703 opened
May 22, 2025 -
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 opened
May 22, 2025 -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 opened
May 21, 2025 -
Eval bug: std::regex to split the text
#13691 opened
May 21, 2025 -
Eval bug: swa_full = true is slower than false
#13683 opened
May 21, 2025 -
Feature Request: Falcon-H1
#13681 opened
May 21, 2025 -
devops/nix: `flake.lock` is very obsolete
#13679 opened
May 21, 2025 -
Misc. bug: AMX is not ready to be used!
#13678 opened
May 21, 2025 -
Eval bug: SYCL branch produces mul_mat bug when trying to run.
#13674 opened
May 21, 2025 -
Eval bug: Output garbled in dual-GPU environment
#13673 opened
May 21, 2025 -
Feature Request: Llama-bench improvement
#13671 opened
May 20, 2025 -
Misc. bug: Speed degradation in `bin-win-cpu-x64` compared to `bin-win-avx2-x64` on Intel Core i7-12700H
#13664 opened
May 20, 2025 -
Feature Request: Procedure for reproducing test models
#13662 opened
May 20, 2025 -
Eval bug: Not splitting model across rows correctly
#13661 opened
May 20, 2025 -
Compile bug: GPU Detection Fails during cmake --build
#13636 opened
May 19, 2025 -
Feature Request: Support for Qwen with Parallel Scaling
#13632 opened
May 19, 2025 -
can't quant llama3 with expanded tokenizer
#13628 opened
May 19, 2025 -
webui: First user prompt sometimes disappears after sending
#13622 opened
May 18, 2025 -
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 opened
May 18, 2025 -
Feature Request: update readme for ideal MOE tensor override calculation
#13616 opened
May 18, 2025 -
Compile bug: tools build failing
#13614 opened
May 18, 2025 -
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 opened
May 18, 2025 -
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 opened
May 18, 2025
70 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat(server): Add tool call support to WebUI (LLama Server)
#13501 commented on
May 22, 2025 • 27 new comments -
Update python verions
#13574 commented on
May 23, 2025 • 18 new comments -
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 commented on
May 23, 2025 • 7 new comments -
server : separate the notion of position and KV tokens, remove prompt truncation
#13576 commented on
May 19, 2025 • 6 new comments -
ggml: aarch64: Implement SVE F32 kernels for Mamba Model
#13602 commented on
May 22, 2025 • 4 new comments -
llama : try loading tensors with pre-computed hashes
#13106 commented on
May 24, 2025 • 4 new comments -
`server`: streaming of tool calls and thoughts when `--jinja` is on
#12379 commented on
May 24, 2025 • 1 new comment -
feat: Hybrid unified/recurrent cache
#13276 commented on
May 23, 2025 • 1 new comment -
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 commented on
May 23, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
May 23, 2025 • 0 new comments -
Feature Reequest: Multi model cli tools: Add a possibility to specify a image in conversation mode plus tab auto completion for path
#12983 commented on
May 23, 2025 • 0 new comments -
Eval bug: LLaVa convert_image_encoder_to_gguf.py fails to byteswap v.head.ffn_up.bias tensor on Big-Endian system
#12863 commented on
May 23, 2025 8000 • 0 new comments -
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on
May 24, 2025 • 0 new comments -
Model Repeats Nonsensical Output
#13066 commented on
May 24, 2025 • 0 new comments -
Feature Request: Improve model load time when using the RPC backend
#12954 commented on
May 23, 2025 • 0 new comments -
Eval bug: Vulkan: "Requested buffer size exceeds device memory allocation limit" even with `-ngl 0` when trying to run very large models
#13024 commented on
May 23, 2025 • 0 new comments -
Feature Request: Installable package via winget
#8188 commented on
May 24, 2025 • 0 new comments -
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 commented on
May 24, 2025 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
May 18, 2025 • 0 new comments -
Allow user to compile with any cuda version using github actions
#10928 commented on
May 23, 2025 • 0 new comments -
Compile bug: NVIDIA A800-SXM4-40GB ggml_cuda_init failed
#13059 commented on
May 23, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
May 23, 2025 • 0 new comments -
(draft) tts: Orpheus support
#12487 commented on
May 18, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
May 19, 2025 • 0 new comments -
Support for OuteTTS 1.0
#12794 commented on
May 20, 2025 • 0 new comments -
gguf-py: byteswapping improvements
#12851 commented on
May 21, 2025 • 0 new comments -
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
May 23, 2025 • 0 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
May 22, 2025 • 0 new comments -
[Perf] [CPU] eliminate redundant memory access in group query attention
#13319 commented on
May 19, 2025 • 0 new comments -
Update README.md for using llama.cpp in Microsoft Word locally
#13401 commented on
May 20, 2025 • 0 new comments -
convert: Swap GLM4 EOS / EOT token
#13505 commented on
May 20, 2025 • 0 new comments -
cuda: set cuda compiler path (#13527)
#13528 commented on
May 21, 2025 • 0 new comments -
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 commented on
May 19, 2025 • 0 new comments -
Granite Four
#13550 commented on
May 23, 2025 • 0 new comments -
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method
#13561 commented on
May 24, 2025 • 0 new comments -
Doc. bug: docs/multimodal/gemma3.md need to be updated
#13064 commented on
May 23, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
May 20, 2025 • 0 new comments -
tutorials : list for llama.cpp
#13523 commented on
May 20, 2025 • 0 new comments -
Eval bug: repeated output for llama-server
#12782 commented on
May 20, 2025 • 0 new comments -
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 commented on
May 20, 2025 • 0 new comments -
Eval bug: RWKV inference issue with llama-server
#13018 commented on
May 20, 2025 • 0 new comments -
llama : combined beam search + grammar sampling strategy
#2923 commented on
May 19, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 commented on
May 19, 2025 • 0 new comments -
llama : add CLI assistant
#10688 commented on
May 19, 2025 • 0 new comments -
Feature Request: dynamic number of experts (hyperparam per request)
#13572 commented on
May 19, 2025 • 0 new comments -
Misc. bug: logit-bias doesn't seem to work
#13605 commented on
May 19, 2025 • 0 new comments -
Feature Request: add per-request "reasoning" options in llama-server
#13272 commented on
May 19, 2025 • 0 new comments -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 commented on
May 19, 2025 • 0 new comments -
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 commented on
May 19, 2025 • 0 new comments -
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 commented on
May 19, 2025 • 0 new comments -
Eval bug: why Gemma 3 model has run into CPU inference
#13004 commented on
May 19, 2025 • 0 new comments -
Qwen3-8B and other models generate garbage output / repeat tokens (GGGGGG...) in llama.cpp via LM Studio Vulkan backend
#13310 commented on
May 18, 2025 • 0 new comments -
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 commented on
May 18, 2025 • 0 new comments -
Eval bug: AttributeError: Moonlight-16B-A3B-Instruct - TikTokenTokenizer has no attribute vocab
#13072 commented on
May 23, 2025 • 0 new comments -
Feature Request: add jina embeddings model availible convert to gguf
#12327 commented on
May 22, 2025 • 0 new comments -
Feature Request: support for image input in llama-server (and web ui)
#12792 commented on
May 22, 2025 • 0 new comments -
Compile bug: Prooted Debian in Droid Termux only
#12452 commented on
May 22, 2025 • 0 new comments -
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 commented on
May 22, 2025 • 0 new comments -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on
May 22, 2025 • 0 new comments -
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 commented on
May 22, 2025 • 0 new comments -
Misc. bug: Inconsistent Vulkan segfault
#10528 commented on
May 21, 2025 • 0 new comments -
Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp
#13609 commented on
May 21, 2025 • 0 new comments -
changelog : `llama-server` REST API
#9291 commented on
May 21, 2025 • 0 new comments -
Feature Request: Mapping model name to LoRA config
#11031 commented on
May 21, 2025 • 0 new comments -
something with llama_server? slow vs llama_cli
#13560 commented on
May 21, 2025 • 0 new comments -
Error while converting peft finetuned merged model to gguf
#12494 commented on
May 21, 2025 • 0 new comments -
Feature Request: Support Jina V3 arch
#9585 commented on
May 21, 2025 • 0 new comments -
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 commented on
May 21, 2025 • 0 new comments -
Perplexity script for non GGUF quantization
#13015 commented on
May 21, 2025 • 0 new comments -
How to start gemma3 multimodal model service using llama_server
#13465 commented on
May 20, 2025 • 0 new comments