-
Notifications
You must be signed in to change notification settings - Fork 12k
Insights: ggml-org/llama.cpp
Overview
Could not load contribution data
Please try again later
233 Releases published by 1 person
-
b5265
published
May 2, 2025 -
b5266
published
May 2, 2025 -
b5267
published
May 2, 2025 -
b5269
published
May 2, 2025 -
b5270
published
May 3, 2025 -
b5271
published
May 3, 2025 -
b5272
published
May 3, 2025 -
b5273
published
May 4, 2025 -
b5274
published
May 4, 2025 -
b5275
published
May 4, 2025 -
b5276
published
May 4, 2025 -
b5277
published
May 4, 2025 -
b5278
published
May 4, 2025 -
b5279
published
May 4, 2025 -
b5280
published
May 4, 2025 -
b5281
published
May 5, 2025 -
b5283
published
May 5, 2025 -
b5284
published
May 5, 2025 -
b5286
published
May 5, 2025 -
b5287
published
May 5, 2025 -
b5289
published
May 6, 2025 -
b5292
published
May 6, 2025 -
b5293
published
May 6, 2025 -
b5295
published
May 6, 2025 -
b5296
published
May 6, 2025 -
b5297
published
May 6, 2025 -
b5298
published
May 6, 2025 -
b5299
published
May 7, 2025 -
b5300
published
May 7, 2025 -
b5301
published
May 7, 2025 -
b5302
published
May 7, 2025 -
b5303
published
May 7, 2025 -
b5306
published
May 7, 2025 -
b5308
published
May 8, 2025 -
b5310
published
May 8, 2025 -
b5311
published
May 8, 2025 -
b5313
published
May 8, 2025 -
b5315
published
May 8, 2025 -
b5309
published
May 8, 2025 -
b5317
published
May 8, 2025 -
b5318
published
May 8, 2025 -
b5320
published
May 9, 2025 -
b5321
published
May 9, 2025 -
b5322
published
May 9, 2025 -
b5323
published
May 9, 2025 -
b5324
published
May 9, 2025 -
b5325
published
May 9, 2025 -
b5326
published
May 9, 2025 -
b5327
published
May 9, 2025 -
b5328
published
May 9, 2025 -
b5329
published
May 9, 2025 -
b5330
published
May 9, 2025 -
b5331
published
May 9, 2025 -
b5332
published
May 9, 2025 -
b5333
published
May 10, 2025 -
b5334
published
May 10, 2025 -
b5335
published
May 10, 2025 -
b5336
published
May 10, 2025 -
b5338
published
May 10, 2025 -
b5340
published
May 10, 2025 -
b5341
published
May 10, 2025 -
b5342
published
May 10, 2025 -
b5344
published
May 11, 2025 -
b5345
published
May 11, 2025 -
b5346
published
May 11, 2025 -
b5347
published
May 11, 2025 -
b5349
published
May 11, 2025 -
b5350
published
May 11, 2025 -
b5351
published
May 12, 2025 -
b5352
published
May 12, 2025 -
b5353
published
May 12, 2025 -
b5354
published
May 12, 2025 -
b5355
published
May 12, 2025 -
b5356
published
May 12, 2025 -
b5357
published
May 12, 2025 -
b5358
published
May 12, 2025 -
b5359
published
May 12, 2025 -
b5360
published
May 12, 2025 -
b5361
published
May 12, 2025 -
b5363
published
May 13, 2025 -
b5365
published
May 13, 2025 -
b5366
published
May 13, 2025 -
b5367
published
May 13, 2025 -
b5368
published
May 13, 2025 -
b5369
published
May 13, 2025 -
b5370
published
May 13, 2025 -
b5371
published
May 13, 2025 -
b5372
published
May 14, 2025 -
b5377
published
May 14, 2025 -
b5378
published
May 14, 2025 -
b5379
published
May 14, 2025 -
b5380
published
May 14, 2025 -
b5381
published
May 14, 2025 -
b5382
published
May 14, 2025 -
b5384
published
May 14, 2025 -
b5385
published
May 14, 2025 -
b5387
published
May 14, 2025 -
b5388
published
May 14, 2025 -
b5390
published
May 15, 2025 -
b5391
published
May 15, 2025 -
b5392
published
May 15, 2025 -
b5394
published
May 15, 2025 -
b5395
published
May 15, 2025 -
b5400
published
May 15, 2025 -
b5401
published
May 15, 2025 -
b5402
published
May 16, 2025 -
b5404
published
May 16, 2025 -
b5405
published
May 16, 2025 -
b5406
published
May 16, 2025 -
b5409
published
May 16, 2025 -
b5410
published
May 16, 2025 -
b5411
published
May 17, 2025 -
b5412
published
May 17, 2025 -
b5414
published
May 17, 2025 -
b5415
published
May 17, 2025 -
b5416
published
May 19, 2025 -
b5417
published
May 19, 2025 -
b5421
published
May 19, 2025 -
b5422
published
May 19, 2025 -
b5423
published
May 19, 2025 -
b5425
published
May 19, 2025 -
b5426
published
May 19, 2025 -
b5427
published
May 20, 2025 -
b5429
published
May 20, 2025 -
b5430
published
May 20, 2025 -
b5431
published
May 20, 2025 -
b5432
published
May 20, 2025 -
b5434
published
May 20, 2025 -
b5435
published
May 20, 2025 -
b5436
published
May 20, 2025 -
b5437
published
May 20, 2025 -
b5438
published
May 20, 2025 -
b5439
published
May 21, 2025 -
b5440
published
May 21, 2025 -
b5441
published
May 21, 2025 -
b5442
published
May 21, 2025 -
b5443
published
May 21, 2025 -
b5444
published
May 21, 2025 -
b5446
published
May 21, 2025 -
b5448
published
May 21, 2025 -
b5449
published
May 21, 2025 -
b5450
published
May 21, 2025 -
b5451
published
May 21, 2025 -
b5452
published
May 21, 2025 -
b5453
published
May 22, 2025 -
b5454
published
May 22, 2025 -
b5456
published
May 22, 2025 -
b5458
published
May 22, 2025 -
b5459
published
May 22, 2025 -
b5460
published
May 22, 2025 -
b5461
published
May 23, 2025 -
b5462
published
May 23, 2025 -
b5463
published
May 23, 2025 -
b5464
published
May 23, 2025 -
b5465
published
May 23, 2025 -
b5466
published
May 23, 2025 -
b5468
published
May 23, 2025 -
b5471
published
May 24, 2025 -
b5472
published
May 24, 2025 -
b5473
published
May 24, 2025 -
b5474
published
May 24, 2025 -
b5475
published
May 24, 2025 -
b5476
published
May 24, 2025 -
b5477
published
May 24, 2025 -
b5478
published
May 25, 2025 -
b5479
published
May 25, 2025 -
b5480
published
May 25, 2025 -
b5481
published
May 25, 2025 -
b5483
published
May 25, 2025 -
b5484
published
May 25, 2025 -
b5486
published
May 25, 2025 -
b5488
published
May 26, 2025 -
b5489
published
May 26, 2025 -
b5490
published
May 26, 2025 -
b5492
published
May 26, 2025 -
b5493
published
May 26, 2025 -
b5494
published
May 26, 2025 -
b5495
published
May 26, 2025 -
b5497
published
May 26, 2025 -
b5498
published
May 26, 2025 -
b5499
published
May 26, 2025 -
b5501
published
May 26, 2025 -
b5502
published
May 27, 2025 -
b5503
published
May 27, 2025 -
b5504
published
May 27, 2025 -
b5505
published
May 27, 2025 -
b5506
published
May 27, 2025 -
b5508
published
May 27, 2025 -
b5509
published
May 27, 2025 -
b5510
published
May 27, 2025 -
b5512
published
May 27, 2025 -
b5513
published
May 27, 2025 -
b5514
published
May 27, 2025 -
b5515
published
May 27, 2025 -
b5516
published
May 27, 2025 -
b5517
published
May 28, 2025 -
b5519
published
May 28, 2025 -
b5522
published
May 28, 2025 -
b5524
published
May 28, 2025 -
b5526
published
May 28, 2025 -
b5527
published
May 28, 2025 -
b5529
published
May 29, 2025 -
b5530
published
May 29, 2025 -
b5532
published
May 29, 2025 -
b5533
published
May 29, 2025 -
b5534
published
May 29, 2025 -
b5535
published
May 29, 2025 -
b5537
published
May 29, 2025 -
b5538
published
May 29, 2025 -
b5539
published
May 30, 2025 -
b5540
published
May 30, 2025 -
b5541
published
May 30, 2025 -
b5543
published
May 30, 2025 -
b5544
published
May 30, 2025 -
b5545
published
May 30, 2025 -
b5546
published
May 30, 2025 -
b5547
published
May 30, 2025 -
b5548
published
May 30, 2025 -
b5551
published
May 31, 2025 -
b5552
published
May 31, 2025 -
b5554
published
May 31, 2025 -
b5555
published
May 31, 2025 -
b5556
published
May 31, 2025 -
b5558
published
May 31, 2025 -
b5559
published
Jun 1, 2025 -
b5560
published
Jun 1, 2025 -
b5568
published
Jun 1, 2025 -
b5569
published
Jun 1, 2025 -
b5571
published
Jun 1, 2025 -
b5572
published
Jun 1, 2025 -
b5573
published
Jun 2, 2025 -
b5574
published
Jun 2, 2025 -
b5575
published
Jun 2, 2025
301 Pull requests merged by 80 people
-
mtmd : fix memory leak in mtmd_helper_eval_chunk_single
#13961 merged
Jun 2, 2025 -
"Fix: Handle mixed-case 'Power' strings in POWER CPU detection"
#13966 merged
Jun 2, 2025 -
sycl: quantize and reorder the input to q8_1 when reorder is enabled
#13826 merged
Jun 2, 2025 -
gguf: fix failure on version == 0
#13956 merged
Jun 1, 2025 -
convert : fix nomic-bert-moe mask token
#13757 merged
Jun 1, 2025 -
convert : fix vocab padding code for bert models
#13954 merged
Jun 1, 2025 -
ggml: check if non-native endian model is being loaded
#13943 merged
Jun 1, 2025 -
sync : ggml
#13953 merged
Jun 1, 2025 -
add easy-llama Python bindings to README
#13950 merged
Jun 1, 2025 -
parallel : fix n_junk == 0
#13952 merged
Jun 1, 2025 -
kv-cache : split implementation in separate sources
#13920 merged
Jun 1, 2025 -
threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling
#12995 merged
May 31, 2025 -
Note about necessity of having libcurl installed for standard build
#13945 merged
May 31, 2025 -
chat
: allow unclosed thinking tags#13931 merged
May 31, 2025 -
llama : deprecate explicit kv_self defrag/update calls
#13921 merged
May 31, 2025 -
llama : use n_swa + n_ubatch cells for SWA cache
#13833 merged
May 31, 2025 -
Replace alert and confirm with custom modals.
#13711 merged
May 31, 2025 -
llama : auto-batch preparation
#13845 merged
May 31, 2025 -
mtmd : drop
_shared
fromlibmtmd
name, merge helpers into libmtmd (⚠️ breaking change)#13917 merged
May 31, 2025 -
kv-cache : refactor + add llama_memory_state_i
#13746 merged
May 31, 2025 -
CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856)
#13895 merged
May 31, 2025 -
CUDA: fix typo in FlashAttention code
#13926 merged
May 30, 2025 -
sched : avoid changing cur_copy when a graph is already allocated
#13922 merged
May 30, 2025 -
parallel : increase the variability of the prompt lengths
#13927 merged
May 30, 2025 -
cuda : prevent using split buffers with 3d/4d matrices
#13919 merged
May 30, 2025 -
SYCL: Add mrope kernel
#13755 merged
May 30, 2025 -
sync : vendor
#13901 merged
May 30, 2025 -
convert : fix rwkv bos/eos token
#13844 merged
May 30, 2025 -
convert : allow partial update to the chkhsh pre-tokenizer list
#13847 merged
May 30, 2025 -
Add support for DistilBert
#13907 merged
May 30, 2025 -
model: minicpm should use llm_build_granite
#13911 merged
May 30, 2025 -
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture
#13890 merged
May 29, 2025 -
llama : add support for jina-reranker-v2
#13900 merged
May 29, 2025 -
gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method
#13561 merged
May 29, 2025 -
arm64: optimize q4_k_q8_k kernel with i8mm
#13886 merged
May 29, 2025 -
cmake: Factor out CPU architecture detection
#13883 merged
May 29, 2025 -
ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm
#13882 merged
May 29, 2025 -
tests : remove json.hpp from a test
#13880 merged
May 29, 2025 -
convert : workaround for AutoConfig dummy labels
#13881 merged
May 29, 2025 -
llama : add RobertaForSequenceClassification reranker support
#13875 merged
May 29, 2025 -
ggml: aarch64: Implement SVE F32 kernels for vector functions
#13843 merged
May 29, 2025 -
gguf/utility: return full content on size < 0
#13841 merged
May 28, 2025 -
llama : fix KV shift for qwen2vl
#13870 merged
May 28, 2025 -
mtmd : move helpers to dedicated library (⚠️ breaking change)
#13866 merged
May 28, 2025 -
ci: disable LLAMA_CURL for Linux cross-builds
#13871 merged
May 28, 2025 -
Add support for BertForSequenceClassification reranking
#13858 merged
May 28, 2025 -
convert: small addition to support LlamaModel
#13838 merged
May 28, 2025 -
convert : fix qwen omni conversion
#13859 merged
May 28, 2025 -
Change umlaut test
#11600 merged
May 28, 2025 -
CUDA: fix FA tg at long context for CC >= 8.9
#13852 merged
May 28, 2025 -
convert : fix tensor naming conflict for llama 4 vision
#13836 merged
May 28, 2025 -
[CANN]: Add SOC TYPE printing in cmake configuration processing
#13837 merged
May 28, 2025 -
opencl: add new ops -
argsort
,div
,sub
,addrows
,sigmoid
,group_norm
#13787 merged
May 27, 2025 -
opencl: mark
MUL_MAT
supports non-contiguous tensors for f32#13790 merged
May 27, 2025 -
vulkan: use timestamp queries for GGML_VULKAN_PERF
#13817 merged
May 27, 2025 -
cmake : add llama-cparams.cpp to build
#13832 merged
May 27, 2025 -
SYCL: add gelu_erf kernel
#13749 merged
May 27, 2025 -
sync : ggml
#13829 merged
May 27, 2025 -
ggml : add ggml_repeat_4d
#13824 merged
May 27, 2025 -
ggml : riscv: add xtheadvector support
#13720 merged
May 27, 2025 -
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output)
#13784 merged
May 27, 2025 -
docs: remove link for llama-cli function calling F438
#13810 merged
May 27, 2025 -
ggml-cpu: x86 feature detection is specific to x86
#13811 merged
May 27, 2025 -
ggml : allow CUDA graphs when using pipeline parallelism
#13814 merged
May 27, 2025 -
kv-cells : track min/max used cells and per-sequence positions
#13808 merged
May 27, 2025 -
sampling : make sure samplers return at least 1 token
#13822 merged
May 27, 2025 -
llama : validate seq id batch input
#13809 merged
May 27, 2025 -
server: --offline mode
#13804 merged
May 26, 2025 -
scripts : add option to compare commits in Debug
#13806 merged
May 26, 2025 -
cuda : avoid cuGetErrorString when not needed
#13791 merged
May 26, 2025 -
SYCL: Add non contiguous support in RMS_NORM and NORM kernels
#13611 merged
May 26, 2025 -
server: fix streaming crashes
#13786 merged
May 26, 2025 -
examples/training: Fix file name in README
#13803 merged
May 26, 2025 -
server
: fix format of streamed tool call deltas (diff name, fix id location)#13800 merged
May 26, 2025 -
server: fix regression on streamed non-chat completion w/ stops
#13785 merged
May 26, 2025 -
examples : allow extracting embeddings from decoder contexts
#13797 merged
May 26, 2025 -
llama : clarify deprecation message
#13794 merged
May 26, 2025 -
sycl: Add more debug prints
#13640 merged
May 26, 2025 -
vulkan: mark IM2COL as supporting non-contig
#13783 merged
May 26, 2025 -
[CANN]: add the basic supports of Flash Attention kernel
#13627 merged
May 26, 2025 -
server
: add--reasoning-budget 0
to disable thinking (incl. qwen3 w/ enable_thinking:false)#13771 merged
May 25, 2025 -
webui : bump max upload file size to 500MB
#13779 merged
May 25, 2025 -
tests : improve UGM tokenizer test coverage
#13773 merged
May 25, 2025 -
kv-cache : rework kv_cell
#13706 merged
May 25, 2025 -
Fix build on OpenBSD
#13541 merged
May 25, 2025 -
mtmd : add support for Qwen2-Audio and SeaLLM-Audio
#13760 merged
May 25, 2025 -
docs : add Moondream2 pre-quantized link
#13745 merged
May 25, 2025 -
server: fix/test add_generation_prompt param
#13770 merged
May 25, 2025 -
Qwen3 MoE should also work with tie_word_embeddings
#13768 merged
May 25, 2025 -
SYCL: Temporarily revert "sycl: simplify bin_bcast_kernel (#13383)"
#13752 merged
May 25, 2025 -
server
: streaming of tool calls and thoughts when--jinja
is on#12379 merged
May 25, 2025 -
releases : bundle llvm omp library in windows release
#13763 merged
May 24, 2025 -
releases : enable openmp in windows cpu backend build
#13756 merged
May 24, 2025 -
ggml-cpu : set openmp wait time if not set
#13758 merged
May 24, 2025 -
Move GLM4 f32 attention fix to the correct function
#13750 merged
May 24, 2025 -
ggml : add ggml_gelu_erf() CUDA kernel
#13719 merged
May 24, 2025 -
vocab : fix ugm tokenizer precision
#13743 merged
May 24, 2025 -
CUDA: fix race condition in FA vector kernels
#13742 merged
May 24, 2025 -
ci : enable winget package updates
#13734 merged
May 23, 2025 -
ci : add winget package updater
#13732 merged
May 23, 2025 -
hparams : initialize arrays
#13728 merged
May 23, 2025 -
llama : allow custom list of swa_layers
#13726 merged
May 23, 2025 -
server : support audio input
#13714 merged
May 23, 2025 -
[CANN]Support OP MUL_MAT_ID Q8 && Q4
#13705 merged
May 23, 2025 -
ggml : fix the order of ggml_unary_op
#13718 merged
May 23, 2025 -
vulkan: support CPY from any type to itself
#13695 merged
May 23, 2025 -
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it
#13696 merged
May 23, 2025 -
use LOG_WARN to replace
std::cerr
#13657 merged
May 23, 2025 -
release : fix windows hip release
#13707 merged
May 22, 2025 -
tts : fix n_ubatch + make WavTokenizer cache-less
#13713 merged
May 22, 2025 -
mtmd : add ultravox audio input
#13623 merged
May 22, 2025 -
common: Include torch package for s390x
#13699 merged
May 22, 2025 -
server : pad small embedding batches
#13692 merged
May 22, 2025 -
gguf-py : correct charsmap parameter typing
#13701 merged
May 22, 2025 -
sycl: Remove waits from async functions call
#13702 merged
May 22, 2025 -
SYCL: Avoid using SYCL-Graph for unsupported nodes
#13587 merged
May 22, 2025 -
opencl: Add support for multiple devices
#12622 merged
May 21, 2025 -
opencl: fix couple crashes
#12795 merged
May 21, 2025 -
releases : build CPU backend separately (windows)
#13642 merged
May 21, 2025 -
hparams : support models for which all layers use SWA
#13682 merged
May 21, 2025 -
server : improve error reporting
#13680 merged
May 21, 2025 -
convert : add qwen2vl support for unsloth merges
#13686 merged
May 21, 2025 -
examples : switch retrieval to llama_encode
#13685 merged
May 21, 2025 -
gguf-py : display the invalid gguf type
#13687 merged
May 21, 2025 -
ggml : add ggml_gelu_erf()
#13667 merged
May 21, 2025 -
Add the endpoints /api/tags and /api/chat
#13659 merged
May 21, 2025 -
server : fix first message identification
#13634 merged
May 21, 2025 -
kv-cache : simplify the interface
#13660 merged
May 21, 2025 -
model : disable SWA for Phi models
#13676 merged
May 21, 2025 -
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
#13647 merged
May 21, 2025 -
vulkan: small fixes
#13626 merged
May 20, 2025 -
mtmd-helper : bug fix to token batching in mtmd
#13650 merged
May 20, 2025 -
model : fix llama4 graph
#13663 merged
May 20, 2025 -
llama : remove llama_kv_cache_view API + remove deprecated
#13653 merged
May 20, 2025 -
CUDA: skip fully masked-out KV in FA vec kernel
#13584 merged
May 20, 2025 -
tests : avoid github urls due to throttling
#13654 merged
May 20, 2025 -
sycl: disable reorder for sycl mulmat
#13536 merged
May 20, 2025 -
Fix GLM4 incoherence with fp16 accumulators
#13639 merged
May 20, 2025 -
metal : fix typo in FA kernel comments
#13651 merged
May 20, 2025 -
kv-cache : add SWA support
#13194 merged
May 20, 2025 -
[CANN] Update CANN model support status
#13162 merged
May 20, 2025 -
sycl : Overcoming workaround for mmap() allocation on Windows
#13482 merged
May 20, 2025 -
added load_progress_callback to common_params
#13617 merged
May 19, 2025 -
Vulkan: Support fp32 accumulator in quantized matmul to fix GLM4-32B incoherence
#13607 merged
May 19, 2025 -
sycl : reviewing the backend documentation
#13544 merged
May 19, 2025 -
mtmd : add vision support for llama 4
#13282 merged
May 19, 2025 -
ci : upgraded oneAPI version in SYCL workflows and dockerfile
#13532 merged
May 19, 2025 -
sync : ggml
#13630 merged
May 19, 2025 -
fix: check model pointer validity before use
#13631 merged
May 19, 2025 -
[CANN]Support OP MUL_MAT_ID
#13042 merged
May 19, 2025 -
server : added --no-prefill-assistant flag
#13608 merged
May 17, 2025 -
fix: use the current build config for
vulkan-shaders-gen
#13595 merged
May 17, 2025 -
parallel : add option for non-shared and larger prompts
#13598 merged
May 17, 2025 -
vulkan: move common FA code to flash_attn_base.comp
#13556 merged
May 17, 2025 -
vulkan: use scalar FA rather than coopmat2 when N==1
#13554 merged
May 17, 2025 -
Bump LLGuidance git tag so it compiles on GCC15
#13594 merged
May 16, 2025 -
server : do not return error when running out of context (with ctx shift disabled)
#13577 merged
May 16, 2025 -
webui : improve accessibility for visually impaired people
#13551 merged
May 16, 2025 -
readme : clarify the list of dependencies and their license
#13591 merged
May 16, 2025 -
releases : use arm version of curl for windows arm releases
#13592 merged
May 16, 2025 -
metal : add FA-vec kernel for head size 64
#13583 merged
May 16, 2025 -
llama : print hint when loading a model when no backends are loaded
#13589 merged
May 16, 2025 -
ci : add ppc64el to build-linux-cross
#13575 merged
May 16, 2025 -
sycl : fixed compilation warnings
#13582 merged
May 16, 2025 -
minja: sync
#13573 merged
May 15, 2025 -
gguf : use ggml log system
#13571 merged
May 15, 2025 -
gguf-py : Fix disconnect-before-connect crash on Kubuntu
#13569 merged
May 15, 2025 -
convert : fix conversion for llama 4
#13567 merged
May 15, 2025 -
sycl: simplify bin_bcast_kernel
#13383 merged
May 15, 2025 -
sycl : Implemented reorder Q4_K mmvq
#13109 merged
May 15, 2025 -
sycl: use oneDNN for matrices multiplication
#12972 merged
May 15, 2025 -
llama-bench : fix -ot with dl backends
#13563 merged
May 15, 2025 -
webui : handle PDF input (as text or image) + convert pasted long content to file
#13562 merged
May 15, 2025 -
fix: proper error handling for missing elements in messages array (OpenAI compatible backend)
#13540 merged
May 15, 2025 -
bench : handle decode errors
#13548 merged
May 15, 2025 -
server
: inject date_string in llama 3.x template + fix date for firefunction v2#12802 merged
May 15, 2025 -
kv-cache : fix out-of-bounds view during reserve graph
#13547 merged
May 14, 2025 -
arm64: optimize q6_k_q8_k kernel with i8mm
#13519 merged
May 14, 2025 -
common
: add partial regex support#12808 merged
May 14, 2025 -
editorconfig : fix trailing whitespace from #13542
#13546 merged
May 14, 2025 -
fix: crash when calling
llama_state_get_size
on a context without a KV cache#13542 merged
May 14, 2025 -
CUDA: fix crash on large batch size for quant. MoE
#13537 merged
May 14, 2025 -
llama : fix quantize with dl backends
#13539 merged
May 14, 2025 -
CUDA: faster Deepseek FA, add Turing support
#13435 merged
May 14, 2025 -
Granite MoE NoPE fix
#13538 merged
May 14, 2025 -
server : passthrough the /models endpoint during loading
#13535 merged
May 14, 2025 -
server : fix cache_tokens bug with no cache_prompt
#13533 merged
May 14, 2025 -
cmake: simplify vulkan shader test logic
#13263 merged
May 14, 2025 -
vulkan: KHR_coopmat flash attention
#13506 merged
May 14, 2025 -
webui : use fflate for more deterministic gzip compress
#13525 merged
May 14, 2025 -
webui: Allow pasting file from clipboard
#13526 merged
May 14, 2025 -
docs: Update link to ggml-org in multimodal.md
#13513 merged
May 14, 2025 -
scripts : fix compare-llama-bench.py show parameter
#13514 merged
May 14, 2025 -
vulkan: workaround FA compile failures on macos
#13517 merged
May 14, 2025 -
quantize: improve pattern matching for allowed tensors
#13033 merged
May 13, 2025 -
clip : clip.h become private API (⚠️ breaking change)
#13510 merged
May 13, 2025 -
metal : use FA-vec kernel up to batch size 20
#13496 merged
May 13, 2025 -
metal : optimize multi-sequence FA vec kernel
#13493 merged
May 13, 2025 -
ggml-cpu: Update KleidiAI to v1.6 and fix include directives
#13509 merged
May 13, 2025 -
batched-bench : fix pp batch contents
#13492 merged
May 13, 2025 -
mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change)
#13460 merged
May 13, 2025 -
scripts : support arbitrary input file formats in compare-llama-bench.py
#13455 merged
May 13, 2025 -
Model: Granite MoE shared
#13269 merged
May 13, 2025 -
sync : ggml
#13502 merged
May 13, 2025 -
llama-bench : add defrag-thold, check for invalid ranges
#13487 merged
May 12, 2025 -
opencl: remove unnecessary assert for
add
#13257 merged
May 12, 2025 -
clip : cap max image size 1024 for qwen vl model
#13478 merged
May 12, 2025 -
llama/ggml: add LLM training support
#10544 merged
May 12, 2025 -
context : fix state io for memory-less contexts
#13470 merged
May 12, 2025 -
Allow content null for tool call
#13477 merged
May 12, 2025 -
llama-bench : accept ranges for integer parameters
#13410 merged
May 12, 2025 -
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
#13053 merged
May 12, 2025 -
CUDA: fix misaligned synchronization in FA
#13469 merged
May 12, 2025 -
ggml : add mrope kernel for metal
#13457 merged
May 12, 2025 -
sycl: enable dpcpp nightly builds with oneMKL and oneDNN
#13406 merged
May 12, 2025 -
mtmd : use RMS norm for InternVL 3 38B and 78B mmproj
#13459 merged
May 11, 2025 -
tools : fix invalid free()
#13436 merged
May 11, 2025 -
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare
#13451 merged
May 11, 2025 -
CUDA: fix crash with partial offloading of MoE
#13439 merged
May 11, 2025 -
Add
--no-op-offload
to improve-ot
pp perf in MoE models like llama4 400B#13386 merged
May 11, 2025 -
mtmd : support InternVL 3 38B and 78B mmproj
#13443 merged
May 11, 2025 -
mtmd : move helpers to dedicated file
#13442 merged
May 11, 2025 -
readme: Fix typo in InternVL model name
#13440 merged
May 10, 2025 -
CUDA: fix race cond A851 itions in FlashAttention kernels
#13438 merged
May 10, 2025 -
vocab : add ByteDance-Seed/Seed-Coder
#13423 merged
May 10, 2025 -
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl
#13434 merged
May 10, 2025 -
server : update docs
#13432 merged
May 10, 2025 -
llguidance : init tokenizer slices
#13424 merged
May 10, 2025 -
ci:
free_disk_space
flag enabled for intel variant#13426 merged
May 10, 2025 -
mtmd : support InternVL 2.5 and 3
#13422 merged
May 10, 2025 -
CUDA: fix FlashAttention on Turing
#13415 merged
May 10, 2025 -
arg : add env var to control mmproj
#13416 merged
May 10, 2025 -
vulkan: scalar flash attention implementation
#13324 merged
May 10, 2025 -
Use tagged version of llguidance that does not break the build
#13413 merged
May 9, 2025 -
server : vision support via libmtmd
#12898 merged
May 9, 2025 -
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs
#12858 merged
May 9, 2025 -
metal : optimize MoE for large batches
#13388 merged
May 9, 2025 -
CUDA: FA support for Deepseek (Ampere or newer)
#13306 merged
May 9, 2025 -
llama : do not crash if there is no CPU backend
#13395 merged
May 9, 2025 -
CUDA: fix crash on large batch size for MoE models
#13384 merged
May 9, 2025 -
Add --parse-special for enabling parsing of special tokens in imatrix calculation
#13389 merged
May 9, 2025 -
llama-run: add support for downloading models from ModelScope
#13370 merged
May 9, 2025 -
mtmd : fix batch_view for m-rope
#13397 merged
May 9, 2025 -
llama : one-off chat template fix for Mistral-Small-2503
#13398 merged
May 9, 2025 -
rpc : add rpc_msg_set_tensor_hash_req
#13353 merged
May 9, 2025 -
vulkan: Allow up to 4096 elements for mul_mat_id row_ids
#13326 merged
May 9, 2025 -
server : (webui) rename has_multimodal --> modalities
#13393 merged
May 9, 2025 -
ci : limit write permission to only the release step + fixes
#13392 merged
May 8, 2025 -
mtmd: Expose helper_decode_image_chunk
#13366 merged
May 8, 2025 -
server : (webui) fix a very small misalignment
#13387 merged
May 8, 2025 -
server : (webui) revamp the input area, plus many small UI improvements
#13365 merged
May 8, 2025 -
convert : support rope_scaling type and rope_type
#13349 merged
May 8, 2025 -
mtmd: Fix the calculation of n_tokens for smolvlm
#13381 merged
May 8, 2025 -
context : allow cache-less context for embeddings
#13108 merged
May 8, 2025 -
context : remove logits_all flag
#13284 merged
May 8, 2025 -
ci : move release workflow to a separate file
#13362 merged
May 8, 2025 -
llama : print size and type of overridden tensors
#13364 merged
May 8, 2025 -
sycl: addressing non-contiguous src1 mul_mats (nc and batched)
#13343 merged
May 8, 2025 -
docker : disable arm64 and intel images
#13356 merged
May 7, 2025 -
sync : ggml
#13355 merged
May 7, 2025 -
llama : deci : support ffn-free with attention
#13296 merged
May 7, 2025 -
common: Warn when we can't match samplers for a sampler sequence.
#13330 merged
May 7, 2025 -
musa: remove nrows_x in mul_mat_q_process_tile
#13325 merged
May 7, 2025 -
examples : remove infill
#13283 merged
May 7, 2025 -
Support tie embedding for chatglm models
#13328 merged
May 7, 2025 -
CUDA: build archs as virtual for GGML_NATIVE=OFF
#13135 merged
May 6, 2025 -
clip : refactor graph builder
#13321 merged
May 6, 2025 -
sampling: make top_n_sigma no-op at <=0 rather than <0
#13345 merged
May 6, 2025 -
sampling: Don't consider -infinity values in top_n_sigma
#13344 merged
May 6, 2025 -
cmake : remove arm64 msvc presets
#13342 merged
May 6, 2025 -
SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled
#13254 merged
May 6, 2025 -
llama : fix build_ffn without gate
#13336 merged
May 6, 2025 -
CUDA: fix bad asserts for partial offload
#13337 merged
May 6, 2025 -
convert : qwen2/3moe : set yarn metadata if present
#13331 merged
May 6, 2025 -
CUDA: fix --split-mode row for MMQ
#13323 merged
May 6, 2025 -
gguf-py : avoid requiring PySide6 for packaged scripts
#13036 merged
May 6, 2025 -
CUDA: fix logic for clearing padding with -ngl 0
#13320 merged
May 5, 2025 -
sampling: Integrate Top-nσ into main sampling chain (and add it to the server)
#13264 merged
May 5, 2025 -
Webui - change setText command from parent window to also send the message.
#13309 merged
May 5, 2025 -
mtmd : rename llava directory to mtmd
#13311 merged
May 5, 2025 -
clip : fix confused naming ffn_up and ffn_down
#13290 merged
May 5, 2025 -
convert : bailingmoe : set yarn metadata if present
#13312 merged
May 5, 2025 -
SYCL: Disable mul_mat kernels for noncontiguous tensor b
#13308 merged
May 5, 2025 -
mtmd : add C public API
#13184 merged
May 4, 2025 -
rpc : use backend registry, support dl backends
#13304 merged
May 4, 2025 -
ggml-cpu: Support Q3_K SIMD on s390x
#13301 merged
May 4, 2025 -
llava/mtmd : fixes to fully support dl backends
#13303 merged
May 4, 2025 -
llama : build windows releases with dl backends
#13220 merged
May 4, 2025 -
CUDA: fix race condition in MMQ stream-k fixup
#13299 merged
May 4, 2025 -
CUDA: fix race condition in MMQ ids_dst
#13294 merged
May 4, 2025 -
vulkan: Additional type support for unary, binary, and copy
#13266 merged
May 4, 2025 -
imatrix: fix oob writes if src1 is not contiguous
#13286 merged
May 3, 2025 -
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change)
#13259 merged
May 3, 2025 -
Llama-3_1-Nemotron-Ultra-253B-v1 support
#12843 merged
May 3, 2025 -
llama : move end-user examples to tools directory
#13249 merged
May 2, 2025 -
sync : ggml
#13268 merged
May 2, 2025 -
context : fix reorder logic
#13267 merged
May 2, 2025 -
PowerPC: Enable MMA for BF16 in llamafile_sgemm
#13148 merged
May 2, 2025 -
llama-model : support Qwen2 embedding models and pooling_mode_lasttoken
#13245 merged
May 2, 2025 -
convert : fix context length for nomic-embed-text-v2-moe
#13216 merged
May 2, 2025
81 Pull requests opened by 66 people
-
Fix Vulkan glslc invocation command lines
#13289 opened
May 3, 2025 -
cuda: refactored ssm_scan and use CUB
#13291 opened
May 4, 2025 -
Added dynamic context size. This is perfect for servers running llama models as a service.
#13295 opened
May 4, 2025 -
[Perf] [CPU] eliminate redundant memory access in group query attention
#13319 opened
May 5, 2025 -
add AMD Genoa
#13334 opened
May 6, 2025 -
Support Sp token Function Call Token Implementation
#13339 opened
May 6, 2025 -
Add mistral-chat-7b preset for llama-server
#13348 opened
May 7, 2025 -
python : bump transformers version
#13351 opened
May 7, 2025 -
common: add default reranker presets
#13352 opened
May 7, 2025 -
CUDA: update build CTK version to 12.8
#13360 opened
May 7, 2025 -
llama: Fix typos in multiple files
#13369 opened
May 8, 2025 -
gguf-py: Optimize `GGUFReader` read-only mode performance
#13378 opened
May 8, 2025 -
musa: restore MUSA graph settings in CMakeLists.txt
#13382 opened
May 8, 2025 -
arg : add model catalog
#13385 opened
May 8, 2025 -
grammar: handle misplaced special regex chars [*+?]
#13391 opened
May 8, 2025 -
server : PoC implementation of "interim" server
#13400 opened
May 9, 2025 -
Update README.md for using llama.cpp in Microsoft Word locally
#13401 opened
May 9, 2025 -
Break down main function in llama-server
#13425 opened
May 10, 2025 -
Webui dynamic config
#13429 opened
May 10, 2025 -
llama: Add configuration presets for chat and reranking servers
#13462 opened
May 12, 2025 -
Support Seed-Coder chat template
#13472 opened
May 12, 2025 -
docker : enable RPC for docker images
#13474 opened
May 12, 2025 -
feat(server): Add tool call support to WebUI (LLama Server)
#13501 opened
May 13, 2025 -
convert: Swap GLM4 EOS / EOT token
#13505 opened
May 13, 2025 -
webui: Add editing assistant messages (#11849)
#13522 opened
May 14, 2025 -
cuda: set cuda compiler path (#13527)
#13528 opened
May 14, 2025 -
[CUDA backend ONLY] Use just K-cache for MLA + FA: 47% saving on KV-cache size
#13529 opened
May 14, 2025 -
Granite Four
#13550 opened
May 14, 2025 -
Update python verions
#13574 opened
May 15, 2025 -
server : separate the notion of position and KV tokens, remove prompt truncation
#13576 opened
May 15, 2025 -
ggml : fix race-condition in ggml-rpc
#13600 opened
May 17, 2025 -
ggml : add memset_tensor for rpc
#13601 opened
May 17, 2025 -
scripts: update pyproject.toml - deprecated poetry config + support uv
#13615 opened
May 18, 2025 -
cuda: fix CMAKE_CUDA_COMPILER not found error (#13528)
#13625 opened
May 19, 2025 -
webui: Allow editing file attachments when editing messages.
#13645 opened
May 20, 2025 -
add GGML_USE_NUMA_MIGRATE feature to optimize cross NUMA op computation
#13649 opened
May 20, 2025 -
model : jina-embeddings-v3 support
#13693 opened
May 21, 2025 -
common/llama: align structures for reduce cacheline size on 64bit platforms
#13710 opened
May 22, 2025 -
remove templates from soft_max_f32_submitter to allow SYCL graph updates
#13724 opened
May 23, 2025 -
Move page cache via mbind to prevent cross-NUMA access
#13731 opened
May 23, 2025 -
SYCL: Implement few same quantized type copy kernels
#13739 opened
May 24, 2025 -
cmake : set `RPATH` to `$ORIGIN` on Linux (#13740)
#13741 opened
May 24, 2025 -
Add comprehensive test for llama_batch/sbatch/ubatch concepts
#13764 opened
May 24, 2025 -
ggml : add ggml_fill()
#13772 opened
May 25, 2025 -
server: args for draft model cache types (#11200)
#13782 opened
May 25, 2025 -
Add support for VK_EXT_debug_utils to add labels to Vulkan objects.
#13792 opened
May 26, 2025 -
Add OPT model support - Add OPT architecture support in C++ code - Im…
#13799 opened
May 26, 2025 -
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D
#13813 opened
May 26, 2025 -
ggml: improve ggml_backend_cuda_cpy_tensor_async
#13818 opened
May 27, 2025 -
Tokenize logging
#13821 opened
May 27, 2025 -
examples : support MiniCPM-V-2
#13828 opened
May 27, 2025 -
convert: add support for Japanese Bert model
#13830 opened
May 27, 2025 -
kv-cache : avoid modifying recurrent cells when setting inputs
#13834 opened
May 27, 2025 -
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat
#13840 opened
May 28, 2025 -
musa: enable fp16 mma (all) and cublas on qy2
#13842 opened
May 28, 2025 -
tests : add test-tokenizers-remote
#13846 opened
May 28, 2025 -
docs : add "Quick start" section for new users
#13862 opened
May 28, 2025 -
finetune.cpp command-line arg
#13873 opened
May 28, 2025 -
sycl: Add reorder to Q6_K mmvq implementation
#13885 opened
May 29, 2025 -
musa: extract ggml_cuda_mul_mat_batched_cublas_gemm_batched_ex
#13887 opened
May 29, 2025 -
[WIP] model: add new model minimax-text-01
#13889 opened
May 29, 2025 -
ggml-cpu : split arch-specific implementations
#13892 opened
May 29, 2025 -
Need to undefine "hz" on AIX
#13894 opened
May 29, 2025 -
ci(intel): venv for python & pip installation for intel docker
#13898 opened
May 29, 2025 -
convert: add eagle2 draft arch
#13908 opened
May 30, 2025 -
remove WIP since PR has been merged
#13912 opened
May 30, 2025 -
[Ascend NPU] Enable labeler
#13914 opened
May 30, 2025 -
[CANN]Support Acl Graph
#13915 opened
May 30, 2025 -
Add plamo2
#13930 opened
May 30, 2025 -
`chat`: improve llama 3.x handling of <|python_tag|> (+ allow --special combo)
#13932 opened
May 30, 2025 -
`server`: update deepseek reasoning format (pass reasoning_content as diffs)
#13933 opened
May 30, 2025 -
vulkan: automatically deduce size of push constants
#13936 opened
May 31, 2025 -
vulkan: fix warnings in perf logger querypool code
#13937 opened
May 31, 2025 -
chore: added badge and link to release
#13938 opened
May 31, 2025 -
opencl: add `backend_synchronize`
#13939 opened
May 31, 2025 -
llama : support multiple classifier outputs and labels
#13940 opened
May 31, 2025 -
ci: add LoongArch cross-compile build
#13944 opened
May 31, 2025 -
gemma : fix attn scale for 27B
#13951 opened
Jun 1, 2025 -
ci: Update windows-2019 to windows-2022
#13960 opened
Jun 1, 2025 -
server : use swa-full fo draft context
#13970 opened
Jun 2, 2025 -
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices
#13973 opened
Jun 2, 2025
236 Issues closed by 57 people
-
Misc. bug: memory leak in mtmd ? (mtmd_helper_eval_chunk_single)
#13958 closed
Jun 2, 2025 -
Misc. bug: rpc - Flash Attention Failure in Metal/CUDA RPC Mixed Environment
#12655 closed
Jun 2, 2025 -
gmake[2]: *** [tests/CMakeFiles/test-tokenizer-0.dir/build.make:107: bin/test-tokenizer-0] Error 1
#12998 closed
Jun 2, 2025 -
Eval bug: Segmentation fault when running gemma3-cli on Android
#13000 closed
Jun 2, 2025 -
Eval bug: why Gemma 3 model has run into CPU inference
#13004 closed
Jun 2, 2025 -
Eval bug: default system prompt in llama-server
#13948 closed
Jun 1, 2025 -
Eval bug: Quad P40 unable to run 70B models on recent releases
#12990 closed
Jun 1, 2025 -
Eval bug: Not support DeepSeek-R1-0528-GGUF-Q8_0
#13916 closed
May 31, 2025 -
mtmd: cmake: C API broken since last change, static linking always broken
#13902 closed
May 31, 2025 -
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use
#13812 closed
May 31, 2025 -
CUDA illigal memory bug 75 fixed?
#13906 closed < 10000 relative-time datetime="2025-05-31T04:28:57Z" class="no-wrap">May 31, 2025
-
Misc. bug: what(): Unexpected empty grammar stack after accepting piece: <unused32>
#13341 closed
May 31, 2025 -
Compile bug: gcc-11: error: unrecognized command-line option '-compress-mode=size'
#12325 closed
May 31, 2025 -
Eval bug: convert_hf_to_gguf.py AttributeError:
#12847 closed
May 31, 2025 -
Compile bug: FAILED: examples/llava/CMakeFiles/llava.dir/llava.cpp.obj
#12899 closed
May 31, 2025 -
Compile bug: how to enable opencl in termux
#12911 closed
May 31, 2025 -
Misc. bug: llama-server speculative decoding not as performant as llama-speculative-simple
#12968 closed
May 31, 2025 -
Feature Request: multi model cli tools: Convert submitted images to best size and format for model
#12981 closed
May 31, 2025 -
Feature Request: Make chat sessions possible with multi model cli tools
#12982 closed
May 31, 2025 -
Misc. bug: Potential memory leak in backend registry
#12986 closed
May 31, 2025 -
Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue
#13877 closed
May 30, 2025 -
`CUDA error: an illegal memory access was encountered` on DeepSeek-R1-0528
#13909 closed
May 30, 2025 -
CUDA error: an illegal memory access was encountered (with large prompts)
#13851 closed
May 30, 2025 -
Eval bug: "GGML_ASSERT(!(split && ne02 > 1)) failed" when loading DeepSeek-R1T with --split-mode row
#13372 closed
May 30, 2025 -
Feature Request: Splitting layers according to VRAM usage on multi GPUs setups
#12654 closed
May 30, 2025 -
Misc. bug: Excessive power draw on the second GPU in dual RTX 3090 setup when idle
#12958 closed
May 30, 2025 -
Why does /ggml/CMakeLists.txt add_subdirectory(examples)?
#12963 closed
May 30, 2025 -
Misc. bug: gguf-new-metadata and gguf-editor-gui changes all integer arrays to INT32
#13557 closed
May 29, 2025 -
Eval bug: stream with tool_call fix in b5478 crash in container and issues with calls from apps
#13766 closed
May 29, 2025 -
Misc. bug: ALL gguf models fail to run (no log, docker exit code 139),
#12205 closed
May 29, 2025 -
Eval bug: got exception: {"code":500,"message":"Unsupported param: echo","type":"server_error"}
#12591 closed
May 29, 2025 -
Compile bug: ggml-cuda/opt-step-adamw.cu error: identifier "__Poly8x8_t" is undefined on Jetson Orin AGX
#12826 closed
May 29, 2025 -
CUDA: implementation of mul_mat_id
#12859 closed
May 29, 2025 -
what *tool/framework* to use if testing performance of .gguf models
#12901 closed
May 29, 2025 -
Misc. bug: llama-bench --tensor-split handling is broken
#12917 closed
May 29, 2025 -
Compile bug: macro "DECL_FATTN_MMA_F16_CASE" requires 3 arguments, but only 2 given
#12921 closed
May 29, 2025 -
Misc. bug: llama-server "terminate called after throwing an instance of 'std::runtime_error'"
#12939 closed
May 29, 2025 -
Model conversion issue
#12941 closed
May 29, 2025 -
Eval bug: KV cache shifting does not work for Qwen2.5VL
#13865 closed
May 28, 2025 -
CI: build-linux-cross failing
#13869 closed
May 28, 2025 -
Eval bug: qwen2.5-vl related bugs
#13848 closed
May 28, 2025 -
Unable to deploy the fine-tuned qwen2.5-vl-7b using llama.cpp.
#13723 closed
May 28, 2025 -
Misc. bug: Streaming tool calls does not return "type": "function", unlike non-stream
#13798 closed
May 28, 2025 -
Feature Request: Free up VRAM when llama-server not in use
#11703 closed
May 28, 2025 -
Eval bug: ggml_vulkan: Device memory allocation of size N failed with ub > 4096 and c > 4096 and b > 4096
#12817 closed
May 28, 2025 -
Eval bug: ROCm error: CUBLAS_STATUS_INTERNAL_ERROR
#12878 closed
May 28, 2025 -
Misc. bug: gguf-my-repo doesn't work - [Errno 2] No such file or directory: './llama.cpp/llama-quantize'
#12925 closed
May 28, 2025 -
Misc. bug: The llama-server not read the "--keep" param that user input in the cli
#12927 closed
May 28, 2025 -
I ran into this issue while trying to convert Smollm2 and Qwen2.5
#13603 closed
May 27, 2025 -
Misc. bug: llama-mtmd-cli ignores multiple image input
#13704 closed
May 27, 2025 -
Large performance drop when using pipeline parallelism and layer splitting on multiple GPUs
#13751 closed
May 27, 2025 -
Eval bug: gemma3 getting stuck with no output when
#13715 closed
May 27, 2025 -
Misc. bug: llama-sampling.cpp:204: GGML_ASSERT(cur_p->size > 0) failed
#13405 closed
May 27, 2025 -
Compile bug: MTT S4000 compile error
#13819 closed
May 27, 2025 -
Misc. bug: Streaming with tools causes pydantic-ai to mess up tool name
#13774 closed
May 26, 2025 -
server: terminate called after throwing an instance of 'std::runtime_error'
#13780 closed
May 26, 2025 -
Eval bug: Output NAN when use Qwen3 embedding models with FP16
#13795 closed
May 26, 2025 -
Eval bug: GGML_ASSERT(ggml_vk_op_supports_incontiguous(op) || ggml_vk_dim01_contiguous(src0)) failed
#13597 closed
May 26, 2025 -
convert_hf_to_gguf.py does not work for QWen-7b-chat fine tuning with LoRa exported model.
#13789 closed
May 26, 2025 -
Eval bug: Mistral Small Multiomodal fails when used with the Vulkan backend
#13778 closed
May 26, 2025 -
Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc
#11333 closed
May 26, 2025 -
Eval bug: Accuracy is dropped when I convert model to gguf. Qwen2_VL_7B_Instruct
#12538 closed
May 26, 2025 -
Eval bug: Crash in trim method
#12710 closed
May 26, 2025 -
How to use *chat_template* with .gguf models ? (tokenizer_name not implemented)
#12897 closed
May 26, 2025 -
multiple_choice_score : task 17 does not fit in the context window
#12905 closed
May 26, 2025 -
Eval bug: GLM-Z1-9B-0414
#12946 closed
May 25, 2025 -
Misc. bug: Speed degradation in `bin-win-cpu-x64` compared to `bin-win-avx2-x64` on Intel Core i7-12700H
#13664 closed
May 25, 2025 -
Feature Request: moondream2 vlm support in mtmd
#13332 closed
May 25, 2025 -
Compile bug: ‘ggml_gelu_erf’ was not declared in this scope; did you mean ‘ggml_gelu’
#13744 closed
May 25, 2025 -
Feature Request: Support for Qwen2-VL
#9246 closed
May 25, 2025 -
Prompt eval is 5x slower than in Ollama and maxes out the CPU
#12237 closed
May 25, 2025 -
Feature Request: Slim Attention (lossless 2x reduction in KV cache size)
#12359 closed
May 25, 2025 -
Misc. bug: convert_hf_to_gguf.py fails to convert the model of architecture T5ForConditionalGeneration
#12862 closed
May 25, 2025 -
Eval bug: Assertion _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS while using a particular model
#12877 closed
May 25, 2025 -
Eval bug: moonshotai/Moonlight-16B-A3B-Instruct
#12880 closed
May 25, 2025 -
Eval bug: add support for https://huggingface.co/
#12884 closed
May 25, 2025 -
Misc. bug: llama-server token per second slow down sigificant after release b5450 (#13642)
#13735 closed
May 24, 2025 -
Eval bug: UGM tokenizer sometimes outputs wrong tokens/in the wrong order
#13725 closed
May 24, 2025 -
Compile bug: Build failure for Intel oneMKL on Windows
#12478 closed
May 24, 2025 -
Add support for gemma 3 in the server?
#12762 closed
May 24, 2025 -
CUDA performance bug when two cards are visible and only one is used
#12838 closed
May 24, 2025 -
Misc. bug: Overflow in Cast (
#13722 closed
May 23, 2025 -
Phi-4-mini reasoning CRASH!!! (Vulkan)
#13464 closed
May 23, 2025 -
OpenCL: Performance comparison depending on gpu_offloads
#12810 closed
May 23, 2025 -
Llama 4 convert_hf_to_gguf.py tokenizer error
#12819 closed
May 23, 2025 -
Misc. bug: HIP / ROCm memory allocation broken after release b5450
#13698 closed
May 22, 2025 -
Eval bug: `llama-tts` fails (abort) with longer lines
#13712 closed
May 22, 2025 -
GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed
#13689 closed
May 22, 2025 -
Eval bug: MUSA backend cause non-sense output on unsloth/deepseek-r1 quantized model
#12779 closed
May 22, 2025 -
Misc. bug: Metric names are invalid
#12803 closed
May 22, 2025 -
crash: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN")
#13688 closed
May 21, 2025 -
Eval bug: phi-4 crashes with new versions
#13665 closed
May 21, 2025 -
OpenCL: Add CPU fallback for unsupported operations
#13621 closed
May 21, 2025 -
Eval bug: Cannot run unsloth/deepseek-r1 2bit Model
#12778 closed
May 21, 2025 -
Qwen3 32B and 30B models are similar size, But there is 4x difference between the performance!?
#13652 closed
May 20, 2025 -
Eval bug: NVIDIA Jetson AGX Xavier CUDA Compatibility Issue with llama.cpp
#13629 closed
May 20, 2025 -
Eval bug: vulkan Llama cpp prefers shared memory over dedicated memory
#12748 closed
May 20, 2025 -
Compile bug: `binary-ops.cpp: error: invalid conversion`
#12765 closed
May 20, 2025 -
Cannot compile SYCL backend SYCL_LIBRARY=SYCL_LIBRARY - NOTFOUND as per documentation
#12696 closed
May 19, 2025 -
Eval bug: No output using llama-batched-bench
#13553 closed
May 19, 2025 -
Feature Request: when llama.cpp can support convert qwen2.5 VL 7B/72B model to gguf?
#11541 closed
May 19, 2025 -
Misc. bug: HIP backend performs poorly on AMD Ryzen AI MAX 395 (Strix Halo gfx1151)
#13565 closed
May 18, 2025 -
Misc. bug: llama_tokenize parse_special is ignored in newer versions
#12743 closed
May 18, 2025 -
Misc. bug: CUDA errors with multi-threaded use
#11804 closed
May 17, 2025 -
Compile bug: error: unknown target CPU 'apple-m2' on M2 Ultra
#13363 closed
May 17, 2025 -
Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT
#10710 closed
May 17, 2025 -
Compile bug: ¿How to compile only one example?
#12661 closed
May 17, 2025 -
When will llama.cpp's vulkan provide support for Intel Arc's matrix core?
#12690 closed
May 17, 2025 -
webui: Make the Web UI more accessible for blind users
#13531 closed
May 16, 2025 -
Feature Request: Task Cancellation on Client Disconnection
#6421 closed
May 16, 2025 -
Eval bug: Qwen3 Q4_0 not working with SYCL
#13163 closed
May 16, 2025 -
Eval bug: Qwen3, failed to parse chat template (jinja)
#13178 closed
May 16, 2025 -
CI: editorconfig-checker appears to have made a false positive judgment on "Trailing whitespace"
#13374 closed
May 16, 2025 -
Misc. bug: llama-cli '--log-disable' parameter omits response
#11983 closed
May 16, 2025 -
llama cpp android gpu
#12462 closed
May 16, 2025 -
llama-gemma3-cli: output degeneration after repeated uses
#12499 closed
May 16, 2025 -
Feature Request: Add support for StarVector-8b/1b
#12666 closed
May 16, 2025 -
Eval bug: with -ub 8192 model llama-server insists running on GPU
#12675 closed
May 16, 2025 -
Feature Request: Method that counts the number of image tokens in LLAVA_API
#12689 closed
May 16, 2025 -
Eval bug: Qwerky 72B (rwkv6qwen2) failed to load with `--split-mode row` option
#12692 closed
May 16, 2025 -
Misc. bug: webui multimodal, image input is not supported by this server, server error 500
#13566 closed
May 15, 2025 -
Misc. bug: Web UI's download and delete chat button missing in b5392
#13564 closed
May 15, 2025 -
Misc. bug: In Windows, llama-bench does not recognize the -ot or --override-tensors parameter.
#13491 closed
May 15, 2025 -
Great work ! !
#13558 closed
May 15, 2025 -
Eval bug: nomic-embed-text-v2-moe GGML_ASSERT(pc_type == ...) failed
#13534 closed
May 15, 2025 -
Eval bug: bizarre Jinja bug when trying to fix Qwen3 tool calling
#13516 closed
May 15, 2025 -
Eval bug: Jinja not replacing `date_string`
#12729 closed
May 15, 2025 -
Eval bug: Segmentation fault when using llama-quantize
#13380 closed
May 14, 2025 -
Misc. bug: Llama-Quantize.exe broken on win11 since b5298 , but works on/earlier b5215
#13518 closed
May 14, 2025 -
server: Describing pictures with multi models seems to crash the model
#13480 closed
May 14, 2025 -
Question regarding the quantization dimension of the weight such as Q4_K format
#13377 closed
May 14, 2025 -
Eval bug: Qwen3 30B adds spaces to end of each line
#13508 closed
May 14, 2025 -
Compile bug: compile cuda backend error
#13527 closed
May 14, 2025 -
Compile bug: cuda backend compile error
#12893 closed
May 14, 2025 -
Misc. bug: Compute pipeline creation failed when using Flash Attention on macOS/Vulkan
#13450 closed
May 14, 2025 -
csm : implement Sesame-based conversation example
#12392 closed
May 14, 2025 -
Eval bug: llama-qwen2vl-cli --log-disable rather disables the response, not the log
#12407 closed
May 14, 2025 -
Misc. bug: Gibbersish output on AMD Ryzen 9 8945HS w/ Radeon 780M Graphics since commit: 3d82dbcbce2c
#12657 closed
May 14, 2025 -
Misc. bug: since b4800 llama-cli does not prompt and llama-bench shows no results
#13452 closed
May 13, 2025 -
What is the partial sum in `block_q8_1_mmq`, is it for reducing the quantization error during MMA?
#13504 closed
May 13, 2025 -
Misc. bug: can't convert finetuned gemma3 model
#13490 closed
May 13, 2025 -
Eval bug: Phi-4 mini in iOS with xcframework
#12232 closed
May 13, 2025 -
Feature Request: convert_hf_to_gguf.py to support model type Qwen2_5_VLForConditionalGeneration
#12642 closed
May 13, 2025 -
GGML_ASSERT(cur_p->size > 0) failed, or gibberish on DeepSeek V3 0324 (Q2_K_XL), CUDA + CPU
#13461 closed
May 12, 2025 -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_120'
#13271 closed
May 12, 2025 -
Eval bug: Qwen2.5-vl在AMD GPU上做图像识别时崩溃(分辨率1242*881)
#13445 closed
May 12, 2025 -
Segfault when submitting image to ggml-org/Qwen2.5-VL-7B-Instruct-GGUF
#13467 closed
May 12, 2025 -
Misc. bug: crashes when calling `llama_state_get_size` on a reranking model
#13463 closed
May 12, 2025 -
Tool call errors with `Expected 'content' to be a string or an array`
#13471 closed
May 12, 2025 -
Misc. bug: rpc-server crash without cache
#13185 closed
May 12, 2025 -
Compile bug: SYCL backend build fail on debug config
#12602 closed
May 12, 2025 -
Misc. bug:
#12623 closed
May 12, 2025 -
Eval bug: mmvq.cu:519: GGML_ASSERT(!src0->view_src) failed
#13437 closed
May 11, 2025 -
Feature Request: Allow disabling `offload_op` for backends by user
#13241 closed
May 11, 2025 -
Compile bug: MinGW32_64 Vulkan Shader
#13419 closed
May 11, 2025 -
Eval bug: run failed when run lora adapter(no merged) on android
#12592 closed
May 11, 2025 -
[New Bitnet Model Support Request] Deepgrove model Bonsai 0.5B - Add Channel Scales
#12598 closed
May 11, 2025 -
Misc. bug: Data check in examples/gguf
#12617 closed
May 11, 2025 -
Eval bug: b5335 break flash attention on 4070
#13430 closed
May 10, 2025 -
ByteDance-Seed/Seed-Coder unsupported?
#13421 closed
May 10, 2025 -
Eval bug: mtmd in server mode crashes on too big image
#13414 closed
May 10, 2025 -
Update server documentation with new mmproj configuration options
#13431 closed
May 10, 2025 -
Misc. bug: Intel container images keep getting `No space left on device` during CI Build
#13052 closed
May 10, 2025 -
Misc. bug: [SYCL] Unexpected "setvars.sh has already been run" warning
#13333 closed
May 10, 2025 -
Eval bug: the swiftui keeps saying the same thing
#12558 closed
May 10, 2025 -
Misc. bug: performance drop with 2x SYCL GPUs
#12575 closed
May 10, 2025 -
-ngl to load ·last n layers· to gpu
#12577 closed
May 10, 2025 -
Compile bug: vulkan-shaders-gen hangs when built with address sanitizers
#12581 closed
May 10, 2025 -
Qwen2.5-vl support and conversion?
#12584 closed
May 10, 2025 -
Eval bug: allocating 114296.55 MiB on device 0: cudaMalloc failed: out of memory
#12586 closed
May 10, 2025 -
server: Bring back multimodal support
#8010 closed
May 9, 2025 -
server : add support for file upload to the Web UI
#11611 closed
May 9, 2025 -
Compile bug: Build breaks with llguidance
#13412 closed
May 9, 2025 -
`CUDA error: invalid configuration argument` for MoEs - `--ubatch-size 8192` exceeds `INT_MAX`
#13376 closed
May 9, 2025 -
Eval bug: mtmd Qwen2.5VL 7B not seeing an image as expected
#13394 closed
May 9, 2025 -
Feature Request: Prefix assistant answer
#11536 closed
May 9, 2025 -
Misc. bug: auto scroll doesn't work in WebUI
#12362 closed
May 9, 2025 -
Eval bug: inference of 32B eats too much memory on ROCM HIP (5x AMD Radeon Instinct Mi50 (gfx906))
#12369 closed
May 9, 2025 -
Feature Request: allow mmap to take advantage of hugepage feature which has 10x speedup
#12444 closed
May 9, 2025 -
Misc. bug: Flash attention on Vulkan
#12526 closed
May 9, 2025 -
Eval bug: seemed it cannot convert theQwen2.5-VL-7B-Instruct, please help advice, Thank you.
#12534 closed
May 9, 2025 -
Eval bug: crash when pooling_type == LLAMA_POOLING_TYPE_MEAN
#12543 closed
May 9, 2025 -
Misc. bug: vulkan: performance regression after fd123cfead49eb32e386e26b8ef7a6d41554dda5
#12553 closed
May 9, 2025 -
Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash.
#12564 closed
May 9, 2025 -
Misc. bug: The following tests FAILED: 23 - test-arg-parser (Subprocess aborted) main
#13371 closed
May 8, 2025 -
Potential memory allocation leak
#12531 closed
May 7, 2025 -
Eval bug: llama-server crash after 2 messages following commit 9070365
#13329 closed
May 6, 2025 -
Eval bug: -sm row causes wrong output
#13297 closed
May 6, 2025 -
Eval bug: input is too large to process. increase the physical batch size
#12295 closed
May 6, 2025 -
Eval bug: llama.swiftui Unexpectedly found nil while unwrapping an Optional value
#12510 closed
May 6, 2025 -
Misc. bug: test-backend-ops grad crash by GGML_ASSERT error
#12520 closed
May 6, 2025 -
Eval bug: DeepSeek-R1-UD-Q2_K_XL output broken
#13305 closed
May 5, 2025 -
Feature Request: YuE (music gen)
#11467 closed
May 5, 2025 -
Misc. bug: --no-context-shift OR --context-shift ?
#12038 closed
May 5, 2025 -
Eval bug: Gemma-3 vision don't work multilingual
#12351 closed
May 5, 2025 -
Feature Request: New sampling method that boosts reasoning performance - looks too good?
#12479 closed
May 5, 2025 -
Feature Request: deep/ recurrent processing like "thinking", but script based.
#12486 closed
May 5, 2025 -
Compile bug: Error build llama cpp on CUDA
#12491 closed
May 5, 2025 -
tts : add support for SparkTTS
#12495 closed
May 5, 2025 -
Feature Request: Add C api for mtmd
#13124 closed
May 4, 2025 -
Eval bug: b5237 broke Llama Scout
#13287 closed
May 4, 2025 -
Eval bug: Heavy nondeterminism in Qwen3 MoE (CUDA)
#13280 closed
May 4, 2025 -
Misc. bug: Buffer offset is not aligned on macOS / Intel / Vulkan
#10984 closed
May 4, 2025 -
Eval bug: Qwen3-30B-A3B-Q4_K_M: Vulkan ~10% slower than AVX2
#13217 closed
May 4, 2025 -
Research: Performance differences between Metal (macOS) and Vulkan (Linux)
#10982 closed
May 4, 2025 -
Bug tracker: (webui/experimental) Python interpreter via pyodide
#11762 closed
May 4, 2025 -
Eval bug: does llama.cpp support Intel AMX instruction? how to enable it
#12003 closed
May 4, 2025 -
Misc. bug: Using `-c -1` results in `n_ctx = 4294967295` or `n_ctx = 8`
#12414 closed
May 3, 2025 -
Eval bug: RK3588 Unexpected inf values cause garbled output(or core dump) in llama-cli
#12458 closed
May 3, 2025 -
Feature Request: Qwen2.5 0.5b OpenCL backend support
#12463 closed
May 3, 2025 -
Eval bug: MiniCPM-2B-128k convert_hf_to_gguf Missing the required key: rope_scaling
#12468 closed
May 3, 2025 -
Eval bug: Cannot convert nomic-embed-code to gguf
#13242 closed
May 2, 2025
125 Issues opened by 106 people
-
Misc. bug: llama-bench improper tensor split
#13972 opened
Jun 2, 2025 -
context shifting should be default option?
#13971 opened
Jun 2, 2025 -
make using shifting context easier.
#13969 opened
Jun 2, 2025 -
Eval bug: Unable to load the model on GPU
#13967 opened
Jun 2, 2025 -
Eval bug: llama.cpp crashes in string comparison when using a reasoning model for long periods of time
#13965 opened
Jun 2, 2025 -
Feature Request: WINA
#13964 opened
Jun 2, 2025 -
Misc. bug: Using draft model with Gemma producing error "get_logits_ith: invalid logits id 0"
#13963 opened
Jun 2, 2025 -
Eval bug: llama-mtmd-cli : option --image failed to load image
#13959 opened
Jun 1, 2025 -
Eval bug: llama-tts abort
#13955 opened
Jun 1, 2025 -
Feature Request: Regarding Hardcoded GGML Tensor Name Length Limit (GGML_MAX_NAME)
#13947 opened
May 31, 2025 -
Feature Request: Generate Image Embeddings with llama.cpp
#13913 opened
May 30, 2025 -
android built on GPU cannot comparable with CPU?
#13910 opened
May 30, 2025 -
Compile bug: nvcc fatal : Unsupported gpu architecture 'compute_'
#13893 opened
May 29, 2025 -
Misc. bug: linux/arm64 does not exist for the server docker image
#13891 opened
May 29, 2025 -
Eval bug: std::runtime_error Invalid diff:
#13876 opened
May 28, 2025 -
Feature Request: Make the `/completion` endpoint in `llama-server` work with multimodal models
#13872 opened
May 28, 2025 -
Misc. bug: Reasoning content is not separated when streaming
#13867 opened
May 28, 2025 -
Automatic optimization of runtime parameters such as -ngl given memory constraints
#13860 opened
May 28, 2025 -
Feature Request: Optimize for Nvidia Jetson Series' truly Unified Memory Architecture
#13856 opened
May 28, 2025 -
Eval bug: Embeddings Always returned as non
#13854 opened
May 28, 2025 -
Feature Request: Set default of --numa to distribute
#13850 opened
May 28, 2025 -
Dequantize function: Row misalignment in dequantized tensors - only first column matches original
#13839 opened
May 28, 2025 -
Eval bug: Llama 4 Scout/Maverick crash when processing images with certain aspect ratio
#13827 opened
May 27, 2025 -
Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server
#13825 opened
May 27, 2025 -
Eval bug: seed seems to be locked to a single value 4294967295
#13823 opened
May 27, 2025 -
Eval bug: Cannot load Qwen3 ranking models
#13820 opened
May 27, 2025 -
ERROR:hf-to-gguf:Model MllamaForConditionalGeneration is not supported
#13805 opened
May 26, 2025 -
ERROR:hf-to-gguf:Model Qwen2_5_VLModel is not supported
#13802 opened
May 26, 2025 -
Compile bug: Vulkan Build Fails in Termux/Proot Due to Missing Cooperative Matrix Shader Variables
#13801 opened
May 26, 2025 -
SYCL fails to initialize unless iGPU is disabled (Intel Arc A770 + i5-9500)
#13775 opened
May 25, 2025 -
Misc. bug: Decreased success rate for tool calling
#13769 opened
May 25, 2025 -
Misc. bug: llama-cli.exe stopped working on Windows Server 10
#13767 opened
May 25, 2025 -
Misc. bug: vulkan prompt processing suddenly slows down once I reach a certain prompt size
#13765 opened
May 25, 2025 -
Misc. bug: segfault in test-gbnf-validator
#13762 opened
May 24, 2025 -
Feature Request: video support in mtmd-cli / server
#13754 opened
May 24, 2025 -
Feature Request: Add keep_alive function for llama-server
#13748 opened
May 24, 2025 -
Feature Request: --swa-extra parameter needed to restore speculative decode function with SWA
#13747 opened
May 24, 2025 -
Misc. bug: RUNPATH properties are not properly set
#13740 opened
May 24, 2025 -
open source dataset for low bit quantization?
#13736 opened
May 24, 2025 -
Eval bug: Server and mtmd both crashing when starting Ultravox
#13727 opened
May 23, 2025 -
Feature Request: (webui) do not throw away message if there is error in stream
#13709 opened
May 22, 2025 -
Eval bug: Server Returns Empty Responses Under High Load
#13703 opened
May 22, 2025 -
Misc. bug: ./llama-server API max_completion_tokens Parameter Not Working
#13700 opened
May 22, 2025 -
Eval bug: Qwen2.5-VL-7B-Instruct returns extremely inaccurate bbox coordinates
#13694 opened
May 21, 2025 -
Eval bug: std::regex to split the text
#13691 opened
May 21, 2025 -
Eval bug: swa_full = true is slower than false
#13683 opened
May 21, 2025 -
Feature Request: Falcon-H1
#13681 opened
May 21, 2025 -
devops/nix: `flake.lock` is very obsolete
#13679 opened
May 21, 2025 -
Misc. bug: AMX is not ready to be used!
#13678 opened
May 21, 2025 -
Eval bug: SYCL branch produces mul_mat bug when trying to run.
#13674 opened
May 21, 2025 -
Eval bug: Output garbled in dual-GPU environment
#13673 opened
May 21, 2025 -
Feature Request: Llama-bench improvement
#13671 opened
May 20, 2025 -
Feature Request: Procedure for reproducing test models
#13662 opened
May 20, 2025 -
Eval bug: Not splitting model across rows correctly
#13661 opened
May 20, 2025 -
Compile bug: GPU Detection Fails during cmake --build
#13636 opened
May 19, 2025 -
Feature Request: Support for Qwen with Parallel Scaling
#13632 opened
May 19, 2025 -
can't quant llama3 with expanded tokenizer
#13628 opened
May 19, 2025 -
webui: First user prompt sometimes disappears after sending
#13622 opened
May 18, 2025 -
Misc. bug: batch in the mtmd-cli.cpp not freed
#13620 opened
May 18, 2025 -
Feature Request: update readme for ideal MOE tensor override calculation
#13616 opened
May 18, 2025 -
Compile bug: tools build failing
#13614 opened
May 18, 2025 -
llama_model_load: error loading model: error loading model vocabulary: std::bad_cast
#13613 opened
May 18, 2025 -
Misc. bug: --split-mode none ≠ --tensor-split 100,0,0 (all layers on GPU0)
#13612 opened
May 18, 2025 -
Eval bug: A100 GPU not working with CUDA 12.8 in llama.cpp
#13609 opened
May 17, 2025 -
Misc. bug: logit-bias doesn't seem to work
#13605 opened
May 17, 2025 -
Eval bug:GGUF Conversion from LLaVA 1.6(LLaVA NeXT) doesn't work
#13593 opened
May 16, 2025 -
10000
Feature Request: dynamic number of experts (hyperparam per request)
#13572 opened
May 15, 2025 -
Feature Request: Save Model Name in Conversation Chats (WebUI)
#13570 opened
May 15, 2025 -
something with llama_server? slow vs llama_cli
#13560 opened
May 15, 2025 -
Misc. bug: missing messages in JSON export via llama-server web UI
#13552 opened
May 14, 2025 -
Misc. bug: Potential out of bound in rerank
#13549 opened
May 14, 2025 -
Misc. bug: -sm row results in gibberish output on HIP (ROCm 6.3.3)
#13545 opened
May 14, 2025 -
tutorials : list for llama.cpp
#13523 opened
May 14, 2025 -
Research: How to integrate VITA 1.5 for multi-modal GGUF deployment?
#13520 opened
May 14, 2025 -
Feature Request: Apple just release Fast-VLM, a very promising set of multimodal language models
#13512 opened
May 13, 2025 -
Misc. bug: llama-cli stopped starting in release b4191 (c9b00a7)
#13498 opened
May 13, 2025 -
kv-cache : improve defrag logic
#13497 opened
May 13, 2025 -
Eval bug: BGE-M3 Embedding model is not accessible
#13494 opened
May 13, 2025 -
Eval bug: I just finetuned gpt2 model with lora and save it to gguf file but not properly worked
#13489 opened
May 12, 2025 -
Partial offload support for training
#13486 opened
May 12, 2025 -
LoRA training example
#13485 opened
May 12, 2025 -
web UI either doesn't scroll or jumps to the wrong element
#13479 opened
May 12, 2025 -
Eval bug: I cannot run llama 405b on CPU
#13475 opened
May 12, 2025 -
Why mul_mat in ggml slower than llama.cpp?
#13473 opened
May 12, 2025 -
How to start gemma3 multimodal model service using llama_server
#13465 opened
May 12, 2025 -
Feature Request: add draft model in llama-bench and more.
#13456 opened
May 11, 2025 -
Eval bug: llama-mtmd-cli doesn't support system prompts
#13454 opened
May 11, 2025 -
Misc. bug: Illegal CUDA memory access in ggml_backend_cuda_cpy_tensor_async
#13449 opened
May 11, 2025 -
Drop support for sentencepiece
#13448 opened
May 11, 2025 -
Compile bug: ld returned 1 exit status (file bigger than 2gb)
#13446 opened
May 11, 2025 -
Eval bug: llama-speculative core dump with Qwen3, GGML_ASSERT(batch.n_tokens > 0) failed
#13433 opened
May 10, 2025 -
Misc. bug: The web UI of llama-server is not displaying correctly.
#13428 opened
May 10, 2025 -
Eval bug: Qwen3-30B-A3B-Q4_K_M: Slows down when using the \no_think mode.
#13427 opened
May 10, 2025 -
Differential mode for llama-bench + plotting code
#13408 opened
May 9, 2025 -
Eval bug: llama-cli, Qwen3 jinja template will break CLI multiturn conversation
#13404 opened
May 9, 2025 -
Eval bug: llama-cli, spurious token added to assistant response
#13402 opened
May 9, 2025 -
Misc. bug: Model not loaded on Android with NDK
#13399 opened
May 9, 2025 -
Misc. bug: invalid regex grammar causes segment violation
#13390 opened
May 8, 2025 -
Compile bug: ninja: build stopped: subcommand failed.
#13375 opened
May 8, 2025 -
Token Generation Speed Decline with GGUF Models on M3 Ultra
#13373 opened
May 8, 2025 -
(Discussion) Improve usability of llama-server
#13367 opened
May 7, 2025 -
Misc. bug: Extended swap/unswap times when loading large models on Apple Silicon
#13361 opened
May 7, 2025 -
Compile bug: clang-18.1.3 compile fail (vsetivli)
#13358 opened
May 7, 2025 -
Misc. bug: error in remote conversion for the new ServiceNow Nemotron 15B model
#13354 opened
May 7, 2025 -
Eval bug: Regex
#13347 opened
May 7, 2025 -
Eval bug: Custom model error.
#13318 opened
May 5, 2025 -
Feature Request: tensor split needs control over where CPU layers go
#13314 opened
May 5, 2025 -
Misc. bug: Compilation with openCL on latest build
#13300 opened
May 4, 2025 -
Eval bug: Can't run Qwen3-32B Q4_K_XL
#13298 opened
May 4, 2025 -
Misc. bug: -TS doesn't support more than ? Devices
#13293 opened
May 4, 2025 -
Compile bug: paths with spaces fail on Unix with Vulkan backend
#13288 opened
May 3, 2025 -
Misc. bug: Completions hang after CUDA error, but health endpoint reports all OK
#13281 opened
May 3, 2025 -
Misc. bug: llama-server webui overriding command line parameters
#13277 opened
May 3, 2025 -
Feature Request: Granite 4 Support
#13275 opened
May 2, 2025 -
Feature Request: add per-request "reasoning" options in llama-server
#13272 opened
May 2, 2025
133 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client
#13196 commented on
May 31, 2025 • 14 new comments -
[CANN] Simplify the environment variable setting for GGML_CANN_MEM_POOL and GGML_CANN_ASYNC_MODE
#13104 commented on
May 23, 2025 • 10 new comments -
llama : try loading tensors with pre-computed hashes
#13106 commented on
May 25, 2025 • 8 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method (TMAC)
#13206 commented on
May 28, 2025 • 7 new comments -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method
#10181 commented on
May 15, 2025 • 4 new comments -
Support start strings, the opposite of stop tokens.
#13214 commented on
May 3, 2025 • 3 new comments -
(draft) tts: Orpheus support
#12487 commented on
May 18, 2025 • 2 new comments -
Feature Request: Installable package via winget
#8188 commented on
May 29, 2025 • 0 new comments -
Eval bug: Persistent <think> Tags in Qwen3-32B Output Despite enable_thinking: False and --reasoning-format none in llama.cpp
#13189 commented on
May 30, 2025 • 0 new comments -
Misc. bug: llama-parallel segmentation fault
#13172 commented on
May 30, 2025 • 0 new comments -
Eval bug: Unreadable output when using qwen2-vl model.
#13165 commented on
May 30, 2025 • 0 new comments -
Misc. bug: Shared libraries don't properly contain /common/ functions
#13156 commented on
May 30, 2025 • 0 new comments -
Compile bug: Vulkan Cross compile for arm64
#13068 commented on
May 30, 2025 • 0 new comments -
Eval bug: Can't utilize all 16 threads / 8 CPU cores for prompt processing when using llama-server. works fine with llama-cli
#13197 commented on
May 31, 2025 • 0 new comments -
Eval bug: SIGILL
#13161 commented on
May 31, 2025 • 0 new comments -
Misc. bug: xcframework does not contain support for Catalyst
#12751 commented on
May 31, 2025 • 0 new comments -
Misc. bug: terminate called after throwing an instance of 'vk::DeviceLostError'
#13248 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: -sm row causes GGML_ASSERT fail in Llama 4 Scout
#13240 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: llama-server stays in unresponsive state- CUDA error: out of memory -
#13085 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: (clip.cpp) q8_0 mmproj is broken on gemma 3
#13025 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: convert_hf_to_gguf.py: ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.SCB'
#12923 commented on
Jun 1, 2025 • 0 new comments -
Feature Request: (webui) Implement a experimental features on webui
#11662 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions.
#12816 commented on
May 29, 2025 • 0 new comments -
Feature request: Graphical GGUF viewer
#6715 commented on
May 29, 2025 • 0 new comments -
Misc. bug: Flash Attention not working on CDNA3 ROCm 6.4 MI300
#13145 commented on
May 29, 2025 • 0 new comments -
Feature Reequest: Multi model cli tools: Add a possibility to specify a image in conversation mode plus tab auto completion for path
#12983 commented on
May 27, 2025 • 0 new comments -
Feature Request: (webui) add import / export function for ALL conversations
#11718 commented on
May 27, 2025 • 0 new comments -
Suport for Jamba JambaForCausalLM
#6372 commented on
May 27, 2025 • 0 new comments -
Support Hybrid Models
#12331 commented on
May 27, 2025 • 0 new comments -
Feature Request: Add support of convert.py for model Qwen2.5-Omni-7B
#12641 commented on
May 26, 2025 • 0 new comments -
Feature Request: Qwen2.5-Omni
#12673 commented on
May 26, 2025 • 0 new comments -
Misc. bug: [SERVER] Multiple slots, generation speed is degraded after each generation/slot used
#10860 commented on
May 26, 2025 • 0 new comments -
Compile bug: Linux with CUDA 12.6
#11696 commented on
May 26, 2025 • 0 new comments -
Enhancement: Improve ROCm performance on various quants (benchmarks included)
#11931 commented on
May 26, 2025 • 0 new comments -
Eval bug: CPU usage is abnormal when running deepseek-r1-671B-Q4_0 weights in Atlas 800T a2 and NPU device。
#11966 commented on
May 26, 2025 • 0 new comments -
Eval bug: Loading fail on Gemma 3:12b > llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
#12367 commented on
May 26, 2025 • 0 new comments -
Move gguf fuzzers to the llama.cpp repository
#11514 commented on
May 13, 2025 • 0 new comments -
llama : add Xiaomi Mimo (with proper MTP - multi token predict)
#13236 commented on
May 4, 2025 • 0 new comments -
Fix ChatGLMModel for glm-4-9b cannot find tokenizer merges in model file
#13058 commented on
May 6, 2025 • 0 new comments -
quantize: Handle user-defined pruning of whole layers (blocks)
#13037 commented on
May 25, 2025 • 0 new comments -
gguf-py: byteswapping improvements
#12851 commented on
May 27, 2025 • 0 new comments -
convert : write tensors in parallel
#12837 commented on
Jun 2, 2025 • 0 new comments -
Support for OuteTTS 1.0
#12794 commented on
May 20, 2025 • 0 new comments -
Update llama-quant.cpp llama_tensor_get_type with DeepSeek friendly modifications
#12727 commented on
May 29, 2025 • 0 new comments -
imatrix: add option to display importance score statistics for a given imatrix file
#12718 commented on
May 26, 2025 • 0 new comments -
WIP: Add support for CogAgent
#12679 commented on
May 29, 2025 • 0 new comments -
tts : implement sesame CSM + Mimi decoder
#12648 commented on
May 16, 2025 • 0 new comments -
PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp
#12326 commented on
May 23, 2025 • 0 new comments -
vulkan: optimization proposals for coopmat1 mul_mm
#12260 commented on
May 10, 2025 • 0 new comments -
[WIP]backend: Integrating QNN (Qualcomm AI Engine Direct) as a dedicated backend for Qualcomm NPUs
#12063 commented on
May 27, 2025 • 0 new comments -
Supporting Velvet model
#11716 commented on
May 16, 2025 • 0 new comments -
Allow user to compile with any cuda version using github actions
#10928 commented on
May 23, 2025 • 0 new comments -
Introduce Graph Profiler
#9659 commented on
May 15, 2025 • 0 new comments -
[Draft] Tensor Parallel support to llama.cpp
#9648 commented on
May 31, 2025 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
May 30, 2025 • 0 new comments -
ggml: avoid rebuild of GGML graph for each token (#7456)
#8366 commented on
Jun 1, 2025 • 0 new comments -
`json`: unified properties order across optional & required
#8133 commented on
May 26, 2025 • 0 new comments -
Add PaliGemma Support
#7553 commented on
Jun 1, 2025 • 0 new comments -
Llama cpp low level python bindings
#1660 commented on
Jun 1, 2025 • 0 new comments -
Eval bug: microsoft/bitnet-b1.58-2B-4T-gguf
#12997 commented on
Jun 2, 2025 • 0 new comments -
Feature Request: s390x CI
#13243 commented on
Jun 2, 2025 • 0 new comments -
Slow token generation speed of Gemma 3 QAT Models
#13048 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: OpenCL: Issue with Adreno 610
#13115 commented on
Jun 2, 2025 • 0 new comments -
Eval bug: sentencepiece tokenizer generates incorrect tokens
#13256 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: the output file of llama-quantize is not gguf format
#13258 commented on
Jun 2, 2025 • 0 new comments -
Misc. bug: Server does not always cancel requests for disconnected connections
#13262 commented on
Jun 2, 2025 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Jun 1, 2025 • 0 new comments -
Misc. bug: llama-quantize clobbers input file + crashes when output file matches
#12753 commented on
May 14, 2025 • 0 new comments -
Feature Request: resize an existing context
#11577 commented on
May 15, 2025 • 0 new comments -
Feature Proposal: Server Model Switching at Runtime
#13027 commented on
May 15, 2025 • 0 new comments -
Eval bug: Weight repacking for AVX2 block interleaving is very slow and NUMA unfriendly
#12759 commented on
May 16, 2025 • 0 new comments -
Eval bug: multimodal llama-gemma3-cli gives nonsensical outputs when used with Vulkan
#13046 commented on
May 16, 2025 • 0 new comments -
Feature Request: Interleaved sliding window attention support for gemma 2 and 3
#12637 commented on
May 16, 2025 • 0 new comments -
Misc. bug: ROCm images cannot be found
#11913 commented on
May 17, 2025 • 0 new comments -
llama : add CLI assistant
#10688 commented on
May 19, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B Q4_0 failed to run
#13168 commented on
May 19, 2025 • 0 new comments -
llama : combined beam search + grammar sampling strategy
#2923 commented on
May 19, 2025 • 0 new comments -
Eval bug: RWKV inference issue with llama-server
#13018 commented on
May 20, 2025 • 0 new comments -
Eval bug: OpenAI incompatible image handling in server multimodal
#12947 commented on
May 20, 2025 • 0 new comments -
Eval bug: repeated output for llama-server
#12782 commented on
May 20, 2025 • 0 new comments -
changelog : `libllama` API
#9289 commented on
May 20, 2025 • 0 new comments -
Perplexity script for non GGUF quantization
#13015 commented on
May 21, 2025 • 0 new comments -
Eval bug: LLaVa convert_image_encoder_to_gguf.py fails to byteswap v.head.ffn_up.bias tensor on Big-Endian system
#12863 commented on
May 23, 2025 • 0 new comments -
Misc. bug: CUDA error: device kernel image is invalid (Quadro RTX 8000)
#12717 commented on
May 3, 2025 • 0 new comments -
KV cache bug: llama-speculative and llama-server choose different kv cache quantization when cache quantization specified
#11200 commented on
May 3, 2025 • 0 new comments -
Feature Request: Support multimodal LLMs such as Qwen2.5-VL as embedding models
#13247 commented on
May 3, 2025 • 0 new comments -
Compile bug: Canot convert from char8_t to char* in llama-chat.cpp
#12740 commented on
May 5, 2025 • 0 new comments -
Feature Request: Tensor paralellism (--split-mode row) over rpc
#13083 commented on
May 5, 2025 • 0 new comments -
Eval bug: Qwen3 30B A3B is slow with CUDA
#13211 commented on
May 5, 2025 • 0 new comments -
bug: ValueError: Architecture qwen3 not supported
#13157 commented on
May 6, 2025 • 0 new comments -
Eval bug: IQ2_M broken for mradermacher / Llama-4-Maverick-17B-128E-Instruct-GGUF
#12913 commented on
May 7, 2025 • 0 new comments -
Compile bug: Emulated Linux ARM64 CPU build fails
#10933 commented on
May 7, 2025 • 0 new comments -
Feature Request: Add Support for ModernBert
#11282 commented on
May 7, 2025 • 0 new comments -
[Tracker] Docker build fails on CI for arm64
#11888 commented on
May 7, 2025 • 0 new comments -
examples : add configuration presets
#10932 commented on
May 7, 2025 • 0 new comments -
Feature Request: (webui) read data from /props endpoint and use it on the webui
#11717 commented on
May 8, 2025 • 0 new comments -
Feature Request: allow setting jinja chat template from server webui
#11689 commented on
May 8, 2025 • 0 new comments -
Misc. bug: Qwen 3.0 "enable_thinking" parameter not working
#13160 commented on
May 8, 2025 • 0 new comments -
Feature Request: Qwen 2.5 VL
#11483 commented on
May 12, 2025 • 0 new comments -
Feature Request: XiaomiMiMo/MiMo-7B-RL
#13218 commented on
May 13, 2025 • 0 new comments -
Compile bug: There was a errror while compiling support for the backend Vulkan
#12619 commented on
May 23, 2025 • 0 new comments -
Feature Request: Add support for Kokoro TTS
#11050 commented on
May 23, 2025 • 0 new comments -
Refactor: (clip.cpp) identify and regroup pre-processing strategies
#13077 commented on
May 24, 2025 • 0 new comments -
Model Repeats Nonsensical Output
#13066 commented on
May 24, 2025 • 0 new comments -
Misc. bug: The KV cache is sometimes truncated incorrectly when making v1/chat/completions API calls
#11970 commented on
May 24, 2025 • 0 new comments -
Misc. bug: Retrieval sample not decoding token successfully
#13102 commented on
May 25, 2025 • 0 new comments -
Compile bug: common.cuh(3): fatal error c1083 cannot open include file: "ggml.h" : No such file or directory
#13073 commented on
May 25, 2025 • 0 new comments -
Misc. bug: llama-cli (vulkan backend) output gibberish with old vulkan sdk
#13044 commented on
May 25, 2025 • 0 new comments -
Misc. bug: Vulkan performance depends on thread priority
#12976 commented on
May 25, 2025 • 0 new comments -
Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache.
#12352 commented on
May 25, 2025 • 0 new comments -
Eval bug: EXAONE fails to run with quantized KV cache
#13121 commented on
May 26, 2025 • 0 new comments -
Feature Request: Kimi-Audio-7B
#13114 commented on
May 26, 2025 • 0 new comments -
Feature Request: define key bindings for quick deletion of the previous conversation.
#13111 commented on
May 26, 2025 • 0 new comments -
Eval: HIP: Llama-server multi-instance lockup
#13100 commented on
May 26, 2025 • 0 new comments -
Eval bug: Flash Attention not working with NVIDIA GeForce RTX 4060 Ti
#13092 commented on
May 26, 2025 • 0 new comments -
Feature Request: Add kv-quant fa kernel variants for head sizes other than 128
#12989 commented on
May 26, 2025 • 0 new comments -
Compile bug: llama.cpp-master/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:80:54:error '_mm256_set_m128i' was not declared in this scope
#11385 commented on
May 21, 2025 • 0 new comments -
Feature Request: Support Jina V3 arch
#9585 commented on
May 21, 2025 • 0 new comments -
Error while converting peft finetuned merged model to gguf
#12494 commented on
May 21, 2025 • 0 new comments -
Feature Request: Mapping model name to LoRA config
#11031 commented on
May 21, 2025 • 0 new comments -
changelog : `llama-server` REST API
#9291 commented on
May 21, 2025 • 0 new comments -
Misc. bug: Inconsistent Vulkan segfault
#10528 commented on
May 21, 2025 • 0 new comments -
Eval bug: Error when load `bge-reranker-v2-gemma` model
#13041 commented on
May 22, 2025 • 0 new comments -
Feature Request: Ability to pack multiple GGUFs into single one
#13028 commented on
May 22, 2025 • 0 new comments -
[Build] Some Build Options/Definitions seems Missing in ggml-base
#13017 commented on
May 22, 2025 • 0 new comments -
Compile bug: Prooted Debian in Droid Termux only
#12452 commented on
May 22, 2025 • 0 new comments -
Feature Request: support for image input in llama-server (and web ui)
#12792 commented on
May 22, 2025 • 0 new comments -
Feature Request: add jina embeddings model availible convert to gguf
#12327 commented on
May 22, 2025 • 0 new comments -
Eval bug: AttributeError: Moonlight-16B-A3B-Instruct - TikTokenTokenizer has no attribute vocab
#13072 commented on
May 23, 2025 • 0 new comments -
Doc. bug: docs/multimodal/gemma3.md need to be updated
#13064 commented on
May 23, 2025 • 0 new comments -
Compile bug: NVIDIA A800-SXM4-40GB ggml_cuda_init failed
#13059 commented on
May 23, 2025 • 0 new comments -
Eval bug: Vulkan: "Requested buffer size exceeds device memory allocation limit" even with `-ngl 0` when trying to run very large models
#13024 commented on
May 23, 2025 • 0 new comments -
Feature Request: Improve model load time when using the RPC backend
#12954 commented on
May 23, 2025 • 0 new comments