8000 [pull] master from ggml-org:master by pull[bot] · Pull Request #475 · rpatil524/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

[pull] master from ggml-org:master #475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 314 commits into from
Jul 15, 2025
Merged
Changes from 1 commit
Commits
Show all changes
314 commits
Select commit Hold shift + click to select a range
ed52f36
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)
ShanoToni Jun 12, 2025
c33fe8b
vocab : prevent heap overflow when vocab is too small (#14145)
ggerganov Jun 13, 2025
09cf2c7
cmake : Improve build-info.cpp generation (#14156)
ckastner Jun 13, 2025
c61285e
SYCL: Bump oneMath commit (#14152)
EwanC Jun 13, 2025
0889eba
sycl: Adding additional cpy dbg print output (#14034)
ShanoToni Jun 13, 2025
ffad043
server : fix SWA condition for full context reprocess (#14163)
ggerganov Jun 13, 2025
d714dad
pooling : make cls_b and cls_out_b optional (#14165)
huydt84 Jun 13, 2025
cc8d081
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)
ckastner Jun 13, 2025
b7cc774
readme : remove survey link (#14168)
ggerganov Jun 13, 2025
60c6663
batch : rework llama_batch_allocr (#14153)
ggerganov Jun 13, 2025
26ff368
docs : Update multimodal.md (#14122)
ddpasa Jun 13, 2025
80709b7
batch : add LLAMA_BATCH_DEBUG environment variable (#14172)
ggerganov Jun 13, 2025
3cfbbdb
Merge commit from fork
GuyGoldenberg Jun 13, 2025
40643ed
sycl: fix docker image (#14144)
sgeor255 Jun 13, 2025
fb85a28
vocab : fix build (#14175)
ggerganov Jun 13, 2025
2e42be4
compare-llama-bench: add option to plot (#14169)
am17an Jun 14, 2025
3cb203c
llama-chat : Do not throw when tool parsing fails (#14012)
p1-0tr Jun 14, 2025
00ba772
docs : remove WIP since PR has been merged (#13912)
pepijndevos Jun 15, 2025
b9912ac
batch : auto-gen positions + verify multi-sequence input (#14177)
ggerganov Jun 15, 2025
c311ac6
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)
ggerganov Jun 15, 2025
9ae4143
model : add dots.llm1 architecture support (#14044) (#14118)
Noeda Jun 15, 2025
5fce5f9
kv-cache : fix use-after-move of defrag info (#14189)
ggerganov Jun 15, 2025
2c2caa4
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRON…
IMbackK Jun 15, 2025
e54b394
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196)
IMbackK Jun 15, 2025
30e5b01
quantize : change int to unsigned int for KV overrides (#14197)
EAddario Jun 15, 2025
cd355ed
server : When listening on a unix domain socket don't print http:// a…
ericcurtin Jun 15, 2025
d7da8dc
model : Add support for Arcee AI's upcoming AFM model (#14185)
bartowski1182 Jun 15, 2025
3555b30
ggml-cpu : rework weak alias on apple targets (#14146)
xctan Jun 16, 2025
c89c2d1
vulkan: mutex around vkQueueSubmit (#14127)
jeffbolznv Jun 16, 2025
4ad2436
gguf-py : allow key override when adding value to GGUFWriter (#14194)
huydt84 Jun 16, 2025
0bf49eb
convert : remove arcee change in convert_hf_to_gguf_update.py (#14207)
bartowski1182 Jun 16, 2025
3ba0d84
ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)
chaxu01 Jun 16, 2025
d3e64b9
llama : rework embeddings logic (#14208)
ggerganov Jun 16, 2025
7d6d91b
HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202)
IMbackK Jun 16, 2025
ad590be
model : add NeoBERT (#14164)
huydt84 Jun 16, 2025
0dbcabd
cmake: clean up external project logic for vulkan-shaders-gen (#14179)
bandoti Jun 16, 2025
6adc3c3
llama : add thread safety test (#14035)
slaren Jun 16, 2025
89fea80
server : fix incorrect usage of llama_get_embeddings() (#14225)
ggerganov Jun 16, 2025
e434e69
common : suggest --jinja when autodetection fails (#14222)
CISC Jun 16, 2025
fe9d60e
musa: fix build warning (unused variable) (#14231)
yeahdongcn Jun 17, 2025
860a9e4
ggml-cpu : remove the weak alias trick (#14221)
xctan Jun 17, 2025
c465030
cmake: remove shader-gen step-targets from ggml-vulkan (#14226)
bandoti Jun 17, 2025
c2056ed
examples : include examples in msvc disable warn (ggml/1270)
danbev Jun 12, 2025
bbe98d2
ggml : remove unused ggml_context_container (ggml/1272)
danbev Jun 13, 2025
dd8e59f
ggml : disable warnings for tests when using MSVC (ggml/1273)
danbev Jun 13, 2025
d03172c
sync : ggml
ggerganov Jun 18, 2025
3865cff
convert : fix null head_dim AutoConfig regression (#14248)
CISC Jun 18, 2025
9540255
llama-chat : fix multiple system message for gemma, orion (#14246)
ngxson Jun 18, 2025
413977d
mtmd : refactor llava-uhd preprocessing logic (#14247)
ngxson Jun 18, 2025
ef03580
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258)
chaxu01 Jun 18, 2025
6231c5c
ggml-cpu: fix uncaught underscore terminators (#14023)
taronaeo Jun 18, 2025
50d2227
ggml-cpu: reduce asm calls for hsum (#14037)
taronaeo Jun 18, 2025
8d94713
docs: add s390x build documentation (#14264)
taronaeo Jun 18, 2025
ed3290a
metal : add mean kernel (#14267)
ggerganov Jun 19, 2025
edc4a29
memory : Hybrid recurrent cache (#13979)
gabe-l-hart Jun 19, 2025
10bb545
Vulkan: Set device max size for host memory to avoid OOM warning and …
0cc4m Jun 19, 2025
faed5a5
llamafile : support s390x SIMD instruction set (#14273)
taronaeo Jun 19, 2025
5fc7856
convert : fix remote option in Windows (#14100)
pqnet Jun 19, 2025
fffcce5
llama-bench : add --no-warmup flag (#14224) (#14270)
s2010 Jun 19, 2025
600e3e9
sycl: Cleanup codepaths in Get Rows in sycl backend (#14215)
ShanoToni Jun 19, 2025
456af35
build : suppress gcc15 compile warnings (#14261)
fanyang89 Jun 19, 2025
d67341d
server : add server parameters for draft model cache type (#13782)
aa956 Jun 19, 2025
381174b
gguf-py : make sentencepiece optional (#14200)
Ahajha Jun 19, 2025
8f71d0f
ggml-cpu : remove unnecesary arm feature detection (#14281)
slaren Jun 19, 2025
9eaa51e
CUDA: add conv_2d_dw (#14265)
am17an Jun 20, 2025
4c9fdfb
ubatch : new splitting logic (#14217)
ggerganov Jun 20, 2025
812939a
model : more uniform output id handling (#14275)
ggerganov Jun 20, 2025
9230dbe
ggml: Update KleidiAI to v1.9.0 (#14277)
chaxu01 Jun 20, 2025
d27b3ca
ggml : fix repack work size for mul_mat_id (#14292)
ggerganov Jun 20, 2025
e28c1b9
cuda : synchronize graph capture and cublas handle destruction (#14288)
slaren Jun 20, 2025
88fc854
llama : improve sep token handling (#14272)
CISC Jun 20, 2025
6369be0
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)
ckastner Jun 20, 2025
8308f98
sycl: add usage of enqueue_functions extension (#14244)
s-Nick Jun 20, 2025
dd6e6d0
vocab : prevent tokenizer overflow (#14301)
retr0reg Jun 20, 2025
22015b2
lint : remove trailing whitepace (#14304)
CISC Jun 20, 2025
c959f46
CUDA: add conv_2d_transpose (#14287)
am17an Jun 20, 2025
d860dd9
docs : fix the link to llama.h (#14293)
david20571015 Jun 20, 2025
b714767
Add `ggml_roll` (ggml/1274)
Acly Jun 18, 2025
06cbedf
sync : ggml
ggerganov Jun 20, 2025
b23fa0b
convert : fix Llama 4 conversion (#14311)
danielhanchen Jun 21, 2025
692e3cd
memory : rename interface to llama_memory_context_i (#14296)
ggerganov Jun 21, 2025
67ae531
metal : fix thread-safety (#14300)
ggerganov Jun 21, 2025
58cba76
gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312)
CISC Jun 21, 2025
bb16041
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (…
mtavenrath Jun 21, 2025
aa0ef5c
gguf-py : fix Qwen3-Embedding eos token (#14314)
CISC Jun 21, 2025
aa064b2
CUDA: add mean operation (#14313)
am17an Jun 22, 2025
40bfa04
common : use std::string_view now that we target c++17 (#14319)
CISC Jun 22, 2025
5d5c066
mtmd : fix Pixtral OOM with large images by capping image_size to 102…
yuiseki Jun 22, 2025
af3373f
HIP: enable vec fattn on RDNA4 (#14323)
IMbackK Jun 22, 2025
f1f5e82
examples : fix is_first logic for tokenization (#14329)
ggerganov Jun 22, 2025
66aba7a
run : avoid double tokenization (#14327)
retr0reg Jun 22, 2025
238005c
gguf-py : fix SpecialVocab parsing when post_processor is null (#14330)
CISC Jun 22, 2025
fa4a9f2
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
EAddario Jun 22, 2025
3a9457d
vulkan: update windows SDK in CI (#14334)
jeffbolznv Jun 23, 2025
7b50d58
kv-cells : fix tracking of seq_pos (#14339)
ggerganov Jun 23, 2025
defe215
CUDA: mul_mat_v support for batch sizes > 1 (#14262)
JohannesGaessler Jun 23, 2025
72c6bc3
llama : better rwkv chat template and add missing `inputs.use_jinja` …
MollySophia Jun 23, 2025
bf2a99e
vulkan: update windows SDK in release.yml (#14344)
jeffbolznv Jun 23, 2025
ce82bd0
ci: add workflow for relocatable cmake package (#14346)
bandoti Jun 23, 2025
0142961
CUDA/HIP: optimize mmv paths taken for HIP devices (#14324)
IMbackK Jun 23, 2025
901e20b
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja (#14349)
bartowski1182 Jun 24, 2025
abf2410
main : honor --verbose-prompt on interactive prompts (#14350)
CISC Jun 24, 2025
1b809ce
server : move no API key doc to /health (#14352)
pnb Jun 24, 2025
c148cf1
cmake : use LLAMA_BUILD_NUMBER when defining LLAMA_INSTALL_VERSION (#…
mbaudier Jun 24, 2025
62af464
batch : fix check for empty sequences in memory (#14364)
ggerganov Jun 24, 2025
73e53dc
opencl: ref count `ggml_backend_opencl_context` and refactor profilin…
lhez Jun 24, 2025
2bf9d53
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices (#13973)
ShanoToni Jun 25, 2025
b193d53
ggml : do not output unprintable characters on GGUF load failure (#14…
CISC Jun 25, 2025
60ef23d
ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317)
taronaeo Jun 25, 2025
716301d
musa: enable fp16 mma (all) and cublas on qy2 (#13842)
yeahdongcn Jun 26, 2025
bf5bcd0
docs: update s390x documentation + add faq (#14389)
taronaeo Jun 26, 2025
5783ae4
metal : batch rows copy in a single threadgroup (#14384)
ggerganov Jun 26, 2025
e8215db
metal : add special-case mat-vec mul for ne00 == 4 (#14385)
ggerganov Jun 26, 2025
b253462
llama : return mistral-v7-tekken as default template only (#14390)
CISC Jun 26, 2025
a01047b
cmake: regen vulkan shaders when shaders-gen sources change (#14398)
bandoti Jun 26, 2025
8846aac
model : gemma3n text-only (#14400)
ngxson Jun 26, 2025
f667f1e
convert : fix broken sentencepiece vocab (#14416)
CISC Jun 27, 2025
8d94219
ggml : add ggml_set_rows (#14274)
rgerganov Jun 27, 2025
4367806
recurrent : call balloc split_reset() in init_batch() (#14414)
ggerganov Jun 27, 2025
72babea
graph : make llm_graph_context destructor virtual (#14410)
ggerganov Jun 27, 2025
ceb1bf5
vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (#14427)
jeffbolznv Jun 28, 2025
6609507
ci : fix windows build and release (#14431)
CISC Jun 28, 2025
b25e927
fix async_mode bug (#14432)
bachelor-dou Jun 28, 2025
566c16f
model : add support for ERNIE 4.5 0.3B model (#14408)
ownia Jun 28, 2025
00d5282
vulkan: lock accesses of pinned_memory vector (#14333)
jeffbolznv Jun 28, 2025
63a7bb3
vulkan: handle noncontig in the final case of ggml_vk_get_cpy_pipelin…
jeffbolznv Jun 28, 2025
27208bf
CUDA: add bf16 and f32 support to cublas_mul_mat_batched (#14361)
am17an Jun 28, 2025
bd9c981
vulkan: Add fusion support for RMS_NORM+MUL (#14366)
jeffbolznv Jun 29, 2025
a0535ff
ggml : implement REGLU/GEGLU/SWIGLU ops (#14158)
CISC Jun 29, 2025
a5d1fb6
ggml : fix unmerged GGML_FPxx_TO_FPxx refactoring (#14443)
CISC Jun 29, 2025
f47c1d7
SYCL: disable faulty fp16 exp kernel (#14395)
qnixsynapse Jun 29, 2025
83790b0
server : fix appearance of the chats list context menu for Safari (#1…
rntk Jun 29, 2025
caf5681
server : support jinja extra template kwargs (Qwen3 enable_thinking f…
matteoserva Jun 29, 2025
e9b6350
scripts : make the shell scripts cross-platform (#14341)
vedranmiletic Jun 30, 2025
c839a2d
cmake : Remove redundant include path in CMakeLists.txt (#14452)
xiaobing318 Jun 30, 2025
eb3fa29
test-backend-ops : disable llama test (#14461)
slaren Jun 30, 2025
a7417f5
ggml-cpu: sycl: Re-enable exp f16 (#14462)
Rbiessy Jun 30, 2025
5dd942d
metal : disable fast-math for some cpy kernels (#14460)
ggerganov Jun 30, 2025
745f11f
memory : correctly handle failure in apply() (#14438)
ggerganov Jun 30, 2025
0a5a3b5
Add Conv2d for CPU (#14388)
am17an Jun 30, 2025
79b33b2
opencl : add GEGLU, REGLU, SWIGLU (#14456)
lhez Jul 1, 2025
497be7c
ggml-quants : rename best_mad to best_error (ggml/1283)
danbev Jun 24, 2025
431b2c2
ggml-cpu : "align corners" for bilinear upscale/downscale (ggml/1285)
Acly Jul 1, 2025
f61c05d
sync : ggml
ggerganov Jul 1, 2025
a6a4795
ggml : remove trailing whitespace (#0)
ggerganov Jul 1, 2025
eff5e45
add GELU_ERF (#14455)
CISC Jul 1, 2025
6a746cf
vulkan: Split large mul_mat_id to fit in shared memory (#14451)
jeffbolznv Jul 1, 2025
343b6e9
CANN: update aclnnGroupedMatmulV2 to aclnnGroupedMatmulV3 (#14411)
noemotiovon Jul 1, 2025
1b2aaf2
Add Vulkan images to docker.md (#14472)
xek Jul 1, 2025
de56944
ci : disable fast-math for Metal GHA CI (#14478)
ggerganov Jul 1, 2025
68b3cd6
ggml : Callback before abort (#14481)
ScaledLizard Jul 2, 2025
85841e1
github : add OpenCL backend to issue templates (#14492)
EZForever Jul 2, 2025
611ba4b
ci : add OpenCL to labeler workflow (#14496)
CISC Jul 2, 2025
603e43d
opencl : update upscale to support align corners (#14488)
lhez Jul 2, 2025
c8a4e47
opencl : skip empty nodes on cgraph compute (#14491)
EZForever Jul 2, 2025
d7f5f4e
simple-chat : fix context-exceeded condition (#14494)
ggerganov Jul 2, 2025
307e79d
opencl : fix possible buffer overflow in dump_tensor (#14490)
jeffzhou2000 Jul 2, 2025
ec68e84
ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435)
ggerganov Jun 27, 2025
8875523
vulkan: support softmax/FA batch and broadcast (#14449)
jeffbolznv Jul 1, 2025
12a81af
CUDA: broadcasting for FlashAttention mask (#14500)
JohannesGaessler Jul 2, 2025
55a1c5a
CUDA: add softmax broadcast (#14475)
am17an Jul 2, 2025
f3ed38d
Set RPATH to "@loader_path" / "$ORIGIN" to ensure executables and dyn…
rotemdan Jul 2, 2025
c46944a
ggml : add version function to get lib version (ggml/1286)
danbev Jul 2, 2025
e17991c
sync : ggml
ggerganov Jul 2, 2025
5d46bab
llama : initial Mamba-2 support (#9126)
compilade Jul 2, 2025
e75ba4c
gguf-py : add support for chat template jinja files (#14508)
CISC Jul 2, 2025
55c2646
CUDA: add dynamic shared mem to softmax, refactor general usage (#14497)
am17an Jul 2, 2025
d4cdd9c
ggml : remove kompute backend (#14501)
ggerganov Jul 3, 2025
9067487
ggml : fix FA mask dim 2 and 3 (#14505)
ggerganov Jul 3, 2025
a70c8a0
kv-cache : use ggml_set_rows (#14285)
ggerganov Jul 3, 2025
0c2ee38
convert : correct gemma 3n conversion (#14450)
ngxson Jul 3, 2025
7b63a71
Fix conditional enabling following arch checks for ggml-sycl (#14504)
s-Nick Jul 3, 2025
c8c4495
ggml: backward pass for split swiglu (#14483)
JohannesGaessler Jul 3, 2025
2b72bed
vulkan: support mixed/deepseekR1 FA head sizes (#14509)
jeffbolznv Jul 3, 2025
bee2842
opencl : broadcast for soft_max (#14510)
lhez Jul 3, 2025
28657a8
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445)
CISC Jul 3, 2025
499a8f5
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002)
luyhcsu Jul 4, 2025
c79184d
batch : add n_used count (#14512)
ggerganov Jul 4, 2025
7b50f7c
graph : prepare for 4D mask (#14515)
ggerganov Jul 4, 2025
67d1ef2
batch : add optional for sequential equal split (#14511)
ggerganov Jul 4, 2025
ef797db
metal : disable fast math in all quantize kernels (#14528)
ggerganov Jul 4, 2025
b81510a
test-backend-ops: add support for specifying output format (#14368)
yeahdongcn Jul 5, 2025
bac8bed
eval-callback : check for empty input (#14539)
ggerganov Jul 5, 2025
6681688
opencl: add GELU_ERF (#14476)
CISC Jul 5, 2025
ddef995
server : fix assistant prefilling when content is an array (#14360)
CISC Jul 5, 2025
a0374a6
vulkan: Handle updated FA dim2/3 definition (#14518)
jeffbolznv Jul 5, 2025
e592be1
vulkan: fix rms_norm+mul fusion (#14545)
jeffbolznv Jul 6, 2025
6491d6e
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485)
netrunnereve Jul 6, 2025
b9c3eef
CUDA: add bf16 and i32 to getrows (#14529)
am17an Jul 7, 2025
12f55c3
llama : remove ggml_cont where possible (#14568)
CISC Jul 7, 2025
e1a7059
llama : fix incorrect minicpm3 v_states shape (#14571)
CISC Jul 7, 2025
68155c6
musa: fix build warnings (unused variable) (#14561)
yeahdongcn Jul 7, 2025
75c91de
CUDA: add bilinear interpolation for upscale (#14563)
am17an Jul 8, 2025
4d0dcd4
cuda : fix rope with partial rotation and non-cont src (#14580)
ggerganov Jul 8, 2025
53903ae
vulkan: increase timeout for CI (#14574)
jeffbolznv Jul 8, 2025
8f22dc0
model : add hunyuan moe (#14425)
ngxson Jul 8, 2025
17a1f0d
server: Add ability to mount server at prefix (#14544)
oluwandabira Jul 8, 2025
b8eeb87
vulkan : fix rope with partial rotation and non-cont src (#14582)
jeffbolznv Jul 8, 2025
bb4f7a9
memory : fix broken batch splits for recurrent cache (#14575)
compilade Jul 8, 2025
0838286
model : add SmolLM3 (#14581)
ngxson Jul 8, 2025
699f439
model : fix hunyuan moe chat template (#14584)
stevenkuang-tencent Jul 8, 2025
6efcd65
vulkan: optimize flash attention split_k_reduce (#14554)
jeffbolznv Jul 8, 2025
20b7bf8
convert : fix smollm3 jinja template (#14586)
ngxson Jul 9, 2025
0465506
model : add support for Falcon-H1 family (#14534)
ibrahimkhadraoui Jul 9, 2025
1055545
llama : remove unintended whitespace (#14592)
CISC Jul 9, 2025
ffd59e7
model : add skt/A.X-4.0 model vocabulary (#14589)
Bing-su Jul 9, 2025
26a48ad
ggml : prevent integer overflow in gguf tensor size calculation (#14595)
Yuuoniy Jul 9, 2025
98bab63
ggml : add ggml_scale_bias (#14417)
ngxson Jul 9, 2025
4a5686d
llama : support Jamba hybrid Transformer-Mamba models (#7531)
compilade Jul 9, 2025
cb9178f
llama : remove llm_graph_input_one (#14603)
ngxson Jul 9, 2025
a57d1bc
cuda : support Falcon-H1 state size for SSM_SCAN (#14602)
compilade Jul 10, 2025
ac44eb6
cmake : llguidance build parser library only (#14608)
EZForever Jul 10, 2025
f9a867f
cmake : bump llguidance version to v1.0.1 (#14609)
EZForever Jul 10, 2025
435a6d1
llama : minor coding style fix for smollm3 (#14605)
ngxson Jul 10, 2025
704bb7a
SYCL: Initial set_rows kernel implementation (#14562)
qnixsynapse Jul 10, 2025
a457551
cmake : do not search for curl libraries by ourselves (#14613)
EZForever Jul 10, 2025
11ee0fe
Docs: script to auto-generate ggml operations docs (#14598)
am17an Jul 10, 2025
4bb625b
Smoldocling support (#14597)
ryan-mangeno Jul 10, 2025
0b88557
opencl: add `set_rows` for `f16` and `f32` (#14547)
lhez Jul 10, 2025
6bdda13
opencl: add tiled mul_mat_f16_f32 (#14535)
rmatif Jul 10, 2025
0aedae0
model : Granite Four (#13550)
gabe-l-hart Jul 11, 2025
576c82e
vocab : add midm-2.0 model pre-tokenizer (#14626)
Bing-su Jul 11, 2025
0d5375d
llama : move enum llama_vocab_pre_type to implementation (#14631)
ggerganov Jul 11, 2025
aaa088d
readme : add hot PRs (#14636)
ggerganov Jul 11, 2025
756aa10
HIP : Add HIP 7.0+ compatibility for hipBLAS compute types (#14634)
slojosic-amd Jul 11, 2025
f5e96b3
model : support LiquidAI LFM2 hybrid family (#14620)
tdakhran Jul 11, 2025
98197e5
vulkan: optimizations for deepseek prompt processing (#14555)
jeffbolznv Jul 12, 2025
b3ad3a0
vulkan: support SET_ROWS (#14587)
jeffbolznv Jul 12, 2025
0c1df14
server : fix pooled embedding output (#14645)
iamlemec Jul 12, 2025
8000
3e303b1
vulkan : implement ggml_roll (ggml/1290)
Acly Jul 12, 2025
74bb294
vulkan : implement bilinear interpolation (ggml/1291)
Acly Jul 12, 2025
2155357
sync : ggml
ggerganov Jul 12, 2025
3120413
vulkan : remove unused vars (#0)
ggerganov Jul 12, 2025
8eff955
sync : ggml
ggerganov Jul 12, 2025
7de5c7c
CUDA: add set rows for f32 and f16 (#14551)
am17an Jul 12, 2025
67eade1
docs : add LFM2 to models section (#14650)
tdakhran Jul 12, 2025
c31e606
tests : cover lfm2 cases in test_ssm_conv (#14651)
tdakhran Jul 12, 2025
84b396e
cmake : Add CMake presets for Linux and GCC (#14656)
YavorGIvanov Jul 13, 2025
dcf7f2e
metal : Add missing unary ops Metal support (#14660)
YavorGIvanov Jul 13, 2025
05fec5b
ggml : add build-time message to remind about ggml_set_rows (#14661)
ggerganov Jul 13, 2025
e743cdd
cuda : add ELU support (#14657)
YavorGIvanov Jul 13, 2025
923e3ea
cuda : add set rows for bf16 (#14664)
CISC Jul 13, 2025
982e347
quantize : fix minor logic flaw in --tensor-type (#14572)
EAddario Jul 13, 2025
0d92267
llama : add jinja template for rwkv-world (#14665)
MollySophia Jul 13, 2025
65a3ebb
sycl: Batched mulmat rework for oneDNN dispatch (#14617)
ShanoToni Jul 14, 2025
0f4c6ec
SYCL: use 1D kernel for set_rows (#14618)
qnixsynapse Jul 14, 2025
494c589
scripts: benchmark for HTTP server throughput (#14668)
JohannesGaessler Jul 14, 2025
9c9e4fc
llama-context: add ability to get logits (#14672)
am17an Jul 14, 2025
55c509d
ggml : refactor llamafile_sgemm PPC code (#14673)
shalinib-ibm Jul 14, 2025
bdca383
sycl: Hotfix for non dnnl codepath (#14677)
ShanoToni Jul 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
kv-cache : fix unified::seq_rm to work with seq_id < 0 (ggml-org#13985)
ggml-ci
  • Loading branch information
ggerganov authored Jun 4, 2025
commit e0e806f52ebcd0ee285c994fe8fd8b8787d2cb0a
23 changes: 19 additions & 4 deletions src/llama-kv-cache-unified.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,27 @@ bool llama_kv_cache_unified::seq_rm(llama_seq_id seq_id, llama_pos p0, llama_pos
p1 = std::numeric_limits<llama_pos>::max();
}

for (uint32_t i = 0; i < cells.size(); ++i) {
if (!cells.pos_in(i, p0, p1)) {
continue;
if (seq_id >= 0) {
for (uint32_t i = 0; i < cells.size(); ++i) {
if (!cells.pos_in(i, p0, p1)) {
continue;
}

if (cells.seq_has(i, seq_id) && cells.seq_rm(i, seq_id)) {
if (new_head == cells.size()) {
new_head = i;
}
}
}
} else {
// match any sequence
for (uint32_t i = 0; i < cells.size(); ++i) {
if (!cells.pos_in(i, p0, p1)) {
continue;
}

cells.rm(i);

if (cells.seq_has(i, seq_id) && cells.seq_rm(i, seq_id)) {
if (new_head == cells.size()) {
new_head = i;
}
Expand Down
0