10000 llama : initial Mamba-2 support by compilade · Pull Request #9126 · ggml-org/llama.cpp · GitHub
[go: up one dir, main page]

Skip to content

llama : initial Mamba-2 support #9126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Jul 2, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
1f0fea7
llama : initial Mamba-2 support
compilade Aug 1, 2024
dceff23
ggml : SIMD ggml_ssm_scan for Mamba-2
compilade Aug 19, 2024
2bfe9de
llama : support running Mamba-Codestral-7B-v0.1
compilade Aug 19, 2024
aff9692
llama : fix Mamba-2 conv state saving
compilade Aug 21, 2024
e04910d
llama : remove unused variable
compilade Aug 22, 2024
fa358e7
llama : add missing break
compilade Aug 22, 2024
38913dc
convert_hf : prefer SentencePiece tokenizer for Mamba-2 when present
compilade Aug 22, 2024
0e601ca
Merge branch 'master' into compilade/mamba2
compilade Sep 18, 2024
273e7a4
llama : avoid redundant state copy for Mamba 1 and 2
compilade Sep 30, 2024
7d6cb36
Merge branch 'master' into compilade/mamba2
compilade Oct 1, 2024
2c77d79
metal : attempt to adapt SSM_SCAN for Mamba-2
compilade Oct 2, 2024
87b97d0
metal : fix SSM_SCAN pipeline scope
compilade Oct 2, 2024
03d0e6e
metal : use log and exp instead of log1pf and expf in SSM_SCAN
compilade Oct 2, 2024
7a351ab
metal : remove unused arguments for SSM_SCAN
compilade Oct 2, 2024
8b15bc6
metal : add back n_seqs to SSM_SCAN args
compilade Oct 2, 2024
5b8ec2b
metal : fix SSM_SCAN state head offset
compilade Oct 2, 2024
62b09b3
metal : fix wrong number of tokens per sequence in SSM_SCAN
compilade Oct 3, 2024
038d958
Merge branch 'master' into compilade/mamba2
compilade Oct 12, 2024
805512a
ggml : remove unused fast broadcast path in GGML_MUL
compilade Oct 12, 2024
7d16e1b
Merge branch 'master' into compilade/mamba2
compilade Nov 1, 2024
3bc7103
ggml : avoid multiply by D in GGML_OP_SSM_SCAN
compilade Nov 4, 2024
8d8f065
Merge branch 'master' into compilade/mamba2
compilade Nov 4, 2024
b4e9c59
convert : fix flake8 lint
compilade Nov 4, 2024
1ee6c48
Merge branch 'master' into compilade/mamba2
compilade Nov 25, 2024
c9ecf62
Merge branch 'master' into compilade/mamba2
compilade Feb 26, 2025
35d06fa
Merge branch 'master' into compilade/mamba2
compilade May 1, 2025
cf4f0a4
metal : fix confusion between ; and ,
compilade May 1, 2025
6def5cd
metal : add missing args for nb references in ssm_scan_f32_group
compilade May 1, 2025
791998b
metal : single-user mamba2 inference works
compilade May 2, 2025
94c3d53
kv-cache : remove const_cast when setting inputs for s_copy
compilade May 2, 2025
929fe85
Merge branch 'master' into compilade/mamba2
compilade May 2, 2025
d55b0d0
convert : avoid AutoConfig for Mamba and Mamba2 hparams
compilade May 2, 2025
e94f393
kv-cache : allow context shift for recurrent models
compilade May 2, 2025
9864bfc
Merge branch 'master' into compilade/mamba2
compilade Jun 10, 2025
2fa5f2c
graph : fix recurrent state copies when avoiding copies
compilade Jun 11, 2025
757aa62
ggml : fix mamba2 ssm scan when compiled with SVE
compilade Jun 11, 2025
0b6f6be
ggml-cpu : reorder SVE FMA for consistency with other SIMD arches
compilade Jun 11, 2025
a42f239
Merge branch 'master' into compilade/mamba2
compilade Jun 19, 2025
f8c7cae
cuda : implement ssm scan for Mamba2
compilade May 15, 2025
830e554
Merge branch 'master' into compilade/mamba2
compilade Jun 19, 2025
afdb669
Merge branch 'master' into compilade/mamba2
compilade Jun 23, 2025
dc1d109
mamba : fix mismatched new and delete size for llm_build_mamba
compilade Jun 26, 2025
73de1fd
Merge branch 'master' into compilade/mamba2
compilade Jul 2, 2025
71bef66
cud 10000 a : graceful fallback for Mamba-1 models with weird embd size
compilade Jul 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'master' into compilade/mamba2
  • Loading branch information
compilade committed May 1, 2025
commit 35d06fac5af8f85903d6ffe14c53c16aad90dc73
10 changes: 5 additions & 5 deletions convert_hf_to_gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4175,7 +4175,6 @@ def set_gguf_parameters(self):
_tok_embd = None

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:

output_name = self.format_tensor_name(gguf.MODEL_TENSOR.OUTPUT)
tok_embd_name = self.format_tensor_name(gguf.MODEL_TENSOR.TOKEN_EMBD)

Expand All @@ -4185,6 +4184,7 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
logger.debug("A_log --> A ==> " + new_name)
data_torch = -torch.exp(data_torch)

# [4 1 8192 1] -> [4 8192 1 1]
if self.match_model_tensor_name(new_name, gguf.MODEL_TENSOR.SSM_CONV1D, bid):
data_torch = data_torch.squeeze()

Expand All @@ -4199,8 +4199,8 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
return [(new_name, data_torch)]


@Model.register("Mamba2ForCausalLM")
class Mamba2Model(Model):
@ModelBase.register("Mamba2ForCausalLM")
class Mamba2Model(TextModel):
model_arch = gguf.MODEL_ARCH.MAMBA2

def set_vocab(self):
Expand Down Expand Up @@ -4284,8 +4284,8 @@ def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iter
yield (new_name, data_torch)


@Model.register("CohereForCausalLM")
class CommandR2Model(Model):
@ModelBase.register("CohereForCausalLM")
class CommandR2Model(TextModel):
model_arch = gguf.MODEL_ARCH.COMMAND_R

def __init__(self, *args, **kwargs): 54A5
Expand Down
Loading
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.
0