[Model] add dots1 #38143

redmoe-moutain · 2025-05-15T08:47:02Z

What does this PR do?

Support model dots.llm1 by rednote-hilab

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

src/transformers/models/dots1/modeling_dots1.py

src/transformers/models/dots1/configuration_dots1.py

src/transformers/models/dots1/modular_dots1.py

redmoe-moutain · 2025-05-20T10:28:55Z

@ArthurZucker Could you please take a look?

Rocketknight1 · 2025-05-22T16:44:42Z

Hi @redmoe-moutain is there an existing pre-trained dots1 model somewhere? We generally don't add architectures until we need them to support a significant model checkpoint

redmoe-moutain · 2025-05-22T18:12:04Z

@Rocketknight1 We're rolling out the open-source models dots.llm1. You can check out the pretrained model here: https://huggingface.co/rednote-hilab/dots.llm1.base. The instruct version and a detailed report are coming soon.

Rocketknight1 · 2025-05-23T13:01:25Z

Cool! In that case, @Cyrilvallez can you take the review?

ArthurZucker

Looks actually very nice! Thanks for the clean modular, I actually have a hard time believing it !

ArthurZucker · 2025-05-26T13:14:28Z

tests/models/dots1/test_modeling_dots1.py

+    )
+
+
+class Dots1ModelTester:


can you check test_modeling_llama we have a new simpler mixin for testing!

ArthurZucker · 2025-05-26T13:15:09Z

src/transformers/models/dots1/configuration_dots1.py

@@ -0,0 +1,192 @@
+from ...configuration_utils import PretrainedConfig


missing a licecne!

ArthurZucker · 2025-05-26T13:15:39Z

src/transformers/models/dots1/configuration_dots1.py

+        "layers.*.mlp.experts.*.gate_proj": "local_colwise",
+        "layers.*.mlp.experts.*.up_proj": "local_colwise",
+        "layers.*.mlp.experts.*.down_proj": "local_rowwise",
+        "layers.*.mlp.experts.*": "local",  # each expert is wrapped in a module list
+        "layers.*.mlp.shared_experts.gate_proj": "local_colwise",
+        "layers.*.mlp.shared_experts.up_proj": "local_colwise",
+        "layers.*.mlp.shared_experts.down_proj": "local_rowwise",
+        "layers.*.mlp.shared_experts": "local",
+        "layers.*.mlp.gate_proj": "local_colwise",
+        "layers.*.mlp.up_proj": "local_colwise",


quick q, did you test TP to make sure it works?

redmoe-moutain · 2025-05-28T03:57:58Z

@ArthurZucker Thank you for your insightful review. I've updated the license and testing as suggested.

While we've tested on PP, we haven't yet covered TP testing cases. Could you provide some examples of how we should approach TP? Any guidance would be greatly appreciated.

redmoe-moutain · 2025-06-09T14:01:50Z

@ArthurZucker Could you please take another look?

Rocketknight1 · 2025-06-11T12:21:38Z

cc @Cyrilvallez for core maintainer review since Arthur is out!

Cyrilvallez

Amazingly simple modular! Super nice 🤗 Added a few comments, but this is truly almost ready to be shipped 👌
For TP, you can check out the example in the doc here. Let me know if something is still unclear!
You could add more integration tests as well, maybe try beyond the sliding window etc as in Qwen3 but this is optional

Cyrilvallez · 2025-06-17T10:11:50Z

docs/source/en/model_doc/dots1.md

@@ -0,0 +1,40 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.


It's 2025! 🤗

Cyrilvallez · 2025-06-17T10:12:09Z

src/transformers/models/dots1/__init__.py

@@ -0,0 +1,27 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.


same here haha

Cyrilvallez · 2025-06-17T10:12:37Z

src/transformers/models/dots1/configuration_dots1.py

+        pretraining_tp (`int`, *optional*, defaults to 1):
+            Experimental: tensor parallelism rank used during pretraining. This is necessary for exact reproducibility
+            of pretraining results.


This should be removed!

Cyrilvallez · 2025-06-17T10:13:49Z

src/transformers/models/dots1/configuration_dots1.py

+        use_sliding_window (`bool`, *optional*, defaults to `False`):
+            Whether to use sliding window attention.


The best is to not have this arg, and simply check sliding_window is None instead - so to remove

Cyrilvallez · 2025-06-17T10:15:42Z

src/transformers/models/dots1/modular_dots1.py

+from ...modeling_outputs import CausalLMOutputWithPast
+from ...processing_utils import Unpack


Missing a license here at the top

Cyrilvallez · 2025-06-17T10:17:48Z

src/transformers/models/dots1/modular_dots1.py

+from ..llama.modeling_llama import (
+    KwargsForCausalLM,
+    LlamaRMSNorm,
+)


Let's import those 2 classes from Qwen3 instead! They are similar, and it's easier to follow if we import everything from the same model!

Cyrilvallez · 2025-06-17T10:18:22Z

tests/models/dots1/test_modeling_dots1.py

@@ -0,0 +1,144 @@
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.


2025 as well!

Cyrilvallez · 2025-06-17T10:22:38Z

tests/models/dots1/test_modeling_dots1.py

+        # greedy generation outputs
+        generated_ids = model.generate(input_ids, max_new_tokens=20, do_sample=False)
+        text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
+        print(text)


Let's remove this print

redmoe-moutain · 2025-06-17T16:02:17Z

@Cyrilvallez Thanks for the review. It looks much cleaner now!

I followed the documentation to test tp with:

model = AutoModelForCausalLM.from_pretrained("rednote-hilab/dots.llm1.inst", tp_plan="auto", torch_dtype=torch.bfloat16)

However, I encountered the following error:

[rank0]:   File "/mnt/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:            ^^^^^^^
[rank0]:   File "/mnt/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in inner
[rank0]:     result = forward_call(*args, **kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/GitHub/transformers/src/transformers/models/dots1/modeling_dots1.py", line 308, in forward
[rank0]:     topk_indices, topk_weights = self.gate(hidden_states)
[rank0]:                                  ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1844, in _call_impl
[rank0]:     return inner()
[rank0]:            ^^^^^^^
[rank0]:   File "/mnt/miniconda3/envs/vllm312/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1803, in inner
[rank0]:     hook_result = hook(self, args, result)
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/GitHub/transformers/src/transformers/integrations/tensor_parallel.py", line 329, in <lambda>
[rank0]:     module.register_forward_hook(lambda mod, inputs, outputs: output_fn(mod, outputs, device_mesh))
[rank0]:                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/mnt/GitHub/transformers/src/transformers/integrations/tensor_parallel.py", line 447, in _prepare_output_fn
[rank0]:     return outputs.to_local() if use_local_output else outputs
[rank0]:            ^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'tuple' object has no attribute 'to_local'

I modified it to apply .to_local() to each element if outputs is a tuple:

@staticmethod
def _prepare_output_fn(output_layouts, use_local_output, mod, outputs, device_mesh):
    if isinstance(outputs, tuple):
        return tuple(output.to_local() if use_local_output else output for output in outputs)
    return outputs.to_local() if use_local_output else outputs

After this change, it works as expected.
Let me know if you’d like me to open a separate issue or PR to further discuss.

Thanks!

ArthurZucker

Not seeing the changes to tenso 8000 r parallel but yes let's open a different PR and let's merge this!

ArthurZucker · 2025-06-25T07:43:47Z

docs/source/en/model_doc/dots1.md

+
+## Overview
+
+The `dots.llm1` model was proposed in dots.llm1 technical report by rednote-hilab team.


can we add a hot link here!

Sure, I've added the hyperlink.

HuggingFaceDocBuilderDev · 2025-06-25T09:05:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2025-06-25T09:38:40Z

Thanks for bearing with us and kudos for the release

redmoe-moutain commented May 15, 2025

View reviewed changes

src/transformers/models/dots1/modeling_dots1.py Outdated Show resolved Hide resolved

redmoe-moutain commented May 15, 2025

View reviewed changes

src/transformers/models/dots1/configuration_dots1.py Outdated Show resolved Hide resolved

redmoe-moutain commented May 15, 2025

View reviewed changes

src/transformers/models/dots1/configuration_dots1.py Outdated Show resolved Hide resolved

src/transformers/models/dots1/modular_dots1.py Outdated Show resolved Hide resolved

src/transformers/models/dots1/modular_dots1.py Outdated Show resolved Hide resolved

redmoe-moutain marked this pull request as ready for review May 20, 2025 09:24

redmoe-moutain changed the title ~~add dots1~~ [Model] add dots1 May 20, 2025

redmoe-moutain marked this pull request as draft May 20, 2025 09:29

redmoe-moutain marked this pull request as ready for review May 20, 2025 10:28

redmoe-moutain force-pushed the dots.1 branch from d983c4b to 912b3e8 Compare May 20, 2025 12:36

redmoe-moutain mentioned this pull request May 20, 2025

[Model] support dots1 vllm-project/vllm#18254

Merged

ArthurZucker reviewed May 26, 2025

View reviewed changes

redmoe-moutain requested a review from ArthurZucker June 5, 2025 08:14

daniel-dona mentioned this pull request Jun 6, 2025

Feature Request: dots.llm1 model support ggml-org/llama.cpp#14044

Closed

4 tasks

redmoe-moutain force-pushed the dots.1 branch from ffe1262 to 62caf7a Compare June 10, 2025 13:22

add dots1

34f5c23

redmoe-moutain force-pushed the dots.1 branch from 62caf7a to 34f5c23 Compare June 10, 2025 13:47

Noeda mentioned this pull request Jun 11, 2025

llama-model : add dots.llm1 architecture support (#14044) ggml-org/llama.cpp#14118

Merged

Cyrilvallez reviewed Jun 17, 2025

View reviewed changes

rgtjf added 3 commits June 17, 2025 13:34

Merge remote-tracking branch 'upstream/main' into dots.1

8bae1ec

address comments

912f143

fix

cc23581

redmoe-moutain requested a review from Cyrilvallez June 17, 2025 15:46

ArthurZucker approved these changes Jun 25, 2025

View reviewed changes

ArthurZucker added the New model label Jun 25, 2025

add link to dots1 doc

b63f447

redmoe-moutain and others added 2 commits June 25, 2025 17:09

Merge branch 'main' into dots.1

0a62b81

format

3ec95b2

ArthurZucker merged commit 7503cb9 into huggingface:main Jun 25, 2025
18 checks passed

		@@ -0,0 +1,192 @@
		from ...configuration_utils import PretrainedConfig

		@@ -0,0 +1,40 @@
		<!--Copyright 2024 The HuggingFace Team. All rights reserved.

		@@ -0,0 +1,27 @@
		# Copyright 2024 The HuggingFace Team. All rights reserved.

		use_sliding_window (`bool`, optional, defaults to `False`):
		Whether to use sliding window attention.

		from ...modeling_outputs import CausalLMOutputWithPast
		from ...processing_utils import Unpack

		@@ -0,0 +1,144 @@
		# Copyright 2024 The HuggingFace Inc. team. All rights reserved.


		## Overview

		The `dots.llm1` model was proposed in dots.llm1 technical report by rednote-hilab team.

		)


		class Dots1ModelTester:

[Model] add dots1 #38143

[Model] add dots1 #38143

Uh oh!

Conversation

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!