Add Qwen3 #3229

yuanwu2017 · 2025-05-16T01:39:01Z

What does this PR do?

Enable the Qwen3 dense base models on Gaudi platform.
Command:

model=Qwen/Qwen3-8B
#model=Qwen/Qwen3-32B
docker run -it  -p $port:80 \
   --runtime=habana \
   -v $volume:/data \
   -v $PWD:/workspace \
   -e HABANA_VISIBLE_DEVICES=all \
   -e HUGGING_FACE_HUB_TOKEN=$hf_token \
   -e HUGGINGFACE_HUB_CACHE=/data/hub \
   -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
   -e http_proxy=${http_proxy}     -e https_proxy=${https_proxy} -e no_proxy=${no_proxy} \
   -e TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN=true \
   -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
   -e ENABLE_HPU_GRAPH=true \
   -e LIMIT_HPU_GRAPH=true \
   -e ATTENTION=paged \
   --cap-add=sys_nice \
   --ipc=host \
   $image --model-id $model \
   --max-input-length 1024 --max-total-tokens 2048 \
   --max-batch-prefill-tokens 8192 --max-batch-size 32 \
   --max-waiting-tokens 7 --waiting-served-ratio 1.2 --max-concurrent-requests 512

Run tests command:
https://github.com/yuanwu2017/llm-dbg

./run_tgi_benchmark.sh
Result:
model=Qwen/Qwen3-8B

model=Qwen/Qwen3-32B

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

8000

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 · 2025-05-20T04:19:43Z

@regisss Please help to review.

regisss · 2025-05-20T08:36:05Z

backends/gaudi/tgi-entrypoint.sh

-  if [[ "$*" == *"Llama-4"* ]]; then
-    echo 'ATTENTION=paged and Llama-4 detected'
+  if [[ "$*" == *"Llama-4"* || "$*" == *"Qwen3"* ]]; then
+    echo 'ATTENTION=paged and Llama-4 or Qwen3 detected'
    pip install git+https://github.com/huggingface/transformers.git@29338949


Transformers v4.52 should be released today, let's wait for it and update this line?

@yuanwu2017 We can use Transformers v4.52.2 now

Currently gaudi tgi cannot use the latest transformers, because the latest transformers moves the VideoInput into video_utils. But qwen2_5_vl.py uses the old version transformers. If I run llama4 or Qwen3 with latest transformers, I need to change the qwen2_5_vl.py. But If I run llama3 with 4.49 transformers, I cannot change the qwen2_5_vl.py. If using the 4.52.2 for all models, I must reomve the optimum-habana, because it has conflict with latest transformers. So I think we need to use the transformers.git@29338949. After we remove the OH, I will update it to latest transformers.

Sounds good

yuanwu2017 · 2025-05-20T08:38:47Z

Ok

________________________________ 发件人: regisss ***@***.***> 发送时间: Tuesday, May 20, 2025 4:36:26 PM 收件人: huggingface/text-generation-inference ***@***.***> 抄送: Wu, Yuan ***@***.***>; Author ***@***.***> 主题: Re: [huggingface/text-generation-inference] Add Qwen3 (PR #3229) @regisss commented on this pull request.

________________________________ In backends/gaudi/tgi-entrypoint.sh<#3229 (comment)>:

@@ -10,8 +10,8 @@ fi

# Check if ATTENTION environment variable is set to paged if [[ "$ATTENTION" == "paged" ]]; then # Check if Llama-4 is in the command line arguments - if [[ "$*" == *"Llama-4"* ]]; then - echo 'ATTENTION=paged and Llama-4 detected' + if [[ "$*" == *"Llama-4"* || "$*" == *"Qwen3"* ]]; then + echo 'ATTENTION=paged and Llama-4 or Qwen3 detected' pip install ***@***.*** Transformers v4.52 should be released today, let's wait for it and update this line? ― Reply to this email directly, view it on GitHub<#3229 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AIIJ2KJSGLCR43NH3KN3KY327LSQVAVCNFSM6AAAAAB5HQ43HWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDQNJTGI2TCOJRGA>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Signed-off-by: yuanwu <yuan.wu@intel.com>

backends/gaudi/Makefile

regisss · 2025-05-22T15:49:09Z

backends/gaudi/tgi-entrypoint.sh

-  if [[ "$*" == *"Llama-4"* ]]; then
-    echo 'ATTENTION=paged and Llama-4 detected'
+  if [[ "$*" == *"Llama-4"* || "$*" == *"Qwen3"* ]]; then
+    echo 'ATTENTION=paged and Llama-4 or Qwen3 detected'
    pip install git+https://github.com/huggingface/transformers.git@29338949


Sounds good

Signed-off-by: yuanwu <yuan.wu@intel.com>

regisss

LGTM

Add Qwen3

d704b0c

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 marked this pull request as draft May 16, 2025 05:18

Remove debug log

2a01478

Signed-off-by: yuanwu <yuan.wu@intel.com>

yuanwu2017 marked this pull request as ready for review May 20, 2025 02:55

regisss reviewed May 20, 2025

View reviewed changes

yuanwu2017 added 6 commits May 22, 2025 11:56

Merge branch 'huggingface:main' into qwen3

3d20c79

Add mark_step into qwen3

ad41abd

Signed-off-by: yuanwu <yuan.wu@intel.com>

Add mark_step into llama4

2e8d3e9

Signed-off-by: yuanwu <yuan.wu@intel.com>

Use the latest transformers

346b6f7

Signed-off-by: yuanwu <yuan.wu@intel.com>

Use the 4.52.2 transformers

1ccf86c

Signed-off-by: yuanwu <yuan.wu@intel.com>

Cannot use the latest transformers

5e1d1bf

Signed-off-by: yuanwu <yuan.wu@intel.com>

regisss reviewed May 22, 2025

View reviewed changes

Remove debug modification

cc3f612

Signed-off-by: yuanwu <yuan.wu@intel.com>

regisss approved these changes May 23, 2025

View reviewed changes

regisss merged commit 1883a62 into huggingface:main May 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Qwen3 #3229

Add Qwen3 #3229

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add Qwen3 #3229

Add Qwen3 #3229

Uh oh!

Conversation

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!