8000 Add Qwen3 by yuanwu2017 · Pull Request #3229 · huggingface/text-generation-inference · GitHub
[go: up one dir, main page]

Skip to content

Add Qwen3 #3229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 23, 2025
Merged

Add Qwen3 #3229

merged 9 commits into from
May 23, 2025

Conversation

yuanwu2017
Copy link
Contributor
@yuanwu2017 yuanwu2017 commented May 16, 2025

What does this PR do?

Enable the Qwen3 dense base models on Gaudi platform.
Command:

model=Qwen/Qwen3-8B
#model=Qwen/Qwen3-32B
docker run -it  -p $port:80 \
   --runtime=habana \
   -v $volume:/data \
   -v $PWD:/workspace \
   -e HABANA_VISIBLE_DEVICES=all \
   -e HUGGING_FACE_HUB_TOKEN=$hf_token \
   -e HUGGINGFACE_HUB_CACHE=/data/hub \
   -e OMPI_MCA_btl_vader_single_copy_mechanism=none \
   -e http_proxy=${http_proxy}     -e https_proxy=${https_proxy} -e no_proxy=${no_proxy} \
   -e TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN=true \
   -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true \
   -e ENABLE_HPU_GRAPH=true \
   -e LIMIT_HPU_GRAPH=true \
   -e ATTENTION=paged \
   --cap-add=sys_nice \
   --ipc=host \
   $image --model-id $model \
   --max-input-length 1024 --max-total-tokens 2048 \
   --max-batch-prefill-tokens 8192 --max-batch-size 32 \
   --max-waiting-tokens 7 --waiting-served-ratio 1.2 --max-concurrent-requests 512

Run tests command:
https://github.com/yuanwu2017/llm-dbg

./run_tgi_benchmark.sh
Result:
model=Qwen/Qwen3-8B
image
model=Qwen/Qwen3-32B
image

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

8000

Signed-off-by: yuanwu <yuan.wu@intel.com>
@yuanwu2017 yuanwu2017 marked this pull request as draft May 16, 2025 05:18
Signed-off-by: yuanwu <yuan.wu@intel.com>
@yuanwu2017 yuanwu2017 marked this pull request as ready for review May 20, 2025 02:55
@yuanwu2017
Copy link
Contributor Author

@regisss Please help to review.

if [[ "$*" == *"Llama-4"* ]]; then
echo 'ATTENTION=paged and Llama-4 detected'
if [[ "$*" == *"Llama-4"* || "$*" == *"Qwen3"* ]]; then
echo 'ATTENTION=paged and Llama-4 or Qwen3 detected'
pip install git+https://github.com/huggingface/transformers.git@29338949
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transformers v4.52 should be released today, let's wait for it and update this line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuanwu2017 We can use Transformers v4.52.2 now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author
@yuanwu2017 yuanwu2017 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently gaudi tgi cannot use the latest transformers, because the latest transformers moves the VideoInput into video_utils. But qwen2_5_vl.py uses the old version transformers. If I run llama4 or Qwen3 with latest transformers, I need to change the qwen2_5_vl.py. But If I run llama3 with 4.49 transformers, I cannot change the qwen2_5_vl.py. If using the 4.52.2 for all models, I must reomve the optimum-habana, because it has conflict with latest transformers. So I think we need to use the transformers.git@29338949. After we remove the OH, I will update it to latest transformers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

8000
@yuanwu2017
Copy link
Contributor Author
yuanwu2017 commented May 20, 2025 via email

Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuan.wu@intel.com>
if [[ "$*" == *"Llama-4"* ]]; then
echo 'ATTENTION=paged and Llama-4 detected'
if [[ "$*" == *"Llama-4"* || "$*" == *"Qwen3"* ]]; then
echo 'ATTENTION=paged and Llama-4 or Qwen3 detected'
pip install git+https://github.com/huggingface/transformers.git@29338949
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

Signed-off-by: yuanwu <yuan.wu@intel.com>
Copy link
Collaborator
@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@regisss regisss merged commit 1883a62 into huggingface:main May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0