Add TP CLI argument to multimodal inference examples #29301

faaany · 2025-11-24T09:05:39Z

Purpose

Currently, the example scripts have hardcoded tensor_parallel_size values for different models. Users running these examples on different hardware configurations (e.g., varying GPU memory) often encounter OOM errors and need to manually edit the code to adjust tensor parallelism. This change provides a more user-friendly way to handle such scenarios.

Test Plan

# Override tensor parallel size for vision language model
python examples/offline_inference/vision_language_multi_image.py -m aria --tensor-parallel-size 2

# Use shorthand notation
python examples/offline_inference/audio_language.py -m qwen2_audio -tp 2

# If not specified, uses the model's default configuration
python examples/offline_inference/vision_language.py -m llama4

Test Result

Tested that the argument correctly overrides default settings
Verified backward compatibility (scripts work without the argument)
Confirmed help message displays correctly

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

mergify · 2025-11-24T09:06:14Z

Documentation preview: https://vllm--29301.org.readthedocs.build/en/29301/

gemini-code-assist

Code Review

This pull request adds a --tensor-parallel-size (-tp) command-line argument to the audio_language.py, vision_language.py, and vision_language_multi_image.py example scripts. This allows users to override the default tensor parallel size for the models. The changes are implemented correctly by updating the engine_args before initializing the LLM engine.

I've identified a potential issue regarding input validation. The newly added tensor-parallel-size argument is not checked for positivity. Providing a non-positive value could lead to a crash. I've added comments with suggestions to add validation for this argument in all three modified files to make the scripts more robust.

examples/offline_inference/audio_language.py

examples/offline_inference/vision_language.py

examples/offline_inference/vision_language_multi_image.py

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

faaany · 2025-11-24T09:20:36Z

cc @jikunshang @yma11

add tp as argument

7ce4dcd

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

faaany changed the title ~~add tp as argument~~ Add TP CLI argument to multimodal inference examples Nov 24, 2025

mergify bot added the documentation Improvements or addi 8000 tions to documentation label Nov 24, 2025

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

check negative values for tp

d6bf907

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add TP CLI argument to multimodal inference examples #29301

Add TP CLI argument to multimodal inference examples #29301

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Add TP CLI argument to multimodal inference examples #29301

Are you sure you want to change the base?

Add TP CLI argument to multimodal inference examples #29301

Conversation

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant