8000 [Bug]: Qwen3-VL Inference Fails with MS-Swift (Current) + vLLM 0.11 · Issue #6040 · modelscope/ms-swift · GitHub
[go: up one dir, main page]

Skip to content

[Bug]: Qwen3-VL Inference Fails with MS-Swift (Current) + vLLM 0.11 #6040

@cheliu-computation

Description

@cheliu-computation

Describe the bug (描述 bug 以及复现过程,最好有截图)

I am attempting to deploy Qwen3-VL models (specifically the 30B or 235B versions) using the MS-Swift inference pipeline with vllm as the backend accelerator.

The inference process appears to fail or encounters a compatibility error when running the swift infer command with these specific multimodal models. This suggests a potential disconnect between the model format expected by the current MS-Swift integration layer and the format produced or required by vLLM.

复现命令 (Reproduction Command):

swift infer Qwen/Qwen3-VL-30B-A3B-Instruct --infer_backend vllm [...]

(File "/home/i-liuche/codes/swift39/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 766, in patch_remove_log async_llm_engine._origin_log_task_completion = async_llm_engine._log_task_completion ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: module 'vllm.engine.async_llm_engine' has no attribute '_log_task_completion' ).

Your hardware and system info (在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

Please confirm the exact versions you are using, or specify "Latest" if installed recently:

Component | Version -- | -- MS-Swift | Current vLLM | 0.11.0 Python | (3.12) GPU/Hardware | (e.g., H100 80GB) CUDA Version | (e.g., 12.8) PyTorch Version | (e.g., 2.8.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0