Pulse · InternLM/lmdeploy · GitHub

8000 Pulse · InternLM/lmdeploy · GitHub

May 10, 2025 – May 17, 2025

Overview

24 Active pull requests

8000

19 Active issues

16 Pull requests merged by 8 people

Skip dp dummy input forward
#3552 merged May 16, 2025
support update params with dp=1 for pytorch engine
#3562 merged May 15, 2025
simulate EPLB for benchmark only
#3490 merged May 15, 2025
fix dp>1 tp=1 ep=1
#3555 merged May 15, 2025
fix proxy server heart beat
#3543 merged May 15, 2025
ray safe exit
#3545 merged May 15, 2025
[bug fix] fix PD Disaggregation in DSV3
#3547 merged May 15, 2025
[ascend]set transdata dynamic shape true
#3531 merged May 13, 2025
support update params for pytorch backend from api server
#3535 merged May 13, 2025
fix stopwords kv cache
#3494 merged May 12, 2025
update two microbatch
#3540 merged May 12, 2025
start engine loop on server startup event
#3523 merged May 12, 2025
update blockedfp8 scale name
#3532 merged May 12, 2025
allow api server terminated through requests from clients
#3533 merged May 12, 2025
ray nsys profile support
#3448 merged May 12, 2025
random pad input ids
#3530 merged May 12, 2025

8 Pull requests opened by 7 people

Refactor engine 2505
#3541 opened May 12, 2025
pipeline warmup
#3548 opened May 14, 2025
Unclock mutual exclusivity of argument: `tool-call-parser` and `reasoning-parser`
#3550 opened May 14, 2025
support awq for Qwen2.5-VL
#3559 opened May 15, 2025
[ci] add test workflow for 3090 machine
#3561 opened May 15, 2025
support qwen3 /think & /no_think & enable_thinking parameter
#3564 opened May 15, 2025
support update params for turbomind backend
#3566 opened May 16, 2025
Update llama.py
#3570 opened May 17, 2025

7 Issues closed by 5 people

[Bug] Unable to execute awq quantization of InternVL3
#3563 closed May 16, 2025
[Feature] 请问是否有计划对批次N进行支持，目前n设置无效，永远是只有1个答案
#3556 closed May 14, 2025
[Bug] qwen3_32b_awq 模型进行 convert 转换后无法正常工作，直接在线转换正常
#3518 closed May 14, 2025
vlm是否支持enable_prefix_caching
#3546 closed May 13, 2025
[Feature] 支持同时启动多个模型
#3536 closed May 13, 2025
[Feature] Will Turbomind support for Expert Parallel?
#3539 closed May 13, 2025
elegant way to set max-pixels of Qwen2.5-VL
#3472 closed May 12, 2025

12 Issues opened by 10 people

[Bug] Stress test memory error
#3571 opened May 17, 2025
[Feature] require support Qwen2.5-Omni-7B
#3569 opened May 17, 2025
[Bug] V100服务启动时卡住没有反应
#3568 opened May 17, 2025
transform部署和lmdeploy部署结果不一致
#3565 opened May 16, 2025
[Feature] 是否有计划将convert逻辑优化成多进程(非多线程，依然是单核跑满)，希望能CPU多核跑满
#3558 opened May 14, 2025
[Bug] 使用Turbomind启动服务，当并发过多的时候会导致OOM，然后服务直接自杀了，希望能改进成OOM之前判断是否能处理，不能处理就一个个处理，不要服务直接自杀，非常感谢
#3557 opened May 14, 2025
[Bug] qwen3-awq internal error happened
#3554 opened May 14, 2025
turbomind attention算子接口求助
#3553 opened May 14, 2025
[Bug] 请问lmdeploy支持k8s方式在昇腾机器上的多节点部署吗
#3551 opened May 14, 2025
[Bug] install with pip will reinstall cpu version of torch
#3549 opened May 14, 2025
[Bug] RuntimeError: yaml-cpp: error at line 13, column 30: bad conversion
#3544 opened May 13, 2025
[Bug] Qwen3-235B-A22B-FP8 cannot be loaded using pytorch
#3542 opened May 13, 2025

18 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Gloo communication to turobmind
#3362 commented on May 12, 2025 • 3 new comments
[Feature] metrics support
#3534 commented on May 15, 2025 • 0 new comments
[dlinfer][ascend]set blcoksize default to 128 for 310P device
#3522 commented on May 12, 2025 • 0 new comments
Launch multiple api servers for dp > 1
#3414 commented on May 16, 2025 • 0 new comments
Improve turbomind's prefix cache
#3332 commented on May 12, 2025 • 0 new comments
[Bug] Ascend v0.7.2.post1，对serving api测速，概率性卡死
#3354 commented on May 16, 2025 • 0 new comments
qwen2_5-7b model --- quantization
#3501 commented on May 15, 2025 • 0 new comments
[Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.
#2451 commented on May 15, 2025 • 0 new comments
[Bug] QwQ-32B error: argument tool-call-parser: not allowed with argument--reasoning-parser
#3431 commented on May 14, 2025 • 0 new comments
[Feature] Support `response_format` for `TurboMind`
#2753 commented on May 13, 2025 • 0 new comments
[Bug] Qwen2.5-VL-7B-Instruct-AWQ，Asecnd 910B，Inference BUG
#3521 commented on May 13, 2025 • 0 new comments
[Bug] lora微调internvl2.58B并合并后，无法推理，报错
#3524 commented on May 13, 2025 • 0 new comments
[Feature] turbomind for qwen2.5VL
#3525 commented on May 13, 2025 • 0 new comments
[Bug] 使用lmdeploy部署internvl-38b-mpo-awq，在dify上使用接口的时候报错，是什么原因导致？
#3528 commented on May 13, 2025 • 0 new comments
[Bug] InternVL2.5 78B stuck during inference
#3529 commented on May 13, 2025 • 0 new comments
OOM when use InternVL2_5-1B-MPO
#3143 commented on May 13, 2025 • 0 new comments
[Feature] turbomind后端是否会支持guided_decoding
#2771 commented on May 12, 2025 • 0 new comments
[Bug] Qwen3的enable_thinking参数目前不支持
#3511 commented on May 12, 2025 • 0 new comments

0