-
Notifications
You must be signed in to change notification settings - Fork 544
Insights: InternLM/lmdeploy
Overview
Could not load contribution data
Please try again later
16 Pull requests merged by 8 people
-
Skip dp dummy input forward
#3552 merged
May 16, 2025 -
support update params with dp=1 for pytorch engine
#3562 merged
May 15, 2025 -
simulate EPLB for benchmark only
#3490 merged
May 15, 2025 -
fix dp>1 tp=1 ep=1
#3555 merged
May 15, 2025 -
fix proxy server heart beat
#3543 merged
May 15, 2025 -
ray safe exit
#3545 merged
May 15, 2025 -
[bug fix] fix PD Disaggregation in DSV3
#3547 merged
May 15, 2025 -
[ascend]set transdata dynamic shape true
#3531 merged
May 13, 2025 -
support update params for pytorch backend from api server
#3535 merged
May 13, 2025 -
fix stopwords kv cache
#3494 merged
May 12, 2025 -
update two microbatch
#3540 merged
May 12, 2025 -
start engine loop on server startup event
#3523 merged
May 12, 2025 -
update blockedfp8 scale name
#3532 merged
May 12, 2025 -
allow api server terminated through requests from clients
#3533 merged
May 12, 2025 -
ray nsys profile support
#3448 merged
May 12, 2025 -
random pad input ids
#3530 merged
May 12, 2025
8 Pull requests opened by 7 people
-
Refactor engine 2505
#3541 opened
May 12, 2025 -
pipeline warmup
#3548 opened
May 14, 2025 -
Unclock mutual exclusivity of argument: `tool-call-parser` and `reasoning-parser`
#3550 opened
May 14, 2025 -
support awq for Qwen2.5-VL
#3559 opened
May 15, 2025 -
[ci] add test workflow for 3090 machine
#3561 opened
May 15, 2025 -
support qwen3 /think & /no_think & enable_thinking parameter
#3564 opened
May 15, 2025 -
support update params for turbomind backend
#3566 opened
May 16, 2025 -
Update llama.py
#3570 opened
May 17, 2025
7 Issues closed by 5 people
-
[Bug] Unable to execute awq quantization of InternVL3
#3563 closed
May 16, 2025 -
[Feature] 请问是否有计划对批次N进行支持,目前n设置无效,永远是只有1个答案
#3556 closed
May 14, 2025 -
[Bug] qwen3_32b_awq 模型进行 convert 转换后无法正常工作,直接在线转换正常
#3518 closed
May 14, 2025 -
vlm是否支持enable_prefix_caching
#3546 closed
May 13, 2025 -
[Feature] 支持同时启动多个模型
#3536 closed
May 13, 2025 -
[Feature] Will Turbomind support for Expert Parallel?
#3539 closed
May 13, 2025 -
elegant way to set max-pixels of Qwen2.5-VL
#3472 closed
May 12, 2025
12 Issues opened by 10 people
-
[Bug] Stress test memory error
#3571 opened
May 17, 2025 -
[Feature] require support Qwen2.5-Omni-7B
#3569 opened
May 17, 2025 -
[Bug] V100服务启动时卡住没有反应
#3568 opened
May 17, 2025 -
transform部署和lmdeploy部署结果不一致
#3565 opened
May 16, 2025 -
[Feature] 是否有计划将convert逻辑优化成多进程(非多线程,依然是单核跑满),希望能CPU多核跑满
#3558 opened
May 14, 2025 -
[Bug] 使用Turbomind启动服务,当并发过多的时候会导致OOM,然后服务直接自杀了,希望能改进成OOM之前判断是否能处理,不能处理就一个个处理,不要服务直接自杀,非常感谢
#3557 opened
May 14, 2025 -
[Bug] qwen3-awq internal error happened
#3554 opened
May 14, 2025 -
turbomind attention算子接口求助
#3553 opened
May 14, 2025 -
[Bug] 请问lmdeploy支持k8s方式在昇腾机器上的多节点部署吗
#3551 opened
May 14, 2025 -
[Bug] install with pip will reinstall cpu version of torch
#3549 opened
May 14, 2025 -
[Bug] RuntimeError: yaml-cpp: error at line 13, column 30: bad conversion
#3544 opened
May 13, 2025 -
[Bug] Qwen3-235B-A22B-FP8 cannot be loaded using pytorch
#3542 opened
May 13, 2025
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Gloo communication to turobmind
#3362 commented on
May 12, 2025 • 3 new comments -
[Feature] metrics support
#3534 commented on
May 15, 2025 • 0 new comments -
[dlinfer][ascend]set blcoksize default to 128 for 310P device
#3522 commented on
May 12, 2025 • 0 new comments -
Launch multiple api servers for dp > 1
#3414 commented on
May 16, 2025 • 0 new comments -
Improve turbomind's prefix cache
#3332 commented on
May 12, 2025 • 0 new comments -
[Bug] Ascend v0.7.2.post1,对serving api测速,概率性卡死
#3354 commented on
May 16, 2025 • 0 new comments -
qwen2_5-7b model --- quantization
#3501 commented on
May 15, 2025 • 0 new comments -
[Bug] triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 108672, Hardware limit: 101376. Reducing block sizes or `num_stages` may help.
#2451 commented on
May 15, 2025 • 0 new comments -
[Bug] QwQ-32B error: argument tool-call-parser: not allowed with argument--reasoning-parser
#3431 commented on
May 14, 2025 • 0 new comments -
[Feature] Support `response_format` for `TurboMind`
#2753 commented on
May 13, 2025 • 0 new comments -
[Bug] Qwen2.5-VL-7B-Instruct-AWQ,Asecnd 910B,Inference BUG
#3521 commented on
May 13, 2025 • 0 new comments -
[Bug] lora微调internvl2.58B并合并后,无法推理,报错
#3524 commented on
May 13, 2025 • 0 new comments -
[Feature] turbomind for qwen2.5VL
#3525 commented on
May 13, 2025 • 0 new comments -
[Bug] 使用lmdeploy部署internvl-38b-mpo-awq,在dify上使用接口的时候报错,是什么原因导致?
#3528 commented on
May 13, 2025 • 0 new comments -
[Bug] InternVL2.5 78B stuck during inference
#3529 commented on
May 13, 2025 • 0 new comments -
OOM when use InternVL2_5-1B-MPO
#3143 commented on
May 13, 2025 • 0 new comments -
[Feature] turbomind后端是否会支持guided_decoding
#2771 commented on
May 12, 2025 • 0 new comments -
[Bug] Qwen3的enable_thinking参数目前不支持
#3511 commented on
May 12, 2025 • 0 new comments