Releases · InternLM/lmdeploy

@CUHKSZzxy

What's Changed

🚀 Features

[Feature] metrics support by @CUHKSZzxy in #3534
Relax FP8 TP requirement by @lzhangzz in #3697
FA3 by @zhaochaoxing in #3623
support qwen2/2.5-vl in turbomind by @irexyc in #3744
feat: add pytorch_engine_qwen2_5vl_sm120 by @kolmogorov-quyet in #3750
Internvl pt by @RunningLeon in #3765
Improve internvl for turbomind engine by @lvhan028 in #3769

💥 Improvements

Refactor linear by @grimoire in #3653
remove python3.8 support and add python3.13 support by @lvhan028 in #3638
refactor vl inputs split by @grimoire in #3699
[Fix]: Replace mutable default with default_factory for scheduler_stats by @ConvolutedDog in #3730
Fix the logic of calculating max_new_tokens and determining finish_reason by @lvhan028 in #3727
Override HF config.json via CLI by @CUHKSZzxy in #3722
feat(build): Integrate and build turbomind backend directly in setup.py by @windreamer in #3726
Generate the benchmark output filename with given arguments by @lvhan028 in #3740
Make loading llm without vlm as an option by @grimoire in #3745

🐞 Bug fixes

add ray to ascend requirements by @sigma-plus in #3713
fix accessing undefined attribute seq_aux of deepseek-r1-0528 by @lvhan028 in #3728
[Fix]: Avoid quantize qk norm for qwen3 dense models by @taishan1994 in #3733
fix py313 env creation failed when building lmdeploy-builder image by @lvhan028 in #3739
[Fix]: kernel meta retrieval for SM7X does not work by @xiaoajie738 in #3746
limit max_session_len by @grimoire in #3751
fix internvl norm by @grimoire in #3756
support qwen3 moe yarn and vlm hf_overrides by @grimoire in #3757
[PD Disaggregation] fix double unshelf by @JimyMa in #3762
fix(build): fix version parse regex to support post-release versions by @windreamer in #3764
adapt transformers>=v4.52.0 to loading qwen2.5-vl with turbomind by @irexyc in #3771
fix chat template with tool call by @RunningLeon in #3773
fix vl nothink mode by @RunningLeon in #3776

📚 Documentations

update reward model docs by @CUHKSZzxy in #3721

🌐 Other

update twomicrobatch by @SHshenhao in #3651
[CI]: Upgrade to py310 for ut by @RunningLeon in #3718
[ci] update dailytest environment and scripts by @zhulinJulia24 in #3716
Preliminary Blackwell (sm_120a, RTX 50 series) support by @lzhangzz in #3701
[ci] add fp8 evaluation workflow by @zhulinJulia24 in #3729
Add VRAM bandwidth utilization stat to attention test by @lzhangzz in #3731
doc: fix dead links to MindX DL to recover CI. by @windreamer in #3741
fix free cache in MPEngine branch by @JimyMa in #3670
fix: make RelWithDebInfo default cmake build type by @windreamer in #3774
bump version to v0.9.2 by @lvhan028 in #3770

New Contributors

@sigma-plus made their first contribution in #3713
@ConvolutedDog made their first contribution in #3730
@windreamer made their first contribution in #3726
@taishan1994 made their first contribution in #3733
@xiaoajie738 made their first contribution in #3746
@kolmogorov-quyet made their first contribution in #3750

Full Changelog: v0.9.1...v0.9.2

@ywx217

What's Changed

🚀 Features

feature: enable tool_call and reasoning_content parsing for qwen3 by @ywx217 in #3615
Support Mooncake migration backend for PD disaggregation by @Risc-lt in #3620
Support load fused moe weights by @RunningLeon in #3672
Seperate api_server and pytorch engine into different processors by @grimoire in #3627
add reward model api by @CUHKSZzxy in #3665

💥 Improvements

[ascend]import patch at initiazing time by @JackWeiw in #3662
[ascend]use custon transdata in python kernel by @JackWeiw in #3671
move import transformers in patch by @grimoire in #3660
set ray envs by @grimoire in #3643
raise ImportError when enable ep and not install dlblas by @zhaochaoxing in #3636
Reduce sampling memory usage by @lzhangzz in #3666

🐞 Bug fixes

fix dockerfile by @lvhan028 in #3657
Fix top-p only sampling with padded vocab size by @lzhangzz in #3661
fix pt engine stop & cancel by @irexyc in #3681
Fix convert bf16 to numpy by @RunningLeon in #3686
disable torch.compile in cuda graph runner by @grimoire in #3691
fix reward model api by @CUHKSZzxy in #3703

📚 Documentations

add reward model documents by @CUHKSZzxy in #3706

🌐 Other

upgrade torch and triton by @grimoire in #3677
support do_preprocess=False for chat.completions by @irexyc in #3645
[ci] change flash atten installation in pr test by @zhulinJulia24 in #3688
fix profile_throughput.py by @irexyc in #3692
fix profile_generation.py by @irexyc in #3707
update dlblas version in dockerfile by @CUHKSZzxy in #3711
bump version to v0.9.1 by @lvhan028 in #3685

New Contributors

@ywx217 made their first contribution in #3615
@Risc-lt made their first contribution in #3620

Full Changelog: v0.9.0...v0.9.1

@JimyMa

What's Changed

🚀 Features

LMDeploy Distserve by @JimyMa in #3304
allow api server terminated through requests from clients by @RunningLeon in #3533
support update params for pytorch backend from api server by @irexyc in #3535
support eplb for Qwen3-MoE by @zhaochaoxing in #3582
support update params for turbomind backend by @irexyc in #3566
Quantize Qwen3 MoE bf16 model to fp8 model at runtime by @grimoire in #3631
[Feat]: Support internvl3-8b-hf by @RunningLeon in #3633
Add FP8 MoE for turbomind by @lzhangzz in #3601

💥 Improvements

reduce ray memory usage by @grimoire in #3487
use dlblas by @zhaochaoxing in #3489
internlm3 dense fp8 by @CUHKSZzxy in #3527
random pad input ids by @grimoire in #3530
ray nsys profile support by @grimoire in #3448
update blockedfp8 scale name by @CUHKSZzxy in #3532
start engine loop on server startup event by @grimoire in #3523
update two microbatch by @SHshenhao in #3540
[ascend]set transdata dynamic shape true by @JackWeiw in #3531
ray safe exit by @grimoire in #3545
support update params with dp=1 for pytorch engine by @irexyc in #3562
Skip dp dummy input forward by @grimoire in #3552
Unclock mutual exclusivity of argument: tool-call-parser and reasoning-parser by @jingyibo123 in #3550
perform torch.cuda.empty_cache() after conversion by @bltcn in #3570
pipeline warmup by @irexyc in #3548
Launch multiple api servers for dp > 1 by @RunningLeon in #3414
support awq for Qwen2.5-VL by @RunningLeon in #3559
support qwen3 /think & /no_think & enable_thinking parameter by @BUJIDAOVS in #3564
Eplb by @zhaochaoxing in #3572
Update benchmark by @lvhan028 in #3578
block output when prefetch next forward inputs. by @grimoire in #3573
support both eplb and microbatch simultaneously by @zhaochaoxing in #3591
Add log_file and set loglevel in launch_servers by @RunningLeon in #3596
1. add migration flow control by @JimyMa in #3599
sampling on the tokenizer's vocab by @grimoire in #3604
update deepgemm version by @grimoire in #3606
[Ascend] set default distrbuted backend as ray for ascend device by @JackWeiw in #3603
Blocked fp8 tma by @grimoire in #3470
[PDDisaggreagtion] Async migration by @JimyMa in #3610
move dp loop to model agent by @grimoire in #3598
update some logs of proxy_server and pt engine by @lvhan028 in #3621
improve loading model performance by shuffling the weight files by @irexyc in #3625
add benchmark scripts about pipeline api and inference engines according to the config file by @lvhan028 in #3622

🐞 Bug fixes

[ascend] fix recompile on different rank by @jinminxi104 in #3513
fix attention sm86 by @grimoire in #3519
fix stopwords kv cache by @grimoire in #3494
[bug fix] fix PD Disaggregation in DSV3 by @JimyMa in #3547
fix proxy server heart beat by @irexyc in #3543
fix dp>1 tp=1 ep=1 by @grimoire in #3555
fix mixtral on new transformers by @grimoire in #3580
[Fix]: reset step after eviction by @RunningLeon in #3589
fix parsing dynamic rope param failed by @lvhan028 in #3575
Fix batch infer for gemma3vl by @RunningLeon in #3592
Fix symbol error when dlBLAS is not imported by @zhaochaoxing in #3597
read distributed envs by @grimoire in #3600
fix side-effect caused by PR 3590 by @lvhan028 in #3608
fix bug in qwen2 by @LKJacky in #3614
fix awq kernel by @grimoire in #3618
fix flash mla interface by @grimoire in #3617
add sampling_vocab_size by @irexyc in #3607
fix for default quant by @grimoire in #3640
Fix log file env in ray worker by @RunningLeon in #3624
fix qwen3 chat template by @lvhan028 in #3641
fix vlm runtime quant by @grimoire in #3644
Fix 'Namespace' object has no attribute 'num_tokens_per_iter' when serving by gradio by @lvhan028 in #3647
Synchronize weight processing by @lzhangzz in #3649
Fix zero scale in fp8 quantization by @lzhangzz in #3652

🌐 Other

update doc for ascend 300I Duo docker image by @jinminxi104 in #3526
simulate EPLB for benchmark only by @lvhan028 in #3490
[ci] add test workflow for 3090 machine by @zhulinJulia24 in #3561
[ci] fix transformers version in prtest by @zhulinJulia24 in #3584
[Misc] minor api_server and tm loader, and upgrade docformatter to resolve lint error by @lvhan028 in #3590
[ci] add qwen3 models into testcase by @zhulinJulia24 in #3593
update Dockerfile by @CUHKSZzxy in #3634
check in lmdeploy-builder on cuda 12.4 and 12.8 platform by @lvhan028 in #3630
fix blocked fp8 overflow by @grimoire in #3650
Bump version to v0.9.0 by @lvhan028 in #3609

New Contributors

@JimyMa made their first contribution in #3304
@jingyibo123 made their first contribution in #3550
@bltcn made their first contribution in #3570
@BUJIDAOVS made their first contribution in #3564
@LKJacky made their first contribution in #3614

Full Changelog: v0.8.0...v0.9.0

@grimoire

What's Changed

🚀 Features

Torch dp support by @grimoire in #3207
Add deep gemm with tma pre allocated by @AllentDan in #3287
Add mixed DP + TP by @lzhangzz in #3229
Add Qwen3 and Qwen3MoE by @lzhangzz in #3305
[ascend] support multi nodes on ascend device by @tangzhiyi11 in #3260
[Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in #3315
[ascend]support deepseekv2 by @yao-fengchen in #3206
add deepep by @zhaochaoxing in #3313
support ascend w8a8 graph_mode by @yao-fengchen in #3267
support all2all ep by @zhaochaoxing in #3370
optimize ep in decoding stage by @zhaochaoxing in #3383
Warmup deepgemm by @grimoire in #3387
support Llama4 by @grimoire in #3408
add twomicrobatch support by @SHshenhao in #3381
Support phi4 mini by @RunningLeon in #3467
[Dlinfer][Ascend] support 310P by @JackWeiw in #3484
support qwen3 fp8 by @CUHKSZzxy in #3505

💥 Improvements

Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
add env var to control timeout by @CUHKSZzxy in #3291
refactor attn param by @irexyc in #3164
Verbose log by @grimoire in #3329
optimize mla, remove load v by @grimoire in #3334
support dp decoding with cudagraph by @grimoire in #3311
optimize quant-fp8 kernel by @grimoire in #3345
refactor dlinfer rope by @yao-fengchen in #3326
enable qwenvl2.5 graph mode on ascend by @jinminxi104 in #3367
Add AIOHTTP_TIMEOUT env var for proxy server by @AllentDan in #3355
disable sync batch on dp eager mode by @grimoire in #3382
fix for deepgemm update by @grimoire in #3380
Add string before hash tokens in blocktrie by @RunningLeon in #3386
optimize moe get sorted idx by @grimoire in #3356
use half/bf16 lm_head output by @irexyc in #3213
remove ep eager check by @grimoire in #3392
Optimize ascend moe by @yao-fengchen in #3364
optimize fp8 moe kernel by @grimoire in #3419
ray async forward execute by @grimoire in #3443
map internvl3 chat template to builtin chat template internvl2_5 by @lvhan028 in #3450
Refactor turbomind (low-level abstractions) by @lzhangzz in #3423
remove barely used code to improve maintenance by @lvhan028 in #3462
optimize sm80 long context by @grimoire in #3465
move partial_json_parser from ’serve.txt‘ to ‘runtime.txt‘ by @lvhan028 in #3493
support qwen3-dense models awq quantization by @lvhan028 in #3503
Optimize MoE gate for Qwen3 by @lzhangzz in #3500
Pass num_tokens_per_iter and max_prefill_iters params through in lmdeploy serve api_server by @josephrocca in #3504
[Dlinfer][Ascend] Optimize performance of 310P device by @JackWeiw in #3486
optimize longcontext decoding by @grimoire in #3510
Support min_p in openai completions_v1 by @josephrocca in #3506

🐞 Bug fixes

fix activation grid oversize by @grimoire in #3282
Set ensure_ascii=False for tool calling by @AllentDan in #3295
fix sliding window multi chat by @grimoire in #3302
add v check by @grimoire in #3307
Fix Qwen3MoE config parsing by @lzhangzz in #3336
Fix finish reasons by @AllentDan in #3338
remove think_end_token_id in streaming content by @AllentDan in #3327
Fix the finish_reason by @AllentDan in #3350
set cmake policy minimum version as 3.5 by @lvhan028 in #3376
fix dp cudagraph by @grimoire in #3372
fix flashmla eagermode by @grimoire in #3375
close engine after each benchmark-generation iter by @grimoire in #3269
[Fix] fix image_token_id error of qwen2-vl and deepseek by @ao-zz in #3358
fix stopping criteria by @grimoire in #3384
support List[dict] prompt input without do_preprocess by @irexyc in #3385
add rayexecutor release timeout by @grimoire in #3403
fix tensor dispatch in dynamo by @wanfengcxz in #3417
fix linting error by upgrade to ubuntu-latest by @lvhan028 in #3442
fix awq tp for pytorch engine by @RunningLeon in #3435
fix mllm testcase fail by @caikun-pjlab in #3458
remove paged attention autotune by @grimoire in #3452
Remove empty prompts in benchmark scripts by @lvhan028 in #3460
failed to end session properly by @lvhan028 in #3471
fix qwen2.5-vl chat template by @CUHKSZzxy in #3475
Align forward arguments of deepgemm blockedf8 by @RunningLeon in #3474
fix turbomind lib missing to link nccl by exporting nccl path by @lvhan028 in #3479
fix dsvl2 no attr config error by @CUHKSZzxy in #3477
fix flash attention crash on triton3.1.0 by @grimoire in #3478
Fix disorder of ray execution by @RunningLeon in #3481
update dockerfile by @CUHKSZzxy in #3482
fix output logprobs by @irexyc in #3488
Fix Qwen2MoE shared expert gate by @lzhangzz in #3491
fix replicate kv for qwen3-moe by @grimoire in #3499
fix sampling if data overflow after temperature penalty by @irexyc in #3508

📚 Documentations

update qwen2.5-vl-32b docs by @CUHKSZzxy in #3446

🌐 Other

bump version to v0.7.2.post1 by @lvhan028 in #3298
[ci] add think function testcase by @zhulinJulia24 in #3299
merge dev into main by @lvhan028 in #3348
[ci] add vl models into pipeline interface testcase by @zhulinJulia24 in #3374
merge dev to main branch by @lvhan028 in #3378
opt experts memory and permute by @zhaochaoxing in #3390
Revert "opt experts memory and permute" by @zhaochaoxing in #3406
merge dev to main by @lvhan028 in #3400
add Hopper GPU dockerfile by @CUHKSZzxy in #3415
optimize internvit by @caikun-pjlab in #3433
fix stop/bad words by @irexyc in #3492
[ci] testcase bugfix and add more models into testcase by @zhulinJulia24 in #3463
bump version to v0.8.0 by @lvhan028 in #3432

New Contributors

@zhaochaoxing made their first contribution in #3313
@ao-zz made their first contribution in #3358
@wanfengcxz made their first contribution in #34...

@lzhangzz

What's Changed

🚀 Features

Add Qwen3 and Qwen3MoE by @lzhangzz in #3305
[Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in #3315
[ascend]support deepseekv2 by @yao-fengchen in #3206
support ascend w8a8 graph_mode by @yao-fengchen in #3267
support Llama4 by @grimoire in #3408

💥 Improvements

Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
add env var to control timeout by @CUHKSZzxy in #3291
optimize mla, remove load v by @grimoire in #3334
refactor dlinfer rope by @yao-fengchen in #3326
enable qwenvl2.5 graph mode on ascend by @jinminxi104 in #3367
Optimize ascend moe by @yao-fengchen in #3364
find port by @grimoire in #3429

🐞 Bug fixes

fix activation grid oversize by @grimoire in #3282
Set ensure_ascii=False for tool calling by @AllentDan in #3295
add v check by @grimoire in #3307
Fix Qwen3MoE config parsing by @lzhangzz in #3336
Fix finish reasons by @AllentDan in #3338
remove think_end_token_id in streaming content by @AllentDan in #3327
Fix the finish_reason by @AllentDan in #3350
support List[dict] prompt input without do_preprocess by @irexyc in #3385
fix tensor dispatch in dynamo by @wanfengcxz in #3417

📚 Documentations

update ascend doc by @yao-fengchen in #3420

🌐 Other

bump version to v0.7.2.post1 by @lvhan028 in #3298
Optimize internvit by @caikun-pjlab in #3316
bump version to v0.7.3 by @lvhan028 in #3416

New Contributors

@wanfengcxz made their first contribution in #3417
@caikun-pjlab made their first contribution in #3316

Full Changelog: v0.7.2...v0.7.3

@AllentDan

What's Changed

💥 Improvements

Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
add env var to control timeout by @CUHKSZzxy in #3291

🐞 Bug fixes

fix activation grid oversize by @grimoire in #3282
Set ensure_ascii=False for tool calling by @AllentDan in #3295

🌐 Other

bump version to v0.7.2.post1 by @lvhan028 in #3298

Full Changelog: v0.7.2...v0.7.2.post1

@CUHKSZzxy

What's Changed

🚀 Features

[Feature] support qwen2.5-vl for pytorch engine by @CUHKSZzxy in #3194
Support reward models by @lvhan028 in #3192
Add collective communication kernels by @lzhangzz in #3163
PytorchEngine multi-node support v2 by @grimoire in #3147
Add flash mla by @AllentDan in #3218
Add gemma3 implementation by @AllentDan in #3272

💥 Improvements

remove update badwords by @grimoire in #3183
defaullt executor ray by @grimoire in #3210
change ascend&camb default_batch_size to 256 by @jinminxi104 in #3251
Tool reasoning parsers and streaming function call by @AllentDan in #3198
remove torchelastic flag by @grimoire in #3242
disable flashmla warning on sm<90 by @grimoire in #3271

🐞 Bug fixes

Fix missing cli chat option by @lzhangzz in #3209
[ascend] fix multi-card distributed inference failures by @tangzhiyi11 in #3215
fix for small cache-max-entry-count by @grimoire in #3221
[dlinfer] fix glm-4v graph mode on ascend by @jinminxi104 in #3235
fix qwen2.5 pytorch engine dtype error on NPU by @tcye in #3247
[Fix] failed to update the tokenizer's eos_token_id into stop_word list by @lvhan028 in #3257
fix dsv3 gate scaling by @grimoire in #3263
Fix the bug for reading dict error by @GxjGit in #3196
Fix get ppl by @lvhan028 in #3268

📚 Documentations

Specifiy lmdeploy version in benchmark guide by @lyj0309 in #3216
[ascend] add Ascend docker image by @jinminxi104 in #3239

🌐 Other

[ci] testcase refactoring by @zhulinJulia24 in #3151
[ci] add testcase for native communicator by @zhulinJulia24 in #3217
[ci] add volc evaluation testcase by @zhulinJulia24 in #3240
[ci] remove v100 testconfig by @zhulinJulia24 in #3253
add rdma dependencies into docker file by @CUHKSZzxy in #3262
docs: update ascend docs for docker running by @CyCle1024 in #3266
bump version to v0.7.2 by @lvhan028 in #3252

New Contributors

@lyj0309 made their first contribution in #3216
@tcye made their first contribution in #3247

Full Changelog: v0.7.1...v0.7.2

@irexyc

What's Changed

🚀 Features

support release pipeline by @irexyc in #3069
[feature] add dlinfer w8a8 support. by @Reinerzhou in #2988
[maca] support deepseekv2 for maca backend. by @Reinerzhou in #2918
[Feature] support deepseek-vl2 for pytorch engine by @CUHKSZzxy in #3149

💥 Improvements

use weights iterator while loading by @RunningLeon in #2886
Add deepseek-r1 chat template by @AllentDan in #3072
Update tokenizer by @lvhan028 in #3061
Set max concurrent requests by @AllentDan in #2961
remove logitswarper by @grimoire in #3109
Update benchmark script and user guide by @lvhan028 in #3110
support eos_token list in turbomind by @irexyc in #3044
Use aiohttp inside proxy server && add --disable-cache-status argument by @AllentDan in #3020
Update runtime package dependencies by @zgjja in #3142
Make turbomind support embedding inputs on GPU by @chengyuma in #3177

🐞 Bug fixes

[dlinfer] fix ascend qwen2_vl graph_mode by @yao-fengchen in #3045
fix error in interactive api by @lvhan028 in #3074
fix sliding window mgr by @grimoire in #3068
More arguments in api_client, update docstrings by @AllentDan in #3077
Add system role to deepseek chat template by @AllentDan in #3031
Fix xcomposer2d5 by @irexyc in #3087
fix user guide about cogvlm deployment by @lvhan028 in #3088
fix postional argument by @lvhan028 in #3086
Fix UT of deepseek chat template by @lvhan028 in #3125
Fix internvl2.5 error after eviction by @grimoire in #3122
Fix cogvlm and phi3vision by @RunningLeon in #3137
[fix] fix vl gradio, use pipeline api and remove interactive chat by @irexyc in #3136
fix the issue that stop_token may be less than defined in model.py by @irexyc in #3148
fix typing by @lz1998 in #3153
fix min length penalty by @irexyc in #3150
fix default temperature value by @irexyc in #3166
Use pad_token_id as image_token_id for vl models by @RunningLeon in #3158
Fix tool call prompt for InternLM and Qwen by @AllentDan in #3156
Update qwen2.py by @GxjGit in #3174
fix temperature=0 by @grimoire in #3176
fix blocked fp8 moe by @grimoire in #3181
fix deepseekv2 has no attribute use_mla error by @CUHKSZzxy in #3188
fix unstoppable chat by @lvhan028 in #3189

🌐 Other

[ci] add internlm3 into testcase by @zhulinJulia24 in #3038
add internlm3 to supported models by @lvhan028 in #3041
update pre-commit config by @lvhan028 in #2683
[maca] add cudagraph support on maca backend. by @Reinerzhou in #2834
bump version to v0.7.0.post1 by @lvhan028 in #3076
bump version to v0.7.0.post2 by @lvhan028 in #3094
[Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
bump version to v0.7.0.post3 by @lvhan028 in #3115
[ci] fix some fail in daily testcase by @zhulinJulia24 in #3134
Bump version to v0.7.1 by @lvhan028 in #3178

New Contributors

@Lychee-acaca made their first contribution in #3103
@lz1998 made their first contribution in #3153
@GxjGit made their first contribution in #3174
@chengyuma made their first contribution in #3177
@CUHKSZzxy made their first contribution in #3149

Full Changelog: v0.7.0...v0.7.1

@AllentDan

What's Changed

💥 Improvements

Set max concurrent requests by @AllentDan in #2961
remove logitswarper by @grimoire in #3109

🐞 Bug fixes

fix user guide about cogvlm deployment by @lvhan028 in #3088
fix postional argument by @lvhan028 in #3086

🌐 Other

[Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
bump version to v0.7.0.post3 by @lvhan028 in #3115

New Contributors

@Lychee-acaca made their first contribution in #3103

Full Changelog: v0.7.0.post2...v0.7.0.post3

@AllentDan

What's Changed

💥 Improvements

Add deepseek-r1 chat template by @AllentDan in #3072
Update tokenizer by @lvhan028 in #3061

🐞 Bug fixes

Add system role to deepseek chat template by @AllentDan in #3031
Fix xcomposer2d5 by @irexyc in #3087

🌐 Other

bump version to v0.7.0.post2 by @lvhan028 in #3094

Full Changelog: v0.7.0.post1...v0.7.0.post2

Releases: InternLM/lmdeploy

v0.9.2

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.9.1

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.9.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

v0.8.0

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.3

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.2.post1

What's Changed

💥 Improvements

🐞 Bug fixes

🌐 Other

Contributors

Uh oh!

v0.7.2

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

📚 Documentations

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.1

What's Changed

🚀 Features

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Contributors

Uh oh!

v0.7.0.post3

What's Changed

💥 Improvements

🐞 Bug fixes