Releases: InternLM/lmdeploy
Releases ยท InternLM/lmdeploy
v0.9.2
What's Changed
๐ Features
- [Feature] metrics support by @CUHKSZzxy in #3534
- Relax FP8 TP requirement by @lzhangzz in #3697
- FA3 by @zhaochaoxing in #3623
- support qwen2/2.5-vl in turbomind by @irexyc in #3744
- feat: add pytorch_engine_qwen2_5vl_sm120 by @kolmogorov-quyet in #3750
- Internvl pt by @RunningLeon in #3765
- Improve internvl for turbomind engine by @lvhan028 in #3769
๐ฅ Improvements
- Refactor linear by @grimoire in #3653
- remove python3.8 support and add python3.13 support by @lvhan028 in #3638
- refactor vl inputs split by @grimoire in #3699
- [Fix]: Replace mutable default with default_factory for scheduler_stats by @ConvolutedDog in #3730
- Fix the logic of calculating max_new_tokens and determining finish_reason by @lvhan028 in #3727
- Override HF config.json via CLI by @CUHKSZzxy in #3722
- feat(build): Integrate and build turbomind backend directly in setup.py by @windreamer in #3726
- Generate the benchmark output filename with given arguments by @lvhan028 in #3740
- Make loading llm without vlm as an option by @grimoire in #3745
๐ Bug fixes
- add ray to ascend requirements by @sigma-plus in #3713
- fix accessing undefined attribute
seq_aux
of deepseek-r1-0528 by @lvhan028 in #3728 - [Fix]: Avoid quantize qk norm for qwen3 dense models by @taishan1994 in #3733
- fix py313 env creation failed when building lmdeploy-builder image by @lvhan028 in #3739
- [Fix]: kernel meta retrieval for SM7X does not work by @xiaoajie738 in #3746
- limit max_session_len by @grimoire in #3751
- fix internvl norm by @grimoire in #3756
- support qwen3 moe yarn and vlm hf_overrides by @grimoire in #3757
- [PD Disaggregation] fix double unshelf by @JimyMa in #3762
- fix(build): fix version parse regex to support post-release versions by @windreamer in #3764
- adapt transformers>=v4.52.0 to loading qwen2.5-vl with turbomind by @irexyc in #3771
- fix chat template with tool call by @RunningLeon in #3773
- fix vl nothink mode by @RunningLeon in #3776
๐ Documentations
- update reward model docs by @CUHKSZzxy in #3721
๐ Other
- update twomicrobatch by @SHshenhao in #3651
- [CI]: Upgrade to py310 for ut by @RunningLeon in #3718
- [ci] update dailytest environment and scripts by @zhulinJulia24 in #3716
- Preliminary Blackwell (sm_120a, RTX 50 series) support by @lzhangzz in #3701
- [ci] add fp8 evaluation workflow by @zhulinJulia24 in #3729
- Add VRAM bandwidth utilization stat to attention test by @lzhangzz in #3731
- doc: fix dead links to MindX DL to recover CI. by @windreamer in #3741
- fix free cache in MPEngine branch by @JimyMa in #3670
- fix: make RelWithDebInfo default cmake build type by @windreamer in #3774
- bump version to v0.9.2 by @lvhan028 in #3770
New Contributors
- @sigma-plus made their first contribution in #3713
- @ConvolutedDog made their first contribution in #3730
- @windreamer made their first contribution in #3726
- @taishan1994 made their first contribution in #3733
- @xiaoajie738 made their first contribution in #3746
- @kolmogorov-quyet made their first contribution in #3750
Full Changelog: v0.9.1...v0.9.2
v0.9.1
What's Changed
๐ Features
- feature: enable tool_call and reasoning_content parsing for qwen3 by @ywx217 in #3615
- Support Mooncake migration backend for PD disaggregation by @Risc-lt in #3620
- Support load fused moe weights by @RunningLeon in #3672
- Seperate api_server and pytorch engine into different processors by @grimoire in #3627
- add reward model api by @CUHKSZzxy in #3665
๐ฅ Improvements
- [ascend]import patch at initiazing time by @JackWeiw in #3662
- [ascend]use custon transdata in python kernel by @JackWeiw in #3671
- move import transformers in patch by @grimoire in #3660
- set ray envs by @grimoire in #3643
- raise ImportError when enable ep and not install dlblas by @zhaochaoxing in #3636
- Reduce sampling memory usage by @lzhangzz in #3666
๐ Bug fixes
- fix dockerfile by @lvhan028 in #3657
- Fix top-p only sampling with padded vocab size by @lzhangzz in #3661
- fix pt engine stop & cancel by @irexyc in #3681
- Fix convert bf16 to numpy by @RunningLeon in #3686
- disable torch.compile in cuda graph runner by @grimoire in #3691
- fix reward model api by @CUHKSZzxy in #3703
๐ Documentations
- add reward model documents by @CUHKSZzxy in #3706
๐ Other
- upgrade torch and triton by @grimoire in #3677
- support do_preprocess=False for chat.completions by @irexyc in #3645
- [ci] change flash atten installation in pr test by @zhulinJulia24 in #3688
- fix profile_throughput.py by @irexyc in #3692
- fix profile_generation.py by @irexyc in #3707
- update dlblas version in dockerfile by @CUHKSZzxy in #3711
- bump version to v0.9.1 by @lvhan028 in #3685
New Contributors
Full Changelog: v0.9.0...v0.9.1
v0.9.0
What's Changed
๐ Features
- LMDeploy Distserve by @JimyMa in #3304
- allow api server terminated through requests from clients by @RunningLeon in #3533
- support update params for pytorch backend from api server by @irexyc in #3535
- support eplb for Qwen3-MoE by @zhaochaoxing in #3582
- support update params for turbomind backend by @irexyc in #3566
- Quantize Qwen3 MoE bf16 model to fp8 model at runtime by @grimoire in #3631
- [Feat]: Support internvl3-8b-hf by @RunningLeon in #3633
- Add FP8 MoE for turbomind by @lzhangzz in #3601
๐ฅ Improvements
- reduce ray memory usage by @grimoire in #3487
- use dlblas by @zhaochaoxing in #3489
- internlm3 dense fp8 by @CUHKSZzxy in #3527
- random pad input ids by @grimoire in #3530
- ray nsys profile support by @grimoire in #3448
- update blockedfp8 scale name by @CUHKSZzxy in #3532
- start engine loop on server startup event by @grimoire in #3523
- update two microbatch by @SHshenhao in #3540
- [ascend]set transdata dynamic shape true by @JackWeiw in #3531
- ray safe exit by @grimoire in #3545
- support update params with dp=1 for pytorch engine by @irexyc in #3562
- Skip dp dummy input forward by @grimoire in #3552
- Unclock mutual exclusivity of argument:
tool-call-parser
andreasoning-parser
by @jingyibo123 in #3550 - perform torch.cuda.empty_cache() after conversion by @bltcn in #3570
- pipeline warmup by @irexyc in #3548
- Launch multiple api servers for dp > 1 by @RunningLeon in #3414
- support awq for Qwen2.5-VL by @RunningLeon in #3559
- support qwen3 /think & /no_think & enable_thinking parameter by @BUJIDAOVS in #3564
- Eplb by @zhaochaoxing in #3572
- Update benchmark by @lvhan028 in #3578
- block output when prefetch next forward inputs. by @grimoire in #3573
- support both eplb and microbatch simultaneously by @zhaochaoxing in #3591
- Add log_file and set loglevel in launch_servers by @RunningLeon in #3596
- sampling on the tokenizer's vocab by @grimoire in #3604
- update deepgemm version by @grimoire in #3606
- [Ascend] set default distrbuted backend as ray for ascend device by @JackWeiw in #3603
- Blocked fp8 tma by @grimoire in #3470
- [PDDisaggreagtion] Async migration by @JimyMa in #3610
- move dp loop to model agent by @grimoire in #3598
- update some logs of proxy_server and pt engine by @lvhan028 in #3621
- improve loading model performance by shuffling the weight files by @irexyc in #3625
- add benchmark scripts about pipeline api and inference engines according to the config file by @lvhan028 in #3622
๐ Bug fixes
- [ascend] fix recompile on different rank by @jinminxi104 in #3513
- fix attention sm86 by @grimoire in #3519
- fix stopwords kv cache by @grimoire in #3494
- [bug fix] fix PD Disaggregation in DSV3 by @JimyMa in #3547
- fix proxy server heart beat by @irexyc in #3543
- fix dp>1 tp=1 ep=1 by @grimoire in #3555
- fix mixtral on new transformers by @grimoire in #3580
- [Fix]: reset step after eviction by @RunningLeon in #3589
- fix parsing dynamic rope param failed by @lvhan028 in #3575
- Fix batch infer for gemma3vl by @RunningLeon in #3592
- Fix symbol error when dlBLAS is not imported by @zhaochaoxing in #3597
- read distributed envs by @grimoire in #3600
- fix side-effect caused by PR 3590 by @lvhan028 in #3608
- fix bug in qwen2 by @LKJacky in #3614
- fix awq kernel by @grimoire in #3618
- fix flash mla interface by @grimoire in #3617
- add sampling_vocab_size by @irexyc in #3607
- fix for default quant by @grimoire in #3640
- Fix log file env in ray worker by @RunningLeon in #3624
- fix qwen3 chat template by @lvhan028 in #3641
- fix vlm runtime quant by @grimoire in #3644
- Fix 'Namespace' object has no attribute 'num_tokens_per_iter' when serving by gradio by @lvhan028 in #3647
- Synchronize weight processing by @lzhangzz in #3649
- Fix zero scale in fp8 quantization by @lzhangzz in #3652
๐ Other
- update doc for ascend 300I Duo docker image by @jinminxi104 in #3526
- simulate EPLB for benchmark only by @lvhan028 in #3490
- [ci] add test workflow for 3090 machine by @zhulinJulia24 in #3561
- [ci] fix transformers version in prtest by @zhulinJulia24 in #3584
- [Misc] minor api_server and tm loader, and upgrade docformatter to resolve lint error by @lvhan028 in #3590
- [ci] add qwen3 models into testcase by @zhulinJulia24 in #3593
- update Dockerfile by @CUHKSZzxy in #3634
- check in lmdeploy-builder on cuda 12.4 and 12.8 platform by @lvhan028 in #3630
- fix blocked fp8 overflow by @grimoire in #3650
- Bump version to v0.9.0 by @lvhan028 in #3609
New Contributors
- @JimyMa made their first contribution in #3304
- @jingyibo123 made their first contribution in #3550
- @bltcn made their first contribution in #3570
- @BUJIDAOVS made their first contribution in #3564
- @LKJacky made their first contribution in #3614
Full Changelog: v0.8.0...v0.9.0
v0.8.0
What's Changed
๐ Features
- Torch dp support by @grimoire in #3207
- Add deep gemm with tma pre allocated by @AllentDan in #3287
- Add mixed DP + TP by @lzhangzz in #3229
- Add Qwen3 and Qwen3MoE by @lzhangzz in #3305
- [ascend] support multi nodes on ascend device by @tangzhiyi11 in #3260
- [Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in #3315
- [ascend]support deepseekv2 by @yao-fengchen in #3206
- add deepep by @zhaochaoxing in #3313
- support ascend w8a8 graph_mode by @yao-fengchen in #3267
- support all2all ep by @zhaochaoxing in #3370
- optimize ep in decoding stage by @zhaochaoxing in #3383
- Warmup deepgemm by @grimoire in #3387
- support Llama4 by @grimoire in #3408
- add twomicrobatch support by @SHshenhao in #3381
- Support phi4 mini by @RunningLeon in #3467
- [Dlinfer][Ascend] support 310P by @JackWeiw in #3484
- support qwen3 fp8 by @CUHKSZzxy in #3505
๐ฅ Improvements
- Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
- add env var to control timeout by @CUHKSZzxy in #3291
- refactor attn param by @irexyc in #3164
- Verbose log by @grimoire in #3329
- optimize mla, remove load
v
by @grimoire in #3334 - support dp decoding with cudagraph by @grimoire in #3311
- optimize quant-fp8 kernel by @grimoire in #3345
- refactor dlinfer rope by @yao-fengchen in #3326
- enable qwenvl2.5 graph mode on ascend by @jinminxi104 in #3367
- Add AIOHTTP_TIMEOUT env var for proxy server by @AllentDan in #3355
- disable sync batch on dp eager mode by @grimoire in #3382
- fix for deepgemm update by @grimoire in #3380
- Add string before hash tokens in blocktrie by @RunningLeon in #3386
- optimize moe get sorted idx by @grimoire in #3356
- use half/bf16 lm_head output by @irexyc in #3213
- remove ep eager check by @grimoire in #3392
- Optimize ascend moe by @yao-fengchen in #3364
- optimize fp8 moe kernel by @grimoire in #3419
- ray async forward execute by @grimoire in #3443
- map internvl3 chat template to builtin chat template internvl2_5 by @lvhan028 in #3450
- Refactor turbomind (low-level abstractions) by @lzhangzz in #3423
- remove barely used code to improve maintenance by @lvhan028 in #3462
- optimize sm80 long context by @grimoire in #3465
- move partial_json_parser from โserve.txtโ to โruntime.txtโ by @lvhan028 in #3493
- support qwen3-dense models awq quantization by @lvhan028 in #3503
- Optimize MoE gate for Qwen3 by @lzhangzz in #3500
- Pass num_tokens_per_iter and max_prefill_iters params through in
lmdeploy serve api_server
by @josephrocca in #3504 - [Dlinfer][Ascend] Optimize performance of 310P device by @JackWeiw in #3486
- optimize longcontext decoding by @grimoire in #3510
- Support min_p in openai completions_v1 by @josephrocca in #3506
๐ Bug fixes
- fix activation grid oversize by @grimoire in #3282
- Set ensure_ascii=False for tool calling by @AllentDan in #3295
- fix sliding window multi chat by @grimoire in #3302
- add
v
check by @grimoire in #3307 - Fix Qwen3MoE config parsing by @lzhangzz in #3336
- Fix finish reasons by @AllentDan in #3338
- remove think_end_token_id in streaming content by @AllentDan in #3327
- Fix the finish_reason by @AllentDan in #3350
- set cmake policy minimum version as 3.5 by @lvhan028 in #3376
- fix dp cudagraph by @grimoire in #3372
- fix flashmla eagermode by @grimoire in #3375
- close engine after each benchmark-generation iter by @grimoire in #3269
- [Fix] fix
image_token_id
error of qwen2-vl and deepseek by @ao-zz in #3358 - fix stopping criteria by @grimoire in #3384
- support List[dict] prompt input without do_preprocess by @irexyc in #3385
- add rayexecutor release timeout by @grimoire in #3403
- fix tensor dispatch in dynamo by @wanfengcxz in #3417
- fix linting error by upgrade to ubuntu-latest by @lvhan028 in #3442
- fix awq tp for pytorch engine by @RunningLeon in #3435
- fix mllm testcase fail by @caikun-pjlab in #3458
- remove paged attention autotune by @grimoire in #3452
- Remove empty prompts in benchmark scripts by @lvhan028 in #3460
- failed to end session properly by @lvhan028 in #3471
- fix qwen2.5-vl chat template by @CUHKSZzxy in #3475
- Align forward arguments of deepgemm blockedf8 by @RunningLeon in #3474
- fix turbomind lib missing to link nccl by exporting nccl path by @lvhan028 in #3479
- fix dsvl2 no attr config error by @CUHKSZzxy in #3477
- fix flash attention crash on triton3.1.0 by @grimoire in #3478
- Fix disorder of ray execution by @RunningLeon in #3481
- update dockerfile by @CUHKSZzxy in #3482
- fix output logprobs by @irexyc in #3488
- Fix Qwen2MoE shared expert gate by @lzhangzz in #3491
- fix replicate kv for qwen3-moe by @grimoire in #3499
- fix sampling if data overflow after temperature penalty by @irexyc in #3508
๐ Documentations
- update qwen2.5-vl-32b docs by @CUHKSZzxy in #3446
๐ Other
- bump version to v0.7.2.post1 by @lvhan028 in #3298
- [ci] add think function testcase by @zhulinJulia24 in #3299
- merge dev into main by @lvhan028 in #3348
- [ci] add vl models into pipeline interface testcase by @zhulinJulia24 in #3374
- merge dev to main branch by @lvhan028 in #3378
- opt experts memory and permute by @zhaochaoxing in #3390
- Revert "opt experts memory and permute" by @zhaochaoxing in #3406
- merge dev to main by @lvhan028 in #3400
- add Hopper GPU dockerfile by @CUHKSZzxy in #3415
- optimize internvit by @caikun-pjlab in #3433
- fix stop/bad words by @irexyc in #3492
- [ci] testcase bugfix and add more models into testcase by @zhulinJulia24 in #3463
- bump version to v0.8.0 by @lvhan028 in #3432
New Contributors
- @zhaochaoxing made their first contribution in #3313
- @ao-zz made their first contribution in #3358
- @wanfengcxz made their first contribution in #34...
v0.7.3
What's Changed
๐ Features
- Add Qwen3 and Qwen3MoE by @lzhangzz in #3305
- [Feature] support qwen3 and qwen3-moe for pytorch engine by @CUHKSZzxy in #3315
- [ascend]support deepseekv2 by @yao-fengchen in #3206
- support ascend w8a8 graph_mode by @yao-fengchen in #3267
- support Llama4 by @grimoire in #3408
๐ฅ Improvements
- Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
- add env var to control timeout by @CUHKSZzxy in #3291
- optimize mla, remove load
v
by @grimoire in #3334 - refactor dlinfer rope by @yao-fengchen in #3326
- enable qwenvl2.5 graph mode on ascend by @jinminxi104 in #3367
- Optimize ascend moe by @yao-fengchen in #3364
- find port by @grimoire in #3429
๐ Bug fixes
- fix activation grid oversize by @grimoire in #3282
- Set ensure_ascii=False for tool calling by @AllentDan in #3295
- add
v
check by @grimoire in #3307 - Fix Qwen3MoE config parsing by @lzhangzz in #3336
- Fix finish reasons by @AllentDan in #3338
- remove think_end_token_id in streaming content by @AllentDan in #3327
- Fix the finish_reason by @AllentDan in #3350
- support List[dict] prompt input without do_preprocess by @irexyc in #3385
- fix tensor dispatch in dynamo by @wanfengcxz in #3417
๐ Documentations
- update ascend doc by @yao-fengchen in #3420
๐ Other
- bump version to v0.7.2.post1 by @lvhan028 in #3298
- Optimize internvit by @caikun-pjlab in #3316
- bump version to v0.7.3 by @lvhan028 in #3416
New Contributors
- @wanfengcxz made their first contribution in #3417
- @caikun-pjlab made their first contribution in #3316
Full Changelog: v0.7.2...v0.7.3
v0.7.2.post1
What's Changed
๐ฅ Improvements
- Add spaces_between_special_tokens to /v1/interactive and make compatible with empty text by @AllentDan in #3283
- add env var to control timeout by @CUHKSZzxy in #3291
๐ Bug fixes
- fix activation grid oversize by @grimoire in #3282
- Set ensure_ascii=False for tool calling by @AllentDan in #3295
๐ Other
Full Changelog: v0.7.2...v0.7.2.post1
v0.7.2
What's Changed
๐ Features
- [Feature] support qwen2.5-vl for pytorch engine by @CUHKSZzxy in #3194
- Support reward models by @lvhan028 in #3192
- Add collective communication kernels by @lzhangzz in #3163
- PytorchEngine multi-node support v2 by @grimoire in #3147
- Add flash mla by @AllentDan in #3218
- Add gemma3 implementation by @AllentDan in #3272
๐ฅ Improvements
- remove update badwords by @grimoire in #3183
- defaullt executor ray by @grimoire in #3210
- change ascend&camb default_batch_size to 256 by @jinminxi104 in #3251
- Tool reasoning parsers and streaming function call by @AllentDan in #3198
- remove torchelastic flag by @grimoire in #3242
- disable flashmla warning on sm<90 by @grimoire in #3271
๐ Bug fixes
- Fix missing cli chat option by @lzhangzz in #3209
- [ascend] fix multi-card distributed inference failures by @tangzhiyi11 in #3215
- fix for small cache-max-entry-count by @grimoire in #3221
- [dlinfer] fix glm-4v graph mode on ascend by @jinminxi104 in #3235
- fix qwen2.5 pytorch engine dtype error on NPU by @tcye in #3247
- [Fix] failed to update the tokenizer's eos_token_id into stop_word list by @lvhan028 in #3257
- fix dsv3 gate scaling by @grimoire in #3263
- Fix the bug for reading dict error by @GxjGit in #3196
- Fix get ppl by @lvhan028 in #3268
๐ Documentations
- Specifiy lmdeploy version in benchmark guide by @lyj0309 in #3216
- [ascend] add Ascend docker image by @jinminxi104 in #3239
๐ Other
- [ci] testcase refactoring by @zhulinJulia24 in #3151
- [ci] add testcase for native communicator by @zhulinJulia24 in #3217
- [ci] add volc evaluation testcase by @zhulinJulia24 in #3240
- [ci] remove v100 testconfig by @zhulinJulia24 in #3253
- add rdma dependencies into docker file by @CUHKSZzxy in #3262
- docs: update ascend docs for docker running by @CyCle1024 in #3266
- bump version to v0.7.2 by @lvhan028 in #3252
New Contributors
Full Changelog: v0.7.1...v0.7.2
v0.7.1
What's Changed
๐ Features
- support release pipeline by @irexyc in #3069
- [feature] add dlinfer w8a8 support. by @Reinerzhou in #2988
- [maca] support deepseekv2 for maca backend. by @Reinerzhou in #2918
- [Feature] support deepseek-vl2 for pytorch engine by @CUHKSZzxy in #3149
๐ฅ Improvements
- use weights iterator while loading by @RunningLeon in #2886
- Add deepseek-r1 chat template by @AllentDan in #3072
- Update tokenizer by @lvhan028 in #3061
- Set max concurrent requests by @AllentDan in #2961
- remove logitswarper by @grimoire in #3109
- Update benchmark script and user guide by @lvhan028 in #3110
- support eos_token list in turbomind by @irexyc in #3044
- Use aiohttp inside proxy server && add --disable-cache-status argument by @AllentDan in #3020
- Update runtime package dependencies by @zgjja in #3142
- Make turbomind support embedding inputs on GPU by @chengyuma in #3177
๐ Bug fixes
- [dlinfer] fix ascend qwen2_vl graph_mode by @yao-fengchen in #3045
- fix error in interactive api by @lvhan028 in #3074
- fix sliding window mgr by @grimoire in #3068
- More arguments in api_client, update docstrings by @AllentDan in #3077
- Add system role to deepseek chat template by @AllentDan in #3031
- Fix xcomposer2d5 by @irexyc in #3087
- fix user guide about cogvlm deployment by @lvhan028 in #3088
- fix postional argument by @lvhan028 in #3086
- Fix UT of deepseek chat template by @lvhan028 in #3125
- Fix internvl2.5 error after eviction by @grimoire in #3122
- Fix cogvlm and phi3vision by @RunningLeon in #3137
- [fix] fix vl gradio, use pipeline api and remove interactive chat by @irexyc in #3136
- fix the issue that stop_token may be less than defined in model.py by @irexyc in #3148
- fix typing by @lz1998 in #3153
- fix min length penalty by @irexyc in #3150
- fix default temperature value by @irexyc in #3166
- Use pad_token_id as image_token_id for vl models by @RunningLeon in #3158
- Fix tool call prompt for InternLM and Qwen by @AllentDan in #3156
- Update qwen2.py by @GxjGit in #3174
- fix temperature=0 by @grimoire in #3176
- fix blocked fp8 moe by @grimoire in #3181
- fix deepseekv2 has no attribute use_mla error by @CUHKSZzxy in #3188
- fix unstoppable chat by @lvhan028 in #3189
๐ Other
- [ci] add internlm3 into testcase by @zhulinJulia24 in #3038
- add internlm3 to supported models by @lvhan028 in #3041
- update pre-commit config by @lvhan028 in #2683
- [maca] add cudagraph support on maca backend. by @Reinerzhou in #2834
- bump version to v0.7.0.post1 by @lvhan028 in #3076
- bump version to v0.7.0.post2 by @lvhan028 in #3094
- [Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
- bump version to v0.7.0.post3 by @lvhan028 in #3115
- [ci] fix some fail in daily testcase by @zhulinJulia24 in #3134
- Bump version to v0.7.1 by @lvhan028 in #3178
New Contributors
- @Lychee-acaca made their first contribution in #3103
- @lz1998 made their first contribution in #3153
- @GxjGit made their first contribution in #3174
- @chengyuma made their first contribution in #3177
- @CUHKSZzxy made their first contribution in #3149
Full Changelog: v0.7.0...v0.7.1
v0.7.0.post3
What's Changed
๐ฅ Improvements
- Set max concurrent requests by @AllentDan in #2961
- remove logitswarper by @grimoire in #3109
๐ Bug fixes
- fix user guide about cogvlm deployment by @lvhan028 in #3088
- fix postional argument by @lvhan028 in #3086
๐ Other
- [Fix] fix the URL judgment problem in Windows by @Lychee-acaca in #3103
- bump version to v0.7.0.post3 by @lvhan028 in #3115
New Contributors
- @Lychee-acaca made their first contribution in #3103
Full Changelog: v0.7.0.post2...v0.7.0.post3
LMDeploy Release V0.7.0.post2
What's Changed
๐ฅ Improvements
- Add deepseek-r1 chat template by @AllentDan in #3072
- Update tokenizer by @lvhan028 in #3061
๐ Bug fixes
- Add system role to deepseek chat template by @AllentDan in #3031
- Fix xcomposer2d5 by @irexyc in #3087
๐ Other
Full Changelog: v0.7.0.post1...v0.7.0.post2