@@ -53,59 +53,60 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
53
53
54
54
## PyTorchEngine on CUDA Platform
55
55
56
- | Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
57
- | :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
58
- | Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
59
- | Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
60
- | Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
61
- | Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
62
- | Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
63
- | Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
64
- | InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
65
- | InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
66
- | InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
67
- | InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes |
68
- | Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
69
- | Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
70
- | ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
71
- | Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
72
- | YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
73
- | Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
74
- | Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
75
- | QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
76
- | QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
77
- | QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
78
- | QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
79
- | Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
80
- | QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
81
- | QWen2.5-VL | 3B - 72B | MLLM | Yes | No | No | No | No |
82
- | DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
83
- | DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
84
- | DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
85
- | DeepSeek-VL2 | 3B - 27B | MLLM | Yes | No | No | No | No |
86
- | MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
87
- | MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
88
- | Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
89
- | Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
90
- | StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
91
- | Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
92
- | Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
93
- | CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
94
- | CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
95
- | LLaVA(1.5,1.6)<sup >\[ 2\] </sup > | 7B-34B | MLLM | No | No | No | No | No |
96
- | InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
97
- | InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - |
98
- | InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - |
99
- | Mono-InternVL<sup >\[ 1\] </sup > | 2B | MLLM | Yes | Yes | Yes | - | - |
100
- | ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
101
- | Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
102
- | Gemma3 | 1B-27B | MLLM | Yes | Yes | Yes | - | - |
103
- | GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
104
- | GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes |
105
- | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
106
- | Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
107
- | Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
108
- | Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
56
+ | Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
57
+ | :----------------------------: | :-------------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
58
+ | Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
59
+ | Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
60
+ | Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
61
+ | Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
62
+ | Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
63
+ | Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
64
+ | Llama4 | Scout, Maverick | MLLM | Yes | Yes | Yes | - | - |
65
+ | InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
66
+ | InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
67
+ | InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
68
+ | InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes |
69
+ | Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
70
+ | Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
71
+ | ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
72
+ | Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
73
+ | YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
74
+ | Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
75
+ | Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
76
+ | QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
77
+ | QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
78
+ | QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
79
+ | QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
80
+ | Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
81
+ | QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
82
+ | QWen2.5-VL | 3B - 72B | MLLM | Yes | No | No | No | No |
83
+ | DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
84
+ | DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
85
+ | DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
86
+ | DeepSeek-VL2 | 3B - 27B | MLLM | Yes | No | No | No | No |
87
+ | MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
88
+ | MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
89
+ | Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
90
+ | Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
91
+ | StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
92
+ | Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
93
+ | Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
94
+ | CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
95
+ | CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
96
+ | LLaVA(1.5,1.6)<sup >\[ 2\] </sup > | 7B-34B | MLLM | No | No | No | No | No |
97
+ | InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
98
+ | InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - |
99
+ | InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - |
100
+ | Mono-InternVL<sup >\[ 1\] </sup > | 2B | MLLM | Yes | Yes | Yes | - | - |
101
+ | ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
102
+ | Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
103
+ | Gemma3 | 1B-27B | MLLM | Yes | Yes | Yes | - | - |
104
+ | GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
105
+ | GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes |
106
+ | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
107
+ | Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
108
+ | Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
109
+ | Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
109
110
110
111
``` {note}
111
112
* [1] Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
0 commit comments