8000 update ascend doc (#3420) · InternLM/lmdeploy@a4c43b4 · GitHub
[go: up one dir, main page]

Skip to content

Commit a4c43b4

Browse files
authored
update ascend doc (#3420)
1 parent e057f52 commit a4c43b4

File tree

4 files changed

+60
-33
lines changed

4 files changed

+60
-33
lines changed

docs/en/get_started/ascend/get_started.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,16 @@ lmdeploy lite auto_awq $HF_MODEL --work-dir $WORK_DIR --device npu
158158

159159
Please check [supported_models](../../supported_models/supported_models.md) before use this feature.
160160

161+
### w8a8 SMOOTH_QUANT
162+
163+
Run the following commands to quantize weights on Atlas 800T A2.
164+
165+
```bash
166+
lmdeploy lite smooth_quant $HF_MODEL --work-dir $WORK_DIR --device npu
167+
```
168+
169+
Please check [supported_models](../../supported_models/supported_models.md) before use this feature.
170+
161171
### int8 KV-cache Quantization
162172

163173
Ascend backend has supported offline int8 KV-cache Quantization on eager mode.

docs/en/supported_models/supported_models.md

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -115,20 +115,23 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
115115

116116
## PyTorchEngine on Huawei Ascend Platform
117117

118-
| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W4A16(eager) |
119-
| :------------: | :------: | :--: | :--------------: | :--------------: | :----------: |
120-
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes |
121-
| Llama3 | 8B | LLM | Yes | Yes | Yes |
122-
| Llama3.1 | 8B | LLM | Yes | Yes | Yes |
123-
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes |
124-
| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes |
125-
| InternLM3 | 8B | LLM | Yes | Yes | Yes |
126-
| Mixtral | 8x7B | LLM | Yes | Yes | No |
127-
| QWen1.5-MoE | A2.7B | LLM | Yes | - | No |
128-
| QWen2(.5) | 7B | LLM | Yes | Yes | No |
129-
| QWen2-MoE | A14.57B | LLM | Yes | - | No |
130-
| DeepSeek-V2 | 16B | LLM | No | Yes | No |
131-
| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes |
132-
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes |
133-
| CogVLM2-chat | 19B | MLLM | Yes | No | - |
134-
| GLM4V | 9B | MLLM | Yes | No | - |
118+
| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W8A8(graph) | W4A16(eager) |
119+
| :------------: | :------: | :--: | :--------------: | :--------------: | :---------: | :----------: |
120+
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
121+
| Llama3 | 8B | LLM | Yes | Yes | Yes | Yes |
122+
| Llama3.1 | 8B | LLM | Yes | Yes | Yes | Yes |
123+
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
124+
| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
125+
| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes |
126+
| Mixtral | 8x7B | LLM | Yes | Yes | No | No |
127+
| QWen1.5-MoE | A2.7B | LLM | Yes | - | No | No |
128+
| QWen2(.5) | 7B | LLM | Yes | Yes | Yes | Yes |
129+
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | - | - |
130+
| QWen2.5-VL | 3B - 72B | MLLM | Yes | Yes | - | - |
131+
| QWen2-MoE | A14.57B | LLM | Yes | - | No | No |
132+
| DeepSeek-V2 | 16B | LLM | No | Yes | No | No |
133+
| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes | Yes |
134+
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | Yes |
135+
| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | Yes | Yes |
136+
| CogVLM2-chat | 19B | MLLM | Yes | No | - | - |
137+
| GLM4V | 9B | MLLM | Yes | No | - | - |

docs/zh_cn/get_started/ascend/get_started.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,16 @@ lmdeploy lite auto_awq $HF_MODEL --work-dir $WORK_DIR --device npu
154154

155155
支持的模型列表请参考[支持的模型](../../supported_models/supported_models.md)
156156

157+
### w8a8 SMOOTH_QUANT
158+
159+
运行下面的代码可以在Atlas 800T A2上对权重进行W8A8量化。
160+
161+
```bash
162+
lmdeploy lite smooth_quant $HF_MODEL --work-dir $WORK_DIR --device npu
163+
```
164+
165+
支持的模型列表请参考[支持的模型](../../supported_models/supported_models.md)
166+
157167
### int8 KV-cache 量化
158168

159169
昇腾后端现在支持了在eager模式下的离线int8 KV-cache量化。

docs/zh_cn/supported_models/supported_models.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -115,19 +115,23 @@
115115

116116
## PyTorchEngine 华为昇腾平台
117117

118-
| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W4A16(eager) |
119-
| :------------: | :------: | :--: | :--------------: | :--------------: | :----------: |
120-
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes |
121-
| Llama3 | 8B | LLM | Yes | Yes | Yes |
122-
| Llama3.1 | 8B | LLM | Yes | Yes | Yes |
123-
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes |
124-
| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes |
125-
| Mixtral | 8x7B | LLM | Yes | Yes | No |
126-
| QWen1.5-MoE | A2.7B | LLM | Yes | - | No |
127-
| QWen2(.5) | 7B | LLM | Yes | Yes | No |
128-
| QWen2-MoE | A14.57B | LLM | Yes | - | No |
129-
| DeepSeek-V2 | 16B | LLM | No | Yes | No |
130-
| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes |
131-
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes |
132-
| CogVLM2-chat | 19B | MLLM | Yes | No | - |
133-
| GLM4V | 9B | MLLM | Yes | No | - |
118+
| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W8A8(graph) | W4A16(eager) |
119+
| :------------: | :------: | :--: | :--------------: | :--------------: | :---------: | :----------: |
120+
| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
121+
| Llama3 | 8B | LLM | Yes | Yes | Yes | Yes |
122+
| Llama3.1 | 8B | LLM | Yes | Yes | Yes | Yes |
123+
| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
124+
| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
125+
| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes |
126+
| Mixtral | 8x7B | LLM | Yes | Yes | No | No |
127+
| QWen1.5-MoE | A2.7B | LLM | Yes | - | No | No |
128+
| QWen2(.5) | 7B | LLM | Yes | Yes | Yes | Yes |
129+
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | - | - |
130+
| QWen2.5-VL | 3B - 72B | MLLM | Yes | Yes | - | - |
131+
| QWen2-MoE | A14.57B | LLM | Yes | - | No | No |
132+
| DeepSeek-V2 | 16B | LLM | No | Yes | No | No |
133+
| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes | Yes |
134+
| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | Yes |
135+
| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | Yes | Yes |
136+
| CogVLM2-chat | 19B | MLLM | Yes | No | - | - |
137+
| GLM4V | 9B | MLLM | Yes | No | - | - |

0 commit comments

Comments
 (0)
0