8000 fix supported model list of ascend graph mode (#2669) · InferenceNexus/lmdeploy@f5189ce · GitHub
[go: up one dir, main page]

Skip to content

Commit f5189ce

Browse files
authored
fix supported model list of ascend graph mode (InternLM#2669)
1 parent a41a2a2 commit f5189ce

File tree

3 files changed

+17
-17
lines changed

3 files changed

+17
-17
lines changed

docs/en/get_started/ascend/get_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ For more information about running the Docker client on Ascend devices, please r
4949
## Offline batch inference
5050

5151
> \[!TIP\]
52-
> Graph mode has been supported on Atlas 800T A2. Currently, InternLM2-7B/LLaMa2-7B/Qwen2-7B are tested on graph mode.
52+
> Graph mode has been supported on Atlas 800T A2. Currently, LLaMa3-8B/LLaMa2-7B/Qwen2-7B are tested on graph mode.
5353
> Users can set `eager_mode=False` to enable graph mode, or, set `eager_mode=True` to disable graph mode.
5454
> (Please source `/usr/local/Ascend/nnal/atb/set_env.sh` before enabling graph mode)
5555

docs/zh_cn/get_started/ascend/get_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-
4949
## 离线批处理
5050

5151
> \[!TIP\]
52-
> 图模式已经支持了Atlas 800T A2。目前,单卡下的InternLM2-7B/LLaMa2-7B/Qwen2-7B已经通过测试。用户可以设定`eager_mode=False`来开启图模式,或者设定`eager_mode=True`来关闭图模式。(启动图模式需要事先source `/usr/local/Ascend/nnal/atb/set_env.sh`)
52+
> 图模式已经支持了Atlas 800T A2。目前,单卡下的LLaMa3-8B/LLaMa2-7B/Qwen2-7B已经通过测试。用户可以设定`eager_mode=False`来开启图模式,或者设定`eager_mode=True`来关闭图模式。(启动图模式需要事先source `/usr/local/Ascend/nnal/atb/set_env.sh`)
5353
5454
### LLM 推理
5555

lmdeploy/pytorch/backends/dlinfer/ascend/graph_runner.py

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ def __init__(self, model: torch.nn.Module, model_config: ModelConfig,
2222
super().__init__(model, model_config, cache_config, backend_config,
2323
device)
2424

25+
self.supported_model = ['Llama3-8B', 'Llama2-7B', 'Qwen2-7B']
2526
self.enable_graph = self.check_enable_graph()
2627
if self.enable_graph:
2728
import dlinfer.graph
@@ -44,21 +45,20 @@ def check_enable_graph(self):
4445
"Graph mode of device_type 'ascend' only supports tp=1 "
4546
'for now, fallback to eager mode', RuntimeWarning)
4647
return False
47-
# model support
48-
self.supported_model = {
49-
'Llama2': 'LlamaConfig',
50-
'InternLM2': 'InternLM2Config',
51-
'Qwen2': 'Qwen2Config',
52-
}
53-
is_model_support = True
54-
model_config_name = str(type(self.model_config.hf_config).__name__)
55-
if model_config_name not in self.supported_model.values():
56-
is_model_support = False
57-
if not is_model_support:
58-
warnings.warn(
59-
"Graph mode of device_type 'ascend' only supports models: "
60-
f"{', '.join(self.supported_model.keys())} when tp=1 for now",
61-
RuntimeWarning)
48+
49+
warnings.warn(
50+
'\n\n'
51+
'**********************************************************\n'
52+
' The following models were tested in graph mode of\n'
53+
" device_type 'ascend' when tp=1:\n"
54+
f" {', '.join(self.supported_model)}\n"
55+
' Other LLaMa-like models may work in graph mode, please\n'
56+
' check the result yourself!\n'
57+
' If graph mode does not work correctly with your model,\n'
58+
' please use eager mode instead.\n'
59+
'**********************************************************\n\n',
60+
RuntimeWarning)
61+
6262
return True
6363

6464
def patch_kernels_custom_op(self):

0 commit comments

Comments
 (0)
0