8000 [Feature] metrics support by CUHKSZzxy · Pull Request #3534 · InternLM/lmdeploy · GitHub
[go: up one dir, main page]

Skip to content

[Feature] metrics support #3534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 99 commits into from
Jul 9, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
f8b4000
metrics support prototype
CUHKSZzxy May 9, 2025
3e4fca9
Merge branch 'main' into metrics-support
CUHKSZzxy May 9, 2025
02c46ec
Merge branch 'main' into metrics-support
CUHKSZzxy May 12, 2025
9ae6a1b
fix wrong conflict resolve
CUHKSZzxy May 12, 2025
7904d3a
add GPU KV cache usage
CUHKSZzxy May 12, 2025
4a339c8
independent logger for each DP
CUHKSZzxy May 12, 2025
8c3ede1
fix gpu cache usage
CUHKSZzxy May 13, 2025
ddeec2e
Merge branch 'main' into metrics-support
CUHKSZzxy May 13, 2025 8000
9229aa1
rename log stats
CUHKSZzxy May 13, 2025
862a708
fix
CUHKSZzxy May 13, 2025
74dc69a
update perf_counter and comments, some bug fix
CUHKSZzxy May 15, 2025
19d81d4
Merge branch 'main' into metrics-support
CUHKSZzxy May 15, 2025
b87f099
overwrite with main branch
CUHKSZzxy May 22, 2025
d9f8e5a
Merge branch 'main' into metrics-support
CUHKSZzxy May 22, 2025
0168eed
refactor
CUHKSZzxy May 22, 2025
d774cc3
cleanup
CUHKSZzxy May 22, 2025
08200e1
fix
CUHKSZzxy May 22, 2025
a4d0ac9
add runtime cuda prometheus_client
CUHKSZzxy May 22, 2025
150d562
fix
CUHKSZzxy May 23, 2025
1f80a8e
cleanup
CUHKSZzxy May 23, 2025
aed3eea
async log
CUHKSZzxy May 23, 2025
0931746
fix gen throughput calculation
CUHKSZzxy May 26, 2025
57f3f91
update max_model_len
CUHKSZzxy May 26, 2025
4bdf89f
Merge branch 'main' into metrics-support
CUHKSZzxy May 26, 2025
83b7c60
fix running/waiting reqs calculations
CUHKSZzxy May 26, 2025
67366b1
Merge branch 'main' into metrics-support
CUHKSZzxy May 26, 2025
9729f0d
fix pr test
CUHKSZzxy May 27, 2025
9c194ac
fix
CUHKSZzxy May 27, 2025
97ccdf3
fix pr test
CUHKSZzxy May 27, 2025
72d4274
update log level
CUHKSZzxy May 27, 2025
382c500
fix
CUHKSZzxy May 27, 2025
e224bc6
Merge branch 'main' into metrics-support
CUHKSZzxy May 29, 2025
0df0473
update
CUHKSZzxy May 29, 2025
47a07b6
add grafana support
CUHKSZzxy May 30, 2025
c354a7d
fix
CUHKSZzxy May 30, 2025
4bc27e0
update
CUHKSZzxy May 30, 2025
a132cc6
update
CUHKSZzxy May 30, 2025
2c1588d
simplify some logics
CUHKSZzxy May 30, 2025
22a1dc6
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 3, 2025
a59d9c3
fix lint
CUHKSZzxy Jun 3, 2025
d5f1bfe
fix lint
CUHKSZzxy Jun 3, 2025
8738bb2
refactor
CUHKSZzxy Jun 4, 2025
7bbb544
fix module init
CUHKSZzxy Jun 4, 2025
c4f0799
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 4, 2025
f13dae1
fix
CUHKSZzxy Jun 4, 2025
5c92c24
reuse status logger
CUHKSZzxy Jun 4, 2025
7974365
cleanup
CUHKSZzxy Jun 4, 2025
0f66854
rename
CUHKSZzxy Jun 4, 2025
1d19ccc
add docs
CUHKSZzxy Jun 5, 2025
91319c3
update docs
CUHKSZzxy Jun 5, 2025
c976d3d
update docs
CUHKSZzxy Jun 5, 2025
aab614a
update docs
CUHKSZzxy Jun 5, 2025
eb8971b
fix typo
CUHKSZzxy Jun 5, 2025
9e20aa7
decouple prometheus_client
CUHKSZzxy Jun 5, 2025
ec31b12
update docs
CUHKSZzxy Jun 5, 2025
8cee584
change log interval
CUHKSZzxy Jun 5, 2025
3099c8f
mp router
grimoire Jun 8, 2025
2ecadfa
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 9, 2025
b0f2087
minor fix
CUHKSZzxy Jun 9, 2025
1bdacbf
optimize
grimoire Jun 9, 2025
4691d74
better streaming
grimoire Jun 9, 2025
b8b5b3b
optimize streaming
grimoire Jun 9, 2025
a7476c3
Merge branch 'main' into mp-engine
grimoire Jun 9, 2025
362240a
close engine
grimoire Jun 9, 2025
899da60
safe exit
grimoire Jun 9, 2025
d49146a
support pd
grimoire Jun 10, 2025
a1e92a1
merge main
grimoire Jun 12, 2025
0be4fc8
fix loader
grimoire Jun 12, 2025
5f2939e
optimize
grimoire Jun 12, 2025
f428506
Merge branch 'main' into mp-engine
grimoire Jun 12, 2025
3285cc6
safe exit
grimoire Jun 12, 2025
3ff8d28
safe exit
grimoire Jun 12, 2025
8f32f52
refactor
CUHKSZzxy Jun 17, 2025
4838003
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 17, 2025
b92b7cc
clean
CUHKSZzxy Jun 17, 2025
63865bc
fix
CUHKSZzxy Jun 18, 2025
81f6653
optimize
CUHKSZzxy Jun 18, 2025
309880f
optimize
CUHKSZzxy Jun 18, 2025
252d7ed
rename
CUHKSZzxy Jun 18, 2025
e689f3b
remove unused metrics
CUHKSZzxy Jun 19, 2025
e241c30
inplace update
CUHKSZzxy Jun 19, 2025
a678a94
clean
CUHKSZzxy Jun 19, 2025
535fa98
async update
CUHKSZzxy Jun 20, 2025
95fd4a5
update
CUHKSZzxy Jun 20, 2025
4470eb3
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 20, 2025
da61d89
Merge branch 'pr-3627' into metrics-support
CUHKSZzxy Jun 23, 2025
892d5f0
optimize
CUHKSZzxy Jun 25, 2025
002e7cf
Merge branch 'main' into metrics-support
CUHKSZzxy Jun 30, 2025
c121921
fix merge
CUHKSZzxy Jun 30, 2025
5daae5f
refactor for MP engine
CUHKSZzxy Jul 2, 2025
ab8b57a
optimize
CUHKSZzxy Jul 2, 2025
b660df6
Merge branch 'main' into metrics-support
CUHKSZzxy Jul 2, 2025
b7b86e1
fix prometheus, grafana
CUHKSZzxy Jul 2, 2025
149804c
raise exception
CUHKSZzxy Jul 3, 2025
6fd552b
fix docs
CUHKSZzxy Jul 7, 2025
1dcfd64
update DP>1 docs
CUHKSZzxy Jul 8, 2025
1435ac8
minor fix
CUHKSZzxy Jul 8, 2025
5e8b721
cleanup
CUHKSZzxy Jul 8, 2025
dccc223
add comments
CUHKSZzxy Jul 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8000
2 changes: 2 additions & 0 deletions .github/scripts/check_lmdeploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ def check_module_init(root: str):
continue
elif d.startswith('lmdeploy/lib'):
continue
elif d.startswith('lmdeploy/monitoring'):
continue
elif d.startswith('lmdeploy/serve/turbomind/triton_models'):
continue
elif d.startswith('lmdeploy/serve/turbomind/triton_python_backend'):
Expand Down
179 changes: 179 additions & 0 deletions docs/en/advance/metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Production Metrics

LMDeploy exposes a set of metrics via Prometheus, and provides visualization via Grafana.

## Setup Guide

This section describes how to set up the monitoring stack (Prometheus + Grafana) provided in the `lmdeploy/monitoring` directory.

## Prerequisites

- [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) installed

- LMDeploy server running with metrics system enabled

## Usage (DP = 1)

1. **Start your LMDeploy server with metrics enabled**

```
lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics
```

Replace the model path according to your needs.
By default, the metrics endpoint will be available at `http://<lmdeploy_server_host>:23333/metrics`.

2. **Navigate to the monitoring directory**

```
cd lmdeploy/monitoring
```

3. **Start the monitoring stack**

```
docker compose up
```

This command will start Prometheus and Grafana in the background.

4. **Access the monitoring interfaces**

- Prometheus: Open your web browser and go to http://localhost:9090.

- Grafana: Open your web browser and go to http://localhost:3000.

5. **Log in to Grafana**

- Default Username: `admin`

- Default Password: `admin` You will be prompted to change the password upon your first login.

6. **View the Dashboard**

The LMDeploy dashboard is pre-configured and should be available automatically.

## Usage (DP > 1)

1. **Start your LMDeploy server with metrics enabled**

As an example, we use the model `Qwen/Qwen2.5-7B-Instruct` with `DP=2, TP=2`. Start the service as follows:

```bash
# Proxy server
lmdeploy serve proxy --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO

# API server
LMDEPLOY_DP_MASTER_ADDR=127.0.0.1 \
LMDEPLOY_DP_MASTER_PORT=29555 \
lmdeploy serve api_server \
Qwen/Qwen2.5-7B-Instruct \
--backend pytorch \
--tp 2 \
--dp 2 \
--proxy-url http://0.0.0.0:8000 \
--nnodes 1 \
--node-rank 0 \
--enable-metrics
```

You should be able to see multiple API servers added to the proxy server list. Details can be found in `lmdeploy/serve/proxy/proxy_config.json`.

For example, you may have the following API servers:

```
http://$host_ip:$api_server_port1

http://$host_ip:$api_server_port2
```

2. **Modify the Prometheus configuration**

When `DP > 1`, LMDeploy will launch one API server for each DP rank. If you want to monitor a specific API server, e.g. `http://$host_ip:$api_server_port1`, modify the configuration file `lmdeploy/monitoring/prometheus.yaml` as follows.

> Note that you should use the actual host machine IP instead of `127.0.0.1` here, since LMDeploy starts the API server using the actual host IP when `DP > 1`

```
global:
scrape_interval: 5s
evaluation_interval: 30s

scrape_configs:
- job_name: lmdeploy
static_configs:
- targets:
- '$host_ip:$api_server_port1' # <= Modify this
Copy link
Collaborator
@RunningLeon RunningLeon Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to descri F438 be this comment to others. Learn more.

can we config all dp server urls in here and show data in grafana board?

```

3. **Navigate to the monitoring folder and perform the same steps as described above**

## Troubleshooting

1. **Port conflicts**

Check if any services are occupying ports `23333` (LMDeploy server port), `9090` (Prometheus port), or `3000` (Grafana port). You can either stop the conflicting running ports or modify the config files as follows:

- Modify LMDeploy server port for Prometheus scrape

In `lmdeploy/monitoring/prometheus.yaml`

```
global:
scrape_interval: 5s
evaluation_interval: 30s

scrape_configs:
- job_name: lmdeploy
static_configs:
- targets:
- '127.0.0.1:23333' # <= Modify this LMDeploy server port 23333, need to match the running server port
```

- Modify Prometheus port

In `lmdeploy/monitoring/grafana/datasources/datasource.yaml`

```
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090 # <= Modify this Prometheus interface port 9090
isDefault: true
editable: false
```

- Modify Grafana port:

In `lmdeploy/monitoring/docker-compose.yaml`, for example, change the port to `3090`

Option 1: Add `GF_SERVER_HTTP_PORT` to the environment section.

```
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_SERVER_HTTP_PORT=3090 # <= Add this line
```

Option 2: Use port mapping.

```
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3090:3000" # <= Host:Container port mapping
```

2. **No data on the dashboard**

- Create traffic

Try to send some requests to the LMDeploy server to create certain traffic

```
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
```

After refreshing, you should be able to see data on the dashboard.
1 change: 1 addition & 0 deletions docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ Documentation
advance/structed_output.md
advance/pytorch_multinodes.md
advance/pytorch_profiling.md
advance/metrics.md

.. toctree::
:maxdepth: 1
Expand Down
176 changes: 176 additions & 0 deletions docs/zh_cn/advance/metrics.md
EED3
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# 生产环境指标监控

LMDeploy 通过 Prometheus 暴露监控指标,并通过 Grafana 提供可视化界面。

## 配置指南

本节介绍如何设置 `lmdeploy/monitoring` 目录中提供的监控套件(Prometheus + Grafana)

## 前提条件

- 已安装 [Docker](https://docs.docker.com/engine/install/) 和 [Docker Compose](https://docs.docker.com/compose/install/)

- 已启用指标系统的 LMDeploy 服务正在运行

## 使用说明 (DP = 1)

1. **启动已启用指标的 LMDeploy 服务**

```
lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics
```

请根据需求替换模型路径。默认 metrics endpoint 位于 `http://<lmdeploy_server_host>:23333/metrics`。

2. **进入监控目录**

```
cd lmdeploy/monitoring
```

3. **启动监控套件**

```
docker compose up
```

此命令将在后台启动 Prometheus 和 Grafana。

4. **访问监控界面**

- Prometheus:浏览器访问 http://localhost:9090.

- Grafana:浏览器访问 http://localhost:3000.

5. **登录 Grafana**

- 默认用户名:`admin`

- 默认密码:`admin` (首次登录后会提示修改密码)

6. **查看仪表盘**

预配置的 LMDeploy 仪表盘将自动加载。

## 使用说明 (DP > 1)

1. **启动已启用指标的 LMDeploy 服务**

以模型 `Qwen/Qwen2.5-7B-Instruct` 为例,使用 `DP=2,TP=2` 启动服务:

```bash
# Proxy server
lmdeploy serve proxy --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO

# API server
LMDEPLOY_DP_MASTER_ADDR=127.0.0.1 \
LMDEPLOY_DP_MASTER_PORT=29555 \
lmdeploy serve api_server \
Qwen/Qwen2.5-7B-Instruct \
--backend pytorch \
--tp 2 \
--dp 2 \
--proxy-url http://0.0.0.0:8000 \
--nnodes 1 \
--node-rank 0 \
--enable-metrics
```

您应该能在代理服务器列表中看到多个 API 服务实例。详细信息可以在 `lmdeploy/serve/proxy/proxy_config.json` 中找到。

例如,您可能会看到如下 API 服务地址:

```
http://$host_ip:$api_server_port1

http://$host_ip:$api_server_port2
```

2. **修改 Prometheus 配置**

当 DP > 1 时,LMDeploy 会为每个 DP Rank 启动一个 API 服务。如果你想监控其中某个 API 服务,例如:`http://$host_ip:$api_server_port1`,请修改配置文件 `lmdeploy/monitoring/prometheus.yaml` 如下所示。

> 注意:这里应使用实际主机的 IP 地址而非 127.0.0.1,因为当 DP > 1 时,LMDeploy 是通过实际主机 IP 启动 API 服务的。

```
global:
scrape_interval: 5s
evaluation_interval: 30s

scrape_configs:
- job_name: lmdeploy
static_configs:
- targets:
- '$host_ip:$api_server_port1' # <= 修改此处
```

3. **进入监控目录并执行上述相同步骤**

## 故障排除

1. **端口冲突**

检查端口 `23333` (LMDeploy 服务端口)、`9090` (Prometheus 端口) 或 `3000` (Grafana 端口) 是否被占用。解决方案,关闭冲突的端口或如下修改配置文件:

- 修改 Prometheus 抓取的 LMDeploy 服务端口

在 `lmdeploy/monitoring/prometheus.yaml` 中

```
global:
scrape_interval: 5s
evaluation_interval: 30s

scrape_configs:
- job_name: lmdeploy
static_configs:
- targets:
- '127.0.0.1:23333' # <= 修改此处的 LMDeploy 服务端口 23333,需与实际运行端口一致
```

- 修改 Prometheus 端口

在 `lmdeploy/monitoring/grafana/datasources/datasource.yaml` 中

```
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://localhost:9090 # <= 修改此处的 Prometheus 接口端口 9090
isDefault: true
editable: false
```

- 修改 Grafana 端口

在 `lmdeploy/monitoring/docker-compose.yaml` 中操作(例如改为 3090 端口):

方案一:在环境变量中添加 `GF_SERVER_HTTP_PORT`

```
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_SERVER_HTTP_PORT=3090 # <= 添加此行
```

方案二:使用端口映射

```
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3090:3000" # <= 主机端口:容器端口映射
```

- **仪表盘无数据**

尝试向 LMDeploy 服务发送请求生成流量:

```
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
```

刷新后仪表盘应显示数据。
1 change: 1 addition & 0 deletions docs/zh_cn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ LMDeploy 工具箱提供以下核心功能:
advance/structed_output.md
advance/pytorch_multinodes.md
advance/pytorch_profiling.md
advance/metrics.md

.. toctree::
:maxdepth: 1
Expand Down
2 changes: 2 additions & 0 deletions lmdeploy/cli/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ def add_parser_api_server():
ArgumentHelper.ep(pt_group)
ArgumentHelper.enable_microbatch(pt_group)
ArgumentHelper.enable_eplb(pt_group)
ArgumentHelper.enable_metrics(pt_group)
ArgumentHelper.role(pt_group)
ArgumentHelper.migration_backend(pt_group)
# multi-node serving args
Expand Down Expand Up @@ -333,6 +334,7 @@ def api_server(args):
max_prefill_token_num=args.max_prefill_token_num,
enable_microbatch=args.enable_microbatch,
enable_eplb=args.enable_eplb,
enable_metrics=args.enable_metrics,
role=EngineRole[args.role],
migration_backend=MigrationBackend[args.migration_backend],
model_format=args.model_format)
Expand Down
Loading
0