-
Notifications
You must be signed in to change notification settings - Fork 582
[Feature] metrics support #3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
99 commits
Select commit
Hold shift + click to select a range
f8b4000
metrics support prototype
CUHKSZzxy 3e4fca9
Merge branch 'main' into metrics-support
CUHKSZzxy 02c46ec
Merge branch 'main' into metrics-support
CUHKSZzxy 9ae6a1b
fix wrong conflict resolve
CUHKSZzxy 7904d3a
add GPU KV cache usage
CUHKSZzxy 4a339c8
independent logger for each DP
CUHKSZzxy 8c3ede1
fix gpu cache usage
CUHKSZzxy ddeec2e
Merge branch 'main' into metrics-support
CUHKSZzxy 9229aa1
rename log stats
CUHKSZzxy 862a708
fix
CUHKSZzxy 74dc69a
update perf_counter and comments, some bug fix
CUHKSZzxy 19d81d4
Merge branch 'main' into metrics-support
CUHKSZzxy b87f099
overwrite with main branch
CUHKSZzxy d9f8e5a
Merge branch 'main' into metrics-support
CUHKSZzxy 0168eed
refactor
CUHKSZzxy d774cc3
cleanup
CUHKSZzxy 08200e1
fix
CUHKSZzxy a4d0ac9
add runtime cuda prometheus_client
CUHKSZzxy 150d562
fix
CUHKSZzxy 1f80a8e
cleanup
CUHKSZzxy aed3eea
async log
CUHKSZzxy 0931746
fix gen throughput calculation
CUHKSZzxy 57f3f91
update max_model_len
CUHKSZzxy 4bdf89f
Merge branch 'main' into metrics-support
CUHKSZzxy 83b7c60
fix running/waiting reqs calculations
CUHKSZzxy 67366b1
Merge branch 'main' into metrics-support
CUHKSZzxy 9729f0d
fix pr test
CUHKSZzxy 9c194ac
fix
CUHKSZzxy 97ccdf3
fix pr test
CUHKSZzxy 72d4274
update log level
CUHKSZzxy 382c500
fix
CUHKSZzxy e224bc6
Merge branch 'main' into metrics-support
CUHKSZzxy 0df0473
update
CUHKSZzxy 47a07b6
add grafana support
CUHKSZzxy c354a7d
fix
CUHKSZzxy 4bc27e0
update
CUHKSZzxy a132cc6
update
CUHKSZzxy 2c1588d
simplify some logics
CUHKSZzxy 22a1dc6
Merge branch 'main' into metrics-support
CUHKSZzxy a59d9c3
fix lint
CUHKSZzxy d5f1bfe
fix lint
CUHKSZzxy 8738bb2
refactor
CUHKSZzxy 7bbb544
fix module init
CUHKSZzxy c4f0799
Merge branch 'main' into metrics-support
CUHKSZzxy f13dae1
fix
CUHKSZzxy 5c92c24
reuse status logger
CUHKSZzxy 7974365
cleanup
CUHKSZzxy 0f66854
rename
CUHKSZzxy 1d19ccc
add docs
CUHKSZzxy 91319c3
update docs
CUHKSZzxy c976d3d
update docs
CUHKSZzxy aab614a
update docs
CUHKSZzxy eb8971b
fix typo
CUHKSZzxy 9e20aa7
decouple prometheus_client
CUHKSZzxy ec31b12
update docs
CUHKSZzxy 8cee584
change log interval
CUHKSZzxy 3099c8f
mp router
grimoire 2ecadfa
Merge branch 'main' into metrics-support
CUHKSZzxy b0f2087
minor fix
CUHKSZzxy 1bdacbf
optimize
grimoire 4691d74
better streaming
grimoire b8b5b3b
optimize streaming
grimoire a7476c3
Merge branch 'main' into mp-engine
grimoire 362240a
close engine
grimoire 899da60
safe exit
grimoire d49146a
support pd
grimoire a1e92a1
merge main
grimoire 0be4fc8
fix loader
grimoire 5f2939e
optimize
grimoire f428506
Merge branch 'main' into mp-engine
grimoire 3285cc6
safe exit
grimoire 3ff8d28
safe exit
grimoire 8f32f52
refactor
CUHKSZzxy 4838003
Merge branch 'main' into metrics-support
CUHKSZzxy b92b7cc
clean
CUHKSZzxy 63865bc
fix
CUHKSZzxy 81f6653
optimize
CUHKSZzxy 309880f
optimize
CUHKSZzxy 252d7ed
rename
CUHKSZzxy e689f3b
remove unused metrics
CUHKSZzxy e241c30
inplace update
CUHKSZzxy a678a94
clean
CUHKSZzxy 535fa98
async update
CUHKSZzxy 95fd4a5
update
CUHKSZzxy 4470eb3
Merge branch 'main' into metrics-support
CUHKSZzxy da61d89
Merge branch 'pr-3627' into metrics-support
CUHKSZzxy 892d5f0
optimize
CUHKSZzxy 002e7cf
Merge branch 'main' into metrics-support
CUHKSZzxy c121921
fix merge
CUHKSZzxy 5daae5f
refactor for MP engine
CUHKSZzxy ab8b57a
optimize
CUHKSZzxy b660df6
Merge branch 'main' into metrics-support
CUHKSZzxy b7b86e1
fix prometheus, grafana
CUHKSZzxy 149804c
raise exception
CUHKSZzxy 6fd552b
fix docs
CUHKSZzxy 1dcfd64
update DP>1 docs
CUHKSZzxy 1435ac8
minor fix
CUHKSZzxy 5e8b721
cleanup
CUHKSZzxy dccc223
add comments
CUHKSZzxy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in
628C
an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
# Production Metrics | ||
|
||
LMDeploy exposes a set of metrics via Prometheus, and provides visualization via Grafana. | ||
|
||
## Setup Guide | ||
|
||
This section describes how to set up the monitoring stack (Prometheus + Grafana) provided in the `lmdeploy/monitoring` directory. | ||
|
||
## Prerequisites | ||
|
||
- [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) installed | ||
|
||
- LMDeploy server running with metrics system enabled | ||
|
||
## Usage (DP = 1) | ||
|
||
1. **Start your LMDeploy server with metrics enabled** | ||
|
||
``` | ||
lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics | ||
``` | ||
|
||
Replace the model path according to your needs. | ||
By default, the metrics endpoint will be available at `http://<lmdeploy_server_host>:23333/metrics`. | ||
|
||
2. **Navigate to the monitoring directory** | ||
|
||
``` | ||
cd lmdeploy/monitoring | ||
``` | ||
|
||
3. **Start the monitoring stack** | ||
|
||
``` | ||
docker compose up | ||
``` | ||
|
||
This command will start Prometheus and Grafana in the background. | ||
|
||
4. **Access the monitoring interfaces** | ||
|
||
- Prometheus: Open your web browser and go to http://localhost:9090. | ||
|
||
- Grafana: Open your web browser and go to http://localhost:3000. | ||
|
||
5. **Log in to Grafana** | ||
|
||
- Default Username: `admin` | ||
|
||
- Default Password: `admin` You will be prompted to change the password upon your first login. | ||
|
||
6. **View the Dashboard** | ||
|
||
The LMDeploy dashboard is pre-configured and should be available automatically. | ||
|
||
## Usage (DP > 1) | ||
|
||
1. **Start your LMDeploy server with metrics enabled** | ||
|
||
As an example, we use the model `Qwen/Qwen2.5-7B-Instruct` with `DP=2, TP=2`. Start the service as follows: | ||
|
||
```bash | ||
# Proxy server | ||
lmdeploy serve proxy --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO | ||
|
||
# API server | ||
LMDEPLOY_DP_MASTER_ADDR=127.0.0.1 \ | ||
LMDEPLOY_DP_MASTER_PORT=29555 \ | ||
lmdeploy serve api_server \ | ||
Qwen/Qwen2.5-7B-Instruct \ | ||
--backend pytorch \ | ||
--tp 2 \ | ||
--dp 2 \ | ||
--proxy-url http://0.0.0.0:8000 \ | ||
--nnodes 1 \ | ||
--node-rank 0 \ | ||
--enable-metrics | ||
``` | ||
|
||
You should be able to see multiple API servers added to the proxy server list. Details can be found in `lmdeploy/serve/proxy/proxy_config.json`. | ||
|
||
For example, you may have the following API servers: | ||
|
||
``` | ||
http://$host_ip:$api_server_port1 | ||
|
||
http://$host_ip:$api_server_port2 | ||
``` | ||
|
||
2. **Modify the Prometheus configuration** | ||
|
||
When `DP > 1`, LMDeploy will launch one API server for each DP rank. If you want to monitor a specific API server, e.g. `http://$host_ip:$api_server_port1`, modify the configuration file `lmdeploy/monitoring/prometheus.yaml` as follows. | ||
|
||
> Note that you should use the actual host machine IP instead of `127.0.0.1` here, since LMDeploy starts the API server using the actual host IP when `DP > 1` | ||
|
||
``` | ||
global: | ||
scrape_interval: 5s | ||
evaluation_interval: 30s | ||
|
||
scrape_configs: | ||
- job_name: lmdeploy | ||
static_configs: | ||
- targets: | ||
- '$host_ip:$api_server_port1' # <= Modify this | ||
``` | ||
|
||
3. **Navigate to the monitoring folder and perform the same steps as described above** | ||
|
||
## Troubleshooting | ||
|
||
1. **Port conflicts** | ||
|
||
Check if any services are occupying ports `23333` (LMDeploy server port), `9090` (Prometheus port), or `3000` (Grafana port). You can either stop the conflicting running ports or modify the config files as follows: | ||
|
||
- Modify LMDeploy server port for Prometheus scrape | ||
|
||
In `lmdeploy/monitoring/prometheus.yaml` | ||
|
||
``` | ||
global: | ||
scrape_interval: 5s | ||
evaluation_interval: 30s | ||
|
||
scrape_configs: | ||
- job_name: lmdeploy | ||
static_configs: | ||
- targets: | ||
- '127.0.0.1:23333' # <= Modify this LMDeploy server port 23333, need to match the running server port | ||
``` | ||
|
||
- Modify Prometheus port | ||
|
||
In `lmdeploy/monitoring/grafana/datasources/datasource.yaml` | ||
|
||
``` | ||
apiVersion: 1 | ||
datasources: | ||
- name: Prometheus | ||
type: prometheus | ||
access: proxy | ||
url: http://localhost:9090 # <= Modify this Prometheus interface port 9090 | ||
isDefault: true | ||
editable: false | ||
``` | ||
|
||
- Modify Grafana port: | ||
|
||
In `lmdeploy/monitoring/docker-compose.yaml`, for example, change the port to `3090` | ||
|
||
Option 1: Add `GF_SERVER_HTTP_PORT` to the environment section. | ||
|
||
``` | ||
environment: | ||
- GF_AUTH_ANONYMOUS_ENABLED=true | ||
- GF_SERVER_HTTP_PORT=3090 # <= Add this line | ||
``` | ||
|
||
Option 2: Use port mapping. | ||
|
||
``` | ||
grafana: | ||
image: grafana/grafana:latest | ||
container_name: grafana | ||
ports: | ||
- "3090:3000" # <= Host:Container port mapping | ||
``` | ||
|
||
2. **No data on the dashboard** | ||
|
||
- Create traffic | ||
|
||
Try to send some requests to the LMDeploy server to create certain traffic | ||
|
||
``` | ||
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json | ||
``` | ||
|
||
After refreshing, you should be able to see data on the dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
# 生产环境指标监控 | ||
|
||
LMDeploy 通过 Prometheus 暴露监控指标,并通过 Grafana 提供可视化界面。 | ||
|
||
## 配置指南 | ||
|
||
本节介绍如何设置 `lmdeploy/monitoring` 目录中提供的监控套件(Prometheus + Grafana) | ||
|
||
## 前提条件 | ||
|
||
- 已安装 [Docker](https://docs.docker.com/engine/install/) 和 [Docker Compose](https://docs.docker.com/compose/install/) | ||
|
||
- 已启用指标系统的 LMDeploy 服务正在运行 | ||
|
||
## 使用说明 (DP = 1) | ||
|
||
1. **启动已启用指标的 LMDeploy 服务** | ||
|
||
``` | ||
lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics | ||
RunningLeon marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
请根据需求替换模型路径。默认 metrics endpoint 位于 `http://<lmdeploy_server_host>:23333/metrics`。 | ||
|
||
2. **进入监控目录** | ||
|
||
``` | ||
cd lmdeploy/monitoring | ||
``` | ||
|
||
3. **启动监控套件** | ||
|
||
``` | ||
docker compose up | ||
``` | ||
|
||
此命令将在后台启动 Prometheus 和 Grafana。 | ||
|
||
4. **访问监控界面** | ||
|
||
- Prometheus:浏览器访问 http://localhost:9090. | ||
|
||
- Grafana:浏览器访问 http://localhost:3000. | ||
|
||
5. **登录 Grafana** | ||
|
||
- 默认用户名:`admin` | ||
|
||
- 默认密码:`admin` (首次登录后会提示修改密码) | ||
|
||
6. **查看仪表盘** | ||
|
||
预配置的 LMDeploy 仪表盘将自动加载。 | ||
|
||
## 使用说明 (DP > 1) | ||
|
||
1. **启动已启用指标的 LMDeploy 服务** | ||
|
||
以模型 `Qwen/Qwen2.5-7B-Instruct` 为例,使用 `DP=2,TP=2` 启动服务: | ||
|
||
```bash | ||
# Proxy server | ||
lmdeploy serve proxy --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO | ||
|
||
# API server | ||
LMDEPLOY_DP_MASTER_ADDR=127.0.0.1 \ | ||
LMDEPLOY_DP_MASTER_PORT=29555 \ | ||
lmdeploy serve api_server \ | ||
Qwen/Qwen2.5-7B-Instruct \ | ||
--backend pytorch \ | ||
--tp 2 \ | ||
--dp 2 \ | ||
--proxy-url http://0.0.0.0:8000 \ | ||
--nnodes 1 \ | ||
--node-rank 0 \ | ||
--enable-metrics | ||
``` | ||
|
||
您应该能在代理服务器列表中看到多个 API 服务实例。详细信息可以在 `lmdeploy/serve/proxy/proxy_config.json` 中找到。 | ||
|
||
例如,您可能会看到如下 API 服务地址: | ||
|
||
``` | ||
http://$host_ip:$api_server_port1 | ||
|
||
http://$host_ip:$api_server_port2 | ||
``` | ||
|
||
2. **修改 Prometheus 配置** | ||
|
||
当 DP > 1 时,LMDeploy 会为每个 DP Rank 启动一个 API 服务。如果你想监控其中某个 API 服务,例如:`http://$host_ip:$api_server_port1`,请修改配置文件 `lmdeploy/monitoring/prometheus.yaml` 如下所示。 | ||
|
||
> 注意:这里应使用实际主机的 IP 地址而非 127.0.0.1,因为当 DP > 1 时,LMDeploy 是通过实际主机 IP 启动 API 服务的。 | ||
|
||
``` | ||
global: | ||
scrape_interval: 5s | ||
evaluation_interval: 30s | ||
|
||
scrape_configs: | ||
- job_name: lmdeploy | ||
static_configs: | ||
- targets: | ||
- '$host_ip:$api_server_port1' # <= 修改此处 | ||
``` | ||
|
||
3. **进入监控目录并执行上述相同步骤** | ||
|
||
## 故障排除 | ||
|
||
1. **端口冲突** | ||
|
||
检查端口 `23333` (LMDeploy 服务端口)、`9090` (Prometheus 端口) 或 `3000` (Grafana 端口) 是否被占用。解决方案,关闭冲突的端口或如下修改配置文件: | ||
|
||
- 修改 Prometheus 抓取的 LMDeploy 服务端口 | ||
|
||
在 `lmdeploy/monitoring/prometheus.yaml` 中 | ||
|
||
``` | ||
global: | ||
scrape_interval: 5s | ||
evaluation_interval: 30s | ||
|
||
scrape_configs: | ||
- job_name: lmdeploy | ||
static_configs: | ||
- targets: | ||
- '127.0.0.1:23333' # <= 修改此处的 LMDeploy 服务端口 23333,需与实际运行端口一致 | ||
``` | ||
|
||
- 修改 Prometheus 端口 | ||
|
||
在 `lmdeploy/monitoring/grafana/datasources/datasource.yaml` 中 | ||
|
||
``` | ||
apiVersion: 1 | ||
datasources: | ||
- name: Prometheus | ||
type: prometheus | ||
access: proxy | ||
url: http://localhost:9090 # <= 修改此处的 Prometheus 接口端口 9090 | ||
isDefault: true | ||
editable: false | ||
``` | ||
|
||
- 修改 Grafana 端口 | ||
|
||
在 `lmdeploy/monitoring/docker-compose.yaml` 中操作(例如改为 3090 端口): | ||
|
||
方案一:在环境变量中添加 `GF_SERVER_HTTP_PORT` | ||
|
||
``` | ||
environment: | ||
- GF_AUTH_ANONYMOUS_ENABLED=true | ||
- GF_SERVER_HTTP_PORT=3090 # <= 添加此行 | ||
``` | ||
|
||
方案二:使用端口映射 | ||
|
||
``` | ||
grafana: | ||
image: grafana/grafana:latest | ||
container_name: grafana | ||
ports: | ||
- "3090:3000" # <= 主机端口:容器端口映射 | ||
``` | ||
|
||
- **仪表盘无数据** | ||
|
||
尝试向 LMDeploy 服务发送请求生成流量: | ||
|
||
``` | ||
python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json | ||
``` | ||
|
||
刷新后仪表盘应显示数据。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to descri F438 be this comment to others. Learn more.
can we config all dp server urls in here and show data in grafana board?