InternLM · lvhan028 · Jul 9, 2025 · May 9, 2025 · May 9, 2025 · May 12, 2025
diff --git a/.github/scripts/check_lmdeploy.py b/.github/scripts/check_lmdeploy.py
@@ -18,6 +18,8 @@ def check_module_init(root: str):
             continue
         elif d.startswith('lmdeploy/lib'):
             continue
+        elif d.startswith('lmdeploy/monitoring'):
+            continue
         elif d.startswith('lmdeploy/serve/turbomind/triton_models'):
             continue
         elif d.startswith('lmdeploy/serve/turbomind/triton_python_backend'):

diff --git a/docs/en/advance/metrics.md b/docs/en/advance/metrics.md
@@ -0,0 +1,179 @@
+# Production Metrics
+
+LMDeploy exposes a set of metrics via Prometheus, and provides visualization via Grafana.
+
+## Setup Guide
+
+This section describes how to set up the monitoring stack (Prometheus + Grafana) provided in the `lmdeploy/monitoring` directory.
+
+## Prerequisites
+
+- [Docker](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) installed
+
+- LMDeploy server running with metrics system enabled
+
+## Usage (DP = 1)
+
+1. **Start your LMDeploy server with metrics enabled**
+
+```
+lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics
+```
+
+Replace the model path according to your needs.
+By default, the metrics endpoint will be available at `http://<lmdeploy_server_host>:23333/metrics`.
+
+2. **Navigate to the monitoring directory**
+
+```
+cd lmdeploy/monitoring
+```
+
+3. **Start the monitoring stack**
+
+```
+docker compose up
+```
+
+This command will start Prometheus and Grafana in the background.
+
+4. **Access the monitoring interfaces**
+
+- Prometheus: Open your web browser and go to http://localhost:9090.
+
+- Grafana: Open your web browser and go to http://localhost:3000.
+
+5. **Log in to Grafana**
+
+- Default Username: `admin`
+
+- Default Password: `admin` You will be prompted to change the password upon your first login.
+
+6. **View the Dashboard**
+
+The LMDeploy dashboard is pre-configured and should be available automatically.
+
+## Usage (DP > 1)
+
+1. **Start your LMDeploy server with metrics enabled**
+
+As an example, we use the model `Qwen/Qwen2.5-7B-Instruct` with `DP=2, TP=2`. Start the service as follows:
+
+```bash
+# Proxy server
+lmdeploy serve proxy --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO
+
+# API server
+LMDEPLOY_DP_MASTER_ADDR=127.0.0.1 \
+LMDEPLOY_DP_MASTER_PORT=29555 \
+lmdeploy serve api_server \
+    Qwen/Qwen2.5-7B-Instruct \
+    --backend pytorch \
+    --tp 2 \
+    --dp 2 \
+    --proxy-url http://0.0.0.0:8000 \
+    --nnodes 1 \
+    --node-rank 0 \
+    --enable-metrics
+```
+
+You should be able to see multiple API servers added to the proxy server list. Details can be found in `lmdeploy/serve/proxy/proxy_config.json`.
+
+For example, you may have the following API servers:
+
+```
+http://$host_ip:$api_server_port1
+
+http://$host_ip:$api_server_port2
+```
+
+2. **Modify the Prometheus configuration**
+
+When `DP > 1`, LMDeploy will launch one API server for each DP rank. If you want to monitor a specific API server, e.g. `http://$host_ip:$api_server_port1`, modify the configuration file `lmdeploy/monitoring/prometheus.yaml` as follows.
+
+> Note that you should use the actual host machine IP instead of `127.0.0.1` here, since LMDeploy starts the API server using the actual host IP when `DP > 1`
+
+```
+global:
+  scrape_interval: 5s
+  evaluation_interval: 30s
+
+scrape_configs:
+  - job_name: lmdeploy
+    static_configs:
+      - targets:
+          - '$host_ip:$api_server_port1' # <= Modify this
+```
+
+3. **Navigate to the monitoring folder and perform the same steps as described above**
+
+## Troubleshooting
+
+1. **Port conflicts**
+
+Check if any services are occupying ports `23333` (LMDeploy server port), `9090` (Prometheus port), or `3000` (Grafana port). You can either stop the conflicting running ports or modify the config files as follows:
+
+- Modify LMDeploy server port for Prometheus scrape
+
+In `lmdeploy/monitoring/prometheus.yaml`
+
+```
+global:
+  scrape_interval: 5s
+  evaluation_interval: 30s
+
+scrape_configs:
+  - job_name: lmdeploy
+    static_configs:
+      - targets:
+          - '127.0.0.1:23333' # <= Modify this LMDeploy server port 23333, need to match the running server port
+```
+
+- Modify Prometheus port
+
+In `lmdeploy/monitoring/grafana/datasources/datasource.yaml`
+
+```
+apiVersion: 1
+datasources:
+  - name: Prometheus
+    type: prometheus
+    access: proxy
+    url: http://localhost:9090 # <= Modify this Prometheus interface port 9090
+    isDefault: true
+    editable: false
+```
+
+- Modify Grafana port:
+
+In `lmdeploy/monitoring/docker-compose.yaml`, for example, change the port to `3090`
+
+Option 1: Add `GF_SERVER_HTTP_PORT` to the environment section.
+
+```
+  environment:
+- GF_AUTH_ANONYMOUS_ENABLED=true
+- GF_SERVER_HTTP_PORT=3090  # <= Add this line
+```
+
+Option 2: Use port mapping.
+
+```
+grafana:
+  image: grafana/grafana:latest
+  container_name: grafana
+  ports:
+  - "3090:3000"  # <= Host:Container port mapping
+```
+
+2. **No data on the dashboard**
+
+- Create traffic
+
+Try to send some requests to the LMDeploy server to create certain traffic
+
+```
+python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
+```
+
+After refreshing, you should be able to see data on the dashboard.
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -104,6 +104,7 @@ Documentation
    advance/structed_output.md
    advance/pytorch_multinodes.md
    advance/pytorch_profiling.md
+   advance/metrics.md
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/zh_cn/advance/metrics.md b/docs/zh_cn/advance/metrics.md
@@ -0,0 +1,176 @@
+# 生产环境指标监控
+
+LMDeploy 通过 Prometheus 暴露监控指标，并通过 Grafana 提供可视化界面。
+
+## 配置指南
+
+本节介绍如何设置 `lmdeploy/monitoring` 目录中提供的监控套件（Prometheus + Grafana）
+
+## 前提条件
+
+- 已安装 [Docker](https://docs.docker.com/engine/install/) 和 [Docker Compose](https://docs.docker.com/compose/install/)
+
+- 已启用指标系统的 LMDeploy 服务正在运行
+
+## 使用说明 (DP = 1)
+
+1. **启动已启用指标的 LMDeploy 服务**
+
+```
+lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct --enable-metrics
+```
+
+请根据需求替换模型路径。默认 metrics endpoint 位于 `http://<lmdeploy_server_host>:23333/metrics`。
+
+2. **进入监控目录**
+
+```
+cd lmdeploy/monitoring
+```
+
+3. **启动监控套件**
+
+```
+docker compose up
+```
+
+此命令将在后台启动 Prometheus 和 Grafana。
+
+4. **访问监控界面**
+
+- Prometheus：浏览器访问 http://localhost:9090.
+
+- Grafana：浏览器访问 http://localhost:3000.
+
+5. **登录 Grafana**
+
+- 默认用户名：`admin`
+
+- 默认密码：`admin` （首次登录后会提示修改密码）
+
+6. **查看仪表盘**
+
+预配置的 LMDeploy 仪表盘将自动加载。
+
+## 使用说明 (DP > 1)
+
+1. **启动已启用指标的 LMDeploy 服务**
+
+以模型 `Qwen/Qwen2.5-7B-Instruct` 为例，使用 `DP=2，TP=2` 启动服务：
+
+```bash
+# Proxy server
+lmdeploy serve proxy --server-port 8000 --routing-strategy 'min_expected_latency' --serving-strategy Hybrid --log-level INFO
+
+# API server
+LMDEPLOY_DP_MASTER_ADDR=127.0.0.1 \
+LMDEPLOY_DP_MASTER_PORT=29555 \
+lmdeploy serve api_server \
+    Qwen/Qwen2.5-7B-Instruct \
+    --backend pytorch \
+    --tp 2 \
+    --dp 2 \
+    --proxy-url http://0.0.0.0:8000 \
+    --nnodes 1 \
+    --node-rank 0 \
+    --enable-metrics
+```
+
+您应该能在代理服务器列表中看到多个 API 服务实例。详细信息可以在 `lmdeploy/serve/proxy/proxy_config.json` 中找到。
+
+例如，您可能会看到如下 API 服务地址：
+
+```
+http://$host_ip:$api_server_port1
+
+http://$host_ip:$api_server_port2
+```
+
+2. **修改 Prometheus 配置**
+
+当 DP > 1 时，LMDeploy 会为每个 DP Rank 启动一个 API 服务。如果你想监控其中某个 API 服务，例如：`http://$host_ip:$api_server_port1`，请修改配置文件 `lmdeploy/monitoring/prometheus.yaml` 如下所示。
+
+> 注意：这里应使用实际主机的 IP 地址而非 127.0.0.1，因为当 DP > 1 时，LMDeploy 是通过实际主机 IP 启动 API 服务的。
+
+```
+global:
+  scrape_interval: 5s
+  evaluation_interval: 30s
+
+scrape_configs:
+  - job_name: lmdeploy
+    static_configs:
+      - targets:
+          - '$host_ip:$api_server_port1' # <= 修改此处
+```
+
+3. **进入监控目录并执行上述相同步骤**
+
+## 故障排除
+
+1. **端口冲突**
+
+检查端口 `23333` (LMDeploy 服务端口)、`9090` (Prometheus 端口) 或 `3000` (Grafana 端口) 是否被占用。解决方案，关闭冲突的端口或如下修改配置文件：
+
+- 修改 Prometheus 抓取的 LMDeploy 服务端口
+
+在 `lmdeploy/monitoring/prometheus.yaml` 中
+
+```
+global:
+  scrape_interval: 5s
+  evaluation_interval: 30s
+
+scrape_configs:
+  - job_name: lmdeploy
+    static_configs:
+      - targets:
+          - '127.0.0.1:23333' # <= 修改此处的 LMDeploy 服务端口 23333，需与实际运行端口一致
+```
+
+- 修改 Prometheus 端口
+
+在 `lmdeploy/monitoring/grafana/datasources/datasource.yaml` 中
+
+```
+apiVersion: 1
+datasources:
+  - name: Prometheus
+    type: prometheus
+    access: proxy
+    url: http://localhost:9090 # <= 修改此处的 Prometheus 接口端口 9090
+    isDefault: true
+    editable: false
+```
+
+- 修改 Grafana 端口
+
+在 `lmdeploy/monitoring/docker-compose.yaml` 中操作（例如改为 3090 端口）:
+
+方案一：在环境变量中添加 `GF_SERVER_HTTP_PORT`
+
+```
+  environment:
+- GF_AUTH_ANONYMOUS_ENABLED=true
+- GF_SERVER_HTTP_PORT=3090  # <= 添加此行
+```
+
+方案二：使用端口映射
+
+```
+grafana:
+  image: grafana/grafana:latest
+  container_name: grafana
+  ports:
+  - "3090:3000"  # <= 主机端口:容器端口映射
+```
+
+- **仪表盘无数据**
+
+尝试向 LMDeploy 服务发送请求生成流量：
+
+```
+python3 benchmark/profile_restful_api.py --backend lmdeploy --num-prompts 5000 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
+```
+
+刷新后仪表盘应显示数据。
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -105,6 +105,7 @@ LMDeploy 工具箱提供以下核心功能：
    advance/structed_output.md
    advance/pytorch_multinodes.md
    advance/pytorch_profiling.md
+   advance/metrics.md
 
 .. toctree::
    :maxdepth: 1

diff --git a/lmdeploy/cli/serve.py b/lmdeploy/cli/serve.py
@@ -169,6 +169,7 @@ def add_parser_api_server():
         ArgumentHelper.ep(pt_group)
         ArgumentHelper.enable_microbatch(pt_group)
         ArgumentHelper.enable_eplb(pt_group)
+        ArgumentHelper.enable_metrics(pt_group)
         ArgumentHelper.role(pt_group)
         ArgumentHelper.migration_backend(pt_group)
         # multi-node serving args
@@ -333,6 +334,7 @@ def api_server(args):
                                                  max_prefill_token_num=args.max_prefill_token_num,
                                                  enable_microbatch=args.enable_microbatch,
                                                  enable_eplb=args.enable_eplb,
+                                                 enable_metrics=args.enable_metrics,
                                                  role=EngineRole[args.role],
                                                  migration_backend=MigrationBackend[args.migration_backend],
                                                  model_format=args.model_format)