[Feature] metrics support #3534

CUHKSZzxy · 2025-05-09T13:07:32Z

Objective

Align with vLLM v1 metrics system and beyond. Here are several key alignments

Monotonic Timestamps:
-- Uses time.perf_counter() for interval calculations (avoids clock drift issues).
Metric Types:
-- Gauges: Active requests, cache usage, etc
-- Counters: Token totals, request success / failure counts, etc
-- Histograms: TTFT (Time-To-First-Token), TPOT (Inter-Token Latency), end-to-end latency, etc
Metrics Publishing:
-- CLI logging
-- Prometheus & Grafana

We only record critical timestamps and events during the main loop and scheduling without further processing. Heavy-weight metrics calculations or metrics publishing are put inside separate coroutines to reduce the main engine loop overhead.

TODO

Use time.perf_counter()
Refactor to minimize the overhead of async engine generate() or engine _async_loop_main()
~~Expert information collections (deferred in another PR)~~
Grafana visualization
Refactor to reduce messy parameters (may use global context / separate registering module) and further abstractions.
Add user guide documents

Usage

Start the server with --enable-metrics

lmdeploy serve api_server models--Qwen--Qwen2.5-7B-Instruct --enable-metrics

Metrics Publishing - Logging
With --enable-metrics, key metrics (e.g., running / waiting requests, cache usage, token throughput) are printed to the terminal every 5 seconds.
Metrics Publishing - Prometheus & Grafana
-- Raw Metrics
Access the raw Prometheus metrics via http://localhost:23333/metrics/ .
You can also curl the metrics endpoint curl http:///localhost:23333/metrics/ to view raw Prometheus results.

-- Prometheus Panel (WIP, user guide to be added)
Access the Prometheus panel via http://localhost:9090 (9090 is the current default port for Prometheus panel)

-- Grafana Panel (WIP, user guide to be added)
Access the Grafana panel via http://localhost:3000 (3000 is the current default port for the Grafana panel)

Performance Impacts

Conclusion:

No obvious throughput degradation for Qwen-2.5-32B, minor degradation (1~2%) for Qwen-2.5-7B, and notable degradation (15% ~ 20%) for small models like Qwen-2.5-0.5B.

You may check the following figures for details. Benchmark settings: 1000 prompts, input len 1000, output len 1000.

Qwen-2.5-7B (TP1), without the metrics.
Qwen-2.5-7B (TP1), with the metrics.
Qwen-2.5-0.5B (TP1), without the metrics.
Qwen-2.5-0.5B (TP1), with the metrics.

Related Issues & PR

Issue 2638, Issue 2673, PR1423

Conflicts: lmdeploy/messages.py lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/engine_instance.py lmdeploy/pytorch/messages.py lmdeploy/pytorch/paging/scheduler.py

Conflicts: lmdeploy/serve/openai/api_server.py

lvhan028 · 2025-05-29T05:33:54Z

lmdeploy/serve/async_engine.py

@@ -302,6 +303,21 @@ def __init__(self,
        self.internal_thread = _EventLoopThread(daemon=True)
        self.limiter: asyncio.Semaphore = None

+        # build status loggers
+        # independent set for each DP rank, since monototic time differs for each process
+        # each set contains one cli logger and one prometheus logger


It might be cleaner to move the new code into a separate function

lvhan028 · 2025-05-29T07:27:11Z

lmdeploy/pytorch/engine/engine.py

+                # actual running requests
+                num_running_reqs = self.scheduler.num_locked()
+                # waiting to be scheduled or have been scheduled but not yet started execution
+                num_waiting_reqs = self.scheduler.num_waiting() + self.scheduler.num_running()


Does self.sched 8000 uler.num_running() refer to scheduled but not yet started?

yes, discussed and confirmed with yaoq.

lvhan028 · 2025-05-29T07:29:07Z

lmdeploy/pytorch/engine/engine.py

@@ -997,6 +1062,16 @@ def __send_resps(step_outputs: List[InferOutput]):
            await self._await_forward_event(forward_event)
            __send_resps(resps)

+    async def _async_log_stats_task(self, log_que: asyncio.Queue):
+
+        while True:


Can this task be terminated normally?

grimoire · 2025-05-30T06:22:35Z

lmdeploy/pytorch/engine/engine.py

@@ -1149,14 +1227,19 @@ async def async_loop(self):
                forward_event, has_runable_event
9E81
),
                                                   name='MainLoopPreprocessMessage')

+            # log task
+            logger.info('Starting async task MainLoopLogStats.')


do not create this task if metrics is disabled.

lvhan028 · 2025-06-02T08:21:06Z

May merge the main branch to resolve lint errors

CUHKSZzxy added 2 commits May 9, 2025 20:38

metrics support prototype

f8b4000

Merge branch 'main' into metrics-support

3e4fca9

Conflicts: lmdeploy/messages.py lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/engine_instance.py lmdeploy/pytorch/messages.py lmdeploy/pytorch/paging/scheduler.py

CUHKSZzxy added the WIP label May 9, 2025

CUHKSZzxy added 22 commits May 12, 2025 18:01

Merge branch 'main' into metrics-support

02c46ec

Conflicts: lmdeploy/serve/openai/api_server.py

fix wrong conflict resolve

9ae6a1b

add GPU KV cache usage

7904d3a

independent logger for each DP

4a339c8

fix gpu cache usage

8c3ede1

Merge branch 'main' into metrics-support

ddeec2e

rename log stats

9229aa1

fix

862a708

update perf_counter and comments, some bug fix

74dc69a

Merge branch 'main' into metrics-support

19d81d4

overwrite with main branch

b87f099

Merge branch 'main' into metrics-support

d9f8e5a

refactor

0168eed

cleanup

d774cc3

fix

08200e1

add runtime cuda prometheus_client

a4d0ac9

fix

150d562

cleanup

1f80a8e

async log

aed3eea

fix gen throughput calculation

0931746

update max_model_len

57f3f91

Merge branch 'main' into metrics-support

4bdf89f

CUHKSZzxy removed the WIP label May 26, 2025

CUHKSZzxy added 2 commits May 26, 2025 20:29

fix running/waiting reqs calculations

83b7c60

Merge branch 'main' into metrics-support

67366b1

CUHKSZzxy marked this pull request as ready for review May 26, 2025 13:24

fix pr test

9729f0d

CUHKSZzxy added 5 commits May 27, 2025 11:20

fix

9c194ac

fix pr test

97ccdf3

update log level

72d4274

fix

382c500

Merge branch 'main' into metrics-support

e224bc6

lvhan028 requested review from RunningLeon, grimoire and lvhan028 May 29, 2025 05:22

lvhan028 added the enhancement New feature or request label May 29, 2025

lvhan028 reviewed May 29, 2025

View reviewed changes

CUHKSZzxy added 5 commits May 29, 2025 18:54

update

0df0473

add grafana support

47a07b6

fix

c354a7d

update

4bc27e0

update

a132cc6

grimoire reviewed May 30, 2025

View reviewed changes

simplify some logics

2c1588d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] metrics support #3534

[Feature] metrics support #3534

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Feature] metrics support #3534

Are you sure you want to change the base?

[Feature] metrics support #3534

Uh oh!

Conversation

Uh oh!

Objective

TODO

Usage

Performance Impacts

Related Issues & PR

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!