10BC0 Fix adaptive metrics decay when provider metrics are not updated by SURYAS1306 · Pull Request #16048 · apache/dubbo · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@SURYAS1306
Copy link
Contributor

What is the purpose of the change?

This PR fixes an issue in AdaptiveLoadBalance / AdaptiveMetrics where latency decay behaves incorrectly when provider metrics are not updated for a period of time.

Currently, when no new provider metrics arrive, getLoad() may repeatedly apply the penalty branch or aggressively right-shift lastLatency, which can result in stale or extreme values dominating EWMA. This makes adaptive load balancing unstable, especially in low-QPS or intermittent-update scenarios.

This PR ensures that latency decays safely and progressively instead of collapsing or being stuck at penalty values.

Fixes #15810

What is changed?

1. Improved decay logic in AdaptiveMetrics#getLoad()

  • Prevents lastLatency from collapsing to zero.
  • Ensures decay happens smoothly when provider metrics are not refreshed.
  • Avoids repeatedly applying the penalty path when timestamps are equal.

2. Added unit test

Added testAdaptiveMetricsDecayWithoutProviderUpdate

Verifies that when provider metrics are not updated:

  • latency decays over time
  • penalty value is not stuck
  • EWMA continues to evolve

Why is this needed?

Adaptive load balancing relies on EWMA latency to reflect recent performance trends.

Without this fix:

  • old latency values can dominate indefinitely
  • penalty values may be repeatedly re-applied
  • low-traffic services become unfairly weighted

This change makes adaptive load balancing more stable, realistic, and robust under real-world traffic patterns.

Verifying this change

  • Added new unit test covering the decay scenario
  • All tests pass locally:
mvn -pl dubbo-cluster -am test

Checklist

@codecov-commenter
Copy link
codecov-commenter commented Jan 25, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 60.73%. Comparing base (f5d6436) to head (2113910).

Files with missing lines Patch % Lines
...ain/java/org/apache/dubbo/rpc/AdaptiveMetrics.java 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                3.3   #16048      +/-   ##
============================================
- Coverage     60.75%   60.73%   -0.03%     
+ Complexity    11757    11752       -5     
============================================
  Files          1952     1952              
  Lines         89012    89012              
  Branches      13421    13421              
============================================
- Hits          54079    54059      -20     
- Misses        29367    29382      +15     
- Partials       5566     5571       +5     
Flag Coverage Δ
integration-tests-java21 32.19% <0.00%> (+0.01%) ⬆️
integration-tests-java8 32.32% <0.00%> (-0.02%) ⬇️
samples-tests-java21 32.06% <0.00%> (-0.06%) ⬇️
samples-tests-java8 29.71% <0.00%> (-0.01%) ⬇️
unit-tests-java11 59.01% <50.00%> (-0.01%) ⬇️
unit-tests-java17 58.52% <50.00%> (+0.01%) ⬆️
unit-tests-java21 58.51% <50.00%> (-0.02%) ⬇️
unit-tests-java25 58.46% <50.00%> (-0.02%) ⬇️
unit-tests-java8 59.01% <50.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SURYAS1306
Copy link
Contributor Author

Hi maintainers,

This PR fixes the adaptive metrics decay issue when provider metrics are not updated and adds a unit test covering the scenario.

All checks are green. I’d really appreciate your review when you have time.

Thanks!

@zrlw
Copy link
Contributor
zrlw commented Jan 27, 2026

you'd better add a comparison test to ensure applying this PR also does well under high QPS circumstance.

@SURYAS1306
Copy link
Contributor Author

Hi @zrlw , thanks for the suggestion.
That makes sense. I’ll add a comparison test to cover high QPS scenarios and update the PR soon

@SURYAS1306
Copy link
Contributor Author

Hi @zrlw , thanks for the suggestion.
I’ve added a high frequency style test to cover the adaptive metrics decay behavior and verified it by running the dubbo-cluster module tests locally.
All tests are passing now. Appreciate it if you could take another look.


Copy link
Contributor
@zrlw zrlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Questions about Dubbo Adaptive Load Balance

3 participants

0