-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Makes getBackoffSleepMillis in ClusterCommandExecutor nondeterministic #3118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@yangbodong22011 @dengliming Your reviews are always appreciated. @walles @jensgreen You were first to introduce backoff. WDYT about this change? |
@adiamzn Hi, thanks for you PR, we don't have lock contention, just network queuing, so I'm not sure what your PR advantage is? Specifically:
Your PR may only be useful in eliminating traffic spikes, but not in speeding up the time to success. Is that what you're trying to do with this PR? |
@yangbodong22011 |
Can you share some diagrams or specific issues you encountered in production? |
@yangbodong22011 Thanks! |
Codecov Report
@@ Coverage Diff @@
## master #3118 +/- ##
============================================
- Coverage 66.55% 66.55% -0.01%
- Complexity 4386 4387 +1
============================================
Files 243 243
Lines 14225 14226 +1
Branches 851 851
============================================
Hits 9468 9468
- Misses 4387 4389 +2
+ Partials 370 369 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@adiamzn OK, but you can test some data and compare the new algorithm you proposed with the old one to make everyone understand it better? |
@yangbodong22011 |
@adiamzn Thanks, I don't think a new PR is needed, we can just keep working on this PR and keep the stack for others to see. |
@yangbodong22011 @sazzad16 |
@yangbodong22011 @adiamzn I am thinking about going forward with this change. Before saying further, two equations: In original PR where backOff time was added, the time was calculated as A. But in my testing I found that few tests here and there would get timeout exceeded. So I was looking for something definitely smaller than A, and I went with B. Now in this PR, total sleep time is exptected to be half of total calculated backOff time. Contrary to current code where total sleep time is equals to total calculated time. WDYT about considering A again but with this change (jitter) as sleep time is expected to be half of A which is smaller than A. BTW, I have added tests of ClusterCommandExecutor in #3139 |
Currently sleepBackOffMillis is deterministic.
As can be seen by the benchmarks taken in this blog post, this out performed by backoff methods which add Jitter.
In the terminology of the blog post referenced, this PR introduces the "Full Jitter" method, with minimal code changes.