[20.10 backport] Fix possible overlapping IPs #42538
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🍒 #42432
- What I did
Fix a possibility where overlapping IP addresses could exist as a result of the node failing to clean up its old loadbalancer IPs.
Fixes #40989
A node is no longer using its load balancer IP address when it no longer
has tasks that use the network that requires that load balancer. When
this occurs, the swarmkit manager will free that IP in IPAM, and may
reassign it.
When a task shuts down cleanly, it attempts removal of the networks it
uses, and if it is the last task using those networks, this removal
succeeds, and the load balancer IP is freed.
However, this behavior is absent if the container fails. Removal of the
networks is never attempted.
To address this issue, I amend the executor. Whenever a node load
balancer IP is removed or changed, that information is passed to the
executor by way of the Configure method. By keeping track of the set of
node NetworkAttachments from the previous call to Configure, we can
determine which, if any, have been removed or changed.
At first, this seems to create a race, by which a task can be attempting
to start and the network is removed right out from under it. However,
this is already addressed in the controller. The controller will attempt
to recreate missing networks before starting a task.
Signed-off-by: Drew Erny derny@mirantis.com
(cherry picked from commit 0d9b0ed)
Signed-off-by: Ameya Gawde agawde@mirantis.com