E543 [20.10 backport] Fix possible overlapping IPs by ameyag · Pull Request #42538 · moby/moby · GitHub
[go: up one dir, main page]

Skip to content

Conversation

ameyag
Copy link
Contributor
@ameyag ameyag commented Jun 18, 2021

🍒 #42432

- What I did

Fix a possibility where overlapping IP addresses could exist as a result of the node failing to clean up its old loadbalancer IPs.

Fixes #40989

A node is no longer using its load balancer IP address when it no longer
has tasks that use the network that requires that load balancer. When
this occurs, the swarmkit manager will free that IP in IPAM, and may
reassign it.

When a task shuts down cleanly, it attempts removal of the networks it
uses, and if it is the last task using those networks, this removal
succeeds, and the load balancer IP is freed.

However, this behavior is absent if the container fails. Removal of the
networks is never attempted.

To address this issue, I amend the executor. Whenever a node load
balancer IP is removed or changed, that information is passed to the
executor by way of the Configure method. By keeping track of the set of
node NetworkAttachments from the previous call to Configure, we can
determine which, if any, have been removed or changed.

At first, this seems to create a race, by which a task can be attempting
to start and the network is removed right out from under it. However,
this is already addressed in the controller. The controller will attempt
to recreate missing networks before starting a task.

Signed-off-by: Drew Erny derny@mirantis.com
(cherry picked from commit 0d9b0ed)
Signed-off-by: Ameya Gawde agawde@mirantis.com

A node is no longer using its load balancer IP address when it no longer
has tasks that use the network that requires that load balancer. When
this occurs, the swarmkit manager will free that IP in IPAM, and may
reaassign it.

When a task shuts down cleanly, it attempts removal of the networks it
uses, and if it is the last task using those networks, this removal
succeeds, and the load balancer IP is freed.

However, this behavior is absent if the container fails. Removal of the
networks is never attempted.

To address this issue, I amend the executor. Whenever a node load
balancer IP is removed or changed, that information is passedd to the
executor by way of the Configure method. By keeping track of the set of
node NetworkAttachments from the previous call to Configure, we can
determine which, if any, have been removed or changed.

At first, this seems to create a race, by which a task can be attempting
to start and the network is removed right out from under it. However,
this is already addressed in the controller. The controller will attempt
to recreate missing networks before starting a task.

Signed-off-by: Drew Erny <derny@mirantis.com>
(cherry picked from commit 0d9b0ed)
Signed-off-by: Ameya Gawde <agawde@mirantis.com>
Copy link
Member
@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

@dperny ptal

@AkihiroSuda AkihiroSuda merged commit 87e28a6 into moby:20.10 Jun 19, 2021
@ameyag ameyag deleted the 20.10-overlap-ip branch July 20, 2023 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0