Fix possible overlapping IPs #42432

dperny · 2021-05-27T18:21:43Z

- What I did

Fix a possibility where overlapping IP addresses could exist as a result of the node failing to clean up its old loadbalancer IPs.

Fixes #40989, supercedes #41062.
closes #41062

- How I did it

A node is no longer using its load balancer IP address when it no longer has tasks that use the network that requires that load balancer. When this occurs, the swarmkit manager will free that IP in IPAM, and may reassign it.

When a task shuts down cleanly, it attempts removal of the networks it uses, and if it is the last task using those networks, this removal will succeed:

https://github.com/dperny/docker/blob/abe0a10c6839ce301bc5a0e43bde4633af01f50d/daemon/cluster/executor/container/controller.go#L378-L384

However, this behavior is absent if the container fails. Removal of the network is never attempted.

To address this issue, I amend the executor. Whenever a node load balancer IP is removed or changed, that information is passed to the executor by way of the Configure method. By keeping track of the set of node NetworkAttachments from the previous call to Configure, we can determine which, if any, have been removed or changed.

At first, this seems to create a race, by which a task can be attempting to start and the network removed right out from under it. However, closer inspection of the code shows that this is already addressed in the controller:

https://github.com/dperny/docker/blob/abe0a10c6839ce301bc5a0e43bde4633af01f50d/daemon/cluster/executor/container/controller.go#L207-L225

The controller will attempt again to create the network as long as the network is missing.

- How to verify it

I'm not certain how to write a sane test for this. I've mostly verified it by logic.

- Description for the changelog

Fixed a bug where swarm could have duplicate IPs when the last task on a network on a node failed.

thaJeztah · 2021-05-27T19:56:00Z

@dperny could you update the commit message with your description?

thaJeztah · 2021-05-27T19:59:36Z

daemon/cluster/executor/container/executor.go

Do we need to check if e.nodeObj is not nil here? (I guess it would panic in that case)

Also wondering if there's concurrency here (if a (RW)Mutex is needed for accessing e.nodeObj (no idea, but putting it here as a comment 😅

Good catch on the nil-check. e.nodeObj will be nil on the first call to Configure. Incredibly embarrassed to have missed that.

I've added a comment explaining why nodeObj does not need to be guarded by a mutex.

thaJeztah · 2021-06-08T08:16:55Z

@dperny needs a rebase, now that libnetwork was removed into this repository

thaJeztah · 2021-06-08T08:17:11Z

daemon/cluster/executor/container/executor.go

Suggested change

"github.com/docker/libnetwork"

"github.com/docker/docker/libnetwork"

A node is no longer using its load balancer IP address when it no longer has tasks that use the network that requires that load balancer. When this occurs, the swarmkit manager will free that IP in IPAM, and may reaassign it. When a task shuts down cleanly, it attempts removal of the networks it uses, and if it is the last task using those networks, this removal succeeds, and the load balancer IP is freed. However, this behavior is absent if the container fails. Removal of the networks is never attempted. To address this issue, I amend the executor. Whenever a node load balancer IP is removed or changed, that information is passedd to the executor by way of the Configure method. By keeping track of the set of node NetworkAttachments from the previous call to Configure, we can determine which, if any, have been removed or changed. At first, this seems to create a race, by which a task can be attempting to start and the network is removed right out from under it. However, this is already addressed in the controller. The controller will attempt to recreate missing networks before starting a task. Signed-off-by: Drew Erny <derny@mirantis.com>

dperny · 2021-06-18T14:05:57Z

@thaJeztah I bumped this and changed the libnetwork import path. It should be good to go.

thaJeztah

LGTM

sorry, though I already approved ☺️

thaJeztah · 2021-06-18T15:21:09Z

@cpuguy83 @AkihiroSuda ptal

maricn · 2021-10-04T12:22:44Z

What's the relationship between this PR and #41011?
This PR supercedes #41062 that has a commit with comment "Bring over #41011 to make a integrated fix.". Is that integrated fix also incorporated in this PR and if so should the referenced #41011 and related tickets be closed?

AkihiroSuda added the area/swarm label May 27, 2021

thaJeztah added area/networking Networking status/2-code-review labels May 27, 2021

thaJeztah reviewed May 27, 2021

View reviewed changes

dperny force-pushed the fix-ip-overlap branch from abe0a10 to 838c57a Compare May 28, 2021 15:45

thaJeztah reviewed Jun 8, 2021

View reviewed changes

dperny force-pushed the fix-ip-overlap branch from 838c57a to 0d9b0ed Compare June 9, 2021 17:22

thaJeztah approved these changes Jun 18, 2021

View reviewed changes

AkihiroSuda approved these changes Jun 18, 2021

View reviewed changes

AkihiroSuda merged commit 7729ebf into moby:master Jun 18, 2021

ameyag mentioned this pull request Jun 18, 2021

[20.10 backport] Fix possible overlapping IPs #42538

Merged

thaJeztah added this to the 21.xx milestone Jul 12, 2021

thaJeztah mentioned this pull request Nov 11, 2021

Fix potential IP overlapping with node LB #40997

Closed

corhere mentioned this pull request May 9, 2025

Fix possible overlapping IPs when ingressNA == nil #49948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix possible overlapping IPs #42432

Fix possible overlapping IPs #42432

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	"github.com/docker/libnetwork"
	"github.com/docker/docker/libnetwork"

Fix possible overlapping IPs #42432

Fix possible overlapping IPs #42432

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!