8000 Fix for DNS name resolution after performing init with --force-new-cluster by kylewuolle · Pull Request #38626 · moby/moby · GitHub
[go: up one dir, main page]

Skip to content

Conversation

kylewuolle
Copy link
Contributor
@kylewuolle kylewuolle commented Jan 23, 2019
  • What I did
    This fixes a problem where the agent on a controller is stopped when a node leaves a swarm and is never restarted. I've added
    a flag to the DaemonJoinsCluster method to indicate the case where a force init is being done. When that flag is set the existing agent is cleaned up
    by setting the cluster provider to nil and waiting for the agent to stop. When the cluster provider is set after, the agent is setup properly. This PR
    fixes the following issue : Docker swarm overlay networking not working after --force-new-cluster docker/for-linux#495

  • How I did it
    Added a flag indicating that this is a force new cluster situation and the agent should be cleaned up before setting the cluster provider.

  • How to verify it

  1. Using the following Docker file build an image on each node called demo.
FROM ubuntu

RUN apt update
RUN apt install dnsutils -y

CMD /bin/bash -c "while true; do nslookup tasks.demo; sleep 2; done"
  1. execute swarm init on one of the nodes
  2. create a network docker network create --scope swarm --driver overlay --attachable test
  3. create a service docker service create --network test --mode global --name demo demo
  4. verify that the tasks.demo endpoint resolves to two ip addresses docker service logs demo
  5. now execute docker swarm init --force-new-cluster on one of the nodes
  6. demote and remove the other node and also, remove the service and network
  7. recreate the service and network on the remaining node
  8. have a third node join the remaining node
  9. Previously, at this point node 3 would resolve tasks.demo to be it's container's ip but the tasks.demo would not resolve on the first node. Also the container on each node could not reach the container on the other node using it's ip. With this fix in place this would work as we expect and each task would be resolved to the respective ips.
  • Description for the changelog
    Fix a problem with DNS resolution after performing a cluster init with the --force-new-cluster option set

  • A picture of a cute animal (not mandatory but encouraged)

…init flag the agent is cleaned up and recreated properly so that agent events are responded to. This was causing some networking issues around DNS resolution after performing a force init on a cluster.

Signed-off-by: Kyle Wuolle <kyle.wuolle@gmail.com>
// When forcing a new cluster, first clean up the existing agent
// ensuring that a new one will be created and started
if(forceNewCluster) {
daemon.setClusterProvider(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see this as not super clean. If the Agent is stopped, I believe libnetwork need to nullify the previous cluster provider automatically. Is there any other case where the clusterProvider is being reused without being set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right there's another way that might be better. I've updated moby/libnetwork#2307. Now instead the cluster provider would be set to nil in agentClose.

@thaJeztah
Copy link
Member

Looks like there's a linting issue;

19:40:46 daemon/daemon.go:1::warning: file is not gofmted with -s (gofmt)
19:40:46 daemon/daemon.go:1::warning: file is not goimported (goimports)
19:40:47 Build step 'Execute shell' marked build as failure

@coolljt0725
Copy link
Contributor

ping @kylewuolle Jenkins failed

@thaJeztah
Copy link
Member
thaJeztah commented Apr 1, 2019

@kylewoule should this one be closed now that moby/libnetwork#2307 was merged (and will be vendored through #38983 ?)

Note that it was already included in Docker 18.09.4 docker-archive#169

@thaJeztah
Copy link
Member

oops, it was actually not yet included in 18.09; cherry-picking now

@thaJeztah
Copy link
Member

ping @kylewuolle is this still needed now that moby/libnetwork#2307 was merged?
/cc @arkodg

@thaJeztah thaJeztah added the kind/bugfix PR's that fix bugs label Oct 10, 2019
@caoyj1991
Copy link

@thaJeztah @kylewuolle Has the fixed merged into 18.09 docker ? and what can i do now?

@kylewuolle kylewuolle closed this Nov 6, 2019
@thaJeztah
Copy link
Member

@caoyj1991 fix was backported in libnetwork through moby/libnetwork#2354, and included in Docker 18.09.6 through docker-archive#201 (should also be in Docker 19.03 and up)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0