-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Fix for DNS name resolution after performing init with --force-new-cluster #38626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…init flag the agent is cleaned up and recreated properly so that agent events are responded to. This was causing some networking issues around DNS resolution after performing a force init on a cluster. Signed-off-by: Kyle Wuolle <kyle.wuolle@gmail.com>
// When forcing a new cluster, first clean up the existing agent | ||
// ensuring that a new one will be created and started | ||
if(forceNewCluster) { | ||
daemon.setClusterProvider(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see this as not super clean. If the Agent is stopped, I believe libnetwork need to nullify the previous cluster provider automatically. Is there any other case where the clusterProvider is being reused without being set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right there's another way that might be better. I've updated moby/libnetwork#2307. Now instead the cluster provider would be set to nil in agentClose.
Looks like there's a linting issue;
|
ping @kylewuolle Jenkins failed |
@kylewoule should this one be closed now that moby/libnetwork#2307 was merged (and will be vendored through #38983 ?)
|
oops, it was actually not yet included in 18.09; cherry-picking now |
ping @kylewuolle is this still needed now that moby/libnetwork#2307 was merged? |
@thaJeztah @kylewuolle Has the fixed merged into 18.09 docker ? and what can i do now? |
@caoyj1991 fix was backported in libnetwork through moby/libnetwork#2354, and included in Docker 18.09.6 through docker-archive#201 (should also be in Docker 19.03 and up) |
What I did
This fixes a problem where the agent on a controller is stopped when a node leaves a swarm and is never restarted. I've added
a flag to the DaemonJoinsCluster method to indicate the case where a force init is being done. When that flag is set the existing agent is cleaned up
by setting the cluster provider to nil and waiting for the agent to stop. When the cluster provider is set after, the agent is setup properly. This PR
fixes the following issue : Docker swarm overlay networking not working after --force-new-cluster docker/for-linux#495
How I did it
Added a flag indicating that this is a force new cluster situation and the agent should be cleaned up before setting the cluster provider.
How to verify it
docker network create --scope swarm --driver overlay --attachable test
docker service create --network test --mode global --name demo demo
Description for the changelog
Fix a problem with DNS resolution after performing a cluster init with the --force-new-cluster option set
A picture of a cute animal (not mandatory but encouraged)