-
Notifications
You must be signed in to change notification settings - Fork 238
wait for balancer to have some address available on gateway #76
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to see the timeout/context specified as an arg to NewGateway
rather than inside NewGateway- to allow whatever is consuming it to specify/propagate a timeout.
Then, we can place balancer.Get
inside a retry loop that can just keep retrying until the balancer resolves- again allowing whatever is consuming NewGateway
to determine how long they're happy to wait.
Thanks for the PR though! Be nice to have this integrated.
426f352
to
3a5dba1
Compare
I added a parameter to the NewGateway function, and passed it from the agent and health commands. Is it ok? or do you prefer another aproach? Perhaps the NewGateway parameters could be converted to the "WithOption" pattern, but I leave this for another PR :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks- I've made some more comments in-line to help.
Sorry for not explaining before but I'd like the context.Context
to be constructed by the commands and passed to NewGateway
rather than the wait duration (and add cli flags to set defaults).
Thanks again so much for adding this.
pkg/server/gateway.go
Outdated
@@ -45,7 +45,7 @@ type GatewayConfig struct { | |||
TLS *TLSConfig | |||
} | |||
|
|||
func NewGateway(address string, refresh time.Duration, caFile, certificateFile, keyFile string) (*KiamGateway, error) { | |||
func NewGateway(address string, refresh time.Duration, waitForBalancer time.Duration, caFile, certificateFile, keyFile string) (*KiamGateway, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we pass the context.Context
through as the first arg to NewGateway
, rather than the wait duration.
pkg/server/gateway.go
Outdated
@@ -80,6 +80,13 @@ func NewGateway(address string, refresh time.Duration, caFile, certificateFile, | |||
return nil, err | |||
} | |||
|
|||
ctx, cancel := context.WithTimeout(context.Background(), waitForBalancer) | |||
defer cancel() | |||
_, _, err = balancer.Get(ctx, grpc.BalancerGetOptions{BlockingWait: true}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass the context through then this doesn't need to create context.WithTimeout
it can just cancel according to the context it's provided.
cmd/agent/main.go
Outdated
@@ -129,7 +129,7 @@ func main() { | |||
config := http.NewConfig(opts.port) | |||
config.AllowIPQuery = opts.allowIPQuery | |||
|
|||
gateway, err := kiamserver.NewGateway(opts.serverAddress, opts.serverAddressRefresh, opts.caPath, opts.certificatePath, opts.keyPath) | |||
gateway, err := kiamserver.NewGateway(opts.serverAddress, opts.serverAddressRefresh, 50*time.Millisecond, opts.caPath, opts.certificatePath, opts.keyPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing the time we'll pass a context.Context
through. We should also add a CLI flag to control what the wait duration is.
cmd/health/main.go
Outdated
@@ -63,7 +63,7 @@ func main() { | |||
log.SetLevel(log.ErrorLevel) | |||
} | |||
|
|||
gateway, err := kiamserver.NewGateway(opts.serverAddress, opts.serverAddressRefresh, opts.caPath, opts.certificatePath, opts.keyPath) | |||
gateway, err := kiamserver.NewGateway(opts.serverAddress, opts.serverAddressRefresh, 50*time.Millisecond, opts.caPath, opts.certificatePath, opts.keyPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the agent's main, we should construct a context and pass it to NewGateway
and set the timeout on that context from a cli flag.
3a5dba1
to
5c78c6d
Compare
Ok, updated following your comments. Let me know if there is anything more to change. |
Thanks @sp-joseluis-ledesma! We run into this as well, this is great. |
Thanks @sp-joseluis-ledesma |
I've tagged it as |
@TattiQ only thing I can recommend for now is increasing the time limits on your health check and see if that helps- do liveness probes always fail or intermittently? Does it do the same if you specify the FQDN for the server? Could you perhaps open as a new issue and then we can track it? |
When running the health command, more often than not, the output is:
running with GRPC_GO_LOG_SEVERITY_LEVEL=info GRPC_GO_LOG_VERBOSITY_LEVEL=8 the problem is that the balancer has not been notifiied of the address before it tries to connect:
This PR adds a "wait" for the balancer to have address on the NewGateway function, so when it returns the KiamGateway we ensure there is some address in the balancer