-
Notifications
You must be signed in to change notification settings - Fork 2k
kubernetes watch API is behaving oddly #1370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You should not expect a watch to run forever, you need to list/watch in a loop. The code is actually semi-thorny to get right, you are probably better off using the |
thanks @brendandburns. I will try out Informer and will let you know. |
@brendandburns I am seeing similar behavior even with Informer class. Let me know if i have configured something wrong here. It received events for the first few minutes and then suddenly it stopped receiving changelog events.
|
Is it possible that your thread is throwing an exception? If you throw an uncaught exception inside the thread, the thread will terminate. I would try: public void run() {
try {
// your code here
} catch (Throwable e) {
e.printStackTrace();
}
} And see if any exceptions occur. Your code for using the informer looks correct. |
i think so |
@brendandburns @yue9944882 I am not running in this a seperate thread because |
@brendandburns I have not changed any timeout variables. i see default for listNamespacedConfigMapCall is set to 5 mins.
Actually, in my case, kube API server goes to unavailable state once in a while. do u think increasing the timeout will work ? |
you code looks good, the informer will retry reconnecting the kube-apiserver every 1 second if the server goes unavailable. and watch connection will be re-established once the server is up. java/examples/src/main/java/io/kubernetes/client/examples/InformerExample.java Lines 38 to 39 in a43fa93
did you set the read-timeout to infinite as the example above shows? |
@yue9944882 yes.
I tried changing the timeouts too. dint help. in my last run, I could see it working for hours. then it stopped receiving the events. I started with debug mode on. I neither see exceptions nor errors. |
I'm actually not sure if you want infinite timeout? In a flaky network, is it possible that the something is not sending a TCP reset on the severing of a network connection? I've seen situations where a TCP reset isn't sent and the system holds a TCP connection open, but there's no traffic flowing. I would actually set a non-infinite timeout (5 minutes?) and see if that fixes things. |
@brendandburns thanks for the suggestion and i will try and let you know |
@sameer2800 where you able to resolve this? We are starting to see the same symptoms on AKS (Azure Kubernetes Service, K8S 1.18.8) for CR instance. After about 5 minutes the informer stops seeing any updates (new/update/delete). We are running with the 9.0.1 release. We updated to 10.0.1 but no difference. @brendanburns you suggested to run with a read timeout that is not zero, but the 10.0.0 release was updated to disallow any read timeout other than zero. See this commit. Any other suggestions to try? |
cc @yue9944882 See some related discussion here: @tony-clarke-amdocs for AKS specifically see the discussion here: I think we should: eventually: c) switch from Web Sockets to HTTP/2 and add health checks. |
@brendandburns @yue9944882 I noticed that the watch call sets the timeout to 5 minutes. See here. Given that we no longer see watch events after 5 minutes...I tend to think this is not a coincidence? |
@brendandburns @yue9944882 I think I have figured this out. The
We need to add the following to add http2 and a pinginterval.
With the above change the watch doesn't hang and it all is good. Does it make sense that the |
That change seems fine to me. |
I am running a kubernetes watch on configmaps list. I am running watch in a background thread and looking for watch events continously and updating my cache if there is any event added/modifed. What i am observing is that sometimes the watch events are not coming at all.
This entire piece of code runs in a background thread. The moment code misses a event, i dont see any more watch events after that point of time at all. is there any chance that watch is being stopped. if so, how do i check the status.
The text was updated successfully, but these errors were encountered: