-
Notifications
You must be signed in to change notification settings - Fork 852
Shutting down daemon on Linux is very slow (very large dataset, before upgrade) #13156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi,
However, 3.4 has meanwhile reached EOL hence its no longer supported. However, I wouldn't see to high risk with force-terminating the arangod process in this case. If you experience similar problems with more recent ArangoDB-Versions, please let us know. |
Hi, It's true that 3.4.1 if fairly old, but on a local dev. machine having the same dataset I had a 3.4.9 that I recently upgraded to 3.7.3, and the situation was the same : 1 hour to shut down. We'll see if it gets better next time starting with 3.7. Thank you ! |
Closing this issue for now as 3.4 is EoL, but please re-open if you can reproduce it with 3.7. Thanks! |
Hi, Reopening this ticket because we just experienced a case where v3.7.3 took around 1 hour to shutdown (restart). Context :
What we did :
What does "feature leases to be released" mean ? Are the slow queries responsible for the shutdown time ? Although I understand this might be dangerous in case there are write queries running, is it possible to terminate them in a rougher way when a shutdown is triggerd ? In a production context, it seems that a quick shutdown / restart might be expected, as 1) triggereing a shutdown is probably due to an emergency situation, and 2) more shutdown time means more downtime for the users. Of course this is just our case, and might not be representative of the general case. Also, I'm just asking out of curiosity as nobody should probably do that ever, but what happens if I Thanks a lot for any advice. arangod.log
…
…
…
|
This problem still persists on latest 3.8.2.
|
Can you run the server with
or the config
so we can better see what the shutdown is held up by? |
On our side, last time we shut down v3.8.0 to upgrade it to v3.8.3 (a few minutes ago) process was quite quick although server was under heavy load: less than 1 minute.
One important note, though: we replaced the RAID10 of HDDs with a RAID10 of nvme SSDs − maybe there's a connection (compactions running in background ? transactions to complete properly ?…) |
Seems fixed for a long time now. |
thanks for following up. |
The issue is not fixed. See below, last node restart in cluster (version 3.11)
|
Indeed, it happens again with version
Important note: our setup is "leader-follower", that is not officially supported anymore starting with version
Greetings, |
@matcho: The important log line here is
It means that 1 AQL query is still ongoing on the machine that does not react to the shutdown signal. Is it possible that there is some long-running AQL query ongoing while the shutdown command is received? |
Thanks @jsteemann , yes it's definitely possible, will have to check that next time. It helps a lot to know that "feature leases to be released" can be translated to "query still running" 🙂 Could it mean then that a long query has trouble terminating, ie. being killed ? Or that shutdown procedure does not kill queries, unless you send a second SIGTERM during shutdown phase ? |
@matcho: I need to check if the shutdown procedure is trying to shut down running queries or not. |
I attempt to shut down (actually restart) the server by typing Seems that it practically sends a SIGTERM, since Arangod says |
@matcho : alright, thanks, I will try to figure out more. |
@matcho : I think it is possible that some queries can still run to completion after a shutdown signal was received. Not all queries do not seem to be aborted when a shutdown signal is received, so the server may just wait for all already-started queries to complete. This can take an arbitrary amount of time, depending on what the queries actually do. |
Thank you very much @jsteemann that's great news! 🎉 |
My Environment
Component, Query & Data
Affected feature:
Shutdown process
Size of your Dataset on disk:
1.4 TB
Steps to reproduce
systemctl stop arangodb3
Problem:
Server takes almost 1 hour to shut down, from the moment it receives "control-c" to the moment
systemctl
command returns and the process is killed.This is penalizing when upgrading ArangoDB package on a production server, which is the case here: we are updating from 3.4.1 to 3.7.3 and this leads to a longer-than-expected period of unavailability for our service. No big deal though, as we do not provide a critical service.
Expected result:
A faster shutdown process.
Our hypothesis is that it might be related to the dataset, which is very large; maybe something with writing persistent indexes to disk before shutting down ?
Thank you very much,
Greetings,
Mathias
The text was updated successfully, but these errors were encountered: