-
Notifications
You must be signed in to change notification settings - Fork 852
ArangoDB 2.7.3 consistently stops responding #1666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the If the issue is reproducible, could you check next time whether the server still responds to the version API (a Can you also let me know what the pretty generic workload is and whether your workload contains AQL queries or server-side transactions that invoke graph selection or modification operations? This should help rule out potential reasons. It would also help to know if the server has enough RAM to hold the active collections completely in-memory. If it's limited on RAM, then operations may get slowed down, but if there's enough RAM, then this is out, too. |
Hi Jan, First off, thank you for your reply :) Much appreciated! I got two different servers with two different scenarios I believe. Both servers responds with the version number. Server A does not respond through arangosh, and no longer serving requests but respond to the version endpoint.
Server B was in the same scenario as Server A, but after leaving it overnight - I can now get a prompt in arangosh, however, everything returns with a 500 Internal Server Error.
Our workload consists of mostly:
We do not use any server-side transactions - only AQL queries to do graph selections and modification. Is there any way to have ArangoDB log all our queries to the server? If not, I will try and dig into our software to see what queries are being called and used. Both servers have 64GB of RAM, and an arangodump of all our collections is around 12GB in size. Thanks for your help!
|
Hi Otto, thanks a lot for the explanation of your scenario. I've discussed with Jan. We might have an idea where the deadlock is caused. Jan will back-port the current deadlock detection to 2.7. But we have to do some more tests to avoid any unwanted side-effects. We will then release a new 2.7. Coming back to your other question: it is possible to log requests by specifying
However, that will not log the complete requests. Only the request path. Thanks |
Thank you Frank and Jan! I will try the latest beta of 2.8 in staging to see if I can reproduce this there. |
@ottoyiu if you can try out 2.8 beta, that would be very helpful, thanks |
I just crashed both servers the same way as 2.7. I'm going to log the requests and see if that'll help with the debugging.
Is there a way to use the 'queries' module to see a list of running/slow queries, while the server is in this state? |
Thanks for testing. We have to investigate in more detail. When the servers hangs and you do a top -H do you see any arangodb thread that is still running (i. e. has a significant %cpu)? |
Doesn't seem like so. Sometimes, there's a short spike in CPU usage for one or two dispat_std thread - but not any significant cpu% usage. Another one is the v8-gc thread which runs from time to time and takes around 10% cpu.
|
and it seemed like the servers recovered... could it be a function of the deadlock timeouts introduced in 2.8? |
We have to investigate a bit more. Is there any chance that you could start the server with --log.level trace and send us [hackers (at) arangodb.org] the corresponding log file? |
Email sent. Thanks! |
Fixed after private conversation - thus progress is not documented here. |
Hello there,
We're running 2.7.3 and 2.6.12 on CentOS6 in four seperate production environments respectively, and we are able to consistently make ArangoDB stop responding with a pretty generic workload we put on it on every single one of them.
(Linux tat1234 2.6.32-573.el6.x86_64 #1 SMP Thu Jul 23 15:44:03 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux)
Attempting to 'arangosh' into the instance, does not return a prompt back nor does the dashboard on port 8529 respond.
I did an
strace -p -f
on the arangod process and I'm not sure what I'm actually looking at, but it doesn't seem TOO out of the ordinary compared to a responding instance:http://hastebin.com/raw/erujuhewec
I was thinking it could be a behaviour of garbage collection, but it doesn't respond even after leaving it for hours. The only thing that fixes it would be to restart the daemon.
I'm clueless as to what steps are to be taken next. Help is much appreciated :)
Thanks in advance,
Otto
The text was updated successfully, but these errors were encountered: