8000 Possible memory leak / regression in 3.3+ · Issue #5579 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content
8000
Possible memory leak / regression in 3.3+ #5579
Closed
@choppedpork

Description

@choppedpork

my environment running ArangoDB

I'm using ArangoDB version:

  • 3.3.8 (subsequently upgraded to 3.3.10)

Mode:

  • Cluster

Storage-Engine:

  • rocksdb

On this operating system:

  • Linux
    • other: self-built Docker container using the latest Ubuntu package

this is an installation-related issue:

Hello,

Since upgrading to 3.3 we've noticed a rather drastic increase in memory usage of server nodes over time. I've had a look at some of the existing open tickets and this looks quite similar to #5414 - I've decided to open a separate issue and let you decide though.

We're seeing this problem only in server instances - both the agency and coordinator nodes have very stable memory usage. Our standard deployment model for Arango is 3x agency, 3x coordinator and 3x server. We're using the following config options (I've skipped ones that seem irrelevant to me like directory locations, ip addresses etc - let me know if you'd like the full list):

--server.authentication=false
--cluster.my-role PRIMARY
--log.level info
--javascript.v8-contexts 16
--javascript.v8-max-heap 3072
--server.storage-engine rocksdb

Here's a list of sysctls we're setting:

- { name: "vm.max_map_count", value: "262144" }
- { name: "vm.overcommit_memory", value: "0" } # arangodb recommend setting this to 2 but this causes a lot of issues bringing other containers up
- { name: "vm.zone_reclaim_mode", value: "0" }
- { name: "vm.swappiness", value: "1" }
- { name: "net.core.somaxconn", value: "65535" }

net.core.somaxconn is also set (same value) on the Docker container.

We're setting the transparent_hugepage defrag and enabled properties to never.

We've upgraded from 3.2.9 to 3.3.0 and have since used 3.3.3, 3.3.4 and on 3.3.8 now.
Here's a memory usage (RSS) for the server nodes in one of our environments (which got upgraded to 3.3.x around 25th April - note that this environment gets shutdown in the evening every day hence the large gaps in the graph):

screen shot 2018-06-11 at 13 06 54

This is the above graph zoomed in to the last 5 working day period:

screen shot 2018-06-11 at 13 06 22

This is another environment, the upgrade to 3.3.x happened on 3rd of April. The change in the memory usage pattern on the 27th April has been caused by applying a docker memory limit of 2g.

screen shot 2018-06-11 at 13 05 16

The above environments have extremely light usage of Arango.

Here's one of which gets used a bit more:
screen shot 2018-06-11 at 13 22 43

As a reference point of sorts, here's what it looks like when we run a load test against our application:
screen shot 2018-06-11 at 13 23 39
The server nodes eventually tail off at 6.4GB and memory usage remains perfectly stable afterwards.

All of the above graphs were taken using the same settings I've mentioned earlier.

Let me know what other information would be useful to provide - I guess disabling statistics and or Foxx queues would be something you might want us to try? If so - shall we try disabling both off or try them one by one (if so - what order would you prefer)?

Thanks,
Simon

edit: as part of the investigation I have upgraded from 3.3.8 to 3.3.10

Metadata

Metadata

Assignees

Labels

1 Analyzing3 OOMSystem runs out of memory / resources

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0