Simple write operations on one document periodically take too much time #18771

Encouse · 2023-04-26T12:50:53Z

My Environment

ArangoDB Version: 3.10.4
Deployment Mode: Cluster
Deployment Strategy: Arango Starter
Configuration:
arangodb_database_directory: "/var/lib/arangodb/"
http__keep_alive_timeout: 15000
cluster__default_replication_factor: 3
cluster__system_replication_factor: 1
query__cache_mode: "demand"
query__tracking_with_bindvars: "false"
rocksdb__pending_compactions_slowdown_trigger: 17179869184
log__level: "trace"
rocksdb__compaction_read_ahead_size: 12000
rocksdb__max_parallel_compactions: 16
rocksdb__max_subcompactions: 16
Infrastructure: 5 agents 3 dbservers 3 coordinators, 8 vm's in 2 datacenters, each coordinator is in pair with dbserver on same machine
Operating System: Linux version 5.4.0-137-generic (buildd@lcy02-amd64-009) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
Total RAM in your machine: 16gb for dbserver/coordinators on 3 machines, 2gb for agency
Disks in use: SSD
Used Package: Ubuntu .deb

Component, Query & Data

Affected feature:
Query execution time (too slow)

AQL query (if applicable):
REPLACE "<key>" IN testcol WITH {<new_doc_data>} RETURN {old: OLD, new: NEW}

AQL explain and/or profile (if applicable):

Query String (79 chars, cacheable: false):
 REPLACE "13918858" WITH {a: 3, hash: 2} IN testcol2 RETURN {old: OLD, new: NEW}

Execution plan:
 Id   NodeType                    Site  Est.   Comment
  1   SingletonNode               COOR     1   * ROOT
  3   CalculationNode             COOR     1     - LET #4 = { "a" : 3, "hash" : 2 }   /* json expression */   /* const assignment */
  7   SingleRemoteOperationNode   COOR     1     - REPLACE { _key : "13918858" } WITH #4 IN testcol2
  5   CalculationNode             COOR     1     - LET #6 = { "old" : $OLD, "new" : $NEW }   /* simple expression */
  6   ReturnNode                  COOR     1     - RETURN #6

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Cache   Selectivity   Fields       Stored values   Ranges
  7   primary   primary   testcol2     true     false    false      100.00 %   [ `_key` ]   [  ]            "13918858"

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-calculations-up-2
  3   optimize-cluster-single-document-operations

53 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00027

Write query options:
 Option                   Value
 waitForSync              false
 skipDocumentValidation   false
 keepNull                 true
 mergeObjects             true
 ignoreRevs               true
 isRestore                false
 ignoreErrors             false
 ignoreDocumentNotFound   false
 readCompleteInput        false
 consultAqlWriteFilter    false
 exclusive                false

Dataset:
Replace in collection with one document

Size of your Dataset on disk:
Overall database size - 10GB, size of collection < 10mb

Replication Factor & Number of Shards (Cluster only):
Replication factor - 3, Number of Shards - 1 for each collection

Steps to reproduce

Run al query in a cycle (something about 100000 times)

Problem:
After some writes all write operations on the collection become really slow (leap from 1ms to 80s)

Expected result:
Execution time of write operations on a single document stay within strict boundaries (0 - 10ms)

_We have two systems and one of them is ArangoDB-based. These systems myst be in sync in data terms, so we use message queue to transport object changes from one to another. On ArangoDB-based service side there are many object updates (approx. 40k daily), we perform them using query given in bug descriotion above. Main proble is that sometimes given REPLACE operation takes 0 - 10ms, but in one moment something breaks and it could hang up for more than 30s. So we've got growing consumers lag, and our services are always out of sync!

That's how it looks like, but on collection with approx. 400000 documents in it_

The text was updated successfully, but these errors were encountered:

dothebart · 2023-04-26T17:45:43Z

Hi,
the ArangoDB cluster is not meant to be communicating across long network distances. Machines should be in the very same network and have shortest possible connections.

If you are looking to be resilient on data center outages, https://www.arangodb.com/docs/stable/deployment-dc2dc.html is the ArangoDB solution for this.

Encouse · 2023-04-26T19:38:58Z

Thanks for your reply! I though so but had some doubts, now it's clear, we'll try to move all the instances in one datacenter.

Encouse closed this as completed Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simple write operations on one document periodically take too much time #18771

Simple write operations on one document periodically take too much time #18771

Uh oh!

Uh oh!

Simple write operations on one document periodically take too much time #18771

Simple write operations on one document periodically take too much time #18771

Comments

Uh oh!

My Environment

Component, Query & Data

Steps to reproduce

Uh oh!

Uh oh!

Uh oh!