8000 Simple write operations on one document periodically take too much time · Issue #18771 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Simple write operations on one document periodically take too much time #18771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Encouse opened this issue Apr 26, 2023 · 2 comments
Closed

Comments

@Encouse
Copy link
Encouse commented Apr 26, 2023

My Environment

  • ArangoDB Version: 3.10.4

  • Deployment Mode: Cluster

  • Deployment Strategy: Arango Starter

  • Configuration:
    arangodb_database_directory: "/var/lib/arangodb/"
    http__keep_alive_timeout: 15000
    cluster__default_replication_factor: 3
    cluster__system_replication_factor: 1
    query__cache_mode: "demand"
    query__tracking_with_bindvars: "false"
    rocksdb__pending_compactions_slowdown_trigger: 17179869184
    log__level: "trace"
    rocksdb__compaction_read_ahead_size: 12000
    rocksdb__max_parallel_compactions: 16
    rocksdb__max_subcompactions: 16

  • Infrastructure: 5 agents 3 dbservers 3 coordinators, 8 vm's in 2 datacenters, each coordinator is in pair with dbserver on same machine

  • Operating System: Linux version 5.4.0-137-generic (buildd@lcy02-amd64-009) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

  • Total RAM in your machine: 16gb for dbserver/coordinators on 3 machines, 2gb for agency

  • Disks in use: SSD

  • Used Package: Ubuntu .deb

Component, Query & Data

Affected feature:
Query execution time (too slow)

AQL query (if applicable):
REPLACE "<key>" IN testcol WITH {<new_doc_data>} RETURN {old: OLD, new: NEW}

AQL explain and/or profile (if applicable):

Query String (79 chars, cacheable: false):
 REPLACE "13918858" WITH {a: 3, hash: 2} IN testcol2 RETURN {old: OLD, new: NEW}

Execution plan:
 Id   NodeType                    Site  Est.   Comment
  1   SingletonNode               COOR     1   * ROOT
  3   CalculationNode             COOR     1     - LET #4 = { "a" : 3, "hash" : 2 }   /* json expression */   /* const assignment */
  7   SingleRemoteOperationNode   COOR     1     - REPLACE { _key : "13918858" } WITH #4 IN testcol2
  5   CalculationNode             COOR     1     - LET #6 = { "old" : $OLD, "new" : $NEW }   /* simple expression */
  6   ReturnNode                  COOR     1     - RETURN #6

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Cache   Selectivity   Fields       Stored values   Ranges
  7   primary   primary   testcol2     true     false    false      100.00 %   [ `_key` ]   [  ]            "13918858"

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-calculations-up-2
  3   optimize-cluster-single-document-operations

53 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00027

Write query options:
 Option                   Value
 waitForSync              false
 skipDocumentValidation   false
 keepNull                 true
 mergeObjects             true
 ignoreRevs               true
 isRestore                false
 ignoreErrors             false
 ignoreDocumentNotFound   false
 readCompleteInput        false
 consultAqlWriteFilter    false
 exclusive                false

Dataset:
Replace in collection with one document

Size of your Dataset on disk:
Overall database size - 10GB, size of collection < 10mb

Replication Factor & Number of Shards (Cluster only):
Replication factor - 3, Number of Shards - 1 for each collection

Steps to reproduce

  1. Run al query in a cycle (something about 100000 times)

Problem:
After some writes all write operations on the collection become really slow (leap from 1ms to 80s)

Expected result:
Execution time of write operations on a single document stay within strict boundaries (0 - 10ms)

_We have two systems and one of them is ArangoDB-based. These systems myst be in sync in data terms, so we use message queue to transport object changes from one to another. On ArangoDB-based service side there are many object updates (approx. 40k daily), we perform them using query given in bug descriotion above. Main proble is that sometimes given REPLACE operation takes 0 - 10ms, but in one moment something breaks and it could hang up for more than 30s. So we've got growing consumers lag, and our services are always out of sync!

That's how it looks like, but on collection with approx. 400000 documents in it_

@dothebart
Copy link
Contributor

Hi,
the ArangoDB cluster is not meant to be communicating across long network distances. Machines should be in the very same network and have shortest possible connections.

If you are looking to be resilient on data center outages, https://www.arangodb.com/docs/stable/deployment-dc2dc.html is the ArangoDB solution for this.

@Encouse
Copy link
Author
Encouse commented Apr 26, 2023

Thanks for your reply! I though so but had some doubts, now it's clear, we'll try to move all the instances in one datacenter.

@Encouse Encouse closed this as completed Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0