@@ -38,19 +38,24 @@ The main advantages of RocksDB are:
38
38
39
39
### Caveats
40
40
41
- RocksDB allows concurrent writes. However, when touching the same document a
42
- write conflict is raised. It is possible to exclusively lock collections when
43
- executing AQL. This will avoid write conflicts but also inhibits concurrent
44
- writes.
45
-
46
- Currently, another restriction is due to the transaction handling in
47
- RocksDB. Transactions are limited in total size. If you have a statement
48
- modifying a lot of documents it is necessary to commit data in-between. This will
49
- be done automatically for AQL by default. Transactions that get too big (in terms of
50
- number of operations involved or the total size of data modified by the transaction)
51
- will be committed automatically. Effectively this means that big user transactions
41
+ RocksDB allows concurrent writes. However, when touching the same document at
42
+ the same time, a write conflict is raised. It is possible to exclusively lock
43
+ collections when executing AQL. This avoids write conflicts, but also inhibits
44
+ concurrent writes.
45
+
46
+ ArangoDB uses RocksDB's transactions to implement the ArangoDB transaction
47
+ handling. Therefore, the same restrictions apply for ArangoDB transactions when
48
+ using the RocksDB engine.
49
+
50
+ RocksDB imposes a limit on the transaction size. It is optimized to
51
+ handle small transactions very efficiently, but is effectively limiting
52
+ the total size of transactions. If you have an operation that modifies a lot of
53
+ documents, it is necessary to commit data in-between. This is done automatically
54
+ for AQL by default. Transactions that get too big (in terms of number of
55
+ operations involved or the total size of data modified by the transaction)
56
+ are committed automatically. Effectively, this means that big user transactions
52
57
are split into multiple smaller RocksDB transactions that are committed individually.
53
- The entire user transaction will not necessarily have ACID properties in this case.
58
+ The entire user transaction does not necessarily have ACID properties in this case.
54
59
55
60
The threshold values for transaction sizes can be configured globally using the
56
61
startup options
@@ -76,18 +81,18 @@ a replication setup when Followers need to replay the same sequence of operation
76
81
on the Leader.
77
82
78
83
The individual RocksDB WAL files are per default about 64 MiB big.
79
- The size will always be proportionally sized to the value specified via
84
+ The size is always proportionally sized to the value specified via
80
85
` --rocksdb.write-buffer-size ` . The value specifies the amount of data to build
81
86
up in memory (backed by the unsorted WAL on disk) before converting it to a
82
87
sorted on-disk file.
83
88
84
89
Larger values can increase performance, especially during bulk loads.
85
90
Up to ` --rocksdb.max-write-buffer-number ` write buffers may be held in memory
86
91
at the same time, so you may wish to adjust this parameter to control memory
87
- usage. A larger write buffer will result in a longer recovery time the next
92
+ usage. A larger write buffer results in a longer recovery time the next
88
93
time the database is opened.
89
94
90
- The RocksDB WAL only contains committed transactions. This means you will never
95
+ The RocksDB WAL only contains committed transactions. This means you never
91
96
see partial transactions in the replication log, but it also means transactions
92
97
are tracked completely in-memory. In practice this causes RocksDB transaction
93
98
sizes to be limited, for more information see the
@@ -102,7 +107,7 @@ found in:
102
107
- [ blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/] ( https://blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/ ) {: target ="_ blank"}
103
108
104
109
The basic idea is that data is organized in levels were each level is a factor
105
- larger than the previous. New data will reside in smaller levels while old data
110
+ larger than the previous. New data resides in smaller levels while old data
106
111
is moved down to the larger levels. This allows to support high rate of inserts
107
112
over an extended period. In principle it is possible that the different levels
108
113
reside on different storage media. The smaller ones on fast SSD, the larger ones
@@ -148,38 +153,24 @@ reaches the limit given by
148
153
149
154
--rocksdb.write-buffer-size
150
155
151
- it will converted to an SST file and inserted at level 0.
156
+ it is converted to an SST file and inserted at level 0.
152
157
153
- The following option controls the size of each level and the depth.
158
+ The following option controls the size of each level and the depth:
154
159
155
160
--rocksdb.num-levels N
156
161
157
- Limits the number of levels to N . By default it is 7 and there is
162
+ It limits the number of levels to ` N ` . By default, it is ` 7 ` and there is
158
163
seldom a reason to change this. A new level is only opened if there is
159
164
too much data in the previous one.
160
165
161
166
--rocksdb.max-bytes-for-level-base B
162
167
163
- L0 will hold at most B bytes.
168
+ L0 holds at most ` B ` bytes.
164
169
165
170
--rocksdb.max-bytes-for-level-multiplier M
166
171
167
- Each level is at most M times as much bytes as the previous
168
- one. Therefore the maximum number of bytes-for-level L can be
169
- calculated as
172
+ Each level is at most ` M ` times as much bytes as the previous
173
+ one. Therefore the maximum number of bytes-for-level ` L ` can be
174
+ calculated as follows:
170
175
171
176
max-bytes-for-level-base * (max-bytes-for-level-multiplier ^ (L-1))
172
-
173
- ### Future
174
-
175
- RocksDB imposes a limit on the transaction size. It is optimized to
176
- handle small transactions very efficiently, but is effectively limiting
177
- the total size of transactions.
178
-
179
- ArangoDB currently uses RocksDB's transactions to implement the ArangoDB
180
- transaction handling. Therefore the same restrictions apply for ArangoDB
181
- transactions when using the RocksDB engine.
182
-
183
- We will improve this by introducing distributed transactions in a future
184
- version of ArangoDB. This will allow handling large transactions as a
185
- series of small RocksDB transactions and hence removing the size restriction.
0 commit comments