@@ -8,9 +8,9 @@ Data Modeling and Operational Factors
8
8
=====================================
9
9
10
10
Designing the data model of your application is a crucial task that can make or
11
- break the performance of your application. A well-designed data model will
12
- allow you to write efficient AQL queries, increase throughput of CRUD operations
13
- and will make sure your data is distributed in the most effective way.
11
+ break the performance of your application. A well-designed data model
12
+ allows you to write efficient AQL queries, increase throughput of CRUD operations,
13
+ and makes sure your data is distributed in the most effective way.
14
14
15
15
Whether you design a new application with ArangoDB or port an existing one to
16
16
use ArangoDB, you should always analyze the (expected) data access patterns of
@@ -146,7 +146,7 @@ on `_from` and `_to` (for edge collections).
146
146
Should you decide to create an index you should consider a few things:
147
147
148
148
- Indexes are a trade-off between storage space, maintenance cost and query speed.
149
- - Each new index will increase the amount of RAM and the amount of disk space needed.
149
+ - Each new index increases the amount of RAM and the amount of disk space needed.
150
150
- Indexes with [ indexed array values] ( indexing-index-basics.html#indexing-array-values )
151
151
need an extra index entry per array entry
152
152
- Adding indexes increases the write-amplification i.e. it negatively affects
@@ -217,7 +217,7 @@ you should consider a few different properties:
217
217
onto more than _ N_ shards. Consider using multiple shard keys, if one of your
218
218
values has a low cardinality.
219
219
- ** Frequency** : Consider how often a given shard key value may appear in
220
- your data. Having a lot of documents with identical shard keys will lead
220
+ your data. Having a lot of documents with identical shard keys leads
221
221
to unevenly distributed data. Consider using multiple shard keys or a different
222
222
one that is more suitable.
223
223
@@ -232,7 +232,7 @@ for more information.
232
232
### SmartGraphs
233
233
234
234
SmartGraphs are an Enterprise Edition feature of ArangoDB. It enables you to
235
- manage graphs at scale, it will give a vast performance benefit for all graphs
235
+ manage graphs at scale. It provides a vast performance benefit for all graphs
236
236
sharded in an ArangoDB Cluster.
237
237
238
238
To add a SmartGraph you need a SmartGraph attribute that partitions your
@@ -260,19 +260,49 @@ network as well as more copying work required inside the storage engine.
260
260
261
261
Consider some ways to minimize the required amount of storage space:
262
262
263
- - Explicitly set the ` _key ` field to a custom unique value.
264
- This enables you to store information in the ` _key ` field instead of another
265
- field inside the document. The ` _key ` value is always indexed, setting a
266
- custom value means you can use a shorter value than what would have been
267
- generated automatically.
268
- - Shorter field names will reduce the amount of space needed to store documents
269
- (this has no effect on index size). ArangoDB is schemaless and needs to store
270
- the document structure inside each document. Usually this is a small overhead
271
- compared to the overall document size.
263
+ - Use the ` _key ` attribute to give documents unique identifiers. The ` _key `
264
+ attribute is always present in every document (including edges), and it
265
+ is always indexed. This means it is the best-suited attribute to store a unique
266
+ document identifier. Using the ` _key ` attribute is preferable to storing
267
+ document identifiers in another attribute and creating a unique index on it.
268
+ Some limitations apply, see [ Document keys] ( data-modeling-naming-conventions-document-keys.html ) .
269
+ - Shorter field names reduce the amount of space needed to store documents.
270
+ ArangoDB is schema-free and needs to store the document structure inside of
271
+ each document. Usually, this is a small overhead compared to the overall
272
+ document size. The field name length has no effect on index sizes.
272
273
- Combining many small related documents into one larger one can also
273
274
reduce overhead. Common fields can be stored once and indexes just need to
274
- store one entry. This will only be beneficial if the combined documents are
275
- regularly retrieved together and not just subsets.
275
+ store one entry. This is only beneficial if the combined documents are
276
+ regularly retrieved together and not just subsets of them.
277
+
278
+ Document Keys
279
+ -------------
280
+
281
+ - Explicitly set the ` _key ` attribute to a custom unique value.
282
+ This enables you to store information in the ` _key ` attribute instead of another
283
+ attribute inside of the document. The ` _key ` attribute is always indexed, so it is
284
+ preferable to storing the document identifiers in another attribute and
285
+ creating an extra index on it.
286
+
287
+ - Try to use short values for the ` _key ` attribute.
288
+ The ` _key ` values are used whenever a document is looked up by its primary
289
+ key, and shorter key values can improve the lookup performance and reduce the
290
+ disk usage.
291
+
292
+ As the ` _key ` values are also used as foreign keys in the ` _from ` and ` _to ` attributes
293
+ of edges, the key length also matters for all graph operations. Again, shorter keys
294
+ can improve lookup performance here and reduce memory usage.
295
+
296
+ When using hash values as document keys, try to avoid long hash values such as
297
+ generated by hash functions such as SHA256 (64 characters in the alphabet
298
+ ` [0-9a-f] ` ) or SHA512 (128 bytes in the alphabet ` [0-9a-f] ` ). Smaller keys are
299
+ always preferable for performance.
300
+
301
+ - Try to avoid keys that are randomly distributed.
302
+ Keys that are randomly distributed are more expensive during larger insert
303
+ operations than keys that follow a mostly ascending sequential pattern, e.g.
304
+ ` 000001 ` , ` 000002 ` , and so on. The storage engine can process sequential keys
305
+ more efficiently on inserts than randomly distributed keys.
276
306
277
307
Storage Engine
278
308
--------------
@@ -281,7 +311,7 @@ Large documents and transactions may negatively impact the write performance
281
311
of the RocksDB storage engine.
282
312
283
313
- Consider a maximum size of 50-75 kB _ per document_ as a good rule of thumb.
284
- This will allow you to maintain steady write throughput even under very high load.
314
+ This allows you to maintain steady write throughput even under very high load.
285
315
- Transactions are held in-memory before they are committed.
286
316
This means that transactions have to be split if they become too big, see the
287
317
[ limitations section] ( transactions-limitations.html#rocksdb-storage-engine ) .
@@ -330,11 +360,11 @@ is being flushed to storage, new writes can continue to the other write buffer.
330
360
The total amount of data to build up in all in-memory buffers when writing into ArangoDB.
331
361
You may wish to adjust this parameter to control memory usage.
332
362
333
- Setting this to a low value may limit the RAM that ArangoDB will use but may slow down
334
- write heavy workloads. Setting this to 0 will not limit the size of the write-buffers.
363
+ Setting this to a low value may limit the RAM that ArangoDB uses but may slow down
364
+ write heavy workloads. Setting this to ` 0 ` does not limit the size of the write-buffers.
335
365
336
366
` --rocksdb.level0-stop-trigger `
337
367
338
- When this many files accumulate in level-0, writes will be stopped to allow compaction to catch up.
368
+ When this many files accumulate in level-0, writes are stopped to allow compaction to catch up.
339
369
Setting this value very high may improve write throughput, but may lead to temporarily
340
370
bad read performance.
0 commit comments