8000 improve documentation about document keys (#1261) · konsultaner/arangodb-docs@e547d18 · GitHub
[go: up one dir, main page]

Skip to content

Commit e547d18

Browse files
jsteemannjoerg84neunhoefSimran-Bnerpaula
authored
improve documentation about document keys (arangodb#1261)
* improve documentation about document keys * Update 3.11/data-modeling-operational-factors.md Co-authored-by: Max Neunhöffer <max@arangodb.com> * Update 3.10/data-modeling-operational-factors.md Co-authored-by: Max Neunhöffer <max@arangodb.com> * Update 3.9/data-modeling-operational-factors.md Co-authored-by: Max Neunhöffer <max@arangodb.com> * Review --------- Co-authored-by: Joerg Schad <joerg.schad@gmail.com> Co-authored-by: Max Neunhöffer <max@arangodb.com> Co-authored-by: Simran Spiller <simran@arangodb.com> Co-authored-by: Paula Mihu <97217318+nerpaula@users.noreply.github.com>
1 parent 8c61b7d commit e547d18

File tree

3 files changed

+153
-63
lines changed

3 files changed

+153
-63
lines changed

3.10/data-modeling-operational-factors.md

Lines changed: 51 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ Data Modeling and Operational Factors
88
=====================================
99

1010
Designing the data model of your application is a crucial task that can make or
11-
break the performance of your application. A well-designed data model will
12-
allow you to write efficient AQL queries, increase throughput of CRUD operations
13-
and will make sure your data is distributed in the most effective way.
11+
break the performance of your application. A well-designed data model
12+
allows you to write efficient AQL queries, increase throughput of CRUD operations,
13+
and makes sure your data is distributed in the most effective way.
1414

1515
Whether you design a new application with ArangoDB or port an existing one to
1616
use ArangoDB, you should always analyze the (expected) data access patterns of
@@ -146,7 +146,7 @@ on `_from` and `_to` (for edge collections).
146146
Should you decide to create an index you should consider a few things:
147147

148148
- Indexes are a trade-off between storage space, maintenance cost and query speed.
149-
- Each new index will increase the amount of RAM and the amount of disk space needed.
149+
- Each new index increases the amount of RAM and the amount of disk space needed.
150150
- Indexes with [indexed array values](indexing-index-basics.html#indexing-array-values)
151151
need an extra index entry per array entry
152152
- Adding indexes increases the write-amplification i.e. it negatively affects
@@ -217,7 +217,7 @@ you should consider a few different properties:
217217
onto more than _N_ shards. Consider using multiple shard keys, if one of your
218218
values has a low cardinality.
219219
- **Frequency**: Consider how often a given shard key value may appear in
220-
your data. Having a lot of documents with identical shard keys will lead
220+
your data. Having a lot of documents with identical shard keys leads
221221
to unevenly distributed data. Consider using multiple shard keys or a different
222222
one that is more suitable.
223223

@@ -232,7 +232,7 @@ for more information.
232232
### SmartGraphs
233233

234234
SmartGraphs are an Enterprise Edition feature of ArangoDB. It enables you to
235-
manage graphs at scale, it will give a vast performance benefit for all graphs
235+
manage graphs at scale. It provides a vast performance benefit for all graphs
236236
sharded in an ArangoDB Cluster.
237237

238238
To add a SmartGraph you need a SmartGraph attribute that partitions your
@@ -260,19 +260,49 @@ network as well as more copying work required inside the storage engine.
260260

261261
Consider some ways to minimize the required amount of storage space:
262262

263-
- Explicitly set the `_key` field to a custom unique value.
264-
This enables you to store information in the `_key` field instead of another
265-
field inside the document. The `_key` value is always indexed, setting a
266-
custom value means you can use a shorter value than what would have been
267-
generated automatically.
268-
- Shorter field names will reduce the amount of space needed to store documents
269-
(this has no effect on index size). ArangoDB is schemaless and needs to store
270-
the document structure inside each document. Usually this is a small overhead
271-
compared to the overall document size.
263+
- Use the `_key` attribute to give documents unique identifiers. The `_key`
264+
attribute is always present in every document (including edges), and it
265+
is always indexed. This means it is the best-suited attribute to store a unique
266+
document identifier. Using the `_key` attribute is preferable to storing
267+
document identifiers in another attribute and creating a unique index on it.
268+
Some limitations apply, see [Document keys](data-modeling-naming-conventions-document-keys.html).
269+
- Shorter field names reduce the amount of space needed to store documents.
270+
ArangoDB is schema-free and needs to store the document structure inside of
271+
each document. Usually, this is a small overhead compared to the overall
272+
document size. The field name length has no effect on index sizes.
272273
- Combining many small related documents into one larger one can also
273274
reduce overhead. Common fields can be stored once and indexes just need to
274-
store one entry. This will only be beneficial if the combined documents are
275-
regularly retrieved together and not just subsets.
275+
store one entry. This is only beneficial if the combined documents are
276+
regularly retrieved together and not just subsets of them.
277+
278+
Document Keys
279+
-------------
280+
281+
- Explicitly set the `_key` attribute to a custom unique value.
282+
This enables you to store information in the `_key` attribute instead of another
283+
attribute inside of the document. The `_key` attribute is always indexed, so it is
284+
preferable to storing the document identifiers in another attribute and
285+
creating an extra index on it.
286+
287+
- Try to use short values for the `_key` attribute.
288+
The `_key` values are used whenever a document is looked up by its primary
289+
key, and shorter key values can improve the lookup performance and reduce the
290+
disk usage.
291+
292+
As the `_key` values are also used as foreign keys in the `_from` and `_to` attributes
293+
of edges, the key length also matters for all graph operations. Again, shorter keys
294+
can improve lookup performance here and reduce memory usage.
295+
296+
When using hash values as document keys, try to avoid long hash values such as
297+
generated by hash functions such as SHA256 (64 characters in the alphabet
298+
`[0-9a-f]`) or SHA512 (128 bytes in the alphabet `[0-9a-f]`). Smaller keys are
299+
always preferable for performance.
300+
301+
- Try to avoid keys that are randomly distributed.
302+
Keys that are randomly distributed are more expensive during larger insert
303+
operations than keys that follow a mostly ascending sequential pattern, e.g.
304+
`000001`, `000002`, and so on. The storage engine can process sequential keys
305+
more efficiently on inserts than randomly distributed keys.
276306

277307
Storage Engine
278308
--------------
@@ -281,7 +311,7 @@ Large documents and transactions may negatively impact the write performance
281311
of the RocksDB storage engine.
282312

283313
- Consider a maximum size of 50-75 kB _per document_ as a good rule of thumb.
284-
This will allow you to maintain steady write throughput even under very high load.
314+
This allows you to maintain steady write throughput even under very high load.
285315
- Transactions are held in-memory before they are committed.
286316
This means that transactions have to be split if they become too big, see the
287317
[limitations section](transactions-limitations.html#rocksdb-storage-engine).
@@ -330,11 +360,11 @@ is being flushed to storage, new writes can continue to the other write buffer.
330360
The total amount of data to build up in all in-memory buffers when writing into ArangoDB.
331361
You may wish to adjust this parameter to control memory usage.
332362

333-
Setting this to a low value may limit the RAM that ArangoDB will use but may slow down
334-
write heavy workloads. Setting this to 0 will not limit the size of the write-buffers.
363+
Setting this to a low value may limit the RAM that ArangoDB uses but may slow down
364+
write heavy workloads. Setting this to `0` does not limit the size of the write-buffers.
335365

336366
`--rocksdb.level0-stop-trigger`
337367

338-
When this many files accumulate in level-0, writes will be stopped to allow compaction to catch up.
368+
When this many files accumulate in level-0, writes are stopped to allow compaction to catch up.
339369
Setting this value very high may improve write throughput, but may lead to temporarily
340370
bad read performance.

3.11/data-modeling-operational-factors.md

Lines changed: 51 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ Data Modeling and Operational Factors
88
=====================================
99

1010
Designing the data model of your application is a crucial task that can make or
11-
break the performance of your application. A well-designed data model will
12-
allow you to write efficient AQL queries, increase throughput of CRUD operations
13-
and will make sure your data is distributed in the most effective way.
11+
break the performance of your application. A well-designed data model
12+
allows you to write efficient AQL queries, increase throughput of CRUD operations,
13+
and makes sure your data is distributed in the most effective way.
1414

1515
Whether you design a new application with ArangoDB or port an existing one to
1616
use ArangoDB, you should always analyze the (expected) data access patterns of
@@ -146,7 +146,7 @@ on `_from` and `_to` (for edge collections).
146146
Should you decide to create an index you should consider a few things:
147147

148148
- Indexes are a trade-off between storage space, maintenance cost and query speed.
149-
- Each new index will increase the amount of RAM and the amount of disk space needed.
149+
- Each new index increases the amount of RAM and the amount of disk space needed.
150150
- Indexes with [indexed array values](indexing-index-basics.html#indexing-array-values)
151151
need an extra index entry per array entry
152152
- Adding indexes increases the write-amplification i.e. it negatively affects
@@ -217,7 +217,7 @@ you should consider a few different properties:
217217
onto more than _N_ shards. Consider using multiple shard keys, if one of your
218218
values has a low cardinality.
219219
- **Frequency**: Consider how often a given shard key value may appear in
220-
your data. Having a lot of documents with identical shard keys will lead
220+
your data. Having a lot of documents with identical shard keys leads
221221
to unevenly distributed data. Consider using multiple shard keys or a different
222222
one that is more suitable.
223223

@@ -232,7 +232,7 @@ for more information.
232232
### SmartGraphs
233233

234234
SmartGraphs are an Enterprise Edition feature of ArangoDB. It enables you to
235-
manage graphs at scale, it will give a vast performance benefit for all graphs
235+
manage graphs at scale. It provides a vast performance benefit for all graphs
236236
sharded in an ArangoDB Cluster.
237237

238238
To add a SmartGraph you need a SmartGraph attribute that partitions your
@@ -260,19 +260,49 @@ network as well as more copying work required inside the storage engine.
260260

261261
Consider some ways to minimize the required amount of storage space:
262262

263-
- Explicitly set the `_key` field to a custom unique value.
264-
This enables you to store information in the `_key` field instead of another
265-
field inside the document. The `_key` value is always indexed, setting a
266-
custom value means you can use a shorter value than what would have been
267-
generated automatically.
268-
- Shorter field names will reduce the amount of space needed to store documents
269-
(this has no effect on index size). ArangoDB is schemaless and needs to store
270-
the document structure inside each document. Usually this is a small overhead
271-
compared to the overall document size.
263+
- Use the `_key` attribute to give documents unique identifiers. The `_key`
264+
attribute is always present in every document (including edges), and it
265+
is always indexed. This means it is the best-suited attribute to store a unique
266+
document identifier. Using the `_key` attribute is preferable to storing
267+
document identifiers in another attribute and creating a unique index on it.
268+
Some limitations apply, see [Document keys](data-modeling-naming-conventions-document-keys.html).
269+
- Shorter field names reduce the amount of space needed to store documents.
270+
ArangoDB is schema-free and needs to store the document structure inside of
271+
each document. Usually, this is a small overhead compared to the overall
272+
document size. The field name length has no effect on index sizes.
272273
- Combining many small related documents into one larger one can also
273274
reduce overhead. Common fields can be stored once and indexes just need to
274-
store one entry. This will only be beneficial if the combined documents are
275-
regularly retrieved together and not just subsets.
275+
store one entry. This is only beneficial if the combined documents are
276+
regularly retrieved together and not just subsets of them.
277+
278+
Document Keys
279+
-------------
280+
281+
- Explicitly set the `_key` attribute to a custom unique value.
282+
This enables you to store information in the `_key` attribute instead of another
283+
attribute inside of the document. The `_key` attribute is always indexed, so it is
284+
preferable to storing the document identifiers in another attribute and
285+
creating an extra index on it.
286+
287+
- Try to use short values for the `_key` attribute.
288+
The `_key` values are used whenever a document is looked up by its primary
289+
key, and shorter key values can improve the lookup performance and reduce the
290+
disk usage.
291+
292+
As the `_key` values are also used as foreign keys in the `_from` and `_to` attributes
293+
of edges, the key length also matters for all graph operations. Again, shorter keys
294+
can improve lookup performance here and reduce memory usage.
295+
296+
When using hash values as document keys, try to avoid long hash values such as
297+
generated by hash functions such as SHA256 (64 characters in the alphabet
298+
`[0-9a-f]`) or SHA512 (128 bytes in the alphabet `[0-9a-f]`). Smaller keys are
299+
always preferable for performance.
300+
301+
- Try to avoid keys that are randomly distributed.
302+
Keys that are randomly distributed are more expensive during larger insert
303+
operations than keys that follow a mostly ascending sequential pattern, e.g.
304+
`000001`, `000002`, and so on. The storage engine can process sequential keys
305+
more efficiently on inserts than randomly distributed keys.
276306

277307
Storage Engine
278308
--------------
@@ -281,7 +311,7 @@ Large documents and transactions may negatively impact the write performance
281311
of the RocksDB storage engine.
282312

283313
- Consider a maximum size of 50-75 kB _per document_ as a good rule of thumb.
284-
This will allow you to maintain steady write throughput even under very high load.
314+
This allows you to maintain steady write throughput even under very high load.
285315
- Transactions are held in-memory before they are committed.
286316
This means that transactions have to be split if they become too big, see the
287317
[limitations section](transactions-limitations.html#rocksdb-storage-engine).
@@ -330,11 +360,11 @@ is being flushed to storage, new writes can continue to the other write buffer.
330360
The total amount of data to build up in all in-memory buffers when writing into ArangoDB.
331361
You may wish to adjust this parameter to control memory usage.
332362

333-
Setting this to a low value may limit the RAM that ArangoDB will use but may slow down
334-
write heavy workloads. Setting this to 0 will not limit the size of the write-buffers.
363+
Setting this to a low value may limit the RAM that ArangoDB uses but may slow down
364+
write heavy workloads. Setting this to `0` does not limit the size of the write-buffers.
335365

336366
`--rocksdb.level0-stop-trigger`
337367

338-
When this many files accumulate in level-0, writes will be stopped to allow compaction to catch up.
368+
When this many files accumulate in level-0, writes are stopped to allow compaction to catch up.
339369
Setting this value very high may improve write throughput, but may lead to temporarily
340370
bad read performance.

0 commit comments

Comments
 (0)
0