8000 OneShard follow-up (#1254) · konsultaner/arangodb-docs@fde5bdb · GitHub
[go: up one dir, main page]

Skip to content

Commit fde5bdb

Browse files
Simran-Bnerpaula
andauthored
OneShard follow-up (arangodb#1254)
* OneShard is per database (not collection), or cluster-wide * OneShard links * Unrelated unification between 3.10..3.11 + timelessness * Shards are per-collection Co-authored-by: Paula Mihu <97217318+nerpaula@users.noreply.github.com>
1 parent 435b69a commit fde5bdb

15 files changed

+101
-115
lines changed

3.10/aql/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,5 +34,5 @@ It is a pure data manipulation language (DML), not a data definition language
3434
The syntax of AQL queries is different to SQL, even if some keywords overlap.
3535
Nevertheless, AQL should be easy to understand for anyone with an SQL background.
3636

37-
For some example queries, please refer to the chapters
38-
[Data Queries](data-queries.html) and [AQL Query Patterns and Examples](examples.html).
37+
For example queries, see the [Data Queries](data-queries.html) and
38+
[Examples & Query Patterns](examples.html) chapters.

3.10/architecture-deployment-modes-cluster-sharding.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,7 @@ replicas. This in turn implies, that a complete pristine replication would
7575
involve 10 shards which need to catch up with their leaders.
7676

7777
Not all use cases require horizontal scalability. In such cases, consider the
78-
[OneShard](architecture-deployment-modes-cluster-architecture.html#oneshard)
79-
feature as alternative to flexible sharding.
78+
[OneShard](deployment-oneshard.html) feature as alternative to flexible sharding.
8079

8180
Shard Keys
8281
----------

3.10/architecture-storage-engines.md

Lines changed: 28 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -38,19 +38,24 @@ The main advantages of RocksDB are:
3838

3939
### Caveats
4040

41-
RocksDB allows concurrent writes. However, when touching the same document a
42-
write conflict is raised. It is possible to exclusively lock collections when
43-
executing AQL. This will avoid write conflicts but also inhibits concurrent
44-
writes.
45-
46-
Currently, another restriction is due to the transaction handling in
47-
RocksDB. Transactions are limited in total size. If you have a statement
48-
modifying a lot of documents it is necessary to commit data in-between. This will
49-
be done automatically for AQL by default. Transactions that get too big (in terms of
50-
number of operations involved or the total size of data modified by the transaction)
51-
will be committed automatically. Effectively this means that big user transactions
41+
RocksDB allows concurrent writes. However, when touching the same document at
42+
the same time, a write conflict is raised. It is possible to exclusively lock
43+
collections when executing AQL. This avoids write conflicts, but also inhibits
44+
concurrent writes.
45+
46+
ArangoDB uses RocksDB's transactions to implement the ArangoDB transaction
47+
handling. Therefore, the same restrictions apply for ArangoDB transactions when
48+
using the RocksDB engine.
49+
50+
RocksDB imposes a limit on the transaction size. It is optimized to
51+
handle small transactions very efficiently, but is effectively limiting
52+
the total size of transactions. If you have an operation that modifies a lot of
53+
documents, it is necessary to commit data in-between. This is done automatically
54+
for AQL by default. Transactions that get too big (in terms of number of
55+
operations involved or the total size of data modified by the transaction)
56+
are committed automatically. Effectively, this means that big user transactions
5257
are split into multiple smaller RocksDB transactions that are committed individually.
53-
The entire user transaction will not necessarily have ACID properties in this case.
58+
The entire user transaction does not necessarily have ACID properties in this case.
5459

5560
The threshold values for transaction sizes can be configured globally using the
5661
startup options
@@ -76,18 +81,18 @@ a replication setup when Followers need to replay the same sequence of operation
7681
on the Leader.
7782

7883
The individual RocksDB WAL files are per default about 64 MiB big.
79-
The size will always be proportionally sized to the value specified via
84+
The size is always proportionally sized to the value specified via
8085
`--rocksdb.write-buffer-size`. The value specifies the amount of data to build
8186
up in memory (backed by the unsorted WAL on disk) before converting it to a
8287
sorted on-disk file.
8388

8489
Larger values can increase performance, especially during bulk loads.
8590
Up to `--rocksdb.max-write-buffer-number` write buffers may be held in memory
8691
at the same time, so you may wish to adjust this parameter to control memory
87-
usage. A larger write buffer will result in a longer recovery time the next
92+
usage. A larger write buffer results in a longer recovery time the next
8893
time the database is opened.
8994

90-
The RocksDB WAL only contains committed transactions. This means you will never
95+
The RocksDB WAL only contains committed transactions. This means you never
9196
see partial transactions in the replication log, but it also means transactions
9297
are tracked completely in-memory. In practice this causes RocksDB transaction
9398
sizes to be limited, for more information see the
@@ -102,7 +107,7 @@ found in:
102107
- [blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/](https://blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/){:target="_blank"}
103108

104109
The basic idea is that data is organized in levels were each level is a factor
105-
larger than the previous. New data will reside in smaller levels while old data
110+
larger than the previous. New data resides in smaller levels while old data
106111
is moved down to the larger levels. This allows to support high rate of inserts
107112
over an extended period. In principle it is possible that the different levels
108113
reside on different storage media. The smaller ones on fast SSD, the larger ones
@@ -148,38 +153,24 @@ reaches the limit given by
148153

149154
--rocksdb.write-buffer-size
150155

151-
it will converted to an SST file and inserted at level 0.
156+
it is converted to an SST file and inserted at level 0.
152157

153-
The following option controls the size of each level and the depth.
158+
The following option controls the size of each level and the depth:
154159

155160
--rocksdb.num-levels N
156161

157-
Limits the number of levels to N. By default it is 7 and there is
162+
It limits the number of levels to `N`. By default, it is `7` and there is
158163
seldom a reason to change this. A new level is only opened if there is
159164
too much data in the previous one.
160165

161166
--rocksdb.max-bytes-for-level-base B
162167

163-
L0 will hold at most B bytes.
168+
L0 holds at most `B` bytes.
164169

165170
--rocksdb.max-bytes-for-level-multiplier M
166171

167-
Each level is at most M times as much bytes as the previous
168-
one. Therefore the maximum number of bytes-for-level L can be
169-
calculated as
172+
Each level is at most `M` times as much bytes as the previous
173+
one. Therefore the maximum number of bytes-for-level `L` can be
174+
calculated as follows:
170175

171176
max-bytes-for-level-base * (max-bytes-for-level-multiplier ^ (L-1))
172-
173-
### Future
174-
175-
RocksDB imposes a limit on the transaction size. It is optimized to
176-
handle small transactions very efficiently, but is effectively limiting
177-
the total size of transactions.
178-
179-
ArangoDB currently uses RocksDB's transactions to implement the ArangoDB
180-
transaction handling. Therefore the same restrictions apply for ArangoDB
181-
transactions when using the RocksDB engine.
182-
183-
We will improve this by introducing distributed transactions in a future
184-
version of ArangoDB. This will allow handling large transactions as a
185-
series of small RocksDB transactions and hence removing the size restriction.

3.10/data-modeling-graphs-from-rdf.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ all of the serializations but it is a step that may effect how data is imported.
140140
### Ontology, Taxonomy, Class Inheritance, and RDFS
141141

142142
The final consideration is something that for many is the core of RDF and
143-
semantic data: *[Ontologies](https://www.w3.org/standards/semanticweb/ontology){:target="_blank"}*.
143+
semantic data: [Ontologies](https://www.w3.org/standards/semanticweb/ontology){:target="_blank"}.
144144
Not just ontologies but also class inheritance, and schema validation. One method
145145
would be add the ontology in a similar way to what has been suggested for the
146146
RDF graphs as ontologies are usually structured in the same way (or can be).

3.10/deployment-oneshard.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
---
22
layout: default
33
description: >-
4-
A OneShard deployment offers a practicable solution that enables significant
5-
performance improvements by massively reducing cluster-internal communication
6-
and allows running transactions with ACID guarantees on shard leaders
4+
The OneShard feature offers a practicable solution that enables significantly
5+
improved performance and transactional guarantees for cluster deployments
76
---
87
# OneShard
98

@@ -12,12 +11,17 @@ description: >-
1211

1312
{% include hint-ee-arangograph.md feature="The OneShard option" %}
1413

15-
In an ArangoDB cluster, the OneShard deployment restricts collections to a
16-
single shard and places them on one DB-Server. This way, whole queries can be
17-
pushed to and executed on that server. The Coordinator only gets back the final
18-
result.
14+
The OneShard option for ArangoDB clusters restricts all collections of a
15+
database to a single shard and places them on one DB-Server node. This way,
16+
whole queries can be pushed to and executed on that server, massively reducing
17+
cluster-internal communication. The Coordinator only gets back the final result.
1918

20-
This setup is highly recommended for most graph use cases and join-heavy queries.
19+
Queries are always limited to a single database, and with the data of a whole
20+
database on a single node, the OneShard option allows running transactions with
21+
ACID guarantees on shard leaders.
22+
23+
A OneShard setup is highly recommended for most graph use cases and join-heavy
24+
queries.
2125

2226
{% hint 'info' %}
2327
For graphs larger than what fits on a single DB-Server node, you can use the

3.10/graphs-enterprise-graphs-getting-started.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ and `arangoimport`.
2626
`arangoexport` allows you to export collections to formats like `JSON`, `JSONL`, or `CSV`.
2727
For this particular case, it is recommended to export data to `JSONL` format.
2828
Once the data is exported, you need to exclude
29-
the *_key* values from edges. The `enterprise-graph` module does not allow
30-
custom *_key* values on edges. This is necessary for the initial data replication
29+
the `_key` values from edges. The `enterprise-graph` module does not allow
30+
custom `_key` values on edges. This is necessary for the initial data replication
3131
when using `arangoimport` because these values are immutable.
3232

3333
### Migration by Example
@@ -107,7 +107,7 @@ After this step, the graph has been migrated.
107107

108108
This example describes a scenario in which the collections names have changed,
109109
assuming that you have renamed `old_vertices` to `vertices`.
110-
For the vertex data this change is not relevant, the `_id` values will adjust automatically,
110+
For the vertex data this change is not relevant, the `_id` values is adjust automatically,
111111
so you can import the data again, and just target the new collection name:
112112

113113
arangoimport --collection vertices --file docOutput/old_vertices.jsonl
@@ -209,10 +209,10 @@ Compared to SmartGraphs, the option `isSmart: true` is required but the
209209
### Add vertex collections
210210

211211
The **collections must not exist** when creating the EnterpriseGraph. The EnterpriseGraph
212-
module will create them for you automatically to set up the sharding for all
212+
module creates them for you automatically to set up the sharding for all
213213
these collections correctly. If you create collections via the EnterpriseGraph
214214
module and remove them from the graph definition, then you may re-add them
215-
without trouble however, as they will have the correct sharding.
215+
without trouble however, as they have the correct sharding.
216216

217217
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
218218
@startDocuBlockInline enterpriseGraphCreateGraphHowTo2_cluster
@@ -257,13 +257,13 @@ correct sharding already).
257257

258258
When creating a collection, you can decide whether it's a SatelliteCollection
259259
or not. For example, a vertex collection can be satellite as well.
260-
SatelliteCollections don't require sharding as the data will be distributed
260+
SatelliteCollections don't require sharding as the data is distributed
261261
globally on all DB-Servers. The `smartGraphAttribute` is also not required.
262262

263263
In addition to the attributes you would set to create a EnterpriseGraph, there is an
264264
additional attribute `satellites` you can optionally set. It needs to be an array of
265265
one or more collection names. These names can be used in edge definitions
266-
(relations) and these collections will be created as SatelliteCollections.
266+
(relations) and these collections are created as SatelliteCollections.
267267
However, all vertex collections on one side of the relation have to be of
268268
the same type - either all satellite or all smart. This is because `_from`
269269
and `_to` can have different types based on the sharding pattern.
@@ -272,8 +272,8 @@ In this example, both vertex collections are created as SatelliteCollections.
272272

273273
{% hint 'info' %}
274274
When providing a satellite collection that is not used in a relation,
275-
it will not be created. If you create the collection in a following
276-
request, only then the option will count.
275+
it is not created. If you create the collection in a following
276+
request, only then the option counts.
277277
{% endhint %}
278278

279279
{% arangoshexample examplevar="examplevar" script="script" result="result" %}

3.10/highlights.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ Also see [What's New in 3.7](release-notes-new-features37.html).
199199

200200
**Enterprise Edition**
201201

202-
- [**OneShard**](architecture-deployment-modes-cluster-architecture.html#oneshard)
202+
- [**OneShard**](deployment-oneshard.html)
203203
deployments offer a practicable solution that enables significant performance
204204
improvements by massively reducing cluster-internal communication. A database
205205
created with OneShard enabled is limited to a single DB-Server node but still

3.10/release-notes-new-features36.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -579,8 +579,7 @@ only the final result. This can drastically reduce resource consumption and
579579
communication effort for the Coordinator.
580580

581581
An entire cluster, selected databases or selected collections can be made
582-
eligible for the OneShard optimization. See
583-
[OneShard cluster architecture](deployment-oneshard.html)
582+
eligible for the OneShard optimization. See [OneShard cluster architecture](deployment-oneshard.html)
584583
for details and usage examples.
585584

586585
HTTP API

3.11/administration-cluster.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ vertex collections (the latter two require the *Enterprise Edition* of ArangoDB)
189189
Manually overriding the sharding strategy does not yet provide a
190190
benefit, but it may later in case other sharding strategies are added.
191191

192-
The [OneShard](architecture-deployment-modes-cluster-architecture.html#oneshard)
192+
The [OneShard](deployment-oneshard.html)
193193
feature does not have its own sharding strategy, it uses `hash` instead.
194194

195195
Moving/Rebalancing _shards_

3.11/aql/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,5 +34,5 @@ It is a pure data manipulation language (DML), not a data definition language
3434
The syntax of AQL queries is different to SQL, even if some keywords overlap.
3535
Nevertheless, AQL should be easy to understand for anyone with an SQL background.
3636

37-
For some example queries, please refer to the chapters
38-
[Data Queries](data-queries.html) and [AQL query patterns and examples](examples.html).
37+
For example queries, see the [Data Queries](data-queries.html) and
38+
[Examples & Query Patterns](examples.html) chapters.

0 commit comments

Comments
 (0)
0