8000 add several configuration options for Pregel jobs (#1007) · arangodb/docs@76d000d · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

Commit 76d000d

Browse files
jsteemannansobolevamarkuspf
authored
add several configuration options for Pregel jobs (#1007)
* docs PR for arangodb/arangodb#16332 * Update 3.10/programs-arangod-pregel.md Co-authored-by: Markus Pfeiffer <markuspf@users.noreply.github.com> * Update 3.10/release-notes-new-features310.md Co-authored-by: Markus Pfeiffer <markuspf@users.noreply.github.com> * Update programs-arangod-pregel.md * Update release-notes-new-features310.md * Update release-notes-upgrading-changes310.md * link to new options from Pregel limits * Update graphs-pregel.md * Update graphs-pregel.md Co-authored-by: ansoboleva <93702078+ansoboleva@users.noreply.github.com> Co-authored-by: Markus Pfeiffer <markuspf@users.noreply.github.com>
1 parent 685f567 commit 76d000d

File tree

5 files changed

+199
-11
lines changed

5 files changed

+199
-11
lines changed

3.10/graphs-pregel.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -460,28 +460,44 @@ const handle = pregel.start("slpa", "yourgraph", {maxGSS: 100, resultField: "com
460460
pregel.status(handle);
461461
```
462462

463-
Limits
463+
Limitations
464464
------
465465

466-
Pregel algorithms in ArangoDB will by default store temporary vertex and edge
467-
data in main memory. For large datasets this is going to cause problems, as
468-
servers may run out of memory while loading the data.
466+
Depending on configuration, Pregel algorithms in ArangoDB may store temporary
467+
vertex and edge data in main memory. For large datasets this may cause
468+
problems, as servers may run out of memory while loading the data.
469469

470470
To avoid servers from running out of memory while loading the dataset, a Pregel
471-
job can be started with the attribute `useMemoryMaps` set to `true`. This will
472-
make the algorithm use memory-mapped files as a backing storage in case of huge
471+
job can be started with the `useMemoryMaps` attribute set to `true`. This
472+
makes the algorithm use memory-mapped files as a backing storage in case of huge
473473
datasets. Falling back to memory-mapped files might make the computation
474474
disk-bound, but may be the only way to complete the computation at all.
475475

476+
Starting from ArangoDB 3.10, there is also a new startup option
477+
[`--pregel.memory-mapped-files`](programs-arangod-pregel.html#pregel-memory-mapped-files-usage), which controls
478+
whether Pregel jobs use memory-mapped files by default.
479+
Out of the box, this option is set to `true`.
480+
In this case the computation can become disk-bound and it requires enough disk space
481+
capacity to be available to hold the memory-mapped files for the Pregel jobs.
482+
483+
You can also configure the storage location for Pregel's memory-mapped files with
484+
the [`--pregel.memory-mapped-files-location-type`](programs-arangod-pregel.html#pregel-memory-mapped-files-storage-location-type)
485+
startup option.
486+
487+
The selected storage location should have enough capacity to hold all the
488+
memory-mapped files for the Pregel jobs that are running on an instance.
489+
Note that the memory-mapped files are removed when a Pregel job completes,
490+
and they do not need to be persisted across instance restarts.
491+
476492
Parts of the Pregel temporary results (aggregated messages) may also be
477-
stored in main memory, and currently the aggregation cannot fall back to
493+
stored in the main memory, and currently the aggregation cannot fall back to
478494
memory-mapped files. That means if an algorithm needs to store a lot of
479-
result messages temporarily, it may consume a lot of main memory.
495+
result messages temporarily, it may consume a lot of the main memory.
480496

481497
In general it is also recommended to set the `store` attribute of Pregel jobs
482-
to `true` to make a job store its value on disk and not just in main memory.
483-
This way the results are removed from main memory once a Pregel job completes.
498+
to `true` to make a job store its value on disk and not just in the main memory.
499+
This way the results are removed from the main memory once a Pregel job completes.
484500
If the `store` attribute is explicitly set to `false`, result sets of completed
485-
Pregel runs will not be removed from main memory until the result set is
501+
Pregel runs are not removed from the main memory until the result set is
486502
explicitly discarded by a call to the `cancel()` method
487503
(or a shutdown of the server).

3.10/programs-arangod-pregel.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
layout: default
3+
---
4+
# ArangoDB Server Pregel Options
5+
6+
## Pregel job parallelism
7+
8+
Pregel jobs have configurable minimum, maximum, and default parallelism values.
9+
Administrators can use these parallelism options to set concurrency defaults and bounds
10+
for Pregel jobs on an instance level. Each individual Pregel job can set its own parallelism
11+
value using the job's `parallelism` option, but the job's effective parallelism is limited by
12+
the bounds defined by `--pregel.min-parallelism` and `--pregel.max-parallelism`.
13+
If a job does not set its `parallelism` value, it defaults to the parallelism value
14+
configured via `--pregel.parallelism`.
15+
16+
### Minimum parallelism
17+
18+
<small>Introduced in: v3.10.0</small>
19+
20+
`--pregel.min-parallelism`
21+
22+
Minimum parallelism usable in Pregel jobs. Defaults to `1`.
23+
Increasing the value of this option forces each Pregel job to run with at least this
24+
level of parallelism.
25+
26+
### Maximum parallelism
27+
28+
<small>Introduced in: v3.10.0</small>
29+
30+
`--pregel.max-parallelism`
31+
32+
Maximum parallelism usable in Pregel jobs. Defaults to the number of available cores.
33+
This option effectively limits the parallelism of each Pregel job to the specified value.
34+
35+
### Default parallelism
36+
37+
<small>Introduced in: v3.10.0</small>
38+
39+
`--pregel.parallelism`
40+
41+
Default parallelism to use in Pregel jobs. Defaults to the number of available cores
42+
divided by 4. The result will be limited to a value between 1 and 16.
43+
The default parallelism for a Pregel job is used only if the job does not set its
44+
`parallelism` attribute.
45+
46+
## Pregel memory-mapped files
47+
48+
By default, Pregel stores its temporary data in memory-mapped files on disk.
49+
Storing temporary data in memory-mapped files rather than in RAM has the advantage of
50+
lowering the RAM usage, which reduces the likelihood of out-of-memory situations.
51+
However, storing the files on disk requires a certain disk capacity, so that instead of running out
52+
of RAM, it is possible to run out of a disk space.
53+
54+
{% hint 'info' %}
55+
Please make sure to use a suitable storage location for Pregel's memory-mapped
56+
files.
57+
{% endhint %}
58+
59+
### Pregel memory-mapped files usage
60+
61+
<small>Introduced in: v3.10.0</small>
62+
63+
`--pregel.memory-mapped-files`
64+
65+
If set to `true`, Pregel jobs store their temporary data in disk-backed
66+
memory-mapped files. If set to `false`, the temporary data of Pregel jobs is buffered
67+
in RAM.
68+
The default value is `true`, meaning that memory-mapped files are used.
69+
You can override this option for each Pregel job by setting the `useMemoryMaps` attribute
70+
of the job.
71+
72+
### Pregel memory-mapped files storage location type
73+
74+
<small>Introduced in: v3.10.0</small>
75+
76+
`--pregel.memory-mapped-files-location-type`
77+
78+
This option configures the location for the memory-mapped files written by Pregel.
79+
This option is only meaningful, if memory-mapped files are used.
80+
The option can have one of the following values:
81+
82+
- `temp-directory`: store memory-mapped files in the temporary directory,
83+
as configured via `--temp.path`. If `--temp.path` is not set, the
84+
system's temporary directory is used.
85+
- `database-directory`: store memory-mapped files in a separate directory
86+
underneath the database directory.
87+
- `custom`: use a custom directory location for memory-mapped files. The
88+
exact location must be set via the `--pregel.memory-mapped-files-custom-path`
89+
configuration parameter.
90+
91+
The default location for Pregel's memory-mapped files is the temporary directory
92+
(`temp-directory`), which may not provide enough capacity for larger Pregel jobs.
93+
It may be more sensible to configure a custom directory for memory-mapped files
94+
and provide the necessary disk space there (`custom`).
95+
Such custom directory can be mounted on ephemeral storage, as the files are only
96+
needed temporarily. If a custom directory location is used, you need to specify
97+
the actual location via the `--pregel.memory-mapped-files-custom-path` parameter.
98+
99+
You can also use a subdirectory of the database directory
100+
as the storage location for the memory-mapped files (`database-directory`).
101+
The database directory often provides a lot of disk space capacity, but when
102+
Pregel's temporary files are stored in there too, it has to provide enough capacity
103+
to store both the regular database data and the Pregel files.
104+
105+
### Pregel memory-mapped files custom storage location
106+
107+
<small>Introduced in: v3.10.0</small>
108+
109+
`--pregel.memory-mapped-files-custom-path`
110+
111+
Specifies a custom directory location for Pregel's memory-mapped files.
112+
This setting can only be used, if the option `--pregel.memory-mapped-files-location-type`
113+
is set to `custom`. When used, the option has to contain the storage directory
114+
location as an absolute path.

3.10/release-notes-new-features310.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,29 @@ deployments will use RangeDeletes regardless of the value of this option.
265265
Note that it is not guaranteed that all truncate operations will use a RangeDelete operation.
266266
For collections containing a low number of documents, the O(n) truncate method may still be used.
267267

268+
### Pregel configration options
269+
270+
There are now several startup options to configure the parallelism of Pregel jobs:
271+
272+
- `--pregel.min-parallelism`: minimum parallelism usable in Pregel jobs.
273+
- `--pregel.max-parallelism`: maximum parallelism usable in Pregel jobs.
274+
- `--pregel.parallelism`: default parallelism to use in Pregel jobs.
275+
276+
Administrators can use these options to set concurrency defaults and bounds
277+
for Pregel jobs on an instance level.
278+
279+
There are also new startup options to configure the usage of memory-mapped files for Pregel
280+
temporary data:
281+
282+
- `--pregel.memory-mapped-files`: to specify whether to use memory-mapped files or RAM for
283+
storing temporary Pregel data.
284+
285+
- `--pregel.memory-mapped-files-location-type`: to set a location for memory-mapped
286+
files written by Pregel. This option is only meaningful, if memory-mapped
287+
files are used.
288+
289+
For more information on the new options, please refer to [ArangoDB Server Pregel Options](programs-arangod-pregel.html).
290+
268291
Miscellaneous changes
269292
---------------------
270293

3.10/release-notes-upgrading-changes310.md

Lines changed: 33 additions & 0 deletions
85
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,39 @@ It is possible to opt out of these changes and get back the memory and performan
85
of the previous versions by setting the `--rocksdb.cache-index-and-filter-blocks`
8686
and `--rocksdb.enforce-block-cache-size-limit` startup options to `false` on startup.
8787

88+
### Pregel options
89+
90+
Pregel jobs now have configurable minimum, maximum and default parallelism values. You can set them
91+
by the following startup options:
92+
93+
- `--pregel.min-parallelism`: minimum parallelism usable in Pregel jobs. Defaults to `1`.
94+
- `--pregel.max-parallelism`: maximum parallelism usable in Pregel jobs. Defaults to the
95+
number of available cores.
96+
- `--pregel.parallelism`: default parallelism to use in Pregel jobs. Defaults to the number
97+
of available cores divided by 4. The result will be clamped to a value between 1 and 16.
98+
99+
{% hint 'info' %}
100+
The default values of these options may differ from parallelism values effectively
101+
used by previous versions, so it is advised to explicitly set the desired parallelism
102+
values in ArangoDB 3.10.
103+
{% endhint %}
104+
105+
Pregel now also stores its temporary data in memory-mapped files on disk by default, whereas
106+
in previous versions the default behavior was to buffer it to RAM.
107+
Storing temporary data in memory-mapped files rather than in RAM has the advantage of lowering
108+
the RAM usage, which reduces the likelihood of out-of-memory situations.
109+
However, storing the files on disk requires disk capacity, so that instead of running out
110+
of RAM it is now possible to run out of disk space.
111+
112+
{% hint 'info' %}
113+
It is advised to set the storage location for Pregel's memory-mapped files explicitly
114+
in ArangoDB 3.10. The following startup options are available for the configuration of
115+
memory-mapped files: `--pregel.memory-mapped-files` and `--pregel.memory-mapped-files-location-type`.
116+
{% endhint %}
117+
118+
For more information on the new options, please refer to [ArangoDB Server Pregel Options](programs-arangod-pregel.html).
119+
120+
88121
Maximum Array / Object nesting
89122
------------------------------
90123

_data/3.10-manual.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,8 @@
8383
href: programs-arangod-network.html
8484
- text: Nonce
8585
href: programs-arangod-nonce.html
86+
- text: Pregel
87+
href: programs-arangod-pregel.html
8688
- text: Query
8789
href: programs-arangod-query.html
8890
- text: Random

0 commit comments

Comments
 (0)
0