8000 DOC-145: Added documentation for APM-84 by cpjulia · Pull Request #1060 · arangodb/docs · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

DOC-145: Added documentation for APM-84 #1060

Merged
merged 13 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions 3.10/aql/invocation-with-arangosh.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,6 +242,44 @@ In the ArangoDB Enterprise Edition there is an additional parameter:
into sync. The default value is 60.0 (seconds). When the max time has been reached
the query will be stopped.

Additional parameters for spilling data from the query onto disk
-----------------------------------------------------------------

There are two additional parameters that would allow for spilling
intermediate data from the query onto disk for descreasing the memory usage.
Note: The option of spilling data from RAM onto disk is experimental and off
by default and this parameter currently only has effect for sorting, meaning
for a query that would use the keyword SORT, but without LIMIT.
Also, the query results will still be built up entirely in RAM on coordinators
and single servers for non-streaming queries. In order to avoid the buildup of
the entire query result in RAM, a streaming query should be used.


- `spillOverThresholdNumRows`: This parameter allows for input data and
intermediate results to be spilled onto disk for the execution of a query
after the number of rows reaches the value this parameter holds. This is
used for decreasing the memory usage during the query execution. In a query
that iterates over a collection that contains documents, each row would be a
document and, in a query that iterates over temporary values
(i.e. `FOR i IN 1..100`), each row would be one of such temporary values.
This parameter is experimental and is only taken into account if a path for a
directory where the temporary data would be stored is provided with the server
startup option
`--temp.intermediate-results-path`.
Default value: 5000000 rows.


- `spillOverThresholdNumRows`: This parameter allows for input data and
intermediate results to be spilled onto disk for the execution of a query
after the memory usage reaches the value in bytes this parameter holds. This
is used for decreasing the memory usage during the query execution. This
parameter is experimental and is only taken into account if a path for a
directory where the temporary data would be stored is provided with the
server startup option
`--temp.intermediate-results-path`.
Default value: 128MB.


With _createStatement (ArangoStatement)
---------------------------------------

Expand Down
60 changes: 60 additions & 0 deletions 3.10/programs-arangod-query.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,3 +314,63 @@ query results cache, as queries on system collections are internal to ArangoDB a
only use space in the query results cache unnecessarily.

The default value is *false*.


## AQL Query with spilling input data to disk

With the parameters mentioned below, queries can execute with storing input
and intermediate results temporarily on disk for decreasing of memory usage
when it reaches a certain threshold.
Note: this feature is experimental and off by default. The query results will
still be built up entirely in RAM on coordinators and single servers for
non-streaming queries. In order to avoid the buildup of the entire query
result in RAM, a streaming query should be used.

The threshold value mentioned to start spilling data onto disk is either
the number of rows in the query input or the amount of memory usage in bytes,
which are set as query options.
The main parameter that must be provided for this feature to be active is

`--temp.intermediate-results-path`.

This parameter specifies a path to a directory that will be used for temporary
storage of data. If such path is not provided, the feature of spilling data
onto the disk will not be activated.
Hence, the following parameters would not have effect unless the parameter
mentioned above is provided with a directory path.
The directory specified here must not be located underneath the instance's
database directory.


`--temp.-intermediate-results-encryption-hardware-acceleration`

Use Intel intrinsics-based encryption, requiring a CPU with the AES-NI
instruction set. If turned off, then OpenSSL is used, which may use hardware-
accelarated encryption too.
Default: *true*.

`--temp.intermediate-results-capacity`

Maximum capacity, in bytes to use for ephemeral, intermediate results, meaning
the maximum size allowed for the mentioned temporary storage.
Default: 0 (unlimited)

`--temp.intermediate-results-encryption`

Encrypt ephemeral, intermediate results on disk.
Default: *false*.


`--temp.intermediate-results-spillover-threshold-num-rows`:
Number of result rows from which on a spillover from RAM to disk will happen.
Default: *5000000*.

`--temp.intermediate-results-spillover-threshold-memory-usage`:
memory usage, in bytes, after which a spillover from RAM to disk will happen.
Default: *128MB*.






47 changes: 47 additions & 0 deletions 3.10/release-notes-new-features310.md
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,53 @@ arangoexport \
...
```


Query changes for decreasing memory usage
-----------------------------------------

With the parameters mentioned below, queries can execute with storing input
and intermediate results temporarily on disk for decreasing of memory usage
when it reaches a certain threshold.
Note: this feature is experimental and off by default. The query results will
still be built up entirely in RAM on coordinators and single servers for
non-streaming queries. In order to avoid the buildup of the entire query
result in RAM, a streaming query should be used. Also, this feature will
only take effect currently for AQL SORT operation without LIMIT.

The threshold value mentioned to start spilling data onto disk is either
the number of rows in the query input or the amount of memory usage in bytes,
which are set as query options `spillOverThresholdNumRows` and
`spillOverThresholdNumRows`.

The main parameter that must be provided for this feature to be active is

`--temp.intermediate-results-path`: specifies a path to a directory that will
be used for temporary storage of data. If such path is not provided, the
feature of spilling data onto the disk will not be activated.
Hence, the following parameters would not have effect unless the parameter
mentioned above is provided with a directory path.
The directory specified here must not be located underneath the instance's
database directory.

`--temp.-intermediate-results-encryption-hardware-acceleration`: use Intel
intrinsics-based encryption, requiring a CPU with the AES-NI instruction set.
If turned off, then OpenSSL is used, which may use hardware-accelarated
encryption too. Default: true.

`--temp.intermediate-results-capacity`: maximum capacity, in bytes to use for ephemeral,
intermediate results. Default: 0 (unlimited)

`--temp.intermediate-results-encryption`: encrypt ephemeral, intermediate
results on disk. Default: false.

`--temp.intermediate-results-spillover-threshold-num-rows`: number of
result rows from which on a spillover from RAM to disk will happen.

`--temp.intermediate-results-spillover-threshold-memory-usage`: memory
usage (in bytes) after which a spillover from RAM to disk will happen.



Internal changes
----------------

Expand Down
0