8000 Add docs for multi-dimensional index type (#633) · arangodb/docs@d13595a · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

Commit d13595a

Browse files
author
Lars Maier
authored
Add docs for multi-dimensional index type (#633)
No arangosh example(s) yet, but a refactor of the entire indexing section is planned anyway. Furthermore, remove mention of volatile skiplist indexes
1 parent c998c9a commit d13595a

9 files changed

+166
-7
lines changed

3.7/indexing-persistent.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -184,5 +184,3 @@ the sort order of those which are returned can be wrong (whenever the persistent
184184
index is consulted).
185185

186186
To fix persistent indexes after a language change, delete and re-create them.
187-
Skiplist indexes are not affected, because they are not persisted and
188-
automatically rebuilt on every server start.

3.8/indexing-persistent.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -184,5 +184,3 @@ the sort order of those which are returned can be wrong (whenever the persistent
184184
index is consulted).
185185

186186
To fix persistent indexes after a language change, delete and re-create them.
187-
Skiplist indexes are not affected, because they are not persisted and
188-
automatically rebuilt on every server start.

3.9/http/indexes-multi-dim.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
layout: default
3+
---
4+
Working with multi-dimensional Indexes
5+
=======================================
6+
7+
{% docublock post_api_index_zkd %}

3.9/indexing-multi-dim.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
layout: default
3+
description: A multi dimensional index allows to efficiently intersect multiple range queries
4+
---
5+
# Multi-dimensional indexes
6+
7+
The multi-dimensional index type (also called ZKD) provided by ArangoDB can be
8+
used to efficiently intersect multiple range queries.
9+
10+
A multi-dimensional index is setup by setting the index type to `"zkd"`.
11+
The `fields` attribute describes which fields are used as dimensions.
12+
The value of each dimension has to be a numeric (double) value.
13+
14+
## Querying documents within a 3D box
15+
16+
Assume we have documents in a collection `points` of the form
17+
18+
```json
19+
{"x": 12.9, "y": -284.0, "z": 0.02}
20+
```
21+
22+
and we want to query all documents that contained within a box defined by
23+
`[x0, x1] * [y0, y1] * [z0, z1]`.
24+
25+
To do so one creates a multi-dimensional index on the attributes `x`, `y` and
26+
`z`, e.g. in _arangosh_:
27+
28+
```js
29+
db.collection.ensureIndex({
30+
type: "zkd",
31+
fields: ["x", "y", "z"],
32+
fieldValueTypes: "double"
33+
});
34+
```
35+
36+
Unlike for other indexes the order of the fields does not matter.
37+
38+
`fieldValueTypes` is required and the only allowed value is `"double"`.
39+
Future extensions of the index will allow other types.
40+
41+
Now we can use the index in a query:
42+
43+
```js
44+
FOR p IN points
45+
FILTER x0 <= p.x && p.x <= x1
46+
FILTER y0 <= p.y && p.y <= y1
47+
FILTER z0 <= p.z && p.z <= z1
48+
RETURN p
49+
```
50+
51+
## Possible range queries
52+
53+
Having an index on a set of fields does not require you to specify a full range
54+
for every field. For each field you can decide if you want to bound
55+
it from both sides, from one side only (i.e. only an upper or lower bound)
56+
or not bound it at all.
57+
58+
Furth A93C ermore you can use any comparison operator. The index supports `<=` and `>=`
59+
naturally, `==` will be translated to the bound `[c, c]`. Strict comparison
60+
is translated to their non-strict counterparts and a post-filter is inserted.
61+
62+
```js
63+
FOR p IN points
64+
FILTER 2 <= p.x && p.x < 9
65+
FILTER y0 >= 80
66+
FILTER p.z == 4
67+
RETURN p
68+
```
69+
70+
## Example Use Case
71+
72+
If you build a calendar using ArangoDB you could create a collection for each user
73+
that contains her appointments. The documents would roughly look as follows:
74+
75+
```json
76+
{
77+
"from": 345365,
78+
"to": 678934,
79+
"what": "Dentist",
80+
}
81+
```
82+
83+
`from`/`to` are the timestamps when an appointment starts/ends. Having an
84+
multi-dimensional index on the fields `["from", "to"]` allows you to query
85+
for all appointments within a given time range efficiently.
86+
87+
### Finding all appointments within a time range
88+
89+
Given a time range `[f, t]` we want to find all appointments `[from, to]` that
90+
are completely contained in `[f, t]`. Those appointments clearly satisfy the
91+
condition
92+
93+
```
94+
f <= from and to <= t
95+
```
96+
97+
Thus our query would be:
98+
99+
```js
100+
FOR app IN appointments
101+
FILTER f <= app.from
102+
FILTER app.to <= t
103+
RETURN app
104+
```
105+
106+
### Finding all appointments that intersect a time range
107+
108+
Given a time range `[f, t]` we want to find all appointments `[from, to]` that
109+
intersect `[f, t]`. Two intervals `[f, t]` and `[from, to]` intersect if
110+
and only if
111+
112+
```
113+
a_2 <= b_1 and a_1 <= b_2
114+
```
115+
116+
Thus our query would be:
117+
118+
```js
119+
FOR app IN appointments
120+
FILTER f <= app.to
121+
FILTER app.from <= t
122+
RETURN app
123+
```
124+
125+
## Limitations
126+
127+
Currently there are a few limitations:
128+
129+
- Using array expansions for attributes is not possible (e.g. `array[*].attr`)
130+
- The `sparse` property is not supported.
131+
- You can only index numeric values that are representable as IEEE-754 double.
132+
- A high number of dimensions (more than 5) can impact the performance considerably.
133+
- The performance can vary depending on the dataset. Densely packed points can
134+
lead to a high number of seeks. This behavior is typical for indexing using
135+
space filling curves.

3.9/indexing-persistent.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -184,5 +184,3 @@ the sort order of those which are returned can be wrong (whenever the persistent
184184
index is consulted).
185185

186186
To fix persistent indexes after a language change, delete and re-create them.
187-
Skiplist indexes are not affected, because they are not persisted and
188-
automatically rebuilt on every server start.

3.9/indexing-which-index.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,24 @@ different usage scenarios:
9494
result TTL indexes will likely not be used for filtering and sort operations in user-land
9595
AQL queries.
9696

97+
- **multi-dimensional index** (ZKD): a multi dimensional index allows to
98+
efficiently intersect multiple range queries. Typical use cases are querying
99+
intervals that intersect a given point or interval. For example, if intervals
100+
are stored in documents like
101+
102+
```json
103+
{ "from": 12, "to": 45 }
104+
```
105+
106+
then you can create an index over `from, to` utilize it with this query:
107+
108+
```js
109+
FOR i IN intervals FILTER i.from <= t && t <= i.to RETURN i
110+
```
111+
112+
Currently only floating-point numbers (doubles) are supported as underlying
113+
type for each dimension.
114+
97115
- **Geo index**: the geo index provided by ArangoDB allows searching for documents
98116
within a radius around a two-dimensional earth coordinate (point), or to
EED3
99117
find documents with are closest to a point. Document coordinates can either
@@ -110,7 +128,7 @@ different usage scenarios:
110128
and a SORT or FILTER statement is used in conjunction with the distance
111129
function.
112130

113-
- **fulltext index**: a fulltext index can be used to index all words contained in
131+
- **fulltext index**: a fulltext index can be used to index all words contained in
114132
a specific attribute of all documents in a collection. Only words with a
115133
(specifiable) minimum length are indexed. Word tokenization is done using
116134
the word boundary analysis provided by libicu, which is taking into account

3.9/indexing.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,6 @@ There are special sections for
1616
- [Persistent Indexes](indexing-persistent.html)
1717
- [TTL Indexes](indexing-ttl.html)
1818
- [Fulltext Indexes](indexing-fulltext.html)
19+
- [Multi-dimensional Indexes](indexing-multi-dim.html)
1920
- [Geo-spatial Indexes](indexing-geo.html)
2021
- [Vertex-centric Indexes](indexing-vertex-centric.html)

_data/3.9-http.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,8 @@
8282
href: indexes-persistent.html
8383
- text: TTL
8484
href: indexes-ttl.html
85+
- text: Multi-dimensional
86+
href: indexes-multi-dim.html
8587
- text: Geo-Spatial
8688
href: indexes-geo.html
8789
- text: Fulltext

_data/3.9-manual.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,8 @@
291291
href: indexing-ttl.html
292292
- text: Fulltext Indexes
293293
href: indexing-fulltext.html
294+
- text: Multi-dimensional Indexes
295+
href: indexing-multi-dim.html
294296
- text: Geo-spatial Indexes
295297
href: indexing-geo.html
296298
- text: Vertex Centric Indexes

0 commit comments

Comments
 (0)
0