@@ -6,115 +6,122 @@ this extra collection to create a result from two collections.
6
6
7
7
First of all, we'll query a few users together with their friends' ids. For that,
8
8
we'll use all *relations* that have a value of *friend* in their *type* attribute.
9
- Relationships are established by using the *from * and *to * attributes in the
10
- *relations* collection, which point to the *id * values in the *users* collection.
9
+ Relationships are established by using the *friendOf * and *thisUser * attributes in the
10
+ *relations* collection, which point to the *userId * values in the *users* collection.
11
11
12
12
!SUBSECTION Join tuples
13
13
14
- We'll start with a SQL-ish result set and return each tuple (user name, friend id )
14
+ We'll start with a SQL-ish result set and return each tuple (user name, friends userId )
15
15
10000
separately. The AQL query to generate such result is:
16
16
17
17
```js
18
- FOR u IN users
19
- FILTER u.active == true
20
- LIMIT 0, 4
21
- FOR f IN relations
22
- FILTER f.type == "friend" && f.from == u.id
23
- RETURN {
24
- "user" : u.name,
25
- "friendId" : f.to
18
+ FOR u IN users
19
+ FILTER u.active == true
20
+ LIMIT 0, 4
21
+ FOR f IN relations
22
+ FILTER f.type == "friend" && f.friendOf == u.userId
23
+ RETURN {
24
+ "user" : u.name,
25
+ "friendId" : f.thisUser
26
26
}
27
27
28
- [
29
- {
30
- "user" : "Abigail",
31
- "friendId" : 108
32
- },
33
- {
34
- "user" : "Abigail",
35
- "friendId" : 102
36
- },
37
- {
38
- "user" : "Abigail",
39
- "friendId" : 106
40
- },
41
- {
42
- "user" : "Fred",
43
- "friendId" : 209
44
- },
45
- {
46
- "user" : "Mary",
47
- "friendId" : 207
48
- },
49
- {
50
- "user" : "Mary",
51
- "friendId" : 104
52
- },
53
- {
54
- "user" : "Mariah",
55
- "friendId" : 203
56
- },
57
- {
58
- "user" : "Mariah",
59
- "friendId" : 205
60
- }
28
+ [
29
+ {
30
+ "user" : "Abigail",
31
+ "friendId" : 108
32
+ },
33
+ {
34
+ "user" : "Abigail",
35
+ "friendId" : 102
36
+ },
37
+ {
38
+ "user" : "Abigail",
39
+ "friendId" : 106
40
+ },
41
+ {
42
+ "user" : "Fred",
43
+ "friendId" : 209
44
+ },
45
+ {
46
+ "user" : "Mary",
47
+ "friendId" : 207
48
+ },
49
+ {
50
+ "user" : "Mary",
51
+ "friendId" : 104
52
+ },
53
+ {
54
+ "user" : "Mariah",
55
+ "friendId" : 203
56
+ },
57
+ {
58
+ "user" : "Mariah",
59
+ "friendId" : 205
60
+ }
61
61
]
62
62
```
63
63
64
+ We iterate over the collection users. Only the 'active' users will be examined.
65
+ For each of these users we will search for up to 4 friends. We locate friends
66
+ by comparing the *userId* of our current user with the *friendOf* attribute of the
<
8000
/code>
67
+ *relations* document. For each of those relations found we return the users name
68
+ and the userId of the friend.
69
+
70
+
64
71
!SUBSECTION Horizontal lists
65
72
66
73
67
- Note that in the above result, a user might be returned multiple times. This is the
74
+ Note that in the above result, a user can be returned multiple times. This is the
68
75
SQL way of returning data. If this is not desired, the friends' ids of each user
69
76
can be returned in a horizontal list. This will return each user at most once.
70
77
71
78
The AQL query for doing so is:
72
79
73
80
```js
74
- FOR u IN users
75
- FILTER u.active == true LIMIT 0, 4
76
- RETURN {
77
- "user" : u.name,
81
+ FOR u IN users
82
+ FILTER u.active == true LIMIT 0, 4
83
+ RETURN {
84
+ "user" : u.name,
78
85
"friendIds" : (
79
- FOR f IN relations
80
- FILTER f.from == u.id && f.type == "friend"
81
- RETURN f.to
86
+ FOR f IN relations
87
+ FILTER f.friendOf == u.userId && f.type == "friend"
88
+ RETURN f.thisUser
82
89
)
83
90
}
84
91
85
- [
86
- {
87
- "user" : "Abigail",
88
- "friendIds" : [
89
- 108,
90
- 102,
91
- 106
92
- ]
93
- },
94
- {
95
- "user" : "Fred",
96
- "friendIds" : [
97
- 209
98
- ]
99
- },
100
- {
101
- "user" : "Mary",
102
- "friendIds" : [
103
- 207,
104
- 104
105
- ]
106
- },
107
- {
108
- "user" : "Mariah",
109
- "friendIds" : [
110
- 203,
111
- 205
112
- ]
<
3E95
tr class="diff-line-row">113
- }
92
+ [
93
+ {
94
+ "user" : "Abigail",
95
+ "friendIds" : [
96
+ 108,
97
+ 102,
98
+ 106
99
+ ]
100
+ },
101
+ {
102
+ "user" : "Fred",
103
+ "friendIds" : [
104
+ 209
105
+ ]
106
+ },
107
+ {
108
+ "user" : "Mary",
109
+ "friendIds" : [
110
+ 207,
111
+ 104
112
+ ]
113
+ },
114
+ {
115
+ "user" : "Mariah",
116
+ "friendIds" : [
117
+ 203,
118
+ 205
119
+ ]
120
+ }
114
121
]
115
122
```
116
123
117
- In this query we are still iterating over the users in the *users* collection
124
+ In this query we are still iterating over the users in the *users* collection
118
125
and for each matching user we are executing a sub-query to create the matching
119
126
list of related users.
120
127
@@ -124,48 +131,112 @@ To not only return friend ids but also the names of friends, we could "join" the
124
131
*users* collection once more (something like a "self join"):
125
132
126
133
```js
127
- FOR u IN users
128
- FILTER u.active == true
129
- LIMIT 0, 4
130
- RETURN {
131
- "user" : u.name,
134
+ FOR u IN users
135
+ FILTER u.active == true
136
+ LIMIT 0, 4
137
+ RETURN {
138
+ "user" : u.name,
132
139
"friendIds" : (
133
- FOR f IN relations
134
- FILTER f.from == u.id && f.type == "friend"
135
- FOR u2 IN users
136
- FILTER f.to == u2.id
140
+ FOR f IN relations
141
+ FILTER f.friendOf == u.userId && f.type == "friend"
142
+ FOR u2 IN users
143
+ FILTER f.thisUser == u2.useId
137
144
RETURN u2.name
138
- )
139
- }
140
-
141
- [
142
- {
143
- "user" : "Abigail",
144
- "friendIds" : [
145
- "Jim",
146
- "Jacob",
147
- "Daniel"
148
- ]
149
- },
150
- {
151
- "user" : "Fred",
152
- "friendIds" : [
153
- "Mariah"
154
- ]
155
- },
156
- {
157
- "user" : "Mary",
158
- "friendIds" : [
159
- "Isabella",
160
- "Michael"
161
- ]
162
- },
163
- {
164
- "user" : "Mariah",
165
- "friendIds" : [
166
- "Madison",
167
- "Eva"
168
- ]
169
- }
145
+ )
146
+ }
147
+
148
+ [
149
+ {
150
+ "user" : "Abigail",
151
+ "friendIds" : [
152
+ "Jim",
153
+ "Jacob",
154
+ "Daniel"
155
+ ]
156
+ },
157
+ {
158
+ "user" : "Fred",
159
+ "friendIds" : [
160
+ "Mariah"
161
+ ]
162
+ },
163
+ {
164
+ "user" : "Mary",
165
+ "friendIds" : [
166
+ "Isabella",
167
+ "Michael"
168
+ ]
169
+ },
170
+ {
171
+ "user" : "Mariah",
172
+ "friendIds" : [
173
+ "Madison",
174
+ "Eva"
175
+ ]
176
+ }
177
+ ]
178
+ ```
179
+
180
+ This query will then again in term fetch the clear text name of the
181
+ friend from the users collection. So here we iterate the users collection,
182
+ and for each hit the relations collection, and for each hit once more the
183
+ users collection.
184
+
185
+ !SUBSECTION Outer joins
186
+
187
+ Lets find the lonely people in our database - those without friends.
188
+
189
+ ```js
190
+
191
+ FOR user IN users
192
+ LET friendList = (
193
+ FOR f IN relations
194
+ FILTER f.friendOf == u.userId
195
+ RETURN 1
196
+ )
197
+ FILTER LENGTH(friendList) == 0
198
+ RETURN { "user" : user.name }
199
+
200
+ [
201
+ {
202
+ "user" : "Abigail"
203
+ },
204
+ {
205
+ "user" : "Fred"
206
+ }
207
+ ]
208
+ ```
209
+
210
+ So, for each user we pick the list of her friends and count them. The ones where
211
+ count equals zero are the lonely people. Using *RETURN 1* in the sub-query
212
+ saves even more precious CPU cycles and gives the optimizer more alternatives.
213
+
214
+ !SUBSECTION Pitfalls
215
+
216
+ Since we're free of schematas, there is no way to tell the format of the
217
+ documents. So, if your documents don't contain an attribute, it defaults to
218
+ null. We can however check our data for accuracy like this:
219
+
220
+ ```
221
+ RETURN LENGTH(FOR u IN users FILTER u.userId == null RETURN 1)
222
+ [
223
+ 10000
224
+ ]
225
+ RETURN LENGTH(FOR f IN relations FILTER f.friendOf == null RETURN 1)
226
+ [
227
+ 10000
170
228
]
171
229
```
230
+
231
+ So that the above queries return 10k matches each, the result of i.e. the Join
232
+ tuples query will become 100.000.000 items large and will use much memory plus
233
+ compution time. So it is generaly a good idea to revalidate that the criteria
234
+ for your join conditions exist.
235
+
236
+ Using indices on the properties can speed up the operation significantly.
237
+ ou can use the explain helper to revalidate your query actualy uses them.
238
+
239
+ If you work with joins on edge collections you would typicaly aggregate over
240
+ the internal fields *_id*, *_from* and *_to* (where *_id* equals *userId*,
241
+ *_from* *friendOf* and *_to* would be *thisUser* in our examples). Arangodb
242
+ implicitely creates indices on them.
0 commit comments