8000 SORT operation unexpectedly reduces documents count · Issue #20127 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

SORT operation unexpectedly reduces documents count #20127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Encouse opened this issue Nov 13, 2023 · 4 comments
Closed

SORT operation unexpectedly reduces documents count #20127

Encouse opened this issue Nov 13, 2023 · 4 comments

Comments

@Encouse
Copy link
Encouse commented Nov 13, 2023

My Environment

  • ArangoDB Version: 3.10.4
  • Deployment Mode: Cluster
  • Deployment Strategy: ArangoDB Starter
  • Configuration: 5 agents 3 dbs 3 coords
  • Infrastructure: own virtual machines, one machine per node
  • Operating System: Ubuntu 20.04
  • Total RAM in your machine: 16Gb for dbs and coords, 4gb for agents
  • Disks in use: SSD
  • Used Package: Debian or Ubuntu .deb

Component, Query & Data

Affected feature:
AQL query using web interface

AQL query (if applicable):

     WITH baremetal, virtual_machine, cable  FOR defaultDoc IN allObjectsView SEARCH defaultDoc.default.deleted.value == false  OPTIONS {collections: ['baremetal', 'virtual_machine', 'cable']} SORT defaultDoc.default.name.value LIMIT 3120, 10 RETURN 1

AQL explain and/or profile (if applicable):

EXPLAIN:
Query String (251 chars, cacheable: true):
WITH baremetal, virtual_machine, cable FOR defaultDoc IN allObjectsView SEARCH
defaultDoc.default.deleted.value == false OPTIONS {collections: ['baremetal', 'virtual_machine',
'cable']} SORT defaultDoc.default.name.value LIMIT 3120, 10 RETURN 1

Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
2 EnumerateViewNode DBS 263237 - FOR defaultDoc IN allObjectsView SEARCH (defaultDoc.default.deleted.value == false) /* view query /
3 CalculationNode DBS 263237 - LET #1 = defaultDoc.default.name.value /
attribute expression /
4 SortNode DBS 263237 - SORT #1 ASC /
sorting strategy: constrained heap /
12 LimitNode DBS 3130 - LIMIT 0, 3130
10 RemoteNode COOR 3130 - REMOTE
11 GatherNode COOR 3130 - GATHER #1 ASC /
parallel /
5 LimitNode COOR 10 - LIMIT 3120, 10
6 CalculationNode COOR 10 - LET #3 = 1 /
json expression / / const assignment */
7 ReturnNode COOR 10 - RETURN #3

Indexes used:
none

Optimization rules applied:
Id RuleName
1 handle-arangosearch-views
2 scatter-in-cluster
3 distribute-filtercalc-to-cluster
4 distribute-sort-to-cluster
5 remove-unnecessary-remote-scatter
6 sort-limit
7 parallelize-gather

53 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00060

PROFILE:
Query String (251 chars, cacheable: false):
WITH baremetal, virtual_machine, cable FOR defaultDoc IN allObjectsView SEARCH
defaultDoc.default.deleted.value == false OPTIONS {collections: ['baremetal', 'virtual_machine',
'cable']} SORT defaultDoc.default.name.value LIMIT 3120, 10 RETURN 1

Execution plan:
Id NodeType Site Calls Items Filtered Runtime [s] Comment
1 SingletonNode DBS 2 2 0 0.00002 * ROOT
2 EnumerateViewNode DBS 35 34149 0 0.12129 - FOR defaultDoc IN allObjectsView SEARCH (defaultDoc.default.deleted.value == false) /* view query /
3 CalculationNode DBS 35 34149 0 0.01864 - LET #1 = defaultDoc.default.name.value /
attribute expression /
4 SortNode DBS 2 2000 0 0.01351 - SORT #1 ASC /
sorting strategy: constrained heap /
12 LimitNode DBS 2 2000 0 0.00005 - LIMIT 0, 3130
10 RemoteNode COOR 6 2000 0 0.00288 - REMOTE
11 GatherNode COOR 3 2000 0 0.02103 - GATHER #1 ASC /
parallel /
5 LimitNode COOR 3 0 0 0.00001 - LIMIT 3120, 10
6 CalculationNode COOR 3 0 0 0.00000 - LET #3 = 1 /
json expression / / const assignment */
7 ReturnNode COOR 3 0 0 0.00001 - RETURN #3

Indexes used:
none

Optimization rules applied:
Id RuleName
1 handle-arangosearch-views
2 scatter-in-cluster
3 distribute-filtercalc-to-cluster
4 distribute-sort-to-cluster
5 remove-unnecessary-remote-scatter
6 sort-limit
7 parallelize-gather

Query Statistics:
Writes Exec Writes Ign Scan Full Scan Index Cache Hits/Misses Filtered Requests Peak Mem [b] Exec Time [s]
0 0 0 34149 0 / 0 0 6 3047424 0.14013

Query Profile:
Query Stage Duration [s]
initializing 0.00001
parsing 0.00021
optimizing ast 0.00001
loading collections 0.00001
instantiating plan 0.00004
optimizing plan 0.00446
executing 0.13540
finalizing 0.00162

Dataset:
Can't provide dataset as it's private (

Size of your Dataset on disk:
3GB

Replication Factor & Number of Shards (Cluster only):
Replication factor - 3
Shards - 1

Steps to reproduce

  1. Execute a query on search-alias view using SORT and LIMIT statements and multiple collections in SEARCH OPTIONS

Problem:
SORT operation limits results to 2000, however if I remove sort it acts normal (working on whole dataset provided by search-alias view)

Expected result:
SORT doesn't limit results

@jsteemann
Copy link
Contributor

Hi @Encouse , thanks for reporting this issue.
A few issues related to LIMIT have been fixed since 3.10.4. I can't say if your exact issue is covered by one of these fixes, but it would be nice if you try out the latest 3.10 release (3.10.11) and retry with that.
Thanks!

@Encouse
Copy link
Author
Encouse commented Nov 14, 2023

Thanks, I'll check it out and leave additional comments!

@Encouse
Copy link
Author
Encouse commented Nov 21, 2023

Now it works as expected, thanks!

@Encouse Encouse closed this as completed Nov 21, 2023
@jsteemann
Copy link
Contributor

Happy to read this!
Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0