8000 Collect aggregate average in arango cluster returns null · Issue #19803 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Collect aggregate average in arango cluster returns null #19803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thomashassan35 opened this issue Sep 21, 2023 &middo 8000 t; 1 comment
Open

Collect aggregate average in arango cluster returns null #19803

thomashassan35 opened this issue Sep 21, 2023 · 1 comment
Labels
1 Bug 3 AQL Query language related 3 Cluster 3 Optimizer AQL query optimizer is involved

Comments

@thomashassan35
Copy link
thomashassan35 commented Sep 21, 2023

My Environment

  • ArangoDB Version: 3.10.8
  • Deployment Mode: Cluster
  • Deployment Strategy: ArangoDB Starter in Docker
  • Configuration: 3 cluster nodes
  • Infrastructure: own
  • Operating System: Ubuntu 20.04
  • Total RAM in your machine: 8Gb/node
  • Disks in use: SSD
  • Used Package: Docker

Component, Query & Data

Affected feature:

Aggregation query

AQL query (if applicable):

FOR alias_a IN avatar
  FILTER alias_a.domain == "http://www.example.com/"
  COLLECT AGGREGATE my_average = AVG(alias_a.value)
  LET aggregates = [{ "name":"average", "value":my_average}]
  FOR agg in aggregates return agg

Alternative query (same behaviour /error)

FOR alias_a IN avatar
  FILTER alias_a.domain == "http://www.example.com/" AND alias_a.value !=null
  COLLECT AGGREGATE my_average = AVG(alias_a.value) 
  LET aggregates = [{ "name":"average", "value":my_average}]
  FOR agg in aggregates return agg

AQL explain and/or profile (if applicable):

Query String (250 chars, cacheable: false):
 FOR alias_a IN avatar 
 FILTER alias_a.domain == "http://www.example.com/" AND alias_a.value !=null 
 COLLECT AGGREGATE my_average = AVG(alias_a.value)  LET aggregates = [{    "name":"average",    
 "value":my_average}] FOR agg in aggregates return agg

Execution plan:
 Id   NodeType            Site  Calls   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode       DBS       3       3          0       0.00001   * ROOT
 10   IndexNode           DBS       3       2          6       0.00028     - FOR alias_a IN avatar   /* persistent index scan, index scan + document lookup (projections: `value`), 3 shard(s) */    FILTER (alias_a.`value` != null)   /* early pruning */   
  5   CalculationNode     DBS       3       2          0       0.00001       - LET #6 = alias_a.`value`   /* attribute expression */   /* collections used: alias_a : avatar */
 16   CollectNode         DBS       3       3          0       0.00001       - COLLECT AGGREGATE #8 = AVERAGE_STEP1(#6)   /* sorted */
 14   RemoteNode          COOR      2       3          0       0.00022       - REMOTE
 15   GatherNode          COOR      2       3          0       0.00001       - GATHER   /* parallel, unsorted */
  6   CollectNode         COOR      2       1          0       0.00001       - COLLECT AGGREGATE my_average = AVERAGE_STEP2(#8)   /* sorted */
  7   CalculationNode     COOR      2       1          0       0.00001       - LET aggregates = [ { "name" : "average", "value" : my_average } ]   /* simple expression */
  8   EnumerateListNode   COOR      2       1          0       0.00001       - FOR agg IN aggregates   /* list iteration */
  9   ReturnNode          COOR      2       1          0       0.00001         - RETURN agg

Indexes used:
 By   Name                       Type         Collection   Unique   Sparse   Cache   Selectivity   Fields         Stored values   Ranges
 10   avatar_domain_hash_index   persistent   avatar       false    false    false        0.14 %   [ `domain` ]   [  ]            (alias_a.`domain` == "http://www.example.com/")

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   use-indexes
  6   remove-filter-covered-by-index
  7   scatter-in-cluster
  8   distribute-filtercalc-to-cluster
  9   move-filters-into-enumerate
 10   remove-unnecessary-remote-scatter
 11   collect-in-cluster
 12   reduce-extraction-to-projection
 13   parallelize-gather

Query Statistics:
 Writes Exec   Writes Ign   Scan Full   Scan Index   Cache Hits/Misses   Filtered   Requests   Peak Mem [b]   Exec Time [s]
           0            0           0            8               0 / 0          6          3          32768         0.00437

Query Profile:
 Query Stage           Duration [s]
 initializing               0.00000
 parsing                    0.00008
 optimizing ast             0.00001
 loading collections        0.00000
 instantiating plan         0.00004
 optimizing plan            0.00364
 executing                  0.00060
 finalizing                 0.00029

Dataset:

[
	{
		"_key": "2233",
		"_id": "avatar/2233",
		"_rev": "_gpdC9MW---",
		"domain": "http://www.example.com/",
		"value": 200
	},
	{
		"_key": "2245",
		"_id": "avatar/2245",
		"_rev": "_gpdDE4W---",
		"domain": "http://www.example.com/",
		"value": 100
	},
	{
		"_key": "2253",
		"_id": "avatar/2253",
		"_rev": "_gpdDMLu---",
		"domain": "http://www.example.com/",
		"value": null
	}
]

Size of your Dataset on disk:

N/A too small

Replication Factor & Number of Shards (Cluster only):

Only one collection (avatar)

Replication factor: 3      
s10010097 4183456 DBServer0001 DBServer0003 DBServer0002
s10010098 4183761 DBServer0001 DBServer0003 DBServer0002
s10010099 4183561 DBServer0001 DBServer0003 DBServer0002

Steps to reproduce

  1. load the small datasets in a collection named 'avatar'
  2. run the aggregation query

Problem:
the query returns a null value for the aggregation result
In single server the issue does not exist and the same dataset produces the expected

Expected result:
the query should ignore any null values and return the average between 100 and 200 -> 150
using MAX instead of average does work and produce the expected result (200).

@Simran-B
Copy link
Contributor

I'm able to reproduce the problem using 3 shards (but not with a single shard), even with an additional safeguard AVG(NOT_NULL(alias_a.value, 0)), in 3.10 but also 3.12.devel. Changing the attribute value from null to a number fixes it. You have an index on domain but the problem also occurs without it.

@Simran-B Simran-B added 1 Bug 3 AQL Query language related 3 Cluster 3 Optimizer AQL query optimizer is involved labels Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Bug 3 AQL Query language related 3 Cluster 3 Optimizer AQL query optimizer is involved
Projects
None yet
Development

No branches or pull requests

2 participants
0