8000 [Arangodb3.4 on 4-node cluster]My graph query with hop[1..2] and multiple startvertex and LIMIT 1000 is very slow · Issue #9040 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

[Arangodb3.4 on 4-node cluster]My graph query with hop[1..2] and multiple startvertex and LIMIT 1000 is very slow #9040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pocketwalker opened this issue May 20, 2019 · 5 comments
Labels

Comments

@pocketwalker
Copy link
pocketwalker commented May 20, 2019

Pls do let me know if any further information is needed to help you understand my case:-)

AQL query (if applicable):
WITH mio_company, mio_people, mio_tag FOR greatstart IN ['mio_company/29fa8d1eb5c6951812eeac8a8930064a','mio_people/29fa8d1eb5c6951812eeac8a8930064a','mio_tag/29fa8d1eb5c6951812eeac8a8930064a','mio_company/b34b3ea1785359b0c20b1d3614024546','mio_people/b34b3ea1785359b0c20b1d3614024546','mio_tag/b34b3ea1785359b0c20b1d3614024546'] FOR v, e, p IN 1..2 ANY greatstart GRAPH 'mio_graph' FILTER e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_FROM_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_FAMILY_RELATIONS' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_BOARD_MEMBER_FROM_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_KEY_EXECUTIVE_FROM_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_PARENT_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_INVEST_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_SUPPLIER_TO_DISTRIBUTOR' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_CREDITOR_TO_BORROWER' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_VENDOR_TO_CLIENT' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_COMPETITOR_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_HOLDS_INVESTMENT_ARM' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_MERGED_ENTITY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_IS_KEY_MEMBER_OF_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_IS_SUPERVISORY_BOARD_MEMBER_OF_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_IS_BRANCH_OF_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_IS_SHAREHOLDER_OF_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_IS_LEGAL_PERSON_OF_COMPANY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_BELONGS_TO_INDUSTRY' OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_INDUSTRY_DOWNSTREAM_OF_INDUSTRY' FILTER v._id == 'mio_company/29fa8d1eb5c6951812eeac8a8930064a' OR v._id == 'mio_people/29fa8d1eb5c6951812eeac8a8930064a' OR v._id == 'mio_tag/29fa8d1eb5c6951812eeac8a8930064a' OR v._id == 'mio_company/b34b3ea1785359b0c20b1d3614024546' OR v._id == 'mio_people/b34b3ea1785359b0c20b1d3614024546' OR v._id == 'mio_tag/b34b3ea1785359b0c20b1d3614024546' LIMIT 1000 RETURN p

AQL explain (if applicable):
Query String:
WITH mio_company, mio_people, mio_tag
FOR greatstart IN
['mio_company/29fa8d1eb5c6951812eeac8a8930064a','mio_people/29fa8d1eb5c6951812eeac8a8930064a','mio_tag/29fa8d1eb5c6951812eeac8a8930064a','mio_company/b34b3ea1785359b0c20b1d3614024546','mio_people/b34b3ea1785359b0c20b1d3614024546','mio_tag/b34b3ea1785359b0c20b1d3614024546']
FOR v, e, p IN 1..2 ANY greatstart GRAPH 'mio_graph'
FILTER e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_FROM_COMPANY'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_FAMILY_RELATIONS'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_BOARD_MEMBER_FROM_COMPANY'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_KEY_EXECUTIVE_FROM_COMPANY'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_PARENT_COMPANY'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_INVEST_COMPANY'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_SUPPLIER_TO_DISTRIBUTOR'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_CREDITOR_TO_BORROWER'
OR e.type == 'KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_VENDOR_TO_CLIENT'
OR e.type =...

Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode COOR 1 * ROOT
2 CalculationNode COOR 1 - LET #4 = [ "mio_company/29fa8d1eb5c6951812eeac8a8930064a", "mio_people/29fa8d1eb5c6951812eeac8a8930064a", "mio_tag/29fa8d1eb5c6951812eeac8a8930064a", "mio_company/b34b3ea1785359b0c20b1d3614024546", "mio_people/b34b3ea1785359b0c20b1d3614024546", "mio_tag/b34b3ea1785359b0c20b1d3614024546" ] /* json expression / / const assignment /
3 EnumerateListNode COOR 6 - FOR greatstart IN #4 /
list iteration /
4 TraversalNode COOR 36100 - FOR v /
vertex /, e / edge /, p / paths / IN 1..2 / min..maxPathDepth / ANY greatstart / startnode / GRAPH 'mio_graph'
12 CalculationNode COOR 36100 - LET #8 = ((e.type in SORTED_UNIQUE([ "KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_FROM_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_FAMILY_RELATIONS", "KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_BOARD_MEMBER_FROM_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_PEOPLE_KEY_EXECUTIVE_FROM_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_PARENT_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_INVEST_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_SUPPLIER_TO_DISTRIBUTOR", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_CREDITOR_TO_BORROWER", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_VENDOR_TO_CLIENT", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_COMPETITOR_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_HOLDS_INVESTMENT_ARM", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_MERGED_ENTITY", "KNOWLEDGE_GRAPH_EDGE_TYPE_IS_KEY_MEMBER_OF_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_IS_SUPERVISORY_BOARD_MEMBER_OF_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_IS_BRANCH_OF_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_IS_SHAREHOLDER_OF_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_IS_LEGAL_PERSON_OF_COMPANY", "KNOWLEDGE_GRAPH_EDGE_TYPE_COMPANY_BELONGS_TO_INDUSTRY", "KNOWLEDGE_GRAPH_EDGE_TYPE_INDUSTRY_DOWNSTREAM_OF_INDUSTRY" ])) && (v._id in [ "mio_company/29fa8d1eb5c6951812eeac8a8930064a", "mio_people/29fa8d1eb5c6951812eeac8a8930064a", "mio_tag/29fa8d1eb5c6951812eeac8a8930064a", "mio_company/b34b3ea1785359b0c20b1d3614024546", "mio_people/b34b3ea1785359b0c20b1d3614024546", "mio_tag/b34b3ea1785359b0c20b1d3614024546" ])) /
simple expression */
6 FilterNode COOR 36100 - FILTER #8
9 LimitNode COOR 1000 - LIMIT 0, 1000
10 ReturnNode COOR 1000 - RETURN p

Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
4 edge mio_company_belongs_to_industry false false n/a [ _to ] base INBOUND
4 edge mio_company_belongs_to_industry false false n/a [ _from ] base OUTBOUND
4 edge mio_company_competitor_company false false n/a [ _to ] base INBOUND
4 edge mio_company_competitor_company false false n/a [ _from ] base OUTBOUND
4 edge mio_company_creditor_to_borrower false false n/a [ _to ] base INBOUND
4 edge mio_company_creditor_to_borrower false false n/a [ _from ] base OUTBOUND
4 edge mio_company_holds_investment_arm false false n/a [ _to ] base INBOUND
4 edge mio_company_holds_investment_arm false false n/a [ _from ] base OUTBOUND
4 edge mio_company_invest_company false false n/a [ _to ] base INBOUND
4 edge mio_company_invest_company false false n/a [ _from ] base OUTBOUND
4 edge mio_company_merged_entity false false n/a [ _to ] base INBOUND
4 edge mio_company_merged_entity false false n/a [ _from ] base OUTBOUND
4 edge mio_company_parent_company false false n/a [ _to ] base INBOUND
4 edge mio_company_parent_company false false n/a [ _from ] base OUTBOUND
4 edge mio_company_supplier_to_distributor false false n/a [ _to ] base INBOUND
4 edge mio_company_supplier_to_distributor false false n/a [ _from ] base OUTBOUND
4 edge mio_industry_downstream_of_industry false false n/a [ _to ] base INBOUND
4 edge mio_industry_downstream_of_industry false false n/a [ _from ] base OUTBOUND
4 edge mio_is_branch_of_company false false n/a [ _to ] base INBOUND
4 edge mio_is_branch_of_company false false n/a [ _from ] base OUTBOUND
4 edge mio_is_key_member_of_company false false n/a [ _to ] base INBOUND
4 edge mio_is_key_member_of_company false false n/a [ _from ] base OUTBOUND
4 edge mio_is_legal_person_of_company false false n/a [ _to ] base INBOUND
4 edge mio_is_legal_person_of_company false false n/a [ _from ] base OUTBOUND
4 edge mio_is_shareholder_of_company false false n/a [ _to ] base INBOUND
4 edge mio_is_shareholder_of_company false false n/a [ _from ] base OUTBOUND
4 edge mio_is_supervisory_board_member_of_company false false n/a [ _to ] base INBOUND
4 edge mio_is_supervisory_board_member_of_company false false n/a [ _from ] base OUTBOUND
4 edge mio_people_board_member_from_company false false n/a [ _to ] base INBOUND
4 edge mio_people_board_member_from_company false false n/a [ _from ] base OUTBOUND
4 edge mio_people_family_relations false false n/a [ _to ] base INBOUND
4 edge mio_people_family_relations false false n/a [ _from ] base OUTBOUND
4 edge mio_people_key_executive_from_company false false n/a [ _to ] base INBOUND
4 edge mio_people_key_executive_from_company false false n/a [ _from ] base OUTBOUND

Functions used:
Name Deterministic Cacheable Uses V8
SORTED_UNIQUE true true false

Traversals on graphs:
Id Depth Vertex collections Edge collections Options Filter conditions
4 1..2 mio_company, mio_people, mio_tag mio_company_belongs_to_industry, mio_company_competitor_company, mio_company_creditor_to_borrower, mio_company_holds_investment_arm, mio_company_invest_company, mio_company_merged_entity, mio_company_parent_company, mio_company_supplier_to_distributor, mio_industry_downstream_of_industry, mio_is_branch_of_company, mio_is_key_member_of_company, mio_is_legal_person_of_company, mio_is_shareholder_of_company, mio_is_supervisory_board_member_of_company, mio_people_board_member_from_company, mio_people_family_relations, mio_people_key_executive_from_company uniqueVertices: none, uniqueEdges: path

Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 move-calculations-up-2
4 move-filters-up-2
5 replace-or-with-in
6 sort-in-values
7 fuse-filters

@pocketwalker pocketwalker changed the title [Arangodb3.4 on 4-node cluster]My graph query with hop[1..2] is very slow [Arangodb3.4 on 4-node cluster]My graph query with hop[1..2] and LIMIT 1000 is very slow May 20, 2019
@pocketwalker pocketwalker changed the title [Arangodb3.4 on 4-node cluster]My graph query with hop[1..2] and LIMIT 1000 is very slow [Arangodb3.4 on 4-node cluster]My graph query with hop[1..2] and multiple startvertex and LIMIT 1000 is very slow May 20, 2019
@jsteemann
Copy link
Contributor

@pocketwalker:
There are a few things that come to my mind here:

  • the traversal uses 17 (or so) edge collections. For each vertex visited during the traversal, there will be 17 checks for connected vertices simply because there are 17 edge collections. Reducing the number of edge collections should help.
  • the query uses the ANY direction for the traversal, which means for each vertex there will be a check for incoming and outgoing edges. As there are 17 edge collections involved, this will mean 34 checks for connected edges for each vertex. This is more expensive than using simply INBOUND or OUTBOUND. By the way, it is possible to specify the direction for each involved edge collection seperately (https://www.arangodb.com/docs/stable/aql/graphs-traversals.html#working-with-collection-sets).
  • the filter condition is not executed during the traversal but afterwards. That means the entire traversal result will be built up first and only afterwards the filters get applied. It would be more efficient to have the filters executed during the traversal. Please check the filtering section in the manual for more information. Apart from that, I am not sure if I understand the filter conditions on the vertices. Are the filters on the vertices actually necessary given that these ids are also the start vertices' ids?
  • the traversal returns the full paths, which for a huge result set (size unknown to me) may mean returning a lot of data. You could try to make it return just the vertices or the edges, or a custom fraction of the path to see if that helps reducing the result set sizes (if size is a problem here).

@pocketwalker
Copy link
Author
pocketwalker commented May 20, 2019

@jsteemann thanks. I just made some changes on FILTERs and removed all edgeType checks, but it's still too slow.
Here's my query,
WITH mio_company, mio_people, mio_tag FOR greatstart IN ['mio_company/29fa8d1eb5c6951812eeac8a8930064a', 'mio_company/b34b3ea1785359b0c20b1d3614024546'] FOR v, e, p IN 1..2 ANY greatstart GRAPH 'mio_graph' FILTER p.vertices[0]._id == 'mio_company/29fa8d1eb5c6951812eeac8a8930064a' OR p.vertices[0]._id== 'mio_company/b34b3ea1785359b0c20b1d3614024546' FILTER p.vertices[2]._id == 'mio_company/29fa8d1eb5c6951812eeac8a8930064a' OR p.vertices[2]._id == 'mio_company/b34b3ea1785359b0c20b1d3614024546' LIMIT 1000 RETURN p
--Are the filters on the vertices actually necessary given that these ids are also the start vertices' ids? Yes if my understanding is correct. I want to get all paths whose v[0] and v[2] (1..2 relationship) are in the given nodeSet.
Any idea?

@jsteemann
Copy link
Contributor

@pocketwalker : none apart from the ones I summarized above.

@pocketwalker
Copy link
Author

Thanks for your reply! It can be closed now.

@dothebart dothebart added 2 Solved Resolution and removed Waiting User Reply labels Jun 3, 2019
@dothebart
Copy link
Contributor

Closing as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
0