8000 Does _totalEdgesCount in src/arangod/Pregel/Conductor.cpp represents the total edge number in graph? · Issue #10899 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Does _totalEdgesCount in src/arangod/Pregel/Conductor.cpp represents the total edge number in graph? #10899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
< 8000 div class="flex-shrink-0 mb-2 flex-self-start flex-md-self-center">
jiangyanwei2 opened this issue Jan 16, 2020 · 0 comments
Labels
1 Question 3 Pregel Graph processing

Comments

@jiangyanwei2
Copy link
jiangyanwei2 commented Jan 16, 2020

My Environment

  • ArangoDB Version: 3.4.8
  • Storage Engine: RocksDB
  • Deployment Mode: Cluster 3 nodes with 3 Agencis ,3 Dbservers and 3 Coordinators
  • Deployment Strategy: ArangoDB Starter in Docker
  • Configuration: default
  • Infrastructure: own
  • Operating System: CentOS
  • Total RAM in your machine: 128G
  • Disks in use: SSD

Size of your Dataset on disk:

one vertex collection: 374M
one edge collection: 37G

Dataset:

the dataset contains only one vertex collection called users with 41,652,230 docs like as follows:

 {
    "_key": "12",
    "_id": "users/12",
    "_rev": "_Z4it3Eu--K",
  }

and only one edge collection which means the follower relationship with 1,468,365,182 docs like as follows:

  {
    "_key": "6842768634",
    "_id": "follow/6842768634",
    "_from": "users/324",
    "_to": "users/20",
    "_rev": "_Z4FeNU---u",
    "vertex": 324
  }

and shard key is ["vertex"];
I confirmed that there are no invaild edges.

Replication Factor & Number of Shards (Cluster only):

Replication Factor 1
Shards 81

Problem:
when I running pregel algorithm,the status received as follows:
pregelissue

the vertexCount is 41,652,230,which is the same as vertex collection, but the edgeCount is 16,695,168, which is much less than edge collection(1.4billion edges).
And, whatever kinds of pregel algorithm I run, the edgeCount number is the same, the logs is as follows:
pregelissue1
pregelissue2

So is edgeCount parameter represens the total egde number in graph? If so, why the egde number in graph is much less than edge collection? Did I do something wrong?

By the way, how can I get the total edges in graph? I run the following aql but out of time since the edgeCount is too large

AQL query (if applicable):

FOR i IN users
 LET ec = (
           FOR v,e,p IN 1..1 OUTBOUND i Graph "twitter"
                 RETURN DISTINCT(e)
          )
 RETURN COUNT(ec)

AQL explain (if applicable):

Execution plan:
 Id   NodeType                  Site      Est.   Comment
  1   SingletonNode             DBS          1   * ROOT
  2   EnumerateCollectionNode   DBS   41652230     - FOR i IN users   /* full collection scan, 81 shard(s) */
 14   RemoteNode                COOR  41652230       - REMOTE
 15   GatherNode                COOR  41652230       - GATHER 
  8   SubqueryNode              COOR  41652230       - LET ec = ...   /* subquery */
  3   SingletonNode             COOR         1         * ROOT
 11   CalculationNode           COOR         1           - LET #15 = true   /* json expression */   /* const assignment */
  4   TraversalNode             COOR         9           - FOR v  /* vertex */, e  /* edge */ IN 1..1  /* min..maxPathDepth */ OUTBOUND i /* startnode */  GRAPH 'twitter'
  6   CollectNode               COOR         9             - COLLECT #11 = e   /* distinct */
  7   ReturnNode                COOR         9             - RETURN #15
  9   CalculationNode           COOR  41652230       - LET #13 = COUNT(ec)   /* simple expression */
 10   ReturnNode                COOR  41652230       - RETURN #13

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields        Ranges
  4   edge   follow       false    false            n/a   [ `_from` ]   base OUTBOUND

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 COUNT   true            true        false  

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions
 4   1..1   users               follow            uniqueVertices: none, uniqueEdges: path                           

Optimization rules applied:
 Id   RuleName
  1   remove-unnecessary-calculations
  2   optimize-subqueries
  3   move-calculations-up-2
  4   optimize-traversals
  5   scatter-in-cluster
  6   remove-unnecessary-remote-scatter
@Simran-B Simran-B added the 3 Pregel Graph processing label Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Question 3 Pregel Graph processing
Projects
None yet
Development

No branches or pull requests

3 participants
0