10000 Does _totalEdgesCount in src/arangod/Pregel/Conductor.cpp represents the total edge number in graph? · Issue #10899 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content
Does _totalEdgesCount in src/arangod/Pregel/Conductor.cpp represents the total edge number in graph? #10899
Open
@jiangyanwei2

Description

@jiangyanwei2

My Environment

  • ArangoDB Version: 3.4.8
  • Storage Engine: RocksDB
  • Deployment Mode: Cluster 3 nodes with 3 Agencis ,3 Dbservers and 3 Coordinators
  • Deployment Strategy: ArangoDB Starter in Docker
  • Configuration: default
  • Infrastructure: own
  • Operating System: CentOS
  • Total RAM in your machine: 128G
  • Disks in use: SSD

Size of your Dataset on disk:

one vertex collection: 374M
one edge collection: 37G

Dataset:

the dataset contains only one vertex collection called users with 41,652,230 docs like as follows:

 {
    "_key": "12",
    "_id": "users/12",
    "_rev": "_Z4it3Eu--K",
  }

and only one edge collection which means the follower relationship with 1,468,365,182 docs like as follows:

  {
    "_key": "6842768634",
    "_id": "follow/6842768634",
    "_from": "users/324",
    "_to": "users/20",
    "_rev": "_Z4FeNU---u",
    "vertex": 324
  }

and shard key is ["vertex"];
I confirmed that there are no invaild edges.

Replication Factor & Number of Shards (Cluster only):

Replication Factor 1
Shards 81

Problem:
when I running pregel algorithm,the status received as follows:
pregelissue

the vertexCount is 41,652,230,which is the same as vertex collection, but the edgeCount is 16,695,168, which is much less than edge collection(1.4billion edges).
And, whatever kinds of pregel algorithm I run, the edgeCount number is the same, the logs is as follows:
pregelissue1
pregelissue2

So is edgeCount parameter represens the total egde number in graph? If so, why the egde number in graph is much less than edge collection? Did I do something wrong?

By the way, how can I get the total edges in graph? I run the following aql but out of time since the edgeCount is too large

AQL query (if applicable):

FOR i IN users
 LET ec = (
           FOR v,e,p IN 1..1 OUTBOUND i Graph "twitter"
                 RETURN DISTINCT(e)
          )
 RETURN COUNT(ec)

AQL explain (if applicable):

Execution plan:
 Id   NodeType                  Site      Est.   Comment
  1   SingletonNode             DBS          1   * ROOT
  2   EnumerateCollectionNode   DBS   41652230     - FOR i IN users   /* full collection scan, 81 shard(s) */
 14   RemoteNode                COOR  41652230       - REMOTE
 15   GatherNode                COOR  41652230       - GATHER 
  8   SubqueryNode              COOR  41652230       - LET ec = ...   /* subquery */
  3   SingletonNode             COOR         1         * ROOT
 11   CalculationNode           COOR         1           - LET #15 = true   /* json expression */   /* const assignment */
  4   TraversalNode             COOR         9           - FOR v  /* vertex */, e  /* edge */ IN 1..1  /* min..maxPathDepth */ OUTBOUND i /* startnode */  GRAPH 'twitter'
  6   CollectNode               COOR         9             - COLLECT #11 = e   /* distinct */
  7   ReturnNode                COOR         9             - RETURN #15
  9   CalculationNode           COOR  41652230       - LET #13 = COUNT(ec)   /* simple expression */
 10   ReturnNode                COOR  41652230       - RETURN #13

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields        Ranges
  4   edge   follow       false    false            n/a   [ `_from` ]   base OUTBOUND

Functions used:
 Name    Deterministic   Cacheable   Uses V8
 COUNT   true            true        false  

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections  Options                                  Filter / Prune Conditions
 4   1..1   users               follow            uniqueVertices: none, uniqueEdges: path                           

Optimization rules applied:
 Id   RuleName
  1   remove-unnecessary-calculations
  2   optimize-subqueries
  3   move-calculations-up-2
  4   optimize-traversals
  5   scatter-in-cluster
  6   remove-unnecessary-remote-scatter

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0