-
Notifications
You must be signed in to change notification settings - Fork 257
Labels
Description
Is your feature request related to a problem? Please describe.
To apply any kind of Graph ML / Graph NN algorithms on real-world power-law graphs we should consider an implementation of the sampling first.
Describe the solution you would like
Top level API:
graph.sampleEdges(strategy: EdgesSamplingStrategy, seed: Long): DataFrame
graph.sampleVertices(strategy: VerticesSamplingStrategy, seed: Long): DataFrame
EdgesSamplingStrategy, VerticesSamplingStrategy -- traits, a part of the public API;
Batteries:
- simple random sampling
- weights based sampling
- fixed-size sampling (like GraphSAGE)
- degree-based sampling
- context sampling (user provides function (src, dst, edge) -> prob
- ???
Component
- Scala Core Internal
- Scala API
- Spark Connect Plugin
- Infrastructure
- PySpark Classic
- PySpark Connect
Additional context
Without sampling it will be hard to implement any of:
- good approximate algorithms
- random walks on power-law graphs
- gcnn and any graph convolutions in general
Are you planning on creating a PR?
- I'm willing to make a pull-request