8000 bug: Broadcast threshold cannot be set to `-1` in ConnectedComponents · Issue #733 · graphframes/graphframes · GitHub
[go: up one dir, main page]

Skip to content

bug: Broadcast threshold cannot be set to -1 in ConnectedComponents #733

@ottobricks

Description

@ottobricks

Describe the bug

Spark 3.5.5, graphframes 0.9.3
Trying to set broadcastThreshold to -1 fails with error:

java.lang.IllegalArgumentException: requirement failed: Broadcast threshold must be non-negative but got -1.
  at scala.Predef$.require(Predef.scala:281)
  at org.graphframes.WithBroadcastThreshold.setBroadcastThreshold(mixins.scala:76)
  at org.graphframes.WithBroadcastThreshold.setBroadcastThreshold$(mixins.scala:75)
  at org.graphframes.lib.ConnectedComponents.setBroadcastThreshold(ConnectedComponents.scala:50)
  ... 47 elided

To Reproduce

Steps to reproduce the behavior:

  1. Run
spark-shell --packages io.graphframes:graphframes-spark3_2.12:0.9.3 \
    --conf spark.checkpoint.dir=/tmp/example-checkpoint
import org.graphframes.GraphFrame

val g = GraphFrame(
    spark.range(5).select("id"),
    spark.range(10)
        .selectExpr(
            "id",
            "id % 5 as src",
            "(id + 1) % 5 as dst"
        )
)

val results = (
    g.connectedComponents
        .setAlgorithm("graphframes")
        .setBroadcastThreshold(-1)
        .run()
)
  1. java.lang.IllegalArgumentException is thrown

The same happens with the Python API

Expected behavior

According to docs here and example here, we should be able to set broadcast threshold to -1

System [please complete the following information]:

  • OS: MacOS 14.8.1
  • Python Version (if applied): Python 3.11.13
  • Spark / PySpark version: Spark 3.5.5 / PySpark 3.5.5
  • GraphFrames version: graphframes-0.9.3

Component

  • Scala Core Internal
  • Scala API
  • Spark Connect Plugin
  • PySpark Classic
  • PySpark Connect

Additional context

I could not find the message 'Broadcast threshold must be non-negative but got' in the repo, which makes me wonder where it's coming from, so it might not be a bug directly in graphframes. I'm going to continue with the default value for now and investigate this further soon.

Are you planning on creating a PR?

  • I'm willing to make a pull-request

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0