8000 bug: PySpark connectedComponents doesn't work · Issue #741 · graphframes/graphframes · GitHub
[go: up one dir, main page]

Skip to content

bug: PySpark connectedComponents doesn't work #741

@hutsulyk

Description

@hutsulyk

Hi team! Thanks for maintaining GraphFrames again!

Describe the bug

I tried to switch to graphframes-py==0.9.3 and io.graphframes:graphframes-spark3_2.12:0.9.3 from graphframes-latest==0.8.3 and graphframes:graphframes:0.8.4-spark3.5-s_2.12, however
connectedComponents fails for PySpark.

Py4JError: An error occurred while calling o233.setUseLabelsAsComponents. Trace:
py4j.Py4JException: Method setUseLabelsAsComponents([class java.lang.Boolean]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
	at py4j.Gateway.invoke(Gateway.java:274)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:1589)

To Reproduce

Steps to reproduce the behavior:

  1. Take a minimal example from README.md Quick Start
  2. Run g.connectedComponents().show()
  3. Command errors with
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
Cell In[5], line 22
     19 edges_df = spark.createDataFrame(edges, ["src", "dst", "relationship"])
     21 g = GraphFrame(nodes_df, edges_df)
---> 22 g.connectedComponents().show()

File /usr/local/lib/python3.10/dist-packages/graphframes/graphframe.py:283, in GraphFrame.connectedComponents(self, algorithm, checkpointInterval, broadcastThreshold, useLabelsAsComponents)
    261 def connectedComponents(
    262     self,
    263     algorithm: str = "graphframes",
   (...)
    266     useLabelsAsComponents: bool = False,
    267 ) -> DataFrame:
    268     """
    269     Computes the connected components of the graph.
    270 
   (...)
    281     :return: DataFrame with new vertices column "component"
    282     """
--> 283     return self._impl.connectedComponents(
    284         algorithm=algorithm,
    285         checkpointInterval=checkpointInterval,
    286         broadcastThreshold=broadcastThreshold,
    287         useLabelsAsComponents=useLabelsAsComponents,
    288     )

File /usr/local/lib/python3.10/dist-packages/graphframes/classic/graphframe.py:216, in GraphFrame.connectedComponents(self, algorithm, checkpointInterval, broadcastThreshold, useLabelsAsComponents)
    204 def connectedComponents(
    205     self,
    206     algorithm: str = "graphframes",
   (...)
    209     useLabelsAsComponents: bool = False,
    210 ) -> DataFrame:
    211     jdf = (
    212         self._jvm_graph.connectedComponents()
    213         .setAlgorithm(algorithm)
    214         .setCheckpointInterval(checkpointInterval)
    215         .setBroadcastThreshold(broadcastThreshold)
--> 216         .setUseLabelsAsComponents(useLabelsAsComponents)
    217         .run()
    218     )
    219     return DataFrame(jdf, self._spark)

File /usr/local/lib/python3.10/dist-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/pyspark/errors/exceptions/captured.py:179, in capture_sql_exception.<locals>.deco(*a, **kw)
    177 def deco(*a: Any, **kw: Any) -> Any:
    178     try:
--> 179         return f(*a, **kw)
    180     except Py4JJavaError as e:
    181         converted = convert_exception(e.java_exception)

File /usr/local/lib/python3.10/dist-packages/py4j/protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
    326         raise Py4JJavaError(
    327             "An error occurred while calling {0}{1}{2}.\n".
    328             format(target_id, ".", name), value)
    329     else:
--> 330         raise Py4JError(
    331             "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332             format(target_id, ".", name, value))
    333 else:
    334     raise Py4JError(
    335         "An error occurred while calling {0}{1}{2}".
    336         format(target_id, ".", name))

Py4JError: An error occurred while calling o233.setUseLabelsAsComponents. Trace:
py4j.Py4JException: Method setUseLabelsAsComponents([class java.lang.Boolean]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
	at py4j.Gateway.invoke(Gateway.java:274)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.base/java.lang.Thread.run(Thread.java:1589)

Expected behavior

Result DataFrame of connectedComponents returned.

+---+-------+---+---------+
| id|   name|age|component|
+---+-------+---+---------+
|  1|  Alice| 30|        1|
|  2|    Bob| 25|        1|
|  3|Charlie| 35|        1|
+---+-------+---+---------+

System:

  • OS: Ubuntu 22.04
  • Python Version: Python 3.10
  • Spark / PySpark version: PySpark 3.5.5
  • GraphFrames version: graphframes-py 0.9.3 (with io.graphframes:graphframes-spark3_2.12:0.9.3)

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0