-
Notifications
You must be signed in to change notification settings - Fork 257
Labels
Description
Hi team! Thanks for maintaining GraphFrames again!
Describe the bug
I tried to switch to graphframes-py==0.9.3 and io.graphframes:graphframes-spark3_2.12:0.9.3 from graphframes-latest==0.8.3 and graphframes:graphframes:0.8.4-spark3.5-s_2.12, however
connectedComponents fails for PySpark.
Py4JError: An error occurred while calling o233.setUseLabelsAsComponents. Trace:
py4j.Py4JException: Method setUseLabelsAsComponents([class java.lang.Boolean]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)
To Reproduce
Steps to reproduce the behavior:
- Take a minimal example from README.md Quick Start
- Run
g.connectedComponents().show() - Command errors with
---------------------------------------------------------------------------
Py4JError Traceback (most recent call last)
Cell In[5], line 22
19 edges_df = spark.createDataFrame(edges, ["src", "dst", "relationship"])
21 g = GraphFrame(nodes_df, edges_df)
---> 22 g.connectedComponents().show()
File /usr/local/lib/python3.10/dist-packages/graphframes/graphframe.py:283, in GraphFrame.connectedComponents(self, algorithm, checkpointInterval, broadcastThreshold, useLabelsAsComponents)
261 def connectedComponents(
262 self,
263 algorithm: str = "graphframes",
(...)
266 useLabelsAsComponents: bool = False,
267 ) -> DataFrame:
268 """
269 Computes the connected components of the graph.
270
(...)
281 :return: DataFrame with new vertices column "component"
282 """
--> 283 return self._impl.connectedComponents(
284 algorithm=algorithm,
285 checkpointInterval=checkpointInterval,
286 broadcastThreshold=broadcastThreshold,
287 useLabelsAsComponents=useLabelsAsComponents,
288 )
File /usr/local/lib/python3.10/dist-packages/graphframes/classic/graphframe.py:216, in GraphFrame.connectedComponents(self, algorithm, checkpointInterval, broadcastThreshold, useLabelsAsComponents)
204 def connectedComponents(
205 self,
206 algorithm: str = "graphframes",
(...)
209 useLabelsAsComponents: bool = False,
210 ) -> DataFrame:
211 jdf = (
212 self._jvm_graph.connectedComponents()
213 .setAlgorithm(algorithm)
214 .setCheckpointInterval(checkpointInterval)
215 .setBroadcastThreshold(broadcastThreshold)
--> 216 .setUseLabelsAsComponents(useLabelsAsComponents)
217 .run()
218 )
219 return DataFrame(jdf, self._spark)
File /usr/local/lib/python3.10/dist-packages/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File /opt/spark/python/pyspark/errors/exceptions/captured.py:179, in capture_sql_exception.<locals>.deco(*a, **kw)
177 def deco(*a: Any, **kw: Any) -> Any:
178 try:
--> 179 return f(*a, **kw)
180 except Py4JJavaError as e:
181 converted = convert_exception(e.java_exception)
File /usr/local/lib/python3.10/dist-packages/py4j/protocol.py:330, in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
--> 330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
333 else:
334 raise Py4JError(
335 "An error occurred while calling {0}{1}{2}".
336 format(target_id, ".", name))
Py4JError: An error occurred while calling o233.setUseLabelsAsComponents. Trace:
py4j.Py4JException: Method setUseLabelsAsComponents([class java.lang.Boolean]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:1589)
Expected behavior
Result DataFrame of connectedComponents returned.
+---+-------+---+---------+
| id| name|age|component|
+---+-------+---+---------+
| 1| Alice| 30| 1|
| 2| Bob| 25| 1|
| 3|Charlie| 35| 1|
+---+-------+---+---------+
System:
- OS: Ubuntu 22.04
- Python Version: Python 3.10
- Spark / PySpark version: PySpark 3.5.5
- GraphFrames version: graphframes-py 0.9.3 (with io.graphframes:graphframes-spark3_2.12:0.9.3)
SemyonSinchenko