Share one Spark cluster for all tests #1290

vepadulano · 2025-03-26T21:10:37Z

Draft implementation of having one Spark cluster for all the tests. This is implemented as two ctest fixtures, one to spawn a Spark master and one to spawn a Spark worker in the background.

According to
https://spark.apache.org/docs/latest/spark-standalone.html#resource-scheduling, the Spark standalone cluster supports a basic FIFO scheduling method. As such, the only way to benefit from having multiple Spark applications running concurrently is to have a Spark worker with many cores (the more the better), and then launch Spark applications that only use 2 cores each (through the spark.cores.max config option).

This is a draft PR, just to document the progress done during the ROOT hackathon in March 2025. Anecdotally, I see some tangible improvement in the best case scenario on my laptop running only ctest, idle otherwise, with the following config:

one ctest running the current test_all suite in master: around 80s
multiple ctests, each running a different .py test file concurrently, all internally creating a SparkContext that connects to the system process Spark worker: around 50s.

The benefit is not yet completely clear, since these numbers would probably change heavily when put in the context of a real ROOT CI run. Specifically, creating a Spark worker process that takes all the cores of the CI machine means leaving little to no room for the other ctests.

Draft implementation of having one Spark cluster for all the tests. This is implemented as two ctest fixtures, one to spawn a Spark master and one to spawn a Spark worker in the background. According to https://spark.apache.org/docs/latest/spark-standalone.html#resource-scheduling, the Spark standalone cluster supports a basic FIFO scheduling method. As such, the only way to benefit from having multiple Spark applications running concurrently is to have a Spark worker with many cores (the more the better), and then launch Spark applications that only use 2 cores each (through the spark.cores.max config option).

vepadulano self-assigned this Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Share one Spark cluster for all tests #1290

Share one Spark cluster for all tests #1290

Uh oh!

Uh oh!

Uh oh!

Share one Spark cluster for all tests #1290

Are you sure you want to change the base?

Share one Spark cluster for all tests #1290

Uh oh!

Conversation

Uh oh!

Uh oh!