E538 Share one Spark cluster for all tests by vepadulano · Pull Request #1290 · root-project/roottest · GitHub
[go: up one dir, main page]

Skip to content

Conversation

vepadulano
Copy link
Member

Draft implementation of having one Spark cluster for all the tests. This is implemented as two ctest fixtures, one to spawn a Spark master and one to spawn a Spark worker in the background.

According to
https://spark.apache.org/docs/latest/spark-standalone.html#resource-scheduling, the Spark standalone cluster supports a basic FIFO scheduling method. As such, the only way to benefit from having multiple Spark applications running concurrently is to have a Spark worker with many cores (the more the better), and then launch Spark applications that only use 2 cores each (through the spark.cores.max config option).

This is a draft PR, just to document the progress done during the ROOT hackathon in March 2025. Anecdotally, I see some tangible improvement in the best case scenario on my laptop running only ctest, idle otherwise, with the following config:

  • one ctest running the current test_all suite in master: around 80s
  • multiple ctests, each running a different .py test file concurrently, all internally creating a SparkContext that connects to the system process Spark worker: around 50s.

The benefit is not yet completely clear, since these numbers would probably change heavily when put in the context of a real ROOT CI run. Specifically, creating a Spark worker process that takes all the cores of the CI machine means leaving little to no room for the other ctests.

Draft implementation of having one Spark cluster for all the tests. This
is implemented as two ctest fixtures, one to spawn a Spark master and
one to spawn a Spark worker in the background.

According to
https://spark.apache.org/docs/latest/spark-standalone.html#resource-scheduling,
the Spark standalone cluster supports a basic FIFO scheduling method.
As such, the only way to benefit from having multiple Spark applications
running concurrently is to have a Spark worker with many cores (the more
the better), and then launch Spark applications that only use 2 cores
each (through the spark.cores.max config option).
@vepadulano vepadulano self-assigned this Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0