Description
Introduction
Other OSS projects such as scipy
and numpy
have made use of development CLI tools such as runtests.py
and dev.py
to provide a collected and singular entry point into development tasks for contributors. I find these tools very helpful personally, and think they add a large amount of value, especially for onboarding new contributors. Recently Pandas has also begun discussion and implementation of their own development CLI tool. @noatamir has made a great writeup on some of the benefits and justifications for such a tool (see: pandas-dev/pandas#47700). I highly recommend taking the time to read their writeup.
Some of the benefits of such a tool include:
- Increased discoverability of tools, decreasing time spent jumping around documentation. This is especially helpful for those who may not read every single word of documentation (such is my bad habit 😅).
- Consistency for CI and development. We can establish a 1-1 between CI checks and
dev.py
commands, e.g.dev.py lint
which could then impose bothblack
andisort
(cf. RFC isort as linter and import sorter #22853) - Ease of installation. Users could install whatever dependencies are necessary depending on what workflow they want enabled. A call to
dev.py doc build
could check for the presence of the necessary dependencies and install if needed (or optionally prompt). - Reduced barrier to entry. Quite frankly, having to run various commands for tasks, anywhere from linting to building C sources, can be overwhelming, especially to new developers.
Recently scipy
explored a new solution for developing a unified development CLI tool (cf. scipy/scipy#15489) based on do.it
. Pandas is also exploring a do.it
based solution. There's been good feedback on this framework and I would propose the same.
Encompassed Actions
Some actions which would be wrapped by such a development CLI include:
- Dependency installation (could be per-workflow)
- Testing
- Linting
- Extension building
- Documentation building
- Pre-commit installation
Considerations
As with all things, the question is whether the benefits outweigh the maintenance cost. While projects like scipy with many complex workflows benefit the most from such a tool, personally I think anything to decrease the barrier to entry for scikit-learn development is worth considering.
If we do opt to build such a tool, we can make use of the work already done in scipy as well as the recently released separate dev.py
package. Either way, the largest modification we'd need to make is alter the build
command to use setuptools
instead of meson
, though that shouldn't be too problematic.