-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
RFC CLI Tool Proposal #24828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I still don't have a strong opinion on the matter. I would need to sleep on it. I would like to like the CLI proposal :) On the downside first, I would be under the impression that we are going to wrap up commands from Python tools, that we use in scikit-learn. In the end, it would be cumbersome for newcomers because they will have to learn a new set of commands for every project to which they want to contribute to. So why not directly learn, the
I assume that the comment of @jorisvandenbossche is valid for the 1-1 CI checks: pandas-dev/pandas#47700 (comment). For the linting, it could be OK but further than that, it would be too optimistic. But I see an upside: if we get a unified CLI for the pydata ecosystem and even if in scikit-learn, we would use a subset of the complex CLI required for SciPy or Pandas, then it starts to be interesting. A newcomer is learning a CLI interface allowing them to contribute to any pydata project. Also, we created in the past a |
I still prefer a I have compiled multiple projects, and I've never used their tool. As @glemaitre mentions, it's hard to learn a new CLI for each project one needs to compile/contribute to, whereas I know how I do recognize that there's a mismatch between what an experienced user like our core dev community can do and expects, and the average first time contributor to the project. These things complicate stuff, quite substantially. I used to understand our CI, and I did it the first time I looked at them. But I haven't followed the recent changes to our CI, and now I have absolutely no clue how we update dependencies in our CI, where all those different lock files come from, and what the CI does; and I think adding this CLI to the CI would complicate things even more. I know this sounds like I'm a grumpy old man, but not looking at the development for a few months, coming back, and noticing you don't understand a thing, is not a good thing, and I used to work quite a bit on our CI side. I'm not sure where things are documented, and the PRs people open to update deps don't mention how they're done. We have a discoverability issue there. I don't necessarily understand why we need new shiny things when the old tools work brilliantly (aka For new contributors:
If a new contributor comes, fixes a documentation issue, and runs a few custom CLI commands, the only thing they get out of it is that they can put An alternative for some of the issues we have, is to have bots which parse the output of our CI and comment on PRs with clear instructions on how a contributor needs to fix them. Example: conda-forge/skops-feedstock#1 (comment) cc @scikit-learn/core-devs Also cc @reshamas , @cmarmo , since I value their input in such matters as they might have a better idea of what our average contributors might need and want to have. |
+1. I prefer canonical
and
Yet, I am not against the creation of a |
I think I'm joining Adrin in the "grumpy person" camp. I'm not a fan of "hiding" what the actual commands are inside tools. It almost doesn't matter what they are, but "old" things like When I come to a new project (or return to one) I want to quickly see what I need to do to setup the dependencies, build the project, run the tests, etc. So the number one problem I have is quickly finding this information. And a lot of projects aren't great for this. Some of the worst are the ones that mention "Read CONTRIBUTING.md" in their README, and then in So I think improving the "time to discovery" is important. Above all else. Once that time is small, then I think using "vanilla" commands is better than custom stuff. If the docs say "Setup the development dependencies with So overall I think improving discoverability is important. For me this is a documentation task. And it should include things like "how to bump CI deps", maybe in a section for maintainers. It also means that you need to keep the complexity low enough that you can explain it in one or two sentences. Adding a layer on top to deal with the problem that "it is too difficult to explain" is IMHO trying to treat a patient by reducing their symptoms, not the actual illness. |
A situation where this or a |
edit I had mis-understood the proposal. Rewriting my comment (and sorry for the mis-understanding, that was a bit dense of me). Standardizing our CLIs / shell scriptsI think that the actual proposal above it to standardize the CLIs / shell scripts / Makefiles that we use. I'm not sure where I sit here. My thoughts:
I think that the good solution is to consider case-by-case settings: replacing elements of our toolchain as we go. Ideally the discussion should be case-by-case. I personally would be fine with having a general policy that we try to prefer Python code to shell. However, what really matters is that the current pool of active devs feel comfortable with changes and are happy about them. Making a CLI out of scikit-learn is out of scope.(not the proposal, but a possible misinterpretation). We are a Python library. We focus on doing high quality numerical algorithms and the tools around them that are necessary to use them to have valid statistical usage an operationalization. We should not embark on this. It will be a strong distraction (and we have plenty on our plate to satisfy the above mission), and it will lead to frustration from the users because we will likely develop half-baked answer (difficult to have a proper expressivity in a CLI given the complexity of pipelines + model evaluation). Finally, such projects can be developed outside scikit-learn. The policy these last years has been to develop new ideas outside, to avoid adding even more weight to an already overburden project. |
It's all in |
I have been tagged, I dare comment then... :) I believe adding layers to the development environment makes more difficult to solve the issues in the code. From the contributor point of view I agree with @adrinjalali comment:
I'd rather improve the documentation, not only the one for users but for maintainers too. In particular the PR checks could be better explained and backed by bots to explicit the errors, as said above: I am often doing the job of the bot ... :) ... I believe this is a very valuable information for contributors who don't always come from a background of software development and face for the first time the challenge of the integration. Finally, I believe that the backlog of pull requests and issues to be addressed is already quite big, without adding new workloads for the developers. |
Thanks everyone for your perspectives on this! To summarize (very very briefly), it seems the greatest concerns are:
As mentioned, I suppose the fundamental issue this would try to solve is the inherit difficulty of discoverability and operability of some tools, however there are preferable solutions:
I do not think that a CLI tool such as this harms a new contributor too much in their journey to master the underlying tools too much, but either way I think we've already arrived at an overall consensus wherein such a tool is not worth the cost, and indeed I agree that the alternative solutions are better implemented first. Thanks again for the great discussion :) |
I am coming back to this RFC to complete my point of view. After having used the |
Introduction
Other OSS projects such as
scipy
andnumpy
have made use of development CLI tools such asruntests.py
anddev.py
to provide a collected and singular entry point into development tasks for contributors. I find these tools very helpful personally, and think they add a large amount of value, especially for onboarding new contributors. Recently Pandas has also begun discussion and implementation of their own development CLI tool. @noatamir has made a great writeup on some of the benefits and justifications for such a tool (see: pandas-dev/pandas#47700). I highly recommend taking the time to read their writeup.Some of the benefits of such a tool include:
dev.py
commands, e.g.dev.py lint
which could then impose bothblack
andisort
(cf. RFC isort as linter and import sorter #22853)dev.py doc build
could check for the presence of the necessary dependencies and install if needed (or optionally prompt).Recently
scipy
explored a new solution for developing a unified development CLI tool (cf. scipy/scipy#15489) based ondo.it
. Pandas is also exploring ado.it
based solution. There's been good feedback on this framework and I would propose the same.Encompassed Actions
Some actions which would be wrapped by such a development CLI include:
Considerations
As with all things, the question is whether the benefits outweigh the maintenance cost. While projects like scipy with many complex workflows benefit the most from such a tool, personally I think anything to decrease the barrier to entry for scikit-learn development is worth considering.
If we do opt to build such a tool, we can make use of the work already done in scipy as well as the recently released separate
dev.py
package. Either way, the largest modification we'd need to make is alter thebuild
command to usesetuptools
instead ofmeson
, though that shouldn't be too problematic.The text was updated successfully, but these errors were encountered: