-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
RFC On the relative harm of cosmetic changes #11336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I should summarise: of 1122 files in the repo at 73b7d07:
(Sorry: being lazy about visualisation) Definition of type: def path_type(path):
if path.startswith('examples'):
return 'examples'
if '/externals/' in path:
return 'externals'
if '/tests/' in path:
return 'tests'
if path.endswith('.rst'):
return 'rst'
if not path.startswith('sklearn/'):
return 'other'
if any(path.endswith(ext) for ext in ['.c', '.h', '.cpp']):
return 'c'
if any(path.endswith(ext) for ext in ['.py', '.pxd', '.pyx', '.pxi']):
return 'py'
if any(path.endswith(ext) for ext in ['.jpg', '.csv', '.gz']):
return 'data'
return 'other' |
Wow this is quite an impressive endeavour! I am guessing that there is a bit of inaccuracy because the line numbers of the flake8 warnings change with history (or maybe this is just on a per-file level). I would still take that as a useful first-order estimate. Personally I have to admit I have a bit of a bias towards flake8ing all (or at least as much the consensus deems acceptable) of the code, sacrifice some old outstanding PRs and help the people that need it on the ongoing still alive PRs. Ideally it would be very nice to have some anecdotal evidence that sometimes conflicts (caused by flake8 or by some similar automated change) is the thing that prevents resuscitating a PR (or makes a PR die). Talking for myself, I have had some cases where resolving the conflicts was too painful (#10663 reviving #4807 stands to mind). I ended up rewriting the code, with a lot of copying and pasting from the PR. One of the arguments I have heard against flake8ing the code is that flake8 changes with time (PEP8 changes with time and is open to interpretation too) and that it's not something you do once and forget about it for the rest of your life. I definitely have seen cases where different versions of flake8 were giving different warnings but I would be (maybe too naively) optimistic that we can control this effect through ignoring flake8 warnings in setup.cfg or on a line-level through comments. Side-comment: if we decide to flake8 some parts of the code (e.g. tests may be a good candidate), there is no guarantee that flake8_diff.sh prevents flake8 errors to be reintroduced (mainly because the diff does not have enough context), so we will need to adapt our flake8 testing script. |
the line numbers are now out of date, but they reflect the "left hand side
of the diff", i.e. master
I might also note that lgtm.com checks for many key pyflakes issues (unused
import, etc), so the benefits here would be some uncontroversial pep8
fixes, and pytest updates. the benefit is that contributors won't try
fixing pep8 in feature PRs, and will have a good example to follow for
testing.
|
Very impressing analysis! This will be very useful for PR introducing cosmetic changes.
I'm still biased toward "yes", particularly for running nose2pytest on the test files, don't really have a strong opinion about flake. Also I was following with interest projects that started to use black (e.g. dask/dask-ml#237) but that's a whole another level of potential merge conflicts so probably not even worth considering here (even assuming the advantages were unanimous, which is not the case I think). |
On a related topic Django recently accepted an Enhancement Proposal to re-format the whole code base with black (https://github.com/django/deps/blob/master/accepted/0008-black.rst) if I understood that right. They "only" have 230 open PRs but still. |
The way Even without pre-commit hooks, if we can get contributors to run |
But black would be a process we apply to the whole codebase beforehand, not
just to contributions.
I have no real objection to this.
|
Although black's approach to indentation is very different to ours.
|
That is the blocker for using black. def hello_world(var_1, var_2, var_3,
var_4, ...):
pass while def hello_world(
var_1,
var_2,
var_3,
...):
pass When not working in |
I'd be OK with using black, the benefit seem to outweigh the downsides. Merge conflicts with existing PRs would be trivial to solve, contributors would just have to run black. |
black would help some, but there are still cases where conflicts would need to be resolved (e.g. if the same line was changed in 2 different ways). |
The issue is that for instance applying flake8 on the diff, after a certain point does not make code more PEP8 compatible -- it does not converge.
On,
So honestly I think we should just fix these 200 LoC (or which many are removed/added newlines), be done with it and remove ~160 lines of bash hacks in |
Or maybe we should just discuss using black in the next dev meeting.. |
I can't say I like black's style all the time... but I agree that it's an easy way out. Happy to discuss. |
For code style as far as can tell we have 3 possibilities, @jorisvandenbossche I see pandas has applied black last year pandas-dev/pandas#27076. Could you share your experience of it? Particularly with respect to managing the transition, whether it had an impact on merge conflicts for existing PR, as well as what impact in PR review and interaction with contributors it had. |
I am +1 for the black adoption. |
I think I'd vote 2, though I'd also be fine with 3. Completely agree with @jnothman :) |
Just an anecdotal remark in favor of black: I've written a bunch of PRs recently where I needed to do some global changes to the codebase (e.g. adding a strict_mode arguement to all the checks, or replacing Having black would have made my life much easier. Right now, I have to fix dozens (sometimes hundreds) of linting issues before I can push, otherwise I know the CI won't run most tests. It's a bit of a pain especially when you're just trying potential ideas. |
I think transitioning could look like:
I'm not yet sure about applying black to examples, where we'll often want to make the example visually clear. I'm thinking about the layout of a 2d matrix input specifically. |
That's an understandable point but probably is something that we are used to it. |
If we are doing this, I would try to push for a slightly higher line-length (maybe 100). I recall that @adrinjalali was not happy with changing the line length. |
Also Gaël I think. 100 would be a bit too large even on my laptop I think. I would prefer either the default (88) or 79.
Yeah, asking contributors to rely on pre-commit for code changes except for examples where they would need to manually do it can be confusing.
Sounds like a plan.
Yes, that would indeed be ideal. Something like,
we would need to think about how to make that a) maybe include manual validation in the beginning b) store state of what was migrated what wasn't c) make it more fault tolerant d) be aware that we probably won't be able to do it all in one run due to Github API limitations. The list of open PRs can be obtained with,
|
Just 2 cents here: I was heavy user of flake, pycodestyle, and other tools on my Python projects with many students working on them. I wanted that students learn good code style and that code is consistent as everyone was participating on just parts of it. The downside was that students spend so much time trying to fix those style issues because even if those tools detect issues, authors do not necessary know how to fix them and then do trial and error. Even more, some students never installed those tools locally and just used CI to check, so there was a crazy amount of commits trying to fix those and waiting for CI. With black, we can just focus on code and ignore the style. Students are happier and they also learn good code style by seeing what black fixes. Which eventually take a hold on them. Since then I also used go which also has a standard formatter. And it really makes life much easier. Even if you disagree with some style choices, at the end consistency is the most important and there is really no reason to spend time manually fixing those consistency issues. I also believe sklearn should get typing information, so black formatting of attributes makes sense in that context. |
Thanks. I think the general consensus at the last dev meeting was that,
while several of us would grieve our loss of aesthetic freedom, this would
be a valuable way to lower the bar to entry, and none of us would be
strongly against. I think what needs clarification is the path to
transition.
… |
Now that black has been adopted, am I wrong or this issue can be closed? |
Joel's script is a really nice way to measure impact for future potential changes. But I think the main issue is resolved. |
I can't remember where, but there was a recent PR offering cosmetic (e.g. PEP8) fixes, and we usually reject these PRs as causing merge conflicts with existing open PRs. @amueller wanted to know exactly how many PRs were affected by each change.
If we knew it would affect no or few open/active PRs, we might be keen to fix flake8 issues, or to modernise tests (pytest style).
Thus, using https://gist.github.com/jnothman/41a5e05c82c4508afa7bee3b493752dd, I have determined which lines are modified (or adjacent to modified lines) by which open PRs.
https://www.dropbox.com/s/jsf07i40npq5o5g/lines-modified-by-open-prs.zip?dl=0 shows the results with a file of results for PRs that can currently be merged into master, and a file of results for PRs that cannot currently be merged into master. Each file has three columns: PR#, file name, line#.
Looking at just the file names, the following files in master are never modified by open PRs:
with the following flake8 errors:
The following files are modified by a single open PR:
The following are modified by 2 open PRs:
What do we think of offering contributors to do cleanups of flake8 or removing
assert_{true,false,equal,dict_equal}
in these files?The text was updated successfully, but these errors were encountered: