-
-
Notifications
You must be signed in to change notification settings - Fork 11k
MAINT: refactor "for ... in range(len(" statements #19781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f0b865a
to
5f6cd40
Compare
There are three patterns of replacement:
I think the first one is a clear win. I am not so excited about the other two. Is there a clear preference in our style guide? I think I would prefer to treat them as we do other code cleanups: only touch them if we need to do so for other reasons. |
IMO all three are overall wins, but won't comment on whether the wins are strong enough to justify the review time. From a quick skim I think this is some nice cleanup. |
When replacing with |
The motivation behind this PR is part style and performance. The ar = np.arange(10000, dtype=float)
%timeit for x in ar: _ = x
# 334 µs ± 75
8000
2 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for i, x in enumerate(ar): _ = x
# 552 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for a, b in zip(ar, ar): _, _ = a, b
# 730 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for i in range(len(ar)): _ = ar[i]
# 746 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Where there are shorter iterations, the performance is about the same for all methods. Mark any edits that are less readable, and I'll cull them. |
Thanks, I was paying attention to this aspect, as zip will stop on the shortest iteration. But also |
numpy/core/records.py
Outdated
for i in range(len(arrayList)): | ||
_array[_names[i]] = arrayList[i] | ||
for name, item in zip(_names, arrayList): | ||
_array[name] = item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you merge the check in lines 667-671 into this loop so we iterate once over arrayList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done. Also improved check error message to also show name of the field.
for i in range(len(x)): | ||
assert_almost_equal(y[i], y_r[i]) | ||
for yi, y_ri in zip(y, y_r): | ||
assert_almost_equal(yi, y_ri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be written more compactly without the loop as assert_almost_equal(y, y_r)
? What am I missing?
Edit: same thing in the following changes in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have been an oversight, as assert_almost_equal
takes array_like args, and seems to work with complex types. Done 3x.
numpy/distutils/misc_util.py
Outdated
args[i] = '"%s"' % (a) | ||
for idx, arg in enumerate(args): | ||
if ' ' in arg and arg[0] not in '"\'': | ||
args[idx] = f'"{arg}"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anywhere in numpy or scipy that uses this function. Can we deprecate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've backed out this change, and will follow-up in a separate issue/PR later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xref #19811
cb_rules.buildcallbacks(lst[i]) | ||
for item in lst: | ||
if '__user__' in item['name']: | ||
cb_rules.buildcallbacks(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes here look like a nice cleanup
coefstr = '(%s + %sj)' % (fmt_float(real(coeffs[k])), | ||
fmt_float(imag(coeffs[k]))) | ||
coefstr = '(%s + %sj)' % (fmt_float(real(coeff)), | ||
fmt_float(imag(coeff))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks better now
OK, I had a deeper look. Only one of these looks out of place, the rest are either improvements or "doesn't matter". Some of the code can be cleaned up further. |
Yes, and that needs to continue to be the case, especially for the functions in the tests. That |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip thanks, comments addressed in updated commit (via push -f
)
numpy/core/records.py
Outdated
for i in ran 8000 ge(len(arrayList)): | ||
_array[_names[i]] = arrayList[i] | ||
for name, item in zip(_names, arrayList): | ||
_array[name] = item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done. Also improved check error message to also show name of the field.
for i in range(len(x)): | ||
assert_almost_equal(y[i], y_r[i]) | ||
for yi, y_ri in zip(y, y_r): | ||
assert_almost_equal(yi, y_ri) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must have been an oversight, as assert_almost_equal
takes array_like args, and seems to work with complex types. Done 3x.
numpy/distutils/misc_util.py
Outdated
args[i] = '"%s"' % (a) | ||
for idx, arg in enumerate(args): | ||
if ' ' in arg and arg[0] not in '"\'': | ||
args[idx] = f'"{arg}"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've backed out this change, and will follow-up in a separate issue/PR later.
for i in range(len(desired)): | ||
assert_array_equal(res[i], desired[i]) | ||
for result_item, desired_item in zip(res, desired): | ||
assert_array_equal(result_item, desired_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a if len(res) != len(desired): raise ...
or is it guarenteed that they will be equal length in all the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this function is not used, so it is now removed.
numpy/lib/tests/test_shape_base.py
Outdated
for i in range(len(desired)): | ||
assert_array_equal(res[i], desired[i]) | ||
for result_item, desired_item in zip(res, desired): | ||
assert_array_equal(result_item, desired_item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a if len(res) != len(desired): raise ... or is it guarenteed that they will be equal length in all the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a length check, with a comment for PEP 618 for Python 3.10, when these functions should be refactored to zip(..., strict=True)
.
It also turns out there was one failure (test_integer_split_2D_rows
), where the lengths were not equal, which is simple to fix in this PR.
This looks good now, just two small nits that if broken now they were broken before too. |
ff8c293
to
64f15a9
Compare
@mattip the main changes to review are in |
@@ -392,7 +392,7 @@ def test_integer_split_2D_rows(self): | |||
assert_(a.dtype.type is res[-1].dtype.type) | |||
|
|||
# Same thing for manual splits: | |||
res = array_split(a, [0, 1, 2], axis=0) | |||
res = array_split(a, [0, 1], axis=0) | |||
tgt = [np.zeros((0, 10)), np.array([np.arange(10)]), 8000 | |||
np.array([np.arange(10)])] | |||
compare_results(res, tgt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the alternative change is to keep this res
, and add a fourth array to tgt
.
Thanks @mwtoews for working through all this |
Refactor
for i in range(len(items))
tofor item in items
, which is generally regarded to be more Pythonic.Other instances refactored to use either
zip
orenumerate
.Cases were identified using
git grep " in range(len("
, although many other cases not suitable for refactoring.