8000 MAINT: refactor "for ... in range(len(" statements by mwtoews · Pull Request #19781 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: refactor "for ... in range(len(" statements #19781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 1, 2021

Conversation

mwtoews
Copy link
Contributor
@mwtoews mwtoews commented Aug 30, 2021

Refactor for i in range(len(items)) to for item in items, which is generally regarded to be more Pythonic.

Other instances refactored to use either zip or enumerate.

Cases were identified using git grep " in range(len(", although many other cases not suitable for refactoring.

@mwtoews mwtoews force-pushed the foreach-item branch 2 times, most recently from f0b865a to 5f6cd40 Compare August 30, 2021 03:07
@mattip
Copy link
Member
mattip commented Aug 30, 2021

There are three patterns of replacement:

  • replace for i in range(len(x)): with for item in x:
  • replace for i in range(len(x)): with for i, item in enumerate(x):
  • replace for i in range(len(x)): with for itemx, itemy in zip(x, y):

I think the first one is a clear win. I am not so excited about the other two. Is there a clear preference in our style guide? I think I would prefer to treat them as we do other code cleanups: only touch them if we need to do so for other reasons.

@eric-wieser
Copy link
Member

IMO all three are overall wins, but won't comment on whether the wins are strong enough to justify the review time. From a quick skim I think this is some nice cleanup.

@rkern
Copy link
Member
rkern commented Aug 30, 2021

When replacing with zip(), especially in the test utilities, be sure to handle differing lengths correctly.

@mwtoews
Copy link
Contributor Author
mwtoews commented Aug 30, 2021

The motivation behind this PR is part style and performance. The for statement in Python is really a foreach statement, so it's not the same as C. Iterating over the simplest statements yields the best performance when scaled up (using Python 3.9.6):

ar = np.arange(10000, dtype=float)

%timeit for x in ar: _ = x
# 334 µs ± 75
8000
2 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for i, x in enumerate(ar): _ = x
# 552 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for a, b in zip(ar, ar): _, _ = a, b
# 730 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for i in range(len(ar)): _ = ar[i]
# 746 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Where there are shorter iterations, the performance is about the same for all methods.

Mark any edits that are less readable, and I'll cull them.

@mwtoews
Copy link
Contributor Author
mwtoews commented Aug 30, 2021

When replacing with zip(), especially in the test utilities, be sure to handle differing lengths correctly.

Thanks, I was paying attention to this aspect, as zip will stop on the shortest iteration. But also x[i], y[i] would have raised IndexError before if they were different lengths.

for i in range(len(arrayList)):
_array[_names[i]] = arrayList[i]
for name, item in zip(_names, arrayList):
_array[name] = item
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you merge the check in lines 667-671 into this loop so we iterate once over arrayList

Copy link
Contributor Author
@mwtoews mwtoews Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done. Also improved check error message to also show name of the field.

for i in range(len(x)):
assert_almost_equal(y[i], y_r[i])
for yi, y_ri in zip(y, y_r):
assert_almost_equal(yi, y_ri)
Copy link
Member
@mattip mattip Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be written more compactly without the loop as assert_almost_equal(y, y_r)? What am I missing?

Edit: same thing in the following changes in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must have been an oversight, as assert_almost_equal takes array_like args, and seems to work with complex types. Done 3x.

args[i] = '"%s"' % (a)
for idx, arg in enumerate(args):
if ' ' in arg and arg[0] not in '"\'':
args[idx] = f'"{arg}"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anywhere in numpy or scipy that uses this function. Can we deprecate it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've backed out this change, and will follow-up in a separate issue/PR later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xref #19811

cb_rules.buildcallbacks(lst[i])
for item in lst:
if '__user__' in item['name']:
cb_rules.buildcallbacks(item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look like a nice cleanup

coefstr = '(%s + %sj)' % (fmt_float(real(coeffs[k])),
fmt_float(imag(coeffs[k])))
coefstr = '(%s + %sj)' % (fmt_float(real(coeff)),
fmt_float(imag(coeff)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks better now

@mattip
Copy link
Member
mattip commented Aug 30, 2021

OK, I had a deeper look. Only one of these looks out of place, the rest are either improvements or "doesn't matter". Some of the code can be cleaned up further.

< 8000 div class="avatar-parent-child TimelineItem-avatar d-none d-md-block"> @rkern
Copy link
Member
rkern commented Aug 30, 2021

Thanks, I was paying attention to this aspect, as zip will stop on the shortest iteration. But also x[i], y[i] would have raised IndexError before if they were different lengths.

Yes, and that needs to continue to be the case, especially for the functions in the tests. That IndexError is part of the test.

Copy link
Contributor Author
@mwtoews mwtoews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip thanks, comments addressed in updated commit (via push -f)

for i in ran 8000 ge(len(arrayList)):
_array[_names[i]] = arrayList[i]
for name, item in zip(_names, arrayList):
_array[name] = item
Copy link
Contributor Author
@mwtoews mwtoews Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done. Also improved check error message to also show name of the field.

for i in range(len(x)):
assert_almost_equal(y[i], y_r[i])
for yi, y_ri in zip(y, y_r):
assert_almost_equal(yi, y_ri)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must have been an oversight, as assert_almost_equal takes array_like args, and seems to work with complex types. Done 3x.

args[i] = '"%s"' % (a)
for idx, arg in enumerate(args):
if ' ' in arg and arg[0] not in '"\'':
args[idx] = f'"{arg}"'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've backed out this change, and will follow-up in a separate issue/PR later.

for i in range(len(desired)):
assert_array_equal(res[i], desired[i])
for result_item, desired_item in zip(res, desired):
assert_array_equal(result_item, desired_item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a if len(res) != len(desired): raise ... or is it guarenteed that they will be equal length in all the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this function is not used, so it is now removed.

for i in range(len(desired)):
assert_array_equal(res[i], desired[i])
for result_item, desired_item in zip(res, desired):
assert_array_equal(result_item, desired_item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a if len(res) != len(desired): raise ... or is it guarenteed that they will be equal length in all the tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a length check, with a comment for PEP 618 for Python 3.10, when these functions should be refactored to zip(..., strict=True).

It also turns out there was one failure (test_integer_split_2D_rows), where the lengths were not equal, which is simple to fix in this PR.

@mattip
Copy link
Member
mattip commented Aug 31, 2021

This looks good now, just two small nits that if broken now they were broken before too.

@mwtoews
Copy link
Contributor Author
mwtoews commented Sep 1, 2021

@mattip the main changes to review are in numpy/lib/tests/test_shape_base.py

@@ -392,7 +392,7 @@ def test_integer_split_2D_rows(self):
assert_(a.dtype.type is res[-1].dtype.type)

# Same thing for manual splits:
res = array_split(a, [0, 1, 2], axis=0)
res = array_split(a, [0, 1], axis=0)
tgt = [np.zeros((0, 10)), np.array([np.arange(10)]), 8000
np.array([np.arange(10)])]
compare_results(res, tgt)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alternative change is to keep this res, and add a fourth array to tgt.

@mattip mattip merged commit 6829957 into numpy:main Sep 1, 2021
@mattip
Copy link
Member
mattip commented Sep 1, 2021

Thanks @mwtoews for working through all this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0