8000 MAINT: Simplify block implementation by j-towns · Pull Request #9667 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Simplify block implementation #9667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 12, 2017
Merged
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix python 3.4 sequence error
  • Loading branch information
j-towns committed Nov 3, 2017
commit ad278f3bfea97636b18c8d699babdd02aa3cac5c
4 changes: 2 additions & 2 deletions numpy/core/shape_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -402,8 +402,8 @@ def format_index(index):
)
)
elif type(arrays) is list and len(arrays) > 0:
indexes, arr_ndims = zip(*(_block_check_depths_match(arr, parent_index + [i])
for i, arr in enumerate(arrays)))
indexes, arr_ndims = zip(*[_block_check_depths_match(arr, parent_index + [i])
Copy link
Member
@eric-wieser eric-wieser Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is where things slow - can you just iterate over the full result in the loop below, and use arr_ndims.append? I don't expect the * here to be particularly fast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented something that avoids the zip(*...) and pushed it, it seems a little better, though there are still some things that are a little slower than 8a83a5f, probably just because _nx.ndim is called on every item at the bottom of the nesting now, which I think we can't get around without reverting to the previous approach.

The two things that are really significantly slower are trivial use cases and are still very fast relative to other use cases anyway.

    before     after       ratio
  [8a83a5fc] [2e948722]
+   11.82μs    19.16μs      1.62  bench_shape_base.Block.time_no_lists(10)
+    9.87μs    15.62μs      1.58  bench_shape_base.Block.time_no_lists(1)
+  460.36μs   588.47μs      1.28  bench_shape_base.Block.time_3d(10)
+   14.19μs    16.98μs      1.20  bench_shape_base.Block.time_block_simple_row_wise(10)
+   54.07μs    64.21μs      1.19  bench_shape_base.Block.time_no_lists(100)
+   80.41μs    88.62μs      1.10  bench_shape_base.Block.time_3d(1)
-  277.39μs   256.95μs      0.93  bench_shape_base.Block.time_nested(100)
-   83.21μs    73.71μs      0.89  bench_shape_base.Block.time_block_simple_row_wise(100)
-   67.95μs    59.97μs      0.88  bench_shape_base.Block.time_block_complicated(10)
-   62.88μs    51.18μs      0.81  bench_shape_base.Block.time_block_complicated(1)
-   25.42μs    20.63μs      0.81  bench_shape_base.Block.time_block_simple_column_wise(1)
-     5.98s      4.77s      0.80  bench_shape_base.Block.time_3d(100)
-   86.93μs    67.27μs      0.77  bench_shape_base.Block.time_nested(10)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. in absolute terms there's unlikely to be much slowdown to users' code

Copy link
Member
@eric-wieser eric-wieser Nov 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just because _nx.ndim is called on every item at the bottom of the nesting now

Wasn't this always the case? The difference is, you previously also called it on every intermediate result too - so if anything, you now call it less.

Does avoiding a generator comprehension speed things up? You could call _block_check_depths_match in the for loop directly, although I agree what you have is cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was calling atleast_nd on every level (including the bottom one), but ndim was only called for depth < max_depth I believe. So perhaps ndim, when called on scalars, is slower than our atleast_nd function (which is just a wrapper for np.array)?

for i, arr in enumerate(arrays)])

first_index = indexes[0]
for i, index in enumerate(indexes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this overwrite the index parameter is confusing

Expand Down
0