-
-
Notifications
You must be signed in to change notification settings - Fork 26k
MAINT: Prefer np.fill_diagonal
over diag_indices
#27965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just curious did this impact performance in one of your use case or you bumped into it by reading the code? |
It's a mix of both. I was bottlenecked by the cosine computation so I looked at the code and found that this change would give a small performance win without adding extra complexity: S = np.random.uniform(size=(20, 20))
%timeit np.fill_diagonal(S, 0.0)
# 1.14 µs ± 4.47 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
%timeit S[np.diag_indices_from(S)] = 0.0
# 9.3 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For the record, I ran a quick benchmark, that seems to show that
import numpy as np
size_list = [10, 50, 100, 500, 1000, 5000, 10_000, 30_000]
for size in size_list:
arr = np.random.uniform(size=(size, size))
print(f'{size=}')
print(' fill_diagonal : ', end='')
%timeit np.fill_diagonal(arr, 0.)
print(' diag_indices_from: ', end='')
%timeit arr[np.diag_indices_from(arr)] = 0. |
Thanks for running more benchmarks!
|
What does this implement/fix? Explain your changes.
np.fill_diagonal
internally uses a faster implementation that never constructs the indices and uses simple slicing (ref numpy docs)