8000 MAINT: Prefer `np.fill_diagonal` over `diag_indices` by lgeiger · Pull Request #27965 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Prefer np.fill_diagonal over diag_indices #27965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 25, 2023

Conversation

lgeiger
Copy link
Contributor
@lgeiger lgeiger commented Dec 15, 2023

What does this implement/fix? Explain your changes.

np.fill_diagonal internally uses a faster implementation that never constructs the indices and uses simple slicing (ref numpy docs)

Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 05d89de. Link to the linter CI: here

@lesteve
Copy link
Member
lesteve commented Dec 19, 2023

Just curious did this impact performance in one of your use case or you bumped into it by reading the code?

@lgeiger
Copy link
Contributor Author
lgeiger commented Dec 19, 2023

Just curious did this impact performance in one of your use case or you bumped into it by reading the code?

It's a mix of both. I was bottlenecked by the cosine computation so I looked at the code and found that this change would give a small performance win without adding extra complexity:

S = np.random.uniform(size=(20, 20))

%timeit np.fill_diagonal(S, 0.0)
# 1.14 µs ± 4.47 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit S[np.diag_indices_from(S)] = 0.0
# 9.3 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lesteve lesteve added the Quick Review For PRs that are quick to review label Dec 19, 2023
@lesteve
Copy link
Member
lesteve commented Dec 19, 2023

For the record, I ran a quick benchmark, that seems to show that fill_diagonal is sometimes slightly slower than diag_indices_from, contrary to what the doc says, see for size=5000 or size=10000 below:

size=10
  fill_diagonal    : 615 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 6.22 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=50
  fill_diagonal    : 837 ns ± 25.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 6.5 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=100
  fill_diagonal    : 1.04 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 6.69 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=500
  fill_diagonal    : 2.86 µs ± 32.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 8.67 µs ± 72.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=1000
  fill_diagonal    : 6.71 µs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 11.3 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=5000
  fill_diagonal    : 32.7 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 29.6 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=10000
  fill_diagonal    : 63.7 µs ± 423 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 56 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=30000
  fill_diagonal    : 390 µs ± 4.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  diag_indices_from: 423 µs ± 6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
import numpy as np
size_list = [10, 50, 100, 500, 1000, 5000, 10_000, 30_000]
for size in size_list:
    arr = np.random.uniform(size=(size, size))

    print(f'{size=}')
    print('  fill_diagonal    : ', end='')
    %timeit np.fill_diagonal(arr, 0.)
    print('  diag_indices_from: ', end='')
    %timeit arr[np.diag_indices_from(arr)] = 0.

@lgeiger
Copy link
Contributor Author
lgeiger commented Dec 19, 2023

Thanks for running more benchmarks!
On my machine fill_diagonal is consistently faster (macOS x86, Python 3.11.6, Numpy 1.26.2). Though for larger arrays the benefits of this change are definitely less:

size=10
  fill_diagonal    : 1.18 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 10.7 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=50
  fill_diagonal    : 1.36 µs ± 36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 9.81 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=100
  fill_diagonal    : 1.64 µs ± 47.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 9.59 µs ± 232 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=500
  fill_diagonal    : 4.01 µs ± 30.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 11.2 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=1000
  fill_diagonal    : 7.25 µs ± 306 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 15.1 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=5000
  fill_diagonal    : 40.1 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 50.8 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=10000
  fill_diagonal    : 93.7 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 102 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=30000
  fill_diagonal    : 539 µs ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  diag_indices_from: 601 µs ± 106 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

@adrinjalali adrinjalali merged commit 0d4a88e into scikit-learn:main Dec 25, 2023
@lgeiger lgeiger deleted the fill_diagonal branch December 25, 2023 18:34
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0