MAINT: Prefer `np.fill_diagonal` over `diag_indices` #27965

lgeiger · 2023-12-15T13:26:31Z

What does this implement/fix? Explain your changes.

np.fill_diagonal internally uses a faster implementation that never constructs the indices and uses simple slicing (ref numpy docs)

github-actions · 2023-12-15T13:27:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 05d89de. Link to the linter CI: here}

lesteve · 2023-12-19T09:26:33Z

Just curious did this impact performance in one of your use case or you bumped into it by reading the code?

lgeiger · 2023-12-19T10:28:20Z

Just curious did this impact performance in one of your use case or you bumped into it by reading the code?

It's a mix of both. I was bottlenecked by the cosine computation so I looked at the code and found that this change would give a small performance win without adding extra complexity:

S = np.random.uniform(size=(20, 20))

%timeit np.fill_diagonal(S, 0.0)
# 1.14 µs ± 4.47 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit S[np.diag_indices_from(S)] = 0.0
# 9.3 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

lesteve

LGTM

lesteve · 2023-12-19T16:41:33Z

For the record, I ran a quick benchmark, that seems to show that fill_diagonal is sometimes slightly slower than diag_indices_from, contrary to what the doc says, see for size=5000 or size=10000 below:

size=10
  fill_diagonal    : 615 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 6.22 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=50
  fill_diagonal    : 837 ns ± 25.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 6.5 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=100
  fill_diagonal    : 1.04 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 6.69 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=500
  fill_diagonal    : 2.86 µs ± 32.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 8.67 µs ± 72.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=1000
  fill_diagonal    : 6.71 µs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 11.3 µs ± 222 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=5000
  fill_diagonal    : 32.7 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 29.6 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=10000
  fill_diagonal    : 63.7 µs ± 423 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 56 µs ± 3.81 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=30000
  fill_diagonal    : 390 µs ± 4.04 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  diag_indices_from: 423 µs ± 6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

import numpy as np
size_list = [10, 50, 100, 500, 1000, 5000, 10_000, 30_000]
for size in size_list:
    arr = np.random.uniform(size=(size, size))

    print(f'{size=}')
    print('  fill_diagonal    : ', end='')
    %timeit np.fill_diagonal(arr, 0.)
    print('  diag_indices_from: ', end='')
    %timeit arr[np.diag_indices_from(arr)] = 0.

lgeiger · 2023-12-19T19:18:31Z

Thanks for running more benchmarks!
On my machine fill_diagonal is consistently faster (macOS x86, Python 3.11.6, Numpy 1.26.2). Though for larger arrays the benefits of this change are definitely less:

size=10
  fill_diagonal    : 1.18 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 10.7 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=50
  fill_diagonal    : 1.36 µs ± 36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
  diag_indices_from: 9.81 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=100
  fill_diagonal    : 1.64 µs ± 47.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 9.59 µs ± 232 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=500
  fill_diagonal    : 4.01 µs ± 30.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 11.2 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=1000
  fill_diagonal    : 7.25 µs ± 306 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
  diag_indices_from: 15.1 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
size=5000
  fill_diagonal    : 40.1 µs ± 2.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 50.8 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=10000
  fill_diagonal    : 93.7 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  diag_indices_from: 102 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
size=30000
  fill_diagonal    : 539 µs ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
  diag_indices_from: 601 µs ± 106 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

)

MAINT: Prefer np.fill_diagonal over diag_indices

05d89de

github-actions bot added the module:metrics label Dec 15, 2023

lesteve approved these changes Dec 19, 2023

View reviewed changes

lesteve added the Quick Review For PRs that are quick to review label Dec 19, 2023

adrinjalali approved these changes Dec 25, 2023

View reviewed changes

adrinjalali merged commit 0d4a88e into scikit-learn:main Dec 25, 2023

lgeiger deleted the fill_diagonal branch December 25, 2023 18:34

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Feb 10, 2024

MAINT: Prefer np.fill_diagonal over diag_indices (scikit-learn#27965

07b41d3

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: Prefer `np.fill_diagonal` over `diag_indices` #27965

MAINT: Prefer `np.fill_diagonal` over `diag_indices` #27965

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MAINT: Prefer np.fill_diagonal over diag_indices #27965

MAINT: Prefer np.fill_diagonal over diag_indices #27965

Uh oh!

Conversation

What does this implement/fix? Explain your changes.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MAINT: Prefer `np.fill_diagonal` over `diag_indices` #27965

MAINT: Prefer `np.fill_diagonal` over `diag_indices` #27965