8000 Improve performance of .fold() by jturner314 · Pull Request #574 · rust-ndarray/ndarray · GitHub
[go: up one dir, main page]

Skip to content

Improve performance of .fold()#574

Merged
bluss merged 2 commits intorust-ndarray:masterfrom
jturner314:optimize-fold
Dec 15, 2018
Merged

Improve performance of .fold()#574
bluss merged 2 commits intorust-ndarray:masterfrom
jturner314:optimize-fold

Conversation

@jturner314
Copy link
Member

This PR does two things that improve the performance of .fold() for ArrayBase:

  1. Specialize the process of finding the inner axis for 2-D arrays.
  2. Implement .fold() for the Axes iterator. This is an improvement for arrays with more than 2 axes.

For the function below with a 320x320 input array, this PR improves the performance by ~50%.

const RADIUS: usize = 1;
const WINDOW_SIZE: usize = 2 * RADIUS + 1;

fn sum_sq_diff_windows(data: ArrayView2<f64>) -> Array2<f64> {
    let mut out = Array2::zeros((data.rows() - 2 * RADIUS, data.cols() - 2 * RADIUS));
    Zip::from(&mut out)
        .and(data.windows((WINDOW_SIZE, WINDOW_SIZE)))
        .apply(|out, window| {
            let center = window[(RADIUS, RADIUS)];
            *out = window.fold(0., |acc, x| acc + (x - center).powi(2));
        });
    out
}

@bluss
Copy link
Member
bluss commented Dec 15, 2018

Nice. So it'a win even in a 3x3 array?

@bluss bluss merged commit 03552e2 into rust-ndarray:master Dec 15, 2018
@jturner314
Copy link
Member Author
jturner314 commented Dec 15, 2018

So it'a win even in a 3x3 array?

Yes, each window is 3x3, and this PR significantly improves the performance because it reduces the cost of each .fold() call. (It doesn't improve the iteration performance within .fold(); it reduces the cost of determining the iteration order.)

@jturner314 jturner314 deleted the optimize-fold branch December 15, 2018 20:23
@bluss
Copy link
Member
bluss commented Dec 15, 2018

Ah, of course. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0