10000 RFC: Comprehensive Plan for Restructuring Dimension Types · Issue #1506 · rust-ndarray/ndarray · GitHub
[go: up one dir, main page]

Skip to content

RFC: Comprehensive Plan for Restructuring Dimension Types #1506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
akern40 opened this issue Apr 24, 2025 · 11 comments
Open

RFC: Comprehensive Plan for Restructuring Dimension Types #1506

akern40 opened this issue Apr 24, 2025 · 11 comments

Comments

@akern40
Copy link
Collaborator
akern40 commented Apr 24, 2025

Summary

This RFC proposes a long-term plan for restructuring the library's core types and approach for handling shapes, strides, and dimensionalities. A successful implementation of new dimension types should accomplish the following goals:

  1. Make the library friendlier to newcomers and more ergonomic for power users by reducing confusion around dimension types
  2. Store strides as isize, not usize
  3. Maintain the ability to capture the dimensionality of an array at the type level
  4. Allow for capturing axis lengths at the type level, and therefore allow compile-time constant optimizations
  5. Support the move towards a "trait-based" library
  6. Reduce the number of "sealed" traits, thereby making the library more extensible
  7. Incorporate, as best as possible, const generics for dimensionality and fixed-length axes
  8. Provide a strong basis for other feature enhancements, such as optimized iteration orders and named axes

This RFC is not meant to propose all the changes at once. Rather, this RFC lays out the overall vision and how we will build a smooth transition path. Each step in that path will require its own set of issues and pull requests.

An experimental implementation of this RFC (but not its integration into ndarray) can be found here (for the Layout trait and module definition) and here for the rest of the traits and types. It includes types that implement the traits, additional design notes and rationale, and examples of advanced usage (such as a completely-constant two-by-two matrix shape).

cc @bluss @jturner314 @NewBornRustacean @lucascolley

Motivation

Clarity

The current handling of shape and strides in the library is a point of confusion for users (#489, #367, this comment) and a point of complexity for maintainers (#283) since at least 2017. The dimension traits and types in ndarray are, at least for me, the most difficult part of the library to understand. I managed to write an entire RFC to revamp the array types themselves without truly understanding the current setup for dimension types. This is a testament to both the complexity of the implementation details and to the power of the current solution.

Features

In addition to complexity, the current approach limits some very promising and important optimizations and extensions.
As @grothesque points out in this comment on #1272, fixed-length axes are a critical step for many powerful optimizations. This is an issue that actually led me working on ndarray in the first place: I wanted to write code that dealt with Nx3 and Nx3x3 state vectors and covariance matrices, I wanted that code to be as fast as possible, and I wanted it to be ergonomic to write.

Non-strided layouts (blocked, sparse, diagonal, Morton, etc) are also of limited interest; see #321, #1500, and this comment.

Conceptual Explanation

I believe that the core driver of confusion around Dimension and its related types and traits is not complexity, but vocabulary. The word "dimension" describes a single size, extent, or measurement. The type Dimension describes the number of axes an array has, the "dimension" / length of each axis, the stride of each axis, and (sometimes) the index used to access the array; it's even used for helping with iteration. I believe this trait is simply overloaded, both conceptually and practically.

Users of the library also often deal with raw_dim(), a method which returns the shape as stored within the array. I believe this is emblematic of the vocabulary problem: "dim" here is ambiguous, and not descriptive of the methods actual functionality.

In #519, @jturner314 lays out a plan for altering the Dimension trait to resolve the shape/strides usize/isize problem. However, discussion at the time suggested that the solution did not go far enough, and that const generics (which have since landed in stable Rust, albeit in a limited form) would likely change the design.

So this RFC takes a clean slate approach, inspired partially by the work done for C++ mdspan and by the excellent work by @fre-hu in mdarray. However, this proposal is not a direct copy of either approach and was developed independently to support ndarray specifically.

New Vocabulary

The basic conceit of this design is breaking up Dimension to give each concept its own trait:

  • Dimensionality, for the number of axes an array has
  • Shape, for the lengths of each axis
  • Strides, for the strides of each axis
  • Layout, for abstracting the memory order of an array
  • Strided: Layout, a layout specifically for strided arrays

The idea of an "index" - or multiple kinds of indices - likely deserves its own traits or types, but that is a question that can be resolved as the design matures, step-by-step.

Technical Details

Dimensionality

We start by breaking out the concept for "number of axes". I propose a trait, Dimensionality, that serves to unify both compile-time numbers of dimensions and runtime numbers of dimensions. Two alternatives deserve mentioning: const generics and typenum. We cannot use const generics universally because they are not yet powerful enough to express the dimensionality for operations like inserting an axis (as that would require N + 1 to be evaluated in a const context). We cannot use typenum universally because it does not support the dynamic-dimensional "escape hatch" that we need for runtime-only array dimensionalities; plus, its errors are very verbose and may impose extra difficulty in the use of ndarray. This leaves us with a third option: implement our own type-level numeric system, very akin to typenum, with good const generic support and a dynamic-dimensional escape hatch after a fixed number (e.g., 7 as it is right now or 12, to match the maximum tuple size).

pub trait Dimensionality:
    Copy
    + Eq
    + Debug
    + Send
    + Sync
    + DMax<D0, Output = Self>
    // ... other max relationship
    + DAdd<Self>
    // ... other add relationships
{
    /// The dimensionality as a constant usize, if it's not dynamic.
    const N: Option<usize>;

    type Smaller: Dimensionality;

    type Larger: Dimensionality;

    // And more associated types and methods, as needed
}

A nice helper trait is Dimensioned, implemented on types that have a dimensionality (e.g., arrays, shapes, strides, vecs, etc). It provides an associated type, Dimality (or maybe just NDim?), that describes how many dimensions it has.
This is mostly useful for not always writing type Dimality: Dimensionality for all of the following traits.

Shape

The next step is to break out the concept of "length of each axis". Here is our first opportunity for an exciting feature enhancement: fixed-length axes, captured at compile time. A basic Shape trait can lay the foundation; for example

pub trait Shape:
    Dimensioned + Index<usize, Output = usize> + Eq + Clone + Send + Sync + Debug
{
    /// Get the shape as a (possibly-borrowed) slice of `usize`.
    fn as_slice(&self) -> Cow<'_, [usize]>;

    /// Try to index the shape mutably, if `index` is a non-`const` axis.
    ///
    /// Types that implement `Shape` do not guarantee any sort of mutability; this method
    /// allows a non-panicking way to discover whether a given axis of a shape is mutable.
    fn try_index_mut(&mut self, index: usize) -> Result<&mut usize, ShapeStrideError<Self>>;

    /// Try to create an `ndim`-dimensional shape filled with `value`.
    ///
    /// This may fail if either the number of dimensions does not match the dimensionality
    /// of the shape, or if the shape has any `const` axis lengths.
    fn try_full(ndim: usize, value: usize) -> Result<Self, ShapeStrideError<Self>>;

    /// Convert this shape into a dynamic-dimensional shape.
    fn to_dyn(&self) -> DShape {
        self.as_slice().into()
    }

    /// Get the runtime dimensionality of the shape.
    ///
    /// Implementors of `Shape` must guarantee that this value will match [`Shape::Dimality`].
    /// Does that mean that this should be an unsafe trait?
    fn ndim(&self) -> usize {
        self.as_slice().len()
    }

    /// Get the number of elements that the array contains.
    fn size(&self) -> usize {
        self.as_slice().iter().product()
    }

    /// Get the number of elements that the array contains, checking for overflow.
    fn size_checked(&self) -> Option<usize> {
        self.as_slice()
            .iter()
            .try_fold(1_usize, |acc, &i| acc.checked_mul(i))
    }

    /// Try to turn this shape into a constant `N`-dimensional shape.
    fn try_to_nshape<const N: usize>(&self) -> Result<NShape<N>, ShapeStrideError<NShape<N>>> {
        if self.ndim() == N {
            let mut values = [0; N];
            values.split_at_mut(N).0.copy_from_slice(&self.as_slice());
            Ok(values.into())
        } else {
            Err(ShapeStrideError::BadDimality(PhantomData, self.ndim()))
        }
    }

    // Additional methods...
}

I've included some familiar methods (size, ndim, size_checked) but also some that require explaining:

  • as_slice returns Cow because, critically, Shape does not guarantee that there is an underlying slice to be borrowed, since some axes may have fixed lengths and not be stored at runtime
  • try_index_mut, rather than just : IndexMut<...>, because Shape does not guarantee the mutability of any of its axis lengths
  • try_full, rather than just full, because some Shapes may not allow creation with arbitrary axis lengths (because their sizes are fixed)

The natural extension is a ShapeMut trait that brings us back to our familiar world of completely mutable shapes.

A fun note here: with the Dimensionality design, we could even create a trait (AxisMut<N>) that captures at the type-level that the Nth axis is mutable. I'm not sure this is particularly useful, but I think it's cool that it can be done.

Error Handling

I also included a ShapeStrideError in the above Shape definition; see the reference implementation (linked at the start) for more information, but it's just an enum that implements Display.

Stride

Of course, we'll also be needing a Strides trait; this is essentially an exact copy of Shape, except that the Index bound will return an isize, not a usize. There would also be an accompanying StridesMut for strides that are entirely mutable. Because stride manipulation is pretty central to libraries like ndarray or NumPy, I expect that the bare, immutable Strides will get a lot less usage than the immutable Shape.

I'm not sure whether Strides needs to support mixing const-valued strides with runtime strides; my initial prototypes don't bother. I can't think of many applications where fixing some subset of strides while allowing others to vary is helpful; but maybe we should! It avoids some complexity, though.

Layout

This is where my design proposal really starts to depart from what ndarray does right now, and the second opportunity for new features. I'd like to propose a Layout trait that would serve as the main interface between an array's memory and its "layout".

Right now, ndarray exclusively supports dense, strided arrays, as the shape and strides are fields within the core types (as of the array reference PR, they sit within LayoutRef). However, moving to trait-based shapes and strides means that we'd have to go from Array<A, D> to Array<A, D, Sh, St> for Sh: Shape and St: Strides, and that wouldn't allow us to move into capabilities like block-organized matrices, sparse matrices, etc. Here's where the Layout trait comes in: like the C++ mdspan implementation, Layout is the gatekeeper between logical indices and memory access:

/// A trait capturing how an array is laid out in memory.
pub trait Layout: Dimensioned {
    /// The type of shape that the array uses.
    ///
    /// Must implement [`Shape`] and have the same dimensionality.
    type Shape: Shape<Dimality = Self::Dimality>;

    /// The index type that this layout uses; e.g., `[usize; N]`.
    ///
    /// Must have the same dimensionality.
    type Index: Dimensioned<Dimality = Self::Dimality>;

    /// Get the shape of the layout.
    ///
    /// If the implementing type does not carry a shape directly,
    /// one should be constructed and passed as [`Cow::Owned`].
    fn shape(&self) -> Cow<'_, Self::Shape>;

    /// Index into this layout with a multidimensional index.
    fn index(&self, idx: Self::Index) -> isize;

    // Shortcut methods, we could add more of these
    fn ndim(&self) -> usize {
        self.shape().ndim()
    }

    fn size(&self) -> usize {
        self.shape().size()
    }

    fn size_checked(&self) -> Option<usize> {
        self.shape().size_checked()
    }

    // Other methods, which will need to support iteration and other kinds of indexing...
}

A few things to note:

  • Like Shape::as_slice, Layout::shape returns Cow since it does not require the storage of a type that implements Shape (hyper-optimizations, like fixed-size matrices, may have their shapes and strides defined completely at compile time)
  • Layout can't implement Index, as it requires computing the offset rather than just accessing it in memory; so we define Layout::index instead
  • I notionally include an Index associated type, but I'm not sure that's the way to go; or maybe it should have a default?

Finally, the Layout trait provides a unified interface and vocabulary for users and developers to talk and think about times when an array's shape and strides should be bundled together. The clearest example of this, in my opinion, is the creation of arrays that "look like" another array, in terms of memory order, e.g., a zeros_like function. I believe that doing something like zeros_layout(arr.layout()) gives users a clear way to indicate the shape and strides of a desired array.

Strided Layouts

If you're still reading this RFC, you may be so invested as to have noticed that Layout doesn't have anything about strides.
Instead, I propose we define an additional trait, Strided: Layout, as so:

pub trait Strided: Layout {
    type Strides: Strides<Dimality = Self::Dimality>;

    fn strides(&self) -> Cow<'_, Self::Strides>;

    // Probably other methods...
}

This leaves us a path forward for those non-dense and non-strided layouts in the future. We may want to consider restricting ArrayBase to L: Strided either temporarily or permanently as we make the transition to this new design; more on that later.

New Generics

Rolling up into Layout means we can now go back to our core ArrayBase (and co.) types using <A, L: Layout>. This would be a huge breaking change, and we should consider if it's a) worth it and b) how we could make it as smooth as possible. However, it comes with a number of advantages:

  • The fewer generics the better, and keeping it at 2 would be nice
  • Trait bounds on associated types can be implicitly added, meaning that fn function(arr: &ArrayRef<A, L>) where L: Strided<Dimality = NDim<3>> {} specifies a 3-dimensional strided array
  • Type aliases can be easily constructed, such as type Array3<A> = Array<A, NLayout<3>> or type Matrix3x3<A> = Array<A, MatLayout<3, 3>>

Drawbacks

As best I can tell, there are three obvious drawbacks to this approach:

  1. Backwards Compatibility: I am proposing a framework that I believe is both more ergonomic and more powerful, but it will eventually come with major breaking changes.
  2. Complexity: There are significantly more traits here, with more flexibility. If we're not careful and/or we don't communicate clearly, this could end up being or looking more complex than the current Dimension approach.
  3. Performance: Ensuring that these are zero-cost abstractions will require careful benchmarking as we go; otherwise, we may end up erasing critical information or using pointer indirection where we weren't before. Similarly, concerns may come up around binary sizes due to monomorphization, LLVM optimizations, etc.

Rationale and Alternatives

I think the clearest alternative to this entire plan is to stick with Dimension and alter it along the lines of what was suggested in #519. As I said before, while this might help the usize/isize problem of shape and strides, I don't think it solves the larger issue of vocabulary, nor does it provide a path for new features. I strongly believe that the pain of breaking changes would be worth the value in ergonomics and features that this new proposal provides.

Transition Path

This is a giant proposal that includes major breaking changes. I wanted to get it all down to create a sort of "north star" that library developers can work together to achieve. To actually implement it, I think we'd have to do something like the following:

  • Prototype the core traits suggested here in a separate branch
  • Create compatibility / bridge traits that allow for smoother transitions between the current API and the new one
  • Introduce new APIs and deprecating old ones
  • Establish plans for versioning - when we want to introduce which breaking changes
  • Provide very clear documentation - see RFC: Website for Expanded Documentation #1499 for how we might do that effectively

The transition path itself needs to be fleshed out. However, I think we can start introducing these new traits and types, and start adapting the library. As we go, we'll be able to experiment with the best ways to transition the codebase.

Unresolved Questions

The largest unresolved question is clearly the transition path. However, the interplay of this design with indexing and iteration is still an unknown. I think more experimentation is needed to figure out how Layout can ergonomically support iteration, in particular.

Summary

This is already a long RFC, so I'll be brief: I propose a design that I think is both more ergonomic and more powerful than our current Dimension-based approach and provides users with "industry"-standard vocabulary (i.e., like NumPy or PyTorch or others). Smooth transition will be a challenge, and breaking changes will be involved, but I think it will be worth it.

@NewBornRustacean
Copy link
Contributor
NewBornRustacean commented Apr 26, 2025

@akern40 what's up! thanks for opening this up🚀
I think this is cool, and have some questions/suggestions!

about the example repo

  • ndarray-design is really helpful to understand what's on your mind, what if we could have the repo as in a sub-directory under rust-ndarray? because this discussion eventually brings dramatic breaking changes so it would be nice to have further discussion in the community, imo.

about Layout.

  • if I understand correctly, once we have LayoutMapping as a abstraction(Layout in the proposal), we could access layout and modify iteration direction easily. I think this is kinda supporting logical layouts with maximum flexibility.
  • (might be a dumb/irrelevant question 🦖) if some functions require physically contiguous array, we still need to copy. (for example select_nth_unstable in my previous PR Add partition(similar to numpy.partition) #1498 ) as far as I know, this kind of copy to a temp buffer is quite common pattern in modern libs(like numpy, tensorflow etc.) so is this fundamental limits of layout abstraction?

useful materials to follow up

@lucascolley
Copy link
Contributor

I don't have much to add :) but in case it is of interest, SciPy is now using mdspan for our special function scalar kernels at https://github.com/scipy/xsf/blob/4fff9b2cb2b5c31a0cf0b0f609d2699a5eeac53b/include/xsf/numpy.h#L68

@NewBornRustacean
Copy link
Contributor

Instead, I'd suggest that the reference type allows for mutability of both data and shape. That means the following:

  • Almost all functionality can be implemented on the reference type.
  • It is transparent to function callers what is being changed: the value they pass in will be altered.
  • It is transparent to function writers what they are changing: whatever the reference type refers to.
  • It creates simple rules for function writers to follow: take in &ReferenceType to read the array, and &mut ReferenceType to write the array.

..and I agree with this (from suggestion in ndarry-design)!
maybe similar reasoning could be applied to this RFC(especially for the Layout) as well 😃

@akern40
Copy link
Collaborator Author
akern40 commented Apr 27, 2025

@NewBornRustacean

what if we could have the repo as in a sub-directory under rust-ndarray

I keep ndarray-design separate mainly because it's helpful to work out what I "ideally" might want without also dealing with the naming conflicts, errors, etc that might come from including the code in the ndarray repository. However, I do hope that a starting point for implementing this RFC would be to copy over a good segment of the code from that experimental repo.

we could access layout and modify iteration direction easily

That's the hope! In particular, provide a nice, unified API for it.

physically contiguous array... is this fundamental limits of layout abstraction

I don't think so! Array views should still be able to indicate whether they are contiguous, and you can always iterate over items to copy them back out. I'm not exactly sure what the concern would be, if you could elaborate?

@NewBornRustacean
Copy link
Contributor

@akern40

physically contiguous array... is this fundamental limits of layout abstraction

I don't think so! Array views should still be able to indicate whether they are contiguous, and you can always iterate over items to copy them back out. I'm not exactly sure what the concern would be, if you could elaborate?

Ah, I see that my explanation was not very clear!
What I meant by 'fundamental limits of layout abstraction' was something like:
'Even if we have a good layout abstraction, wouldn’t we still need to either copy or adjust strides if we want to use functions that only work with contiguous arrays?'

Looking back at my question, it does seem like it’s less about the design of the layout itself.
Sorry if I caused any confusion!

@NewBornRustacean
Copy link
Contributor

@akern40 and about this RFC itself, what if we versioning this? like tagging status good-to-go, ongoing etc.
I found this rfc workflow from sentry. I'm not sure this kind of process would be helpful, maybe it could be like vaccination before we get breaking changes.

@grothesque
Copy link
grothesque commented Apr 29, 2025

Wouldn’t a clean cut be actually less work (for ndarray contributors and users taken together) than evolving ndarray into something similar to mdarray? Providing a clean evolution path that accompanies users over a series of breaking changes (and following such a path by a library’s users) is a lot of work. Perhaps the alternative approach where ndarray would mostly remain the way it is might actually be simpler for most people, while providing the additional benefit of a fresh design?

Or perhaps one could evolve (parts of) ndarray into a compatibility wrapper around mdarray?

In any way, I think that there is a lot of value in having a minimal core array crate. Minimal in the sense that it includes all that is necessary (in particular given Rust’s orphan rules), but nothing that can be provided by other crates. Mdarray comes close to this ideal. In our research group, we’ve been using mdarray for a while now (the projects are internal so far), and while bits are missing here and there, our impression is that the core design is very solid.

(Alternatively, if this design proposal is meant to improve on mdarray (or to address other needs), I would find it useful if these aspects could be made explicit in the proposal.)


I think that there are other ways in which ndarray could be improved upon, and a transition to mdarray could be a chance for that. The following is just to give a concrete example of what I have in mind:

I am currently exploring builder-pattern-based approach to linear algebra algorithms that I hope will allow to

  • Preserve the full semantics (e.g. GEMM is able to accumulate into an array) while being easy to use -- all at minimum (often zero) runtime cost

  • Allow the use of various linear algebra algorithms via a common generic interface. (For example for matrix multiplication there are at least BLAS, BLIS, matrixmultiply, and faer. In addition inlined naive matrix multiplication is relevant for small fixed-size matrices.)

The basic idea is to be able to write something like

Blas.matmul(&a, &b).scale(3.0).new();
Blis.matmul(&b, &a).add_to(&mut c);

Here a, b, c are arrays that deref to mdarray slices. Blas and Blis are “contexts”, instances of types that implement a MatMul trait. These could be either zero-sized-types, or they could carry some configuration (e.g. parameters like degree of parallelization to be used).

(The above lines should compile to direct calls to GEMM.)

@akern40
Copy link
Collaborator Author
akern40 commented May 19, 2025

@grothesque I'm not opposed to the idea that mdarray should supersede ndarray, either by wrapping or by fiat; I agree that the core design is really, really well done. However, I can't unilaterally make that decision. I have three concerns with encouraging a move away from ndarray: first, ndarray itself struggles with maintainers, and fre-hu has previously said that mdarray is a hobby project 1. Second, moving to mdarray is already a non-trivial switch for library users. And third, there are still features (like broadcasting and parallelization) that, last I checked, mdarray does not have.

On the note of these builder patterns, can you help me understand the advantage of hard-coding the accelerator provider over opting into it via something like blas-src?

Footnotes

  1. Although maybe maintenance would be easier with mdarray's cleaner design?

@grothesque
Copy link

@grothesque I'm not opposed to the idea that mdarray should supersede nda 8000 rray, either by wrapping or by fiat; I agree that the core design is really, really well done. However, I can't unilaterally make that decision.

Myself, I’d simply like to see wider adoption of Rust for numerical computing, because I think that the language has the potential to become a better replacement for C++/Fortran in that domain. (I do have professional interest in this, because we are in the process of shifting our computational theoretical physics group from C++ & Python to Rust & Python.)

I don’t really care which library gains wider adoption in the long run, but I do have a weakness for elegance, so I hope it will be a nice one. So far my impression is that mdarray indeed gets many things right for a “rusty” numerical array library, while on the other hand ndarray suffers from some design decisions that seem unfortunate in retrospect (like the ones you propose to fix, but also IMHO ndarray sticks a bit too closely to NumPy) .

Did you check out mdarray’s “expressions” for example? While in ndarray it is possible to perform elementwise operations on two arrays, and operands are reused if one gives up ownership (as one would expect from an idiomatic library), the expression &a + &b results in a new allocation. In mdarray this results in an “expression” which is a lazy array. Such an expression can then be evaluated explicitly into a new allocation ((&a + &b).eval()) but it can be also written directly into an existing array (c.assign(&a + &b);)

I haven’t looked at the assembly output, but I would not be surprised if the Rust compiler was smart enough to optimize (&a + &b + &c).eval() as a single pass.

I have three concerns with encouraging a move away from ndarray: first, ndarray itself struggles with maintainers, and fre-hu has previously said that mdarray is a hobby project 1.

Indeed, but so does ndarray. Still, have a look at the discussions we had with @fre-hu in the mdarray issues. The library evolved quite a bit in recent months.

Myself, I haven’t yet contributed any code to mdarray, but I have been working on this BLAS binding prototype. I’m still a newcomer to Rust, but I have been spending a significant fraction of my work time on numeric Rust infrastructure, and I intend to continue doing so. Moreover, our team here will be reinforced with a full-time scientific programmer soon, who might also contribute.

Second, moving to mdarray is already a non-trivial switch for library users.

Adoption of a modernized ndarray would be non-trivial as well, and sticking with old-style ndarray has (opportunity) cost just as well.

Since overall manpower is limited, I’m just trying to see whether those who are currently investing time into this domain could perhaps argue on a common vision. I do not yet have clarity on what a good vision would be. One possibility is certainly modernizing ndarray (with mdarray serving as a prototype/inspiration). Another possibility is trying to move forward with several independent libraries, perhaps with some interoperability.

Here is another approach to consider:

  • Instead of modernizing ndarray via a series of incompatible changes, evolve it in a way that leaves the adaptation burden low for projects for which ndarray just works.

  • Try to add interoperability between ndarray and mdarray. For example, perhaps ndarray’s arrays could be made to dereference to mdarray’s slices? One could also consider adding array-converters that go both ways.

  • Work on the missing bits (and documentation) for mdarray, keeping in mind that the idea is that the core crate should remain minimal (which IMHO is an advantage compared to ndarray).

  • Work on an ecosystem of extensions, perhaps along the lines of what I started, perhaps with a different approach. In fact, multiple approaches are possible.

  • Recommend that projects that currently use ndarray, and that could profit from the improvements, gradually adapt mdarray.

As said, this is just one possibility and I do not know which one is best (I am all open to arguments for other approaches and against this one). But what I know is that if those people who have been active on ndarray/mdarray recently could agree on a common vision, this would be the biggest revival of multi-dimensional Rust libraries in a number of years. With a core library that has no overhead for small arrays, perhaps we could one day even envision unification with nalgebra.

And third, there are still features (like broadcasting and parallelization) that, last I checked, mdarray does not have.

I’m sure that great variants of both could be added to mdarray. With its “expressions” mdarray might in fact have a better technical basis for broadcasting than ndarray which uses views for that purpose. (I haven’t though deeply about this.)

@fre-hu has been very careful with adding features to the core library, but I do not think that these two things are necessarily a lot of work.

By the way, my GEMM-interface opens a way of controlling the degree of parallelization (and other parameters) at zero added runtime cost.

On the note of these builder patterns, can you help me understand the advantage of hard-coding the accelerator provider over opting into it via something like blas-src?

Have a look at the API (i.e. the expressions that begin with bd.) in actual use:
https://github.com/grothesque/mdarray-linalg-prototype/blob/86b61d00fe0ef5a9556ef04009d4849c1ed314cb/tests/context.rs
(This is just a prototype. The names, but really everything, are up to discussion.)

The primary goal was for this API to express everything that GEMM can do

  • conveniently,
  • safely,
  • and without runtime overhead.

In contrast ndarray’s general_mat_mul does not seem to allow use of uninitialized memory.

The motivation for the backend-object based interface goes beyond being able to select between one of multiple backends as with blas-src:

  • In the same code one might want to multiply both very small matrices and big ones, and different backends might be appropriate in both cases.

  • While the example backend Blas is a zero-sized type, the backend can carry configuration. For example the kind and level of parallelization, or perhaps numerical precision (for lapack-like functionality).

  • The trait MatMul goes beyond BLAS. It could be implemented for scalar types that are integers, finite fields, arbitrary precision floating point numbers, quaternions...

@grothesque
Copy link

On the theme of proliferation of multi-dimensional array packages for Rust, here is one that has seen rapid development (albeit by a single person, @ajz34) recently: https://github.com/RESTGroup/rstsr

@ajz34
Copy link
ajz34 commented May 29, 2025

Hi devs!

This issue is a very important and useful proposal. I generally agree with the ideas raised by @akern40 and @grothesque.

I’m not a programmer and have only been using Rust for about a year. Based on my experience developing the RSTSR project, here are some of my thoughts and understandings regarding the current topic:

On the theme of proliferation of multi-dimensional array packages for Rust, here is one that has seen rapid development (albeit by a single person, @ajz34) recently: https://github.com/RESTGroup/rstsr

The RSTSR project essentially aims to provide a NumPy-like experience in Rust while supporting the development of the REST electronic structure program. However, this also means that—by design (and not just because I’m not an experienced developer)—we inherit some of NumPy’s shortcomings, including the lack of support for lazy evaluation.

The motivation behind RSTSR was to port some Python-implemented algorithms to Rust. These algorithms heavily rely on operations like reshape, broadcast, and basic slicing, and efficiency bounded by matmul (e.g., RI-CCSD). I believe that creating a NumPy-like library makes it easier to migrate such code while avoiding Python’s performance drawbacks and threading limitations in certain cases.

The current documentation that best represents RSTSR’s functionality is the NumPy-RSTSR Cheatsheet.
Right now, RSTSR is primarily focused on supporting electronic structure calculations (though as a n-dimensional tensor library). If it ends up being useful to other users, I’d be very happy—but I can’t guarantee it will meet most audiences' needs in the short term.

5. Support the move towards a "trait-based" library

This likely depends on which part is trait-based. In RSTSR, I decided to implement a trait for Shape (DimBaseAPI), with the Stride type bound to DimBaseAPI::Stride. However, for Layout, I opted for a pure struct implementation, using traits only to assist with method implementations. RSTSR’s Layout follows the Strided Layout concept mentioned by @akern40.

4. Allow for capturing axis lengths at the type level, and therefore allow compile-time constant optimizations

My understanding of this is that it refers to cases like NxMx3x3, where the 3x3 part is a compile-time fixed shape—not that the entire NxMx3x3 is a 4D array where the dimension count (4) is known at compile time.

From what I recall, nalgebra and dfdx support similar feature, while candle and burn do not.

RSTSR has explicitly decided not to support this feature, for the following reasons:

  • I feel that introducing fixed shapes would significantly complicate the codebase. While these challenges might not be insurmountable, I don’t believe I (as a developer) can handle them well, nor would users find them easy to work with.
  • Compile-time shapes might benefit areas like computer vision, gaming, or fluid dynamics. However, in electronic structure calculations, fixed shapes are either rare, not a performance bottleneck, or can be manually unrolled where needed.
  • For some problems where fixed shapes do provide performance benefits (e.g., convolutional neural networks), manual optimization (e.g., tuning for CPU cache size) might outperform compiler auto-optimization. In such cases, we probably shouldn’t expect a general-purpose n-D array library to solve high-performance computing problems—specialized libraries (like oneDNN of Intel, ONNX, etc on CPU, cuDNN on CUDA) or hand-tuned code would be more appropriate.

This reasoning might be similar to why faer also avoids fixed shapes. In electronic structure calculations, tensors are often very large (multi-GB scale is common).

Thus, RSTSR has chosen an approach similar to ndarray’s current model: allowing compile-time dimensions but not compile-time shapes. However, users from other scientific computing domains might still find fixed-shape functionality valuable.

the core crate should remain minimal

Implementation of RSTSR Layout is actually implemented in rstsr-common, out of core crate. This crate does not involve tensors, but includes some light-weighted manipulation and iteration routines for layouts. Not sure whether this is the best-practice: separati 6553 ng too much crates can also increase developer's burden.

(some texts translated by AI from my native language)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
0