-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Increase maximum number of array dimensions? #5744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This has been discussed before, but is not completely trivial. I'd point you to the mailing list archives, but my browser currently crashes on http://dir.gmane.org/gmane.comp.python.numeric.general. |
Mostly the reason we have a finite limit is laziness -- there are a number Given the current situation where we do have a single static limit, I think
|
See the following references: - https://stackoverflow.com/questions/22747068/is-there-a-max-number-of-arguments-javascript-functions-can-accept - https://bugs.webkit.org/show_bug.cgi?id=80797 - numpy/numpy#5744 Note that the maximum number of function arguments can vary from engine to engine. Here, we choose something of a lowest common denominator which may not be valid everywhere.
Hello again (5 years later)! We're running into this problem again. It seems that in the meantime it's been noticed by others. @njsmith, I could try to implement a dynamic solution as you described, but I don't have a lot of experience with C and I have no experience with NumPy internals. Looks like @jcmgray, @mrocklin, @jrbourbeau, would this be useful for Dask beyond dask/dask#5595? |
I think the biggest/toughest chunk for relaxing either I do not know if the heap allocations would mean overhead, but I assume that can be made so low, that it just doesn't matter. Or should we consider again to simply bump up |
I suspect that the original 32 came from |
Yes, the maximum number of dimensions impacts dask arrays (at least those backed by numpy arrays) outside of dask.array.empty example:In [1]: import dask.array as da
In [2]: x = da.empty((1,) * 33)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~/github/dask/dask/dask/array/utils.py in meta_from_array(x, ndim, dtype)
90 if ndim > x.ndim:
---> 91 meta = meta[(Ellipsis,) + tuple(None for _ in range(ndim - meta.ndim))]
92 meta = meta[tuple(slice(0, 0, None) for _ in range(meta.ndim))]
IndexError: invalid index to scalar variable.
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-2-021c43ce84db> in <module>
----> 1 x = da.empty((1,) * 33)
~/github/dask/dask/dask/array/wrap.py in wrap_func_shape_as_first_arg(func, *args, **kwargs)
70
71 dsk = dict(zip(keys, vals))
---> 72 return Array(dsk, name, chunks, dtype=dtype)
73
74
~/github/dask/dask/dask/array/core.py in __new__(cls, dask, name, chunks, dtype, meta, shape)
1053 raise ValueError(CHUNKS_NONE_ERROR_MESSAGE)
1054
-> 1055 self._meta = meta_from_array(meta, ndim=self.ndim, dtype=dtype)
1056
1057 for plugin in config.get("array_plugins", ()):
~/github/dask/dask/dask/array/utils.py in meta_from_array(x, ndim, dtype)
96 meta = meta.reshape((0,) * ndim)
97 except Exception:
---> 98 meta = np.empty((0,) * ndim, dtype=dtype or x.dtype)
99
100 if np.isscalar(meta):
ValueError: maximum supported dimension for an ndarray is 32, found 33 That said, it doesn't seems to be a limitation lots of users are running into (dask/dask#5595 is the only reported issue I'm aware of) Just out of curiosity, @wmayner how many dimensions are you typically dealing with? Would the previous suggestion from @charris help? |
@jrbourbeau, we likely wouldn't need more than 64 dimensions for now, so if increasing to 64 is an option, @charris, that would be great. In case it's helpful to motivate the change, I'll describe our use case: We're representing the probabilities of state transitions over a single timestep in a discrete dynamical system. If the system has Our approach is to use separate dimensions to index the state of each element at Increasing the limit to 64 would allow us to represent systems of up to 32 elements instead of 16, which would be a qualitatively significant improvement for our research questions. @njsmith's point about hesitating to increase the limit for long-tail cases is well taken, though. But if the overhead is very low, then increasing to 64 would be a simple stopgap solution until we can make the allocations dynamically. |
@wmayner That sounds like something that could be more flexibly handled by some custom code. |
I would like to know what @jcmgray reason was for opening the dask issue on the limited number of dimensions numpy/dask support and whether that is related to memory size (something like 2**48), etc. |
Hi, yes I have run up against this limit quite a few times. As some technical context, basically when simulating quantum systems its natural to represent each system as a dimension (rather than manually 'vectorizing' them into a joint dimension). This is particularly true for tensor networks (basically graphs with an array at each node, with In a very common case each dimension is of size 2, and I guess this is a key point: In principle The problem is particularly compounded for I should say, this isn't a blocker for me right now, and these borderline HPC cases are understandably I guess not |
If I understand correctly, the shape of a ndarray as described would contain only zeros and ones, otherwise memory use would explode. This would be better served by a different data structure, and is not a sufficient reason to fundamentally change ndarray. Any proposal to make the number of dimensions flexible would need to be carefully benchmarked and should come with a cache of vectors that could be reused rather than repeatedly calling malloc/free. |
What would be the cost of just bumping MAX_DIMS up to 64? |
If this was in reference to my usecase - the arrays are generally filled with complex numbers, and yes, the memory does explode! To be clear, my view is just that a moderate increase in MAX_DIM, to e.g. 64, would be potentially useful. |
I've been running trade studies on how different models interact with each other. Each dimension in my results array is linked to a particular input parameter (whether that is a single value or a range of values). Since I am using multiprocessing pools to run the simulations in parallel, my problem with the 32 dimension limit arises when I return the data from the pool and reshape it to match the inputs (+ a couple dimensions for outputs). So far my workaround has been to write different versions of the same functions to evaluate the effects of different combinations of inputs independently.I would also appreciate an update to this, even if just to 64 dimensions to start with. Thanks! |
To be clear, the "64 to start with" is exactly the thing that makes this harder to just decide on for me at least... 64 is the absolute plausible limit at this time, I would personally prefer much less, e.g. 40 or 48 since both should be plenty in a |
Have you considered using xarray, which lets you name your dimensions, meaning you do not need to insert padding dimensions of length 1? |
As noted here, this does also currently limit the total number of indices involved in an import numpy as np
import string
d = 11
# total dims involved = 3 * d
# but max number of array dims = 2 * d
x = np.random.uniform(size=[2] * 2 * d)
y = np.random.uniform(size=[2] * 2 * d)
chars = string.ascii_lowercase + string.ascii_uppercase
eq = f"{chars[0 * d:2 * d]},{chars[1 * d:3 * d]}"
print(eq)
# abcdefghijklmnopqrstuv,lmnopqrstuvwxyzABCDEFG
# ........... ...........
# contracted
z = np.einsum(eq, x, y)
# ValueError: too many subscripts in einsum (n.b. this is just the simplest illustrative equation - I know one could use The following numpy/numpy/core/src/multiarray/einsum.c.src Lines 947 to 964 in 4cba2d9
(Possibly for this particular problem there are also other fixes other than increasing I do fairly regularly come across this limitation in real contractions! Currently I need to switch to e.g. |
As many in this post, I frequently bump into this issue. In the end, I forked and just increased the maximum number of dimensions to 64. I just had to update some tests but it works flawlessly.
@jcmgray Your code gives me no problem now and I didn't need to change I want to do a PR but first wanted to ask if there is still any blocking issue. |
Is there any update regarding this? Willingness to accept @mofeing or someone else's patches? Xarray is far from being a real alternative. Even though I can use a DataArray's named dimensions to do without singleton dimensions, the underlying data still is an ndarray, so switching to xarray simply pushes the limit from 32 dimensions in total to 32 nontrivial dimensions. |
This might be something to consider for a NumPy 2.0 release |
OK, there be problems and someone needs to step up to attack them. We can bump up
@mattip doesn't like it, but this makes me wonder if it isn't a good first step to just delete (or rename) The purpose would be only to remove the chicken-egg problem by removing the promise that there is a Unless we don't care about downstream segfaulting for those, but even then :) |
@seberg from your comment it seems that there is no technical hard blockers inside numpy itself and the fear is of potentially breaking downstream libraries. If pumping up the number of dimensions is too risky then maybe an alternative would be to create a different version of numpy that supports the 64 dimensions, kinda like how we can do At code level the #ifndef NPY_MAXDIMS
#define NPY_MAXDIMS 32
#endif
#ifndef NPY_MAXARGS
#define NPY_MAXARGS 32
#endif This keeps the old behaviour of the code with NPY_MAXARGS defined and equal to 32 for downstream libraries. while for the 64 version we would just need to add an extra If this is okay and nothing is expected to break then I can send a PR with this solution. Note: this is blocking Cirq from supporting simulations for circuits with more than 32 qubits quantumlib/Cirq#6031 |
That is the technical hard blocker. A special install doesn't work, it is the opposite problem from cupy: you would need a I am OK with just saying some of scipy can only do 32 dims, going ahead fast would also means some of NumPy may not (unless someone can promise to work on it). |
I think this is okay for our use case (cirq simulation). Is there a list of things within numpy that will break with 64 dimensions? |
The public iterator API/ABI for the "old style" iterator https://github.com/numpy/numpy/blob/main/numpy/core/include/numpy/ndarraytypes.h#L1139 and the macros blow mostly (there is a multiiter, and a few other iterators although those matter less). |
After sleeping on this, maybe it is OK to error out if using the old style iterator (i.e. |
I don't think SciPy has a lot, maybe even just in |
I get quite a few hits when doing |
Are the "old style" iterator planned to be supported or dropped in future versions? If it will be dropped then it doesn't make sense to make it support 64 dimensions. I think a first step would be to provide it through special install as I suggest earlier #5744 (comment) this will allow the people who need this (e.g. quantum computing community) to use and test the library and see if anything breaks. Eventually the change can be made permanent in numpy v2 as @mattip suggested in #5744 (comment). This way we unblock Cirq and similar libraries allowing researchers to simulate quantum systems with more than 32 qubits, while giving time to other libraries to update their code to support the 64 dims if they need to. What do you think? |
As @seberg stated previously, that is only the tip of the iceberg. You now need to rebuild special versions of all the other libraries that use |
My suggestion is the following:
Even if it is possible to ship wheels in such a way (and I doubt it is), you would double the amount of wheels that a lot of downstream needs to build. I am pretty sure it's easier to do the full thing I said above (including that |
I built the library with 64 instead of 32 and it works fine >> np.ones((1,)*64, dtype=np.complex128)
array([[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[1.+0.j]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]) Running the tests, most of them pass. Most of the few that fail seem to be asserting the number of dimensions <= 32 so they should be fine to modify. Failed Tests
I think that having some downstream libraries support only 32 dimensions not the full 64 dimensions is fine as a starting point. this in addition to replacing |
It would be nice to at least start fixing NumPy internals which will also remain limited to 32 dimensions before we bump it though. |
Cool. Can the large arrays be used in the rest of your daily work or do you need to also rebuild everything else? |
… wrapping it This is to avoid the numpy limit on the number of dimensions quantumlib/Cirq#6031 The 1D representation should only be used when the number of qubits is greater than the numpy limit on the number of dimensions (currently set to 32) numpy/numpy#5744. ```py3 _, state_vector, _ = s.simulate_into_1d_array(c) ``` fixes quantumlib/Cirq#6031
Closing, as it's bumped on main and hopefully nobody runs into huge issues. However, I did not bother to try to fix all paths, because it's annoying, since the iterators rely on a certain ABI and are used both inside NumPy and publically. So if you care about this, check the PR and try to stuff the wholes that fail... |
At my lab, we're working with arrays with very many dimensions. We've run up against the hardcoded limit of 32 dimensions for
np.array
s. What is the rationale for this limit, and is it possible to increase it? Thanks!The text was updated successfully, but these errors were encountered: