-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: sparse: Add csr 1d sparse arrays and test_arithmetic1d #19833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b8b4f0d
to
1ab08c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve pending sorting out multiplication semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes made as per review:
- construct matmul to follow array api -- in particular to produce 1d or 0d arrays as appropriate. 1d arrays are sparse. 0d arrays are ndarrays.
- adjust tests for affected code
- adjusted and updated comments
- made code more readable, and logic more clear
77c8586
to
e817933
Compare
Which should sparse arrays return? While adding tests, I found a broken behavior for this PRs 1d sparse arrays when they appear on the right of This is a problem for the 2d case (spmatrix or sparray) also. So I'm not going to fix that here. To fix this we need a CSR @ CSC sparsetools function (we currently have CSR @ CSR). I have found an algorithm for that and will make a separate PR. We should be able to multiply sparse 1d arrays with length N while avoiding a csr column that requires an (N+1)-vector for |
Array scalars kinda duck-type 0-D arrays. The discussion at numpy/numpy#25542 (comment) clarifies that returning array scalars therefore does not actually break array API compatibility. NumPy may or may not remove scalars in the future; it'd be a big operation and if this happens it'll be another major release, so at least several years away. I'd say 0-D arrays are preferred for new code. For existing code, we should pay attention to API consistency as well - if there's 10 functions/methods and 9 already return scalars, then it's better for the 10th one to do the same. |
4f67011
to
92f3364
Compare
Merged! 🎉 |
Adding sparse array support within the csr code allows us to use the 2d sparsetools routines for csr with 1d sparse arrays. While 1d arrays do not have a compressed index, we can construct
indptr = np.array([0 nnz])
and use the sparsetools functionality as if the 1d array were a 2d array. Reshaping the result where needed works well. The other importance of csr format is that many features in other formats are funneled to csr because that handles most everything. So we should be able to start enabling some of the features for sparse arrays that are waiting for 1d support after this PR. I envision additional formats as a special feature rather than an essential part of the 1d support process. We are getting close to full functionality.This PR also introduces tests of 1d arithmetic taken from the test_base.py arithmetic tests for 2d, converted to 1d and modernized to use fixtures and pytest parametrize rather than subclassing test classes for the different formats. This PR includes the tests of arithmetic without including the tests of various data dtypes, as those are presumably tested well in 2d and the code is essentially the same for 1d and for 1xN 2d.
Indexing beyond integers (and tests of such) are not implemented for csr yet (or dok in #19715).
It is not clear whether it makes sense to use csc-1d. A 1d array is essentially a row. Using csc with this formulation means that indptr has N+1 elements. So, unless the indptr dtype is quit small, the array takes more RAM than a dense array. We could instead treat the csc-1d as a single column. But this breaks the usual symmetries between CSR and CSC. Converting between csr and csc would simply be a copy of the data and change of class. Another alternate implementation would be to make a 1d format that combines features of coo, csc, csr and leaves out support for 2d.
But I think this is likely to be the best way forward. So see what you think...
This PR is built on #19853 (base-1d). So it is independent of the dok-1d branch and the 3 PRs should be easier to review. Looking at it now will show files _base.py and test_common1d.py which are actually included in the base-1d PR.