-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX test_csr_polynomial_expansion_index_overflow on [scipy-dev] #30393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX test_csr_polynomial_expansion_index_overflow on [scipy-dev] #30393
Conversation
Note that I do not fully understand what's going on in EDIT: fixed the PR number. |
The PR number seems wrong |
I don't understand what's different from the previous version. Looking at the diff I feel like we're expecting the same thing as before. Can you expand a bit ? |
It's checking for greater or equal rather than just equal. With scipy/scipy#21824, |
This seems like a reasonable change in My general understanding is that With the following snippet: import numpy as np
import scipy
print(scipy.__version__)
from scipy import sparse
X_ind_32bit = sparse.csr_array([[1, 0, 0], [0, 1, 0], [0, 1, 0]])
X_ind_64bit = X_ind_32bit.copy()
X_ind_64bit.indptr = X_ind_64bit.indptr.astype(np.int64)
X_ind_64bit.indices = X_ind_64bit.indices.astype(np.int64)
print('hstack int32 int64:', sparse.hstack([X_ind_32bit, X_ind_64bit]).indptr.dtype)
print('hstack int64 int64:', sparse.hstack([X_ind_64bit, X_ind_64bit]).indptr.dtype) Recent scipy-dev version output:
scipy latest release 1.14.1 output:
I have to say I haven't managed to figure out how to adapt our test yet. The output indices of
We are kind of testing the fact that we don't really need int64 in this case since int32 indices would be enough, except that we end up |
On top of this (as if this wasn't complicated enough), the behaviour of
import numpy as np
import scipy
print(scipy.__version__)
from scipy import sparse
X_ind_32bit = sparse.csr_array([[1, 0, 0], [0, 1, 0], [0, 1, 0]])
X_matrix_ind_32bit = sparse.csr_matrix(X_ind_32bit.copy())
X_ind_64bit = X_ind_32bit.copy()
X_ind_64bit.indptr = X_ind_64bit.indptr.astype(np.int64)
X_ind_64bit.indices = X_ind_64bit.indices.astype(np.int64)
X_matrix_ind_64bit = sparse.csr_matrix(X_ind_64bit.copy())
print('array int32 + matrix int64:', sparse.hstack([X_ind_32bit, X_matrix_ind_64bit]).indptr.dtype)
print('array int64 + matrix int32:', sparse.hstack([X_ind_64bit, X_matrix_ind_32bit]).indptr.dtype)
print('matrix int64 + array int32:', sparse.hstack([X_matrix_ind_64bit, X_ind_32bit]).indptr.dtype)
print('matrix int32 + array int64:', sparse.hstack([X_matrix_ind_32bit, X_ind_64bit]).indptr.dtype) Output on scipy-dev:
There is the additional complication (maybe slight variation of same behaviour when using Python integers as indices) So then you look at the code and it seems like this is quite tricky to figure out what the output indices dtype will be: scikit-learn/sklearn/preprocessing/_polynomial.py Lines 448 to 466 in 8ded7f4
The test failure happen when |
Thanks for your investigation @lesteve. But in the end, do you think we can fix the test the way I did in this PR? Or shall we do something else? Or do you think the surprisingly asymmetric scipy behavior should be reported upstream? |
Looks like using int32 indices when we can when creating diff --git a/sklearn/preprocessing/tests/test_polynomial.py b/sklearn/preprocessing/tests/test_polynomial.py
index b97500d43e..29546dff30 100644
--- a/sklearn/preprocessing/tests/test_polynomial.py
+++ b/sklearn/preprocessing/tests/test_polynomial.py
@@ -1050,8 +1050,10 @@ def test_csr_polynomial_expansion_index_overflow(
`scipy.sparse.hstack`.
"""
data = [1.0]
- row = [0]
- col = [n_features - 1]
+ # Use int32 indices when we can
+ indices_dtype = np.int32 if n_features - 1 <= np.iinfo(np.int32).max else np.int64
+ row = np.array([0], dtype=indices_dtype)
+ col = np.array([n_features - 1], dtype=indices_dtype)
# First degree index
expected_indices = [ |
Let's do this, then. I will update the PR. |
I pushed some commits with the proposed change, let's see what the CI has to say about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is happy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 as well.
…it-learn#30393) Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
…it-learn#30393) Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Fixes #30315, #30377 and partially #30378.
We probably want to backport this to 1.6.X.