-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST Extend tests for scipy.sparse.*array
in test_glm.py
#27107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jjerphan what's a reasonable time increase for these tests? Also, am I missing any existing tests for sparse matrices here? |
Thanks for pointing this out. I would suggest not worrying much about this: we can add a setup of tests to only have As for your questions:
Code coverage does the absence of tests cases for sparse data with GLMs. Yet, adding more tests for GLM on sparse data would be appreciated. I do not have the bandwidth to look at that for now, but I might later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could introduce safe_sparse_hstack
/ safe_sparse_vstack
next to our safe_sparse_dot
.
I am not particularly found of the safe_
prefix but we could reuse it for the sake of consistency.
@@ -270,7 +273,10 @@ def test_glm_regression_hstacked_X(solver, fit_intercept, glm_dataset): | |||
|
|||
model = clone(model).set_params(**params) | |||
X = X[:, :-1] # remove intercept | |||
X = 0.5 * np.concatenate((X, X), axis=1) | |||
if sparse.issparse(X): | |||
X = np.multiply(sparse.hstack((X, X)), 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following looks simpler:
X = np. 8000 multiply(sparse.hstack((X, X)), 0.5) | |
X = 0.5 * sparse.hstack((X, X)) |
I am not sure if this is worth it or not. For most solvers, the numerical code is probably independent to the original data sparsity pattern. But there might be solvers (e.g. SAG / SAGA) who are actually specialized for sparse input datastructures (different solver branches). I have no strong opinion one way or another myself. Any opinion @lorentzenchr? |
I would definitely NOT extend |
Alright. Let's keep |
Reference Issues/PRs
Towards #27090
What does this implement/fix? Explain your changes.
Adds tests for csr_matrix and csr_array to test_glm.py
Any other comments?
This more than triples the run time of
test_glm.py
and is probably more extensive than is actually needed for this case.But I would appreciate guidance on the correct way to implement this. AFAICT there are no existing tests for sparse input to GLMs, even though they seem to work.