[MRG+1] Scaling a sparse matrix along axis 0 should accept a csc by default#5791
Conversation
|
I think this makes sense, and I agree to remove the superfluous copies as well. Could you time |
|
I don't think this would matter much in the unless done in a loop. But it is good to have for correctness. I do not give similar speeds between this branch and master. |
|
Well, I see only two places where a sparse matrix is used in test_data.py. So it definitely shouldn't matter in that. |
You have also removed a copy in the dense case. But, yeah, probably not worth a try as inputs in tests are tiny anyway. |
|
Do you have any more comments or can I update the PR description to +1? |
|
+1 from my side! |
|
LGTM. Thank you. Merging. |
[MRG+1] Scaling a sparse matrix along axis 0 should accept a csc by default
|
forgot to update |
There was a problem hiding this comment.
I did not write the initial code, but I think the idea is to convert it into a format where the column operations are sufficiently faster. I just maintained that convention because converting it to csr would be non-intuitive.
There was a problem hiding this comment.
Are you afraid that this will not be backward compatible?
There was a problem hiding this comment.
How much faster is converting to csc and multipying compared to not converting? ['csc', 'csr'] will not convert to CSR but will let CSR pass.
For scaling a csc_matrix along axis 0, one pass is made across
n_features.For scaling a csc_matrix along axis 0, two passes are made across
non_zero_valuesand a pass acrossn_features.So the csc_matrix should be accepted by default. This can be validated by this quick benchmark.