-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] ENH/FIX Support large sparse matrices #9678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] ENH/FIX Support large sparse matrices #9678
Conversation
…eck_array and new functon sparse_indices_check
thanks for this! ping @rth
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I have made some changes to the title and description to conform with our conventions, and to mark that this resolves some issues (which allows github to automatically close them when this PR gets merged).
I am somewhat concerned that you have bitten off a bit more than you can chew. #9545 was an easy issue as long as you constrained it to fixing the SVM bug. You've tried to fix the broader issue, which will require more work. We encourage new contributors to take on small issues first so that they learn the ropes before we have to hold their hands through a large issue. If you do not feel comfortable making the following changes, perhaps you should scale this back to the smaller scope of fixing #9545:
- You have some flake8 failures. Please always ensure your changes conform to PEP8.
- The changes to
check_array
andcheck_X_y
need tests insklearn/utils/tests/test_validation.py
, including assertions that appropriate errors are raised. - Please extend
check_estimator_sparse_data
insklearn/utils/estimator_checks.py
to check that either large sparse matrices are accepted or that the correct error message is raised.
sklearn/decomposition/nmf.py
Outdated
@@ -970,8 +970,7 @@ def non_negative_factorization(X, W=None, H=None, n_components=None, | |||
Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix | |||
factorization with the beta-divergence. Neural Computation, 23(9). | |||
""" | |||
|
|||
X = check_array(X, accept_sparse=('csr', 'csc'), dtype=float) | |||
X = check_array(X, accept_sparse=('csr', 'csc'), dtype=float,accept_large_sparse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pep8: space after comma; line length should be at most 79 characters. See Travis output for more pep8 errors
sklearn/decomposition/nmf.py
Outdated
@@ -1012,7 +1011,6 @@ def non_negative_factorization(X, W=None, H=None, n_components=None, | |||
|
|||
l1_reg_W, l1_reg_H, l2_reg_W, l2_reg_H = _compute_regularization( | |||
alpha, l1_ratio, regularization) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to avoid modifying unrelated code
sklearn/utils/validation.py
Outdated
@@ -499,14 +499,38 @@ def check_array(array, accept_sparse=False, dtype="numeric", order=None, | |||
msg = ("Data with input dtype %s was converted to %s%s." | |||
% (dtype_orig, array.dtype, context)) | |||
warnings.warn(msg, DataConversionWarning) | |||
|
|||
## Indices Datatype regulation | |||
sparse_indices_check(array,accept_large_sparse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved above alongside _ensure_sparse_format
sklearn/svm/base.py
Outdated
@@ -875,6 +875,9 @@ def _fit_liblinear(X, y, C, fit_intercept, intercept_scaling, class_weight, | |||
libsvm_sparse.set_verbosity_wrap(verbose) | |||
liblinear.set_verbosity_wrap(verbose) | |||
|
|||
##Liblinear doesnt support 64bit sparse matrix indices yet | |||
sparse_indices_check(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to use check_array
or check_X_y
in this case?
sklearn/utils/validation.py
Outdated
return array | ||
|
||
def sparse_indices_check(array,accept_large_sparse=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be _check_large_sparse
or something.
sklearn/utils/validation.py
Outdated
if hasattr(array, "indices"): | ||
if hasattr(array.indices, "dtype"): | ||
if array.indices.dtype not in supported_indices: | ||
raise ValueError("CSR arrays accepts only 32 bit integers indexing. You provided %s bit indexing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just "Sparse matrices with 64bit indices are not supported"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because even 8bit or other indices are also crashing, so that's why I added supported_indices array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scipy.sparse should refuse to create <32bit integer indices. It only supports 32 and 64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman I was able to create with int8 and it gave segmentation fault in truncatedsvd, so I think we should add this matrix only
sklearn/utils/validation.py
Outdated
|
||
""" | ||
if accept_large_sparse==False: | ||
if type(array) ==sp.csr.csr_matrix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use array.format
.
We also need to check for:
csc_matrix.indices
;coo_matrix.row
dia_matrix.offsets
And for completeness's sake:
bsr_matrix.indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do for bsr_matrix directly
sklearn/utils/validation.py
Outdated
""" | ||
if accept_large_sparse==False: | ||
if type(array) ==sp.csr.csr_matrix: | ||
supported_indices=["int32"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the indirection of an additional variable here.
sklearn/utils/validation.py
Outdated
if type(array) ==sp.csr.csr_matrix: | ||
supported_indices=["int32"] | ||
if hasattr(array, "indices"): | ||
if hasattr(array.indices, "dtype"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use and
where possible instead of nested if
s.
sklearn/linear_model/logistic.py
Outdated
@@ -1213,7 +1213,7 @@ def fit(self, X, y, sample_weight=None): | |||
_dtype = np.float64 | |||
|
|||
X, y = check_X_y(X, y, accept_sparse='csr', dtype=_dtype, | |||
order="C") | |||
order="C",accept_large_sparse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this is solver dependent : as far as I could tell newton-cg and lbfgs worked while sag failed..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is solver dependent, shouldn't this work this way only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well maybe reflect that SAG doesn't accept those with ?
accept_large_array=(self.solver not in ['sag', 'saga'])
[need to check which ones work..]
Thanks for this PR @kdhingra307 !
@jnothman I think this fixes #9545 and only starts to address the other two, so it might be too early to auto-close those.. |
@kdhingra307, are you still working on this? I'd really like to see this fixed in the next release. |
…into sparse_indices_check Merging changes between fork and master:
Hi @kdhingra307, given your silence, we'll try get someone else to complete this. Let us know if we are mistaken on that. The PR still suffers from a lot of PEP8 errors. |
@jnothman I had an annual exam today(CAT for my postgrad), thats why i was not online for last 2 month, I will be active from today only |
Great, thanks!
|
@kdhingra307, we really need the SVM at least fixed. Are you continuing to work on this? |
@kdhingra307 a polite ping. If you intend to continue working on this in near future it is fine, otherwise I could try to get this PR complete. |
sklearn/utils/validation.py
Outdated
Only Int32 are supported for now | ||
|
||
""" | ||
if accept_large_sparse is False and type(array) == sp.csr.csr_matrix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to be checking CSC matrices and COO matrices also, at least. Why not just use isspmatrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do this in next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman i checked for almost all the algorithms with CSC and COO matrices, it passed all the test cases.
I analyzed why these matrices are behaving like these and found that during preprocessing CSC and COO matrices are converted to CSR matrices using _ensure_sparse_format function
as _check_large_sparse is called after _ensure_sparse_format, therefore I don't feel the need to handle CSC and COO matrices cases.
I will do extensive check in order to find out whether any algorithm is prioritizing COO/CSC over CSR, if you know any plz share (it would ease the implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea to just handle other cases here, as _check_large_sparse
can and is being used as a standalone validation routine.
Please avoid is False
. Just use if not accept_large_sparse
sklearn/utils/validation.py
Outdated
raise ValueError("CSR arrays accepts only 32 bit integer indexing." | ||
" You provided % s bit indexing" | ||
% array.indices.dtype) | ||
if (hasattr(array, "indptr") and hasattr(array.indptr, "dtype") and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what case does this handle? It's not tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically if any of indices or indptr are 64bit,
sparse will crash (so to check OR condition)
sklearn/utils/validation.py
Outdated
supported_indices = ["int32"] | ||
if (hasattr(array, "indices") and hasattr(array.indices, "dtype") and | ||
array.indices.dtype not in supported_indices): | ||
raise ValueError("CSR arrays accepts only 32 bit integer indexing." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrays -> matrices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think what you mean is something more like "Only sparse matrices with 32-bit integer indices are accepted. Got %s indices."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrays->matrices?
I didn't understand
will change the exception though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct term is "CSR matrix" not "CSR array"
def check_X_y(X, y, accept_sparse=False, dtype="numeric", order=None, | ||
copy=False, force_all_finite=True, ensure_2d=True, | ||
allow_nd=False, multi_output=False, ensure_min_samples=1, | ||
ensure_min_features=1, y_numeric=False, | ||
warn_on_dtype=False, estimator=None): | ||
warn_on_dtype=False, estimator=None, accept_large_sparse=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs documentation under Parameters. I also wonder whether we should just be some alteration to accept_sparse
, e.g. having accept_sparse=['csr', 'csr64', ...]
means that a 32bit-indexed or a 64bit-indexed csr is accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this should be different variable altogether
cuz in later phases, when 64bits indices will be supported (less iteration would be required to fix)
Btw, I would certainly not call this issue a "good first issue". In hindsight, it may be Moderate more than Easy. But I was also initially suggesting that we only fix it piece by piece: make sure SVMs don't crash, then build up from there. Your current general solution requires, in turn, a general test. |
So yes, rather than removing the tests for that case, let's just throw an
approppriate error in check_array. Thanks.
|
@jnothman done that in _ensure_sparse_format, it raises an issue with large sparse indices over scipy versions <0.14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completed
do u think anything is missing or is there any bigger scope that yet has to be dealt with?
sklearn/preprocessing/data.py
Outdated
@@ -1263,6 +1263,7 @@ class PolynomialFeatures(BaseEstimator, TransformerMixin): | |||
See :ref:`examples/linear_model/plot_polynomial_interpolation.py | |||
<sphx_glr_auto_examples_linear_model_plot_polynomial_interpolation.py>` | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like you to remove these blank lines. It is not our convention. And extra changes means extra merge conflicts
sklearn/utils/estimator_checks.py
Outdated
raise | ||
except Exception: | ||
if "64" in matrix_format: | ||
raise AssertionError("Estimator % s doesn't seem to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove space between % and s?
sklearn/utils/estimator_checks.py
Outdated
% (name, matrix_format)) | ||
else: | ||
print("Estimator %s doesn't seem to fail gracefully on " | ||
"sparse data: error message state explicitly that " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message *should state
"Estimator doesn't support 64-bit indices") | ||
elif X.getformat() in ["csc", "csr"]: | ||
if X.indices.dtype == "int64" or X.indptr.dtype == "int64": | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codecov says this is never run. Why not?
@@ -427,6 +429,48 @@ def test_check_array_accept_sparse_no_exception(): | |||
check_array(X_csr, accept_sparse=('csr',)) | |||
|
|||
|
|||
def test_check_array_accept_large_sparse_no_exception(): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm blank line
sklearn/utils/validation.py
Outdated
indices_datatype = getattr(X, key).dtype | ||
if (indices_datatype not in supported_indices): | ||
if not_supported_scipy_version: | ||
raise ValueError("Scipy Version %s does not support large" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase "version"
Perhaps scipy -> scipy.sparse
sklearn/utils/validation.py
Outdated
@@ -547,14 +555,44 @@ def check_array(array, accept_sparse=False, dtype="numeric", order=None, | |||
msg = ("Data with input dtype %s was converted to %s%s." | |||
% (dtype_orig, array.dtype, context)) | |||
warnings.warn(msg, DataConversionWarning) | |||
|
|||
# Indices Datatype regulation | |||
_check_large_sparse(array, accept_large_sparse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we are not just doing this in _ensure_sparse_format
called above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no reason,
I shifted there but below some of the functions, because those function may change the indices type of spmatrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? Where do you mean? If it's possible to move it to _ensure_sparse_format
I'd much rather it
sklearn/utils/validation.py
Outdated
Only Int32 are supported for now | ||
""" | ||
|
||
if (not accept_large_sparse and sp.issparse(X)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm unnecessary parentheses
sklearn/utils/validation.py
Outdated
def _check_large_sparse(X, accept_large_sparse=False, | ||
not_supported_scipy_version=False): | ||
"""Indices Regulation for CSR Matrices. | ||
Only Int32 are supported for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true
sklearn/utils/validation.py
Outdated
@@ -297,6 +300,10 @@ def _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, | |||
if isinstance(accept_sparse, six.string_types): | |||
accept_sparse = [accept_sparse] | |||
|
|||
if LooseVersion(scipy_version) < '0.14.0': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should do this check once when the module is loaded and store a LARGE_SPARSE_SUPPORTED
global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you understand what I mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite the requested changes this is very good
sklearn/utils/validation.py
Outdated
@@ -297,6 +300,10 @@ def _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, | |||
if isinstance(accept_sparse, six.string_types): | |||
accept_sparse = [accept_sparse] | |||
|
|||
if LooseVersion(scipy_version) < '0.14.0': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you understand what I mean here?
@@ -345,7 +352,8 @@ def _ensure_no_complex_data(array): | |||
def check_array(array, accept_sparse=False, dtype="numeric", order=None, | |||
copy=False, force_all_finite=True, ensure_2d=True, | |||
allow_nd=False, ensure_min_samples=1, ensure_min_features=1, | |||
warn_on_dtype=False, estimator=None): | |||
warn_on_dtype=False, estimator=None, | |||
accept_large_sparse=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this should default to false, as that may break current behaviour in external uses of check_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, this changes a lot of lines in the PR. Sorryi did not realise it sooner
sklearn/utils/validation.py
Outdated
@@ -547,14 +555,44 @@ def check_array(array, accept_sparse=False, dtype="numeric", order=None, | |||
msg = ("Data with input dtype %s was converted to %s%s." | |||
% (dtype_orig, array.dtype, context)) | |||
warnings.warn(msg, DataConversionWarning) | |||
|
|||
# Indices Datatype regulation | |||
_check_large_sparse(array, accept_large_sparse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? Where do you mean? If it's possible to move it to _ensure_sparse_format
I'd much rather it
After those are fixed, please change the title to MRG Please add an entry to the change log at |
@jnothman i did all of the changes but just wanted to push them in different segments I understood your LARGE_SPARSE_SUPPORTED global variable, but do not know where and when to load it(if u can give some insight) |
and i will push all the changes after this global initialization of LARGE_SPARSE_SUPPORTED |
You can do it right at the top of validation.py |
I'm not sure what you mean. Give an example?
…On 14 February 2018 at 06:10, Karan Dhingra ***@***.***> wrote:
@jnothman <https://github.com/jnothman> I have reduced the complexity of
the whole patch, I was allowing the case when someone purposefully enforced
the indices of 64bits but entries are way less.
But there is no purpose as no one will do that practically and we should
also block that, i guess
what do u think?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9678 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65muzLR8-83yIHxAj5HZh99AIOC_ks5tUd4ggaJpZM4PLDF->
.
|
@jnothmanI have rewritten most of the stuffs to ensure any possible micro-optimization plus lesser usage per loop also now its far easier to reverse accept_large_sparse from False to True (basically you want default value set to be True, I just wanna know any particular or specific reason??) and I will check the commit thoroughly once for any unnecessary changes (so that you dont have to) |
I'm also +1 on having |
@rth no issues, will migrate that change in next commit |
I just mean that existing code might be relying on check_array letting
large sparse matrices through would be broken if we rejected them by
default. You've fixed up the cases in scikit-learn (although perhaps not in
meta-estimators) but not in custom uses
|
This pull request introduces 37 alerts and fixes 18 - view on lgtm.com new alerts:
fixed alerts:
Comment posted by lgtm.com |
(don't worry about the lgtm.com alerts. it's a glitch on their side, and
they're investigating)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the recent changes look good. Just to change the default...
sklearn/utils/validation.py
Outdated
@@ -33,6 +33,12 @@ | |||
# performance profiling. | |||
warnings.simplefilter('ignore', NonBLASDotWarning) | |||
|
|||
# checking whether large sparse are supported by scipy or not | |||
if LooseVersion(scipy_version) >= '0.14.0': | |||
LARGE_SPARSE_SUPPORTED = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can directly do = LooseV... >= ...0'
without an if statement
sklearn/utils/validation.py
Outdated
Only Int32 are supported for now | ||
def _check_large_sparse(X, accept_large_sparse=False): | ||
"""Indices Regulation of Sparse Matrices | ||
for estimators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop "for estimators". It's also for other things
@@ -30,6 +33,9 @@ | |||
# performance profiling. | |||
warnings.simplefilter('ignore', NonBLASDotWarning) | |||
|
|||
# checking whether large sparse are supported by scipy or not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment let's say "large sparse" -> "sparse arrays with 64 bit indices"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does look nice.
A few more comments below. Please ping me when the default is changed.
|
||
Parameters | ||
---------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove whitespace (also below Returns)
@@ -403,6 +404,40 @@ def pairwise_estimator_convert_X(X, estimator, kernel=linear_kernel): | |||
return X | |||
|
|||
|
|||
def _generate_sparse_matrix(X_csr): | |||
"""Generate matrices in multiple formats for CSR,CSC and COO matrices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a private function, but this could be reformulated to,
Generate 32 and 64 bit indexed sparse matrices in CSR, CSC and COO format
""" | ||
|
||
for sparse_format in ['dok', 'lil', 'dia', 'bsr', 'csr', 'csc', 'coo']: | ||
yield sparse_format, X_csr.asformat(sparse_format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a X_csr.copy().asformat
here (and on L430) ?
Just to be sure that we are not taking in e.g. a CSR matrix, returning it as it is when sparse_format="csr"
that get's then in place modified by some estimator and so the next iteration of sparse_format="csc"
would use this modified version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a copy=True kwarg iirc
We'd really like to have this merged in the next release. @kdhingra307, are you with us? Perhaps a core dev should adopt this, otherwise... |
Fixes #9545
Partially addresses #4149
Partially addresses #2969
accept_large_sparse option is added in linear_model/base.py, linear_model/logistic.py and decomposition/truncated_svd,nml
while sparse_indices_check directly block all calls to liblinear
Reference Issue
What does this implement/fix? Explain your changes.
Any other comments?