-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] apply sparse_threshold even if all columns are sparse #12304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] apply sparse_threshold even if all columns are sparse #12304
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the docstring and what's new
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
When the transformed output consists of all sparse or all dense data, | ||
the stacked result will be sparse or dense, respectively, and this | ||
keyword will be ignored. | ||
If the transformed output contains sparse matrices, it will be stacked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "If the output of the different transformers contains sparse matrices, ..." be clearer? Because now if I start reading this, I would interpret "transformed output" as the output of the ColumnTransformer (so already stacked), and then the rest of the sentence reads a bit strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I wasn't happy with that wording either. I like yours.
as a sparse matrix if the density is lower than this value. Use | ||
``sparse_threshold=0`` to always return dense. When the transformed | ||
output consists of all dense data, the stacked result will be dense, | ||
and this keyword will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also lift this special case? (the case of all dense data)
In principle for the default of 0.3, this will always be the case anyway (you would need to set the threshold to something bigger than 1 to always have sparse)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying we should consider a way to force dense output into sparse? I think we can leave that feature till later, as currently threshold > 1 is not valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this case would be much rarer? When would you want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I would ever want that, I was just wondering for simplicity's sake (to not have special cases). But maybe it is so logical to keep all dense as dense, that actually strictly following the sparse_threshold would actually be less "normal".
So you can ignore my comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It would not have hurt to add some validation about sparse_threshold
being in [0, 1] and make it explicit in the docstring as currently it's not very clear (but that could be done in a different PR).
Waiting a few days before merging in case Joel has other comments on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though yes, validation would be nice
…se (scikit-learn#12304)" This reverts commit 1dc7cc0.
…se (scikit-learn#12304)" This reverts commit 1dc7cc0.
Fixes #12150