8000 [MRG] apply sparse_threshold even if all columns are sparse by amueller · Pull Request #12304 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] apply sparse_threshold even if all columns are sparse #12304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 14, 2018

Conversation

amueller
Copy link
Member
@amueller amueller commented Oct 5, 2018

Fixes #12150

@amueller amueller added this to the 0.20.1 milestone Oct 5, 2018
Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the docstring and what's new

@amueller
Copy link
Member Author

done

Copy link
Member
@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

When the transformed output consists of all sparse or all dense data,
the stacked result will be sparse or dense, respectively, and this
keyword will be ignored.
If the transformed output contains sparse matrices, it will be stacked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "If the output of the different transformers contains sparse matrices, ..." be clearer? Because now if I start reading this, I would interpret "transformed output" as the output of the ColumnTransformer (so already stacked), and then the rest of the sentence reads a bit strange.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I wasn't happy with that wording either. I like yours.

as a sparse matrix if the density is lower than this value. Use
``sparse_threshold=0`` to always return dense. When the transformed
output consists of all dense data, the stacked result will be dense,
and this keyword will be ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also lift this special case? (the case of all dense data)
In principle for the default of 0.3, this will always be the case anyway (you would need to set the threshold to something bigger than 1 to always have sparse)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying we should consider a way to force dense output into sparse? I think we can leave that feature till later, as currently threshold > 1 is not valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this case would be much rarer? When would you want that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I would ever want that, I was just wondering for simplicity's sake (to not have special cases). But maybe it is so logical to keep all dense as dense, that actually strictly following the sparse_threshold would actually be less "normal".
So you can ignore my comment :)

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would not have hurt to add some validation about sparse_threshold being in [0, 1] and make it explicit in the docstring as currently it's not very clear (but that could be done in a different PR).

Waiting a few days before merging in case Joel has other comments on this.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though yes, validation would be nice

@jnothman jnothman merged commit 0ad8736 into scikit-learn:master Oct 14, 2018
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Oct 15, 2018
anuragkapale pushed a commit to anuragkapale/scikit-learn that referenced this pull request Oct 23, 2018
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0