8000 Improve error message for sparse multilabel-indicator y in RandomForestClassifier by rushabh-v · Pull Request #15971 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Improve error message for sparse multilabel-indicator y in RandomForestClassifier #15971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 10, 2020

Conversation

rushabh-v
Copy link
Contributor
@rushabh-v rushabh-v commented Dec 25, 2019

Fixes #15958 .

@rushabh-v rushabh-v changed the title edit the error message Improve error message for sparse multilabel-indicator y in RandomForestClassifier Dec 25, 2019
rth
rth previously requested changes Dec 26, 2019
Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Can somebody please guide me to get rid of these failing codecov checks?

You need to add a unit tests that passes a sparse y and checks that the correct error message is raised with,

msg = "multilabel-indicator of type sparse is not supported"
with pytest.raises(ValueError, match=msg):
    est.fit(X, y)

@rushabh-v rushabh-v requested a review from rth December 27, 2019 01:28
@rushabh-v
Copy link
Contributor Author
rushabh-v commented Dec 27, 2019

I have changed the location of the condition because the call y = np.atleast_1d(y) in the fit function changes the instance of the sparse matrices. I think we should put the condition at the very beginning of the fit function. Are there any other suggestions @rth ?

@rushabh-v
Copy link
Contributor Author
rushabh-v commented Dec 28, 2019

Can we change value of the argement accept_sparse in the call y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)?

@jnothman
Copy link
Member

Sorry, you're right, this should be improved....

Can we change value of the argement accept_sparse in the call y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None)?

That seems like the right thing to do.... Why do we say that we accept sparse y there?

DecisionTreeClassifier checks y and does not accept sparse....

@rushabh-v rushabh-v requested a review from jnothman December 29, 2019 05:21
@thomasjpfan
Copy link
Member

DecisionTreeClassifier checks y and does not accept sparse....

Looks like DecisionTreeClassifier use to accept it and was changed in 409c888

The forest was not updated to reflect this change.

@rushabh-v
Copy link
Contributor Author
rushabh-v commented Jan 9, 2020

Looks like DecisionTreeClassifier use to accept it and was changed in 409c888
The forest was not updated to reflect this change.

Then is this PR going to be merged or it should be closed now?

@rushabh-v rushabh-v closed this Jan 9, 2020
@rushabh-v rushabh-v reopened this Jan 9, 2020
@thomasjpfan
Copy link
Member

This PR is nice to have since it specifies y as the problem.

For the regular tree code, we do not do this explicit check and use the default check_array error message which is ambiguous about which array is causing the error.

@rushabh-v
Copy link
Contributor Author

This PR is nice to have since it specifies y as the problem.

For the regular tree code, we do not do this explicit check and use the default check_array error message which is ambiguous about which array is causing the error.

Oh, okay.

@rushabh-v
Copy link
Contributor Author

Can you merge this, please? @NicolasHug

@jnothman jnothman merged commit 3f0b6c0 into scikit-learn:master Feb 10, 2020
@jnothman
Copy link
Member

Thanks @rushabh-v

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RandomForestClassifier does not handle sparse multilabel-indicator y
5 participants
0