-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Helpers for managing different set of classes #8100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll take this up. |
I think it will be a challenge, @dalmia, but go ahead. |
Thanks for the heads up @jnothman. :) |
@dalmia are you still working on this? |
@amueller No Sir, please retain the 'Need Contributor' label. |
Ok so in #6231 we talked about adding classes as a construction parameter, which I think might be the best solution. If we do that, what's the shape of |
Should we sort the It seems a bit strange to me that |
As a first go, let's require that they are sorted, rather than sorting them.
There are other things that make no sense to sort over. There are some
parameters that make no sense to warm_start over. Shrug.
…On 8 July 2017 at 05:55, Andreas Mueller ***@***.***> wrote:
Should we sort the classes that the user provides? That might be
surprising to the user, but not doing it might lead to subtle bugs as we
rely on that in some places, I think?
It seems a bit strange to me that get_params and set_params return / set
classes as searching over it is nonsensical, but I guess it's not the
worst....
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8100 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67FEOB0wA9k0g6cvy3ayRIfh6iy-ks5sLo0egaJpZM4LTjY9>
.
|
Also see #8004 #1304 #1183 #7889 (comment) |
Should we have attributes reflect the additional classes in all cases? I think so, because otherwise it will be very confusing. @aarshayj points out that that requires changing |
yea i think we'll have to do this. for instance, if the user sets classes=[1,2,3] while y has only [1,2], then an SVM will return just 1 set of coefficients and 1d decision function because the library we use makes use of data only in y. but in our API, we need to supplement both coef and decision function to make them 3d. the returned values are for the positive class (2). so we can fill in 0s for class (3) and fill in class (1) as negative of the values for class (2). |
Yes, `classes_` should be identical to `classes` parameter when provided,
and coef et al should correspond.
…On 12 August 2017 at 06:45, Aarshay Jain ***@***.***> wrote:
yea i think we'll have to do this. for instance, if the user sets
classes=[1,2,3] while y has only [1,2], then an SVM will return just 1 set
of coefficients and 1d decision function because the library we use makes
use of data only in y. but in our API, we need to supplement both coef and
decision function to make them 3d.
the returned values are for the positive class (2). so we can fill in 0s
for class (3) and fill in class (1) as negative of the values for class (2).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8100 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62tBq2WOg2bulX6J2tCX3_zQYTI8ks5sXL13gaJpZM4LTjY9>
.
|
So I'm not entirely certain this is the right solution any more, because of issues we face in the implementation. There are two or three main points:
If we used a wrapper/meta-estimator approach, these issues would not exist as we could wrap third-party estimators and not expose any internal data structures. I'm not sure if a wrapper would be the right solution to ensure consistent scoring, but it's probably possible. |
Hm for the third-party estimators, I guess we don't have to require |
In short, yes. Eventually. We need to maintain the We don't end up where we started in that we have a way for metaestimators to pass classes to supporting classifiers (and scorers), and that we remove the problem in A wrapper approach for meta-estimators interpreting the output of I do see your problems about making the models represent all the classes in a consistent way. I'm not yet sure what the solution is. |
I think that the idea of trying to have "classes_" everywhere, and supporting edge cases only if this attribute is present. It seems very explicit and powerful.
Wrapper approaches make the code harder to understand. Scikit-learn is useful for many people, and fosters a great community, because it's codebase is (reasonnably) easy to understand. Wrapper approaches would get us closer to the code design of Mayavi, which was beautiful and robust, but led to the project having not enough contributors.
|
It seems like half the bugs we've solved in the past couple of months surround problems of having different sets of classes, in the context of:
predict_proba
,decision_function
or metrics, or in normalising macro-averaged scores)partial_fit
whereclasses
are specified upfront, but then repeated calls need matching to those classeswarm_start
whereclasses_
from the first fit must be identical to the set of classes iny
in each call tofit()
These are all subtly different problems, but at the moment it seems like we're handling them on an ad-hoc (and too often a post-hoc) basis.
It would be amazing if someone could review these issues and identify where either API changes (
classes
as a constructor parameter to classifiers has been suggested) or helper utilities might help avoid these issues in the future.The text was updated successfully, but these errors were encountered: