-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
WIP simplify naive Bayes parametrization #1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP simplify naive Bayes parametrization #1525
Conversation
Making the common tests pass is a bit more tricky. I think this is a step forward in the inferface, though. |
Actually, I'm not so sure things have gotten consistent now; I think I'd favor the former for simplicity; Naive Bayes is a typical beginner's tool, so it should be simple. The latter would provide better API compatibility with other estimators, but NB's interpretation of |
I see your point about I don't know what you mean by "prior taking That t 8000 hese are mutually exclusive is one of the reasons I think having a single parameter is more natural. One could also imagine |
Yes, but we could decouple them so that unrepresented classes get a weight of 1 and other classes can be up-/downweighted arbitrarily. We'd just have to fill in the ones and normalize, which seems to be what you're implying as well. |
Ok but then the meaning of the parameter would change a lot from the previous |
Ok so you would keep the |
No, because I agree the API is too clumsy (and the code is more complicated than an NB estimator needs to be). Let me formulate a proposal:
This way, custom priors can still be given explicitly, but the parameter becomes much more flexible. Does that sound like something we almost could put in a docstring, aside from the [rationale]? |
It sounds a lot like my current docstring, doesn't it ;) It sounds like a good solution to me. I must admit I haven't really thought about the algorithm just shuffled the bits around. Sounds actually kinda simple the way you put it. Do you want to give it a shot or should I try? |
Ok it sounds more like I should have written the docstring, actually ;) but it was what I meant, I only put less elegantly. |
We can actually just reuse the function from |
Looks good! |
Ok, I'll add some more tests for dictionaries and also for using lists in SVM and SGD. |
Ok so this doesn't seem to make much sense, maybe: So I just unified two functions that did the opposite... I could still use the same function but take the inverse in the |
Hm reusing the code is not so trivial given the different normalizations. Any ideas how to handle this better? Just leave the current parametrization? |
closing this as it sucks. |
After moving
class_prior
toclass_weight
as in__init__
parameter in #1499, the parametrization was a bit weird. I now got rid offit_prior
and letclass_weight
be'auto'
,None
(=uniform) or an array. This is a bit more consistent with the other estimators.I'll try to make it pass the common tests now and I guess I'll also allow dictionaries.