-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Add partial_fit to GaussianNB #3324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Uses Chan/Golub/LeVeque update for model parameters. Does not implement sample weighting.
new_theta, new_sigma = self.update_mean_variance( | ||
self.class_count_[i], | ||
self.theta_[i, :], self.sigma_[i, :], | ||
X_i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be on one line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite, or it fails pep8. got it on two, though.
besides looks clean to me. |
@agramfort addressed all comments in c383e47. PTAL |
@@ -106,13 +106,16 @@ class GaussianNB(BaseNB): | |||
|
|||
Attributes | |||
---------- | |||
`class_prior_` : array, shape = [n_classes] | |||
`class_prior_` : array, shape (n_classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(n_classes) -> (n_classes,)
1 element tuple
- clean up docstring formatting - add class-level docstring for GaussianNB.class_count_ - make update_mean_variance private - fix up variable names in update_mean_variance
y : array-like, shape (n_samples) | ||
Target values. | ||
|
||
classes : array-like, shape = (n_classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in f4df685
Previously triggered a copy of class_prior_ on partial_fit; now assign into existing array.
Fixed docstring stuff in 13db01f. Also fixed an inadvertent copy of |
Coverage increased (+0.0%) when pulling 0a87d99199dea6adab131675ce1752f5916dd551 on ihaque:master into 877d471 on scikit-learn:master. |
Training vectors, where n_samples is the number of samples | ||
and n_features is the number of features. | ||
|
||
y : array-like, shape = [n_samples] | ||
y : array-like, shape = (n_samples,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra =
same above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in f4df685
besides LGTM maybe @larsmans wants to have a look. |
|
||
See Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque: | ||
|
||
http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like this reference, and a note pointing out that online fitting is possible at all, to appear in the constructor docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the class docstring, or the docstring to partial_fit
? There's no __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in f4df685
OK, I think I addressed all pending comments from @larsmans and @agramfort in f4df685. |
Travis error appears to be unrelated:
|
+1 for merge on my side. |
bump -- any objections to merging? |
|
Oh no, |
Merged by rebase as e7e49a6. Thanks! |
Good job! |
Uses a method due to Chan, Golub, and LeVeque to perform online update of the model parameters in
GaussianNB
. Does not implement sample weighting.Added tests to
test_naive_bayes
to ensure thatpartial_fit
called on the entire toy set produces the same result asfit
, and that it produces the same result even if the toy set is fitted in two parts.TODO:
fit
could be a thin wrapper aroundpartial_fit
rather than duplicating code.