8000 [MRG+1] Add partial_fit to GaussianNB by ihaque · Pull Request #3324 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Add partial_fit to GaussianNB #3324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

ihaque
Copy link
Contributor
@ihaque ihaque commented Jul 1, 2014

Uses a method due to Chan, Golub, and LeVeque to perform online update of the model parameters in GaussianNB. Does not implement sample weighting.

Added tests to test_naive_bayes to ensure that partial_fit called on the entire toy set produces the same result as fit, and that it produces the same result even if the toy set is fitted in two parts.

TODO: fit could be a thin wrapper around partial_fit rather than duplicating code.

Uses Chan/Golub/LeVeque update for model parameters. Does not implement
sample weighting.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 69486a8 on ihaque:master into 877d471 on scikit-learn:master.

new_theta, new_sigma = self.update_mean_variance(
self.class_count_[i],
self.theta_[i, :], self.sigma_[i, :],
X_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be on one line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite, or it fails pep8. got it on two, though.

@agramfort
Copy link
Member

besides looks clean to me.

@ihaque
Copy link
Contributor Author
ihaque commented Jul 1, 2014

@agramfort addressed all comments in c383e47. PTAL

@@ -106,13 +106,16 @@ class GaussianNB(BaseNB):

Attributes
----------
`class_prior_` : array, shape = [n_classes]
`class_prior_` : array, shape (n_classes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(n_classes) -> (n_classes,)

1 element tuple

 - clean up docstring formatting
 - add class-level docstring for GaussianNB.class_count_
 - make update_mean_variance private
 - fix up variable names in update_mean_variance
y : array-like, shape (n_samples)
Target values.

classes : array-like, shape = (n_classes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f4df685

Previously triggered a copy of class_prior_ on partial_fit; now assign into
existing array.
@ihaque
Copy link
Contributor Author
ihaque commented Jul 1, 2014

Fixed docstring stuff in 13db01f. Also fixed an inadvertent copy of self.class_prior_ at the end of partial_fit.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 0a87d99199dea6adab131675ce1752f5916dd551 on ihaque:master into 877d471 on scikit-learn:master.

Training vectors, where n_samples is the number of samples
and n_features is the number of features.

y : array-like, shape = [n_samples]
y : array-like, shape = (n_samples,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra =

same above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f4df685

@agramfort
Copy link
Member

besides LGTM

maybe @larsmans wants to have a look.


See Stanford CS tech report STAN-CS-79-773 by Chan, Golub, and LeVeque:

http://i.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this reference, and a note pointing out that online fitting is possible at all, to appear in the constructor docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the class docstring, or the docstring to partial_fit? There's no __init__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring. Sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in f4df685

@ihaque
Copy link
Contributor Author
ihaque commented Jul 1, 2014

OK, I think I addressed all pending comments from @larsmans and @agramfort in f4df685.

@ihaque
Copy link
Contributor Author
ihaque commented Jul 1, 2014

Travis error appears to be unrelated:

ERROR: sklearn.tests.test_common.test_regressors_int('OrthogonalMatchingPursuitCV', <class 'sklearn.linear_model.omp.OrthogonalMatchingPursuitCV'>, array([[-0.44836249, -0.47282444, -1.20608008, -0.27435163, -0.81468082,

@agramfort
Copy link
Member

+1 for merge on my side.

@ihaque
Copy link
Contributor Author
ihaque commented Jul 3, 2014

bump -- any objections to merging?

@larsmans
Copy link
Member
larsmans commented Jul 4, 2014

sample_weight support is not tested, AFAICT.

@larsmans larsmans changed the title Add partial_fit to GaussianNB [MRG+1] Add partial_fit to GaussianNB Jul 4, 2014
@larsmans
Copy link
Member
larsmans commented Jul 4, 2014

Oh no, sample_weight is not implemented. Sorry, my mistake. Merging.

@larsmans
Copy link
Member
larsmans commented Jul 4, 2014

Merged by rebase as e7e49a6. Thanks!

@larsmans larsmans closed this Jul 4, 2014
@GaelVaroquaux
Copy link
Member

Good job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0