-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
MNT refactor naive Bayes tests #14064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The diff is currently quite large and hard to review because you both rename test functions and shuffle them somewhat.
For instance, test_discrete_prior
was previously L172 after test_gnb_pfit_wrong_nb_features
test. In this PR it is test_discretenb_prior
L197 after test_gnb_naive_bayes_scale_invariance
This will also make using git blame more difficult without good reason.
I would suggest you look at the diff on Github and try to reduce it somewhat, at least by not moving functions around.
labels = [GaussianNB().fit(f * X, y).predict(f * X) | ||
for f in [1E-10, 1, 1E10]] | ||
assert_array_equal(labels[0], labels[1]) | ||
assert_array_equal(labels[1], labels[2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why moving this test around in the file? It makes the diff hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your remark. I didn't expect go get feedback so quickly :) (I commented in a review many of my changes).
The test you mention changed its position, because I moved the test_gnb_partial_fit
and test_gnb_naive_bayes_scale_invariance
to the other gnb tests.
Basically, I moved tests around to order them by the class that is tested. Right now (in master), it is a bit mixed, which makes it unnecessarily complicated to get an overview of what is tested and what not. E.g. gnb tests come in between discrete nb tests and tests are not always named in the same pattern, so by just reading the function name, it is not obvious what is tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[..]
This will also make using git blame more difficult without good reason.I would suggest you look at the diff on Github and try to reduce it somewhat, at least by not moving functions around.
Moving the functions was a central idea of this PR. Do you think it is more important to have a good git blame history or have the codebase well organized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git blame history is more useful for maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented most of my changes to point out what I did and why.
Basically, tests have been ordered by class and renamed to similar patterns.
At the beginning come gnb (gaussian nb) tests, then general / parametrized disctete nb tests. Followed by more specific tests for classes that belong to discrete nb. Finally alpha and accuracy tests.
@@ -169,126 +169,52 @@ def test_gnb_pfit_wrong_nb_features(): | |||
assert_raises(ValueError, clf.partial_fit, np.hstack((X, X)), y) | |||
|
|||
|
|||
def test_discrete_prior(): | |||
# Test whether class priors are properly set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved down to other discrete nb tests and renamed it to test_discretenb_prior
|
||
@pytest.mark.parametrize('kind', ('dense', 'sparse')) | ||
def test_mnnb(kind): | ||
# Test Multinomial Naive Bayes classification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved tests down to other mnnb tests
y_pred_log_proba = clf.predict_log_proba(X) | ||
assert_array_almost_equal(np.log(y_pred_proba), y_pred_log_proba, 8) | ||
def test_gnb_partial_fit(): | ||
clf = GaussianNB().fit(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved up to here
assert_array_almost_equal(y_pred_proba2, y_pred_proba) | ||
assert_array_almost_equal(y_pred_log_proba2, y_pred_log_proba) | ||
def test_gnb_naive_bayes_scale_invariance(): | ||
# Scaling the data should not change the prediction results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally test_naive_bayes_scale_invariance
. Moved up to other gnb tests and renamed
assert_array_almost_equal(y_pred_log_proba3, y_pred_log_proba) | ||
@pytest.mark.parametrize("cls", [MultinomialNB, BernoulliNB]) | ||
def test_discretenb_prior(cls): | ||
# Test whether class priors are properly set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originally test_discrete_prior
. Moved and renamed
|
||
def test_feature_log_prob_bnb(): | ||
# Test for issue #4268. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to test_bnb_feature_log_prob
and moved to other bnb tests
# Fit Bernoulli NB w/ alpha = 1.0 | ||
clf = BernoulliNB(alpha=1.0) | ||
clf.fit(X, Y) | ||
def test_mnb_prior_unobserved_targets(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved test down to here.
@@ -581,6 +527,29 @@ def test_bnb(): | |||
assert_array_almost_equal(clf.predict_proba(X_test), predict_proba) | |||
|
|||
|
|||
def test_bnb_feature_log_prob(): | |||
# Test for issue #4268. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originally test_feature_log_prob_bnb
. Renamed and moved to here
@@ -647,15 +616,6 @@ def test_cnb(): | |||
assert_array_almost_equal(clf.feature_log_prob_, normed_weights) | |||
|
|||
|
|||
def test_naive_bayes_scale_invariance(): | |||
# Scaling the data should not change the prediction results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved up to other gnb tests and renamed to test_gnb_naive_bayes_scale_invariance
@@ -743,3 +703,37 @@ def test_alpha_vector(): | |||
expected_msg = ('alpha should be a scalar or a numpy array ' | |||
'with shape [n_features]') | |||
assert_raise_message(ValueError, expected_msg, m_nb.fit, X, y) | |||
|
|||
|
|||
def test_check_accuracy_on_digits(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved down from above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
I don't think git blame usage on tests is a big priority over a test file
whose coverage is easy to understand. git blame is often hard in any case.
Better to have git blame place blame on a commit named "MNT refactor tests"
than "ENH add categorical Naive Bayes"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, thanks for the detailed comments!
As discussed in PR #12569, I make the refactored naive_bayes tests as a separate PR.
When I implemented the Categorical Naive Bayes classifier, it took me some time to understand the naive Bayes tests and get an overview of what is tested. Therefore, I started to refactor them in order to make them easier understandable and graspable.
Basically, I moved tests around to order them by the class that is tested. Right now (in master), it is a bit mixed, which makes it, in my opinion, unnecessarily complicated to get an overview of what is tested and what not. E.g. gnb tests come in between discrete nb tests and tests are not always named in the same pattern, so by just reading the function name, it is not obvious what is tested.
I commented many of my changes in a review below.