MNT refactor naive Bayes tests #14064

timbicker · 2019-06-11T09:49:39Z

As discussed in PR #12569, I make the refactored naive_bayes tests as a separate PR.

When I implemented the Categorical Naive Bayes classifier, it took me some time to understand the naive Bayes tests and get an overview of what is tested. Therefore, I started to refactor them in order to make them easier understandable and graspable.

Basically, I moved tests around to order them by the class that is tested. Right now (in master), it is a bit mixed, which makes it, in my opinion, unnecessarily complicated to get an overview of what is tested and what not. E.g. gnb tests come in between discrete nb tests and tests are not always named in the same pattern, so by just reading the function name, it is not obvious what is tested.

I commented many of my changes in a review below.

rth

Thanks! The diff is currently quite large and hard to review because you both rename test functions and shuffle them somewhat.

For instance, test_discrete_prior was previously L172 after test_gnb_pfit_wrong_nb_features test. In this PR it is test_discretenb_prior L197 after test_gnb_naive_bayes_scale_invariance

This will also make using git blame more difficult without good reason.

I would suggest you look at the diff on Github and try to reduce it somewhat, at least by not moving functions around.

rth · 2019-06-11T10:11:28Z

sklearn/tests/test_naive_bayes.py

-    labels = [GaussianNB().fit(f * X, y).predict(f * X)
-              for f in [1E-10, 1, 1E10]]
-    assert_array_equal(labels[0], labels[1])
-    assert_array_equal(labels[1], labels[2])


Why moving this test around in the file? It makes the diff hard to follow.

Thanks for your remark. I didn't expect go get feedback so quickly :) (I commented in a review many of my changes).

The test you mention changed its position, because I moved the test_gnb_partial_fit and test_gnb_naive_bayes_scale_invariance to the other gnb tests.

Basically, I moved tests around to order them by the class that is tested. Right now (in master), it is a bit mixed, which makes it unnecessarily complicated to get an overview of what is tested and what not. E.g. gnb tests come in between discrete nb tests and tests are not always named in the same pattern, so by just reading the function name, it is not obvious what is tested.

[..]
This will also make using git blame more difficult without good reason.

I would suggest you look at the diff on Github and try to reduce it somewhat, at least by not moving functions around.

Moving the functions was a central idea of this PR. Do you think it is more important to have a good git blame history or have the codebase well organized?

git blame history is more useful for maintenance.

timbicker

I commented most of my changes to point out what I did and why.
Basically, tests have been ordered by class and renamed to similar patterns.
At the beginning come gnb (gaussian nb) tests, then general / parametrized disctete nb tests. Followed by more specific tests for classes that belong to discrete nb. Finally alpha and accuracy tests.

timbicker · 2019-06-11T10:30:33Z

sklearn/tests/test_naive_bayes.py

@@ -169,126 +169,52 @@ def test_gnb_pfit_wrong_nb_features():
    assert_raises(ValueError, clf.partial_fit, np.hstack((X, X)), y)


-def test_discrete_prior():
-    # Test whether class priors are properly set.


moved down to other discrete nb tests and renamed it to test_discretenb_prior

timbicker · 2019-06-11T10:31:07Z

sklearn/tests/test_naive_bayes.py

-
-@pytest.mark.parametrize('kind', ('dense', 'sparse'))
-def test_mnnb(kind):
-    # Test Multinomial Naive Bayes classification.


moved tests down to other mnnb tests

timbicker · 2019-06-11T10:31:25Z

sklearn/tests/test_naive_bayes.py

-    y_pred_log_proba = clf.predict_log_proba(X)
-    assert_array_almost_equal(np.log(y_pred_proba), y_pred_log_proba, 8)
+def test_gnb_partial_fit():
+    clf = GaussianNB().fit(X, y)


moved up to here

timbicker · 2019-06-11T10:31:47Z

sklearn/tests/test_naive_bayes.py

-    assert_array_almost_equal(y_pred_proba2, y_pred_proba)
-    assert_array_almost_equal(y_pred_log_proba2, y_pred_log_proba)
+def test_gnb_naive_bayes_scale_invariance():
+    # Scaling the data should not change the prediction results


Originally test_naive_bayes_scale_invariance. Moved up to other gnb tests and renamed

timbicker · 2019-06-11T10:32:45Z

sklearn/tests/test_naive_bayes.py

-    assert_array_almost_equal(y_pred_log_proba3, y_pred_log_proba)
+@pytest.mark.parametrize("cls", [MultinomialNB, BernoulliNB])
+def test_discretenb_prior(cls):
+    # Test whether class priors are properly set.


originally test_discrete_prior. Moved and renamed

timbicker · 2019-06-11T10:37:34Z

sklearn/tests/test_naive_bayes.py


-def test_feature_log_prob_bnb():
-    # Test for issue #4268.


renamed to test_bnb_feature_log_prob and moved to other bnb tests

timbicker · 2019-06-11T10:37:59Z

sklearn/tests/test_naive_bayes.py

-    # Fit Bernoulli NB w/ alpha = 1.0
-    clf = BernoulliNB(alpha=1.0)
-    clf.fit(X, Y)
+def test_mnb_prior_unobserved_targets():


moved test down to here.

timbicker · 2019-06-11T10:38:24Z

sklearn/tests/test_naive_bayes.py

@@ -581,6 +527,29 @@ def test_bnb():
    assert_array_almost_equal(clf.predict_proba(X_test), predict_proba)


+def test_bnb_feature_log_prob():
+    # Test for issue #4268.


originally test_feature_log_prob_bnb. Renamed and moved to here

timbicker · 2019-06-11T10:38:41Z

sklearn/tests/test_naive_bayes.py

@@ -647,15 +616,6 @@ def test_cnb():
    assert_array_almost_equal(clf.feature_log_prob_, normed_weights)


-def test_naive_bayes_scale_invariance():
-    # Scaling the data should not change the prediction results


Moved up to other gnb tests and renamed to test_gnb_naive_bayes_scale_invariance

timbicker · 2019-06-11T10:39:00Z

sklearn/tests/test_naive_bayes.py

@@ -743,3 +703,37 @@ def test_alpha_vector():
    expected_msg = ('alpha should be a scalar or a numpy array '
                    'with shape [n_features]')
    assert_raise_message(ValueError, expected_msg, m_nb.fit, X, y)
+
+
+def test_check_accuracy_on_digits():


moved down from above

jnothman

Thanks for this.

jnothman · 2019-06-13T07:48:29Z

I don't think git blame usage on tests is a big priority over a test file whose coverage is easy to understand. git blame is often hard in any case. Better to have git blame place blame on a commit named "MNT refactor tests" than "ENH add categorical Naive Bayes"

rth

Fair enough, thanks for the detailed comments!

refactor tests

628cec1

timbicker changed the title ~~[MRG] refactor tests~~ [MRG] refactor naive Bayes tests Jun 11, 2019

rth reviewed Jun 11, 2019

View reviewed changes

timbicker commented Jun 11, 2019

View reviewed changes

jnothman approved these changes Jun 13, 2019

View reviewed changes

rth approved these changes Jun 13, 2019

View reviewed changes

rth changed the title ~~[MRG] refactor naive Bayes tests~~ MNT refactor naive Bayes tests Jun 13, 2019

rth merged commit d91b0f3 into scikit-learn:master Jun 13, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

MNT refactor naive Bayes tests (scikit-learn#14064)

9b47af5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MNT refactor naive Bayes tests #14064

MNT refactor naive Bayes tests #14064

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MNT refactor naive Bayes tests #14064

MNT refactor naive Bayes tests #14064

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!