8000 MNT refactor naive Bayes tests by timbicker · Pull Request #14064 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MNT refactor naive Bayes tests #14064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2019
Merged

Conversation

timbicker
Copy link
Contributor
@timbicker timbicker commented Jun 11, 2019

As discussed in PR #12569, I make the refactored naive_bayes tests as a separate PR.

When I implemented the Categorical Naive Bayes classifier, it took me some time to understand the naive Bayes tests and get an overview of what is tested. Therefore, I started to refactor them in order to make them easier understandable and graspable.

Basically, I moved tests around to order them by the class that is tested. Right now (in master), it is a bit mixed, which makes it, in my opinion, unnecessarily complicated to get an overview of what is tested and what not. E.g. gnb tests come in between discrete nb tests and tests are not always named in the same pattern, so by just reading the function name, it is not obvious what is tested.

I commented many of my changes in a review below.

@timbicker timbicker changed the title [MRG] refactor tests [MRG] refactor naive Bayes tests Jun 11, 2019
Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The diff is currently quite large and hard to review because you both rename test functions and shuffle them somewhat.

For instance, test_discrete_prior was previously L172 after test_gnb_pfit_wrong_nb_features test. In this PR it is test_discretenb_prior L197 after test_gnb_naive_bayes_scale_invariance

This will also make using git blame more difficult without good reason.

I would suggest you look at the diff on Github and try to reduce it somewhat, at least by not moving functions around.

labels = [GaussianNB().fit(f * X, y).predict(f * X)
for f in [1E-10, 1, 1E10]]
assert_array_equal(labels[0], labels[1])
assert_array_equal(labels[1], labels[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

Why moving this test around in the file? It makes the diff hard to follow.

Copy link
Contributor Author
@timbicker timbicker Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your remark. I didn't expect go get feedback so quickly :) (I commented in a review many of my changes).

The test you mention changed its position, because I moved the test_gnb_partial_fit and test_gnb_naive_bayes_scale_invariance to the other gnb tests.

Basically, I moved tests around to order them by the class that is tested. Right now (in master), it is a bit mixed, which makes it unnecessarily complicated to get an overview of what is tested and what not. E.g. gnb tests come in between discrete nb tests and tests are not always named in the same pattern, so by just reading the function name, it is not obvious what is tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[..]
This will also make using git blame more difficult without good reason.

I would suggest you look at the diff on Github and try to reduce it somewhat, at least by not moving functions around.

Moving the functions was a central idea of this PR. Do you think it is more important to have a good git blame history or have the codebase well organized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git blame history is more useful for maintenance.

Copy link
Contributor Author
@timbicker timbicker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented most of my changes to point out what I did and why.
Basically, tests have been ordered by class and renamed to similar patterns.
At the beginning come gnb (gaussian nb) tests, then general / parametrized disctete nb tests. Followed by more specific tests for classes that belong to discrete nb. Finally alpha and accuracy tests.

@@ -169,126 +169,52 @@ def test_gnb_pfit_wrong_nb_features():
assert_raises(ValueError, clf.partial_fit, np.hstack((X, X)), y)


def test_discrete_prior():
# Test whether class priors are properly set.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved down to other discrete nb tests and renamed it to test_discretenb_prior


@pytest.mark.parametrize('kind', ('dense', 'sparse'))
def test_mnnb(kind):
# Test Multinomial Naive Bayes classification.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved tests down to other mnnb tests

y_pred_log_proba = clf.predict_log_proba(X)
assert_array_almost_equal(np.log(y_pred_proba), y_pred_log_proba, 8)
def test_gnb_partial_fit():
clf = GaussianNB().fit(X, y)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved up to here

assert_array_almost_equal(y_pred_proba2, y_pred_proba)
assert_array_almost_equal(y_pred_log_proba2, y_pred_log_proba)
def test_gnb_naive_bayes_scale_invariance():
# Scaling the data should not change the prediction results
Copy link
Contributor Author
@timbicker timbicker Jun 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally test_naive_bayes_scale_invariance. Moved up to other gnb tests and renamed

assert_array_almost_equal(y_pred_log_proba3, y_pred_log_proba)
@pytest.mark.parametrize("cls", [MultinomialNB, BernoulliNB])
def test_discretenb_prior(cls):
# Test whether class priors are properly set.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally test_discrete_prior. Moved and renamed


def test_feature_log_prob_bnb():
# Test for issue #4268.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to test_bnb_feature_log_prob and moved to other bnb tests

# Fit Bernoulli NB w/ alpha = 1.0
clf = BernoulliNB(alpha=1.0)
clf.fit(X, Y)
def test_mnb_prior_unobserved_targets():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved test down to here.

@@ -581,6 +527,29 @@ def test_bnb():
assert_array_almost_equal(clf.predict_proba(X_test), predict_proba)


def test_bnb_feature_log_prob():
# Test for issue #4268.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally test_feature_log_prob_bnb. Renamed and moved to here

@@ -647,15 +616,6 @@ def test_cnb():
assert_array_almost_equal(clf.feature_log_prob_, normed_weights)


def test_naive_bayes_scale_invariance():
# Scaling the data should not change the prediction results
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved up to other gnb tests and renamed to test_gnb_naive_bayes_scale_invariance

@@ -743,3 +703,37 @@ def test_alpha_vector():
expected_msg = ('alpha should be a scalar or a numpy array '
'with shape [n_features]')
assert_raise_message(ValueError, expected_msg, m_nb.fit, X, y)


def test_check_accuracy_on_digits():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved down from above

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

@jnothman
Copy link
Member
jnothman commented Jun 13, 2019 via email

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, thanks for the detailed comments!

@rth rth changed the title [MRG] refactor naive Bayes tests MNT refactor naive Bayes tests Jun 13, 2019
@rth rth merged commit d91b0f3 into scikit-learn:master Jun 13, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0