8000 ENH Adding variable `force_alpha` to classes in naive_bayes.py by Micky774 · Pull Request #22269 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Adding variable force_alpha to classes in naive_bayes.py #22269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Jul 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
282a7dd
Adding variable alphaCorrection to classes in naive_bayes.py.
arka204 Mar 22, 2020
d78e17b
Splitting few lines of code.
arka204 Mar 22, 2020
3b79637
Merge pull request #1 from scikit-learn/master
arka204 Apr 11, 2020
464dc37
Merge pull request #2 from scikit-learn/master
arka204 May 10, 2020
a4429bf
Fixing problems and adding tests.
arka204 May 21, 2020
cf35eb1
Updating naive_bayes.py.
arka204 May 21, 2020
15a658f
Merge pull request #3 from scikit-learn/master
arka204 May 22, 2020
ec786b3
Merge branch 'alpha-close-or-equal-0-update' into alpha-1
arka204 May 22, 2020
4606d85
Merge pull request #5 from arka204/alpha-1
arka204 May 22, 2020
f0debb6
Merge pull request #6 from arka204/alpha-close-or-equal-0-update
arka204 May 22, 2020
dcce4a8
Checkig warnings in tests.
arka204 May 31, 2020
81d5f32
Merge branch 'alpha-close-or-equal-0-update' of https://github.com/ar…
arka204 May 31, 2020
43dfda5
Merge pull request #8 from arka204/alpha-close-or-equal-0-update
arka204 Jun 8, 2020
be0ebfe
Update v0.24.rst
arka204 Jun 8, 2020
2968400
Merge pull request #10 from scikit-learn/master
arka204 Jun 8, 2020
7e1b649
Merge branch 'Proposition-for-BernoulliNB-and-MultinomialNB-when-alph…
arka204 Jun 8, 2020
5782e6f
Merge pull request #11 from arka204/master-copy
arka204 Jun 8, 2020
a7337d2
Merge remote-tracking branch 'upstream/master' into Proposition-for-B…
Nov 10, 2020
2d16091
Fix merge
Nov 10, 2020
728b842
Merge remote-tracking branch 'upstream/master' into 10772-force-alpha
hongshaoyang Dec 20, 2020
a8209e0
Move whatsnew
hongshaoyang Dec 20, 2020
44c50fd
Merge remote-tracking branch 'upstream/master' into 10772-force-alpha
hongshaoyang Dec 21, 2020
ded1f6e
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang Jan 30, 2021
2a2d8f3
Apply suggestions from code review
hongshaoyang Feb 8, 2021
d8f784e
Remove extra line
hongshaoyang Feb 9, 2021
a3897f7
Flake8
hongshaoyang Feb 9, 2021
23c68dd
Apply suggestions from code review
hongshaoyang Feb 9, 2021
b1151d7
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang May 29, 2021
1d01c6c
Fix merge
hongshaoyang May 29, 2021
aa1d8de
use assert_warns_message
hongshaoyang May 29, 2021
203af9e
Apply suggestions from code review
hongshaoyang Jun 9, 2021
cc4fda7
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang Jun 9, 2021
91127bc
Fix wrong variable name
hongshaoyang Jun 9, 2021
c4d0736
Fix test to use "with pytest.warns" instead of assert_warns_message
hongshaoyang Jun 9, 2021
8964a16
Merge commit '0e7761cdc4f244adb4803f1a97f0a9fe4b365a99' into 10772-fo…
hongshaoyang Jun 23, 2021
e7a5f37
MAINT Adds target_version to black config (#20293)
thomasjpfan Jun 17, 2021
98c0c12
Black formatting
hongshaoyang Jun 23, 2021
2d9ab41
Merge remote-tracking branch 'upstream/main' into 10772-force-alpha
hongshaoyang Jun 23, 2021
16af708
Apply suggestions from code review
hongshaoyang Jun 23, 2021
180721f
Merge branch 'main' into 10772-force-alpha
Micky774 Jan 22, 2022
62d9f5b
Updated versioning and improved warning message
Micky774 Jan 22, 2022
9bc6f74
Updated docs to include literal value of `_ALPHA_MIN`
Micky774 Jan 22, 2022
0d6f2ce
Updated tests and improved `force_alpha` documentation
Micky774 Jan 22, 2022
28e34d9
Merge branch 'main' into 10772-force-alpha
Micky774 Feb 13, 2022
858c279
Merge branch 'main' into 10772-force-alpha
Micky774 Feb 17, 2022
39769e3
Merge branch 'main' into 10772-force-alpha
Micky774 Feb 25, 2022
61e5a28
Merge branch 'main' into 10772-force-alpha
Micky774 Feb 27, 2022
036fcac
Fixed changelog formatting
Micky774 Mar 6, 2022
635d403
Merge branch 'main' into 10772-force-alpha
Micky774 Mar 6, 2022
f2038d9
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 2, 2022
be9fb63
Apply suggestions from code review
Micky774 Jun 2, 2022
5c3c7c9
Fixed tests to avoid FutureWarning
Micky774 Jun 2, 2022
fdef471
Merge branch '10772-force-alpha' of https://github.com/Micky774/sciki…
Micky774 Jun 2, 2022
87a459a
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 6, 2022
77e0dbe
Update sklearn/tests/test_naive_bayes.py
Micky774 Jun 6, 2022
641036b
Merge branch '10772-force-alpha' of https://github.com/Micky774/sciki…
Micky774 Jun 6, 2022
0158331
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 7, 2022
5ddd625
Updated examples with `force_alpha=True`
Micky774 Jun 7, 2022
699ab83
Updated tests and code for double-deprecation cycle
Micky774 Jun 9, 2022
91b6f3c
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 9, 2022
41be745
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 15, 2022
89ddbf1
Simplification of validation
Micky774 Jun 15, 2022
63f9fd8
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 28, 2022
14a360f
Added private `_alpha` attribute for testing ease
Micky774 Jun 28, 2022
84743b6
Revert "Added private `_alpha` attribute for testing ease"
Micky774 Jun 30, 2022
c855059
Update sklearn/naive_bayes.py
Micky774 Jun 30, 2022
579f696
Merge branch 'main' into 10772-force-alpha
Micky774 Jun 30, 2022
782f10a
Filtered warnings in tests, fixed warning message
Micky774 Jun 30, 2022
30cf579
Merge branch 'main' into 10772-force-alpha
thomasjpfan Jul 4, 2022
97fa75a
Merge branch 'main' into 10772-force-alpha
Micky774 Jul 6, 2022
f6376ff
Made quieter warning and expanded test
Micky774 Jul 6, 2022
764fb58
Streamlined test
Micky774 Jul 6, 2022
615ebdb
Updated test
Micky774 Jul 6, 2022
f3462d2
Apply suggestions from code review
Micky774 Jul 6, 2022
5828e0e
Apply suggestions from code review
Micky774 Jul 20, 2022
02cb3db
Merge branch 'main' into 10772-force-alpha
Micky774 Jul 20, 2022
15b964b
Merge branch 'main' into 10772-force-alpha
Micky774 Jul 21, 2022
cb2f886
Addressed review feedback
Micky774 Jul 21, 2022
5715086
Apply suggestions from code review
Micky774 Jul 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,18 @@ Changelog
:pr:`10805` by :user:`Mathias Andersen <MrMathias>` and
:pr:`23471` by :user:`Meekail Zain <micky774>`

:mod:`sklearn.naive_bayes`
..........................

- |Enhancement| A new parameter `force_alpha` was added to
:class:`naive_bayes.BernoulliNB`, :class:`naive_bayes.ComplementNB`,
:class:`naive_bayes.CategoricalNB`, and :class:`naive_bayes.MultinomialNB`,
allowing user to set parameter alpha to a very small number, greater or equal
0, which was earlier automatically changed to `1e-10` instead.
:pr:`16747` by :user:`arka204`,
:pr:`18805` by :user:`hongshaoyang`,
:pr:`22269` by :user:`Meekail Zain <micky774>`.

Code and Documentation Contributors
-----------------------------------

Expand Down
149 changes: 122 additions & 27 deletions sklearn/naive_bayes.py
10000
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from .utils.multiclass import _check_partial_fit_first_call
from .utils.validation import check_is_fitted, check_non_negative
from .utils.validation import _check_sample_weight
from .utils._param_validation import Interval
from .utils._param_validation import Interval, Hidden, StrOptions

__all__ = [
"BernoulliNB",
Expand Down Expand Up @@ -549,12 +549,14 @@ class _BaseDiscreteNB(_BaseNB):
"alpha": [Interval(Real, 0, None, closed="left"), "array-like"],
"fit_prior": ["boolean"],
"class_prior": ["array-like", None],
"force_alpha": ["boolean", Hidden(StrOptions({"warn"}))],
}

def __init__(self, alpha=1.0, fit_prior=True, class_prior=None):
def __init__(self, alpha=1.0, fit_prior=True, class_prior=None, force_alpha="warn"):
self.alpha = alpha
self.fit_prior = fit_prior
self.class_prior = class_prior
self.force_alpha = force_alpha

@abstractmethod
def _count(self, X, Y):
Expand Down Expand Up @@ -622,22 +624,34 @@ def _check_alpha(self):
alpha = (
np.asarray(self.alpha) if not isinstance(self.alpha, Real) else self.alpha
)
alpha_min = np.min(alpha)
if isinstance(alpha, np.ndarray):
if not alpha.shape[0] == self.n_features_in_:
raise ValueError(
"When alpha is an array, it should contains `n_features`. "
f"Got {alpha.shape[0]} elements instead of {self.n_features_in_}."
)
# check that all alpha are positive
if np.min(alpha) < 0:
if alpha_min < 0:
raise ValueError("All values in alpha must be greater than 0.")
alpha_min = 1e-10
if np.min(alpha) < alpha_min:
alpha_lower_bound = 1e-10
# TODO(1.4): Replace w/ deprecation of self.force_alpha
# See gh #22269
_force_alpha = self.force_alpha
if _force_alpha == "warn" and alpha_min < alpha_lower_bound:
_force_alpha = False
warnings.warn(
"The default value for `force_alpha` will change to `True` in 1.4. To"
" suppress this warning, manually set the value of `force_alpha`.",
FutureWarning,
)
if alpha_min < alpha_lower_bound and not _force_alpha:
warnings.warn(
"alpha too small will result in numeric errors, setting alpha ="
f" {alpha_min:.1e}"
f" {alpha_lower_bound:.1e}. Use `force_alpha=True` to keep alpha"
" unchanged."
)
return np.maximum(alpha, alpha_min)
return np.maximum(alpha, alpha_lower_bound)
return alpha

def partial_fit(self, X, y, classes=None, sample_weight=None):
Expand Down Expand Up @@ -812,7 +826,16 @@ class MultinomialNB(_BaseDiscreteNB):
----------
alpha : float or array-like of shape (n_features,), default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.

.. versionadded:: 1.2
.. deprecated:: 1.2
The default value of `force_alpha` will change to `True` in v1.4.

fit_prior : bool, default=True
Whether to learn class prior probabilities or not.
Expand Down Expand Up @@ -881,15 +904,22 @@ class MultinomialNB(_BaseDiscreteNB):
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import MultinomialNB
>>> clf = MultinomialNB()
>>> clf = MultinomialNB(force_alpha=True)
>>> clf.fit(X, y)
MultinomialNB()
MultinomialNB(force_alpha=True)
>>> print(clf.predict(X[2:3]))
[3]
"""

def __init__(self, *, alpha=1.0, fit_prior=True, class_prior=None):
super().__init__(alpha=alpha, fit_prior=fit_prior, class_prior=class_prior)
def __init__(
self, *, alpha=1.0, force_alpha="warn", fit_prior=True, class_prior=None
):
super().__init__(
alpha=alpha,
fit_prior=fit_prior,
class_prior=class_prior,
force_alpha=force_alpha,
)

def _more_tags(self):
return {"requires_positive_X": True}
Expand Down Expand Up @@ -928,7 +958,17 @@ class ComplementNB(_BaseDiscreteNB):
Parameters
----------
alpha : float or array-like of shape (n_features,), default=1.0
Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
Additive (Laplace/Lidstone) smoothing parameter
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.

.. versionadded:: 1.2
.. deprecated:: 1.2
The default value of `force_alpha` will change to `True` in v1.4.

fit_prior : bool, default=True
Only used in edge case with a single class in the training set.
Expand Down Expand Up @@ -1005,9 +1045,9 @@ class ComplementNB(_BaseDiscreteNB):
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import ComplementNB
>>> clf = ComplementNB()
>>> clf = ComplementNB(force_alpha=True)
>>> clf.fit(X, y)
ComplementNB()
ComplementNB(force_alpha=True)
>>> print(clf.predict(X[2:3]))
[3]
"""
Expand All @@ -1017,8 +1057,21 @@ class ComplementNB(_BaseDiscreteNB):
"norm": ["boolean"],
}

def __init__(self, *, alpha=1.0, fit_prior=True, class_prior=None, norm=False):
super().__init__(alpha=alpha, fit_prior=fit_prior, class_prior=class_prior)
def __init__(
self,
*,
alpha=1.0,
force_alpha="warn",
fit_prior=True,
class_prior=None,
norm=False,
):
super().__init__(
alpha=alpha,
force_alpha=force_alpha,
fit_prior=fit_prior,
class_prior=class_prior,
)
self.norm = norm

def _more_tags(self):
Expand Down Expand Up @@ -1064,7 +1117,16 @@ class BernoulliNB(_BaseDiscreteNB):
----------
alpha : float or array-like of shape (n_features,), default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.

.. versionadded:: 1.2
.. deprecated:: 1.2
The default value of `force_alpha` will change to `True` in v1.4.

binarize : float or None, default=0.0
Threshold for binarizing (mapping to booleans) of sample features.
Expand Down Expand Up @@ -1144,9 +1206,9 @@ class BernoulliNB(_BaseDiscreteNB):
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf = BernoulliNB(force_alpha=True)
>>> clf.fit(X, Y)
BernoulliNB()
BernoulliNB(force_alpha=True)
>>> print(clf.predict(X[2:3]))
[3]
"""
Expand All @@ -1156,8 +1218,21 @@ class BernoulliNB(_BaseDiscreteNB):
"binarize": [None, Interval(Real, 0, None, closed="left")],
}

def __init__(self, *, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None):
super().__init__(alpha=alpha, fit_prior=fit_prior, class_prior=class_prior)
def __init__(
self,
*,
alpha=1.0,
force_alpha="warn",
binarize=0.0,
fit_prior=True,
class_prior=None,
):
super().__init__(
alpha=alpha,
fit_prior=fit_prior,
class_prior=class_prior,
force_alpha=force_alpha,
)
self.binarize = binarize

def _check_X(self, X):
Expand Down Expand Up @@ -1219,7 +1294,16 @@ class CategoricalNB(_BaseDiscreteNB):
----------
alpha : float, default=1.0
Additive (Laplace/Lidstone) smoothing parameter
(0 for no smoothing).
(set alpha=0 and force_alpha=True, for no smoothing).

force_alpha : bool, default=False
If False and alpha is less than 1e-10, it will set alpha to
1e-10. If True, alpha will remain unchanged. This may cause
numerical errors if alpha is too close to 0.

.. versionadded:: 1.2
.. deprecated:: 1.2
The default value of `force_alpha` will change to `True` in v1.4.

fit_prior : bool, default=True
Whether to learn class prior probabilities or not.
Expand Down Expand Up @@ -1301,9 +1385,9 @@ class CategoricalNB(_BaseDiscreteNB):
>>> X = rng.randint(5, size=(6, 100))
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> from sklearn.naive_bayes import CategoricalNB
>>> clf = CategoricalNB()
>>> clf = CategoricalNB(force_alpha=True)
>>> clf.fit(X, y)
CategoricalNB()
CategoricalNB(force_alpha=True)
>>> print(clf.predict(X[2:3]))
[3]
"""
Expand All @@ -1319,9 +1403,20 @@ class CategoricalNB(_BaseDiscreteNB):
}

def __init__(
self, *, alpha=1.0, fit_prior=True, class_prior=None, min_categories=None
self,
*,
alpha=1.0,
force_alpha="warn",
fit_prior=True,
class_prior=None,
min_categories=None,
):
super().__init__(alpha=alpha, fit_prior=fit_prior, class_prior=class_prior)
super().__init__(
alpha=alpha,
force_alpha=force_alpha,
fit_prior=fit_prior,
class_prior=class_prior,
)
self.min_categories = min_categories

def fit(self, X, y, sample_weigh 10000 t=None):
Expand Down
4 changes: 2 additions & 2 deletions sklearn/tests/test_calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ def test_calibration(data, method, ensemble):
X_test, y_test = X[n_samples:], y[n_samples:]

# Naive-Bayes
clf = MultinomialNB().fit(X_train, y_train, sample_weight=sw_train)
clf = MultinomialNB(force_alpha=True).fit(X_train, y_train, sample_weight=sw_train)
prob_pos_clf = clf.predict_proba(X_test)[:, 1]

cal_clf = CalibratedClassifierCV(clf, cv=y.size + 1, ensemble=ensemble)
Expand Down Expand Up @@ -322,7 +322,7 @@ def test_calibration_prefit():
X_test, y_test = X[2 * n_samples :], y[2 * n_samples :]

# Naive-Bayes
clf = MultinomialNB()
clf = MultinomialNB(force_alpha=True)
# Check error if clf not prefit
unfit_clf = CalibratedClassifierCV(clf, cv="prefit")
with pytest.raises(NotFittedError):
Expand Down
8 changes: 8 additions & 0 deletions sklearn/tests/test_docstring_parameters.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,14 @@ def test_fit_docstring_attributes(name, Estimator):
est.set_params(n_init="auto")

# TODO(1.4): TO BE REMOVED for 1.4 (avoid FutureWarning)
if Estimator.__name__ in (
"MultinomialNB",
"ComplementNB",
"BernoulliNB",
"CategoricalNB",
):
est.set_params(force_alpha=True)

if Estimator.__name__ == "QuantileRegressor":
solver = "highs" if sp_version >= parse_version("1.6.0") else "interior-point"
est.set_params(solver=solver)
Expand Down
3 changes: 3 additions & 0 deletions sklearn/tests/test_multiclass.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@
from sklearn import datasets
from sklearn.datasets import load_breast_cancer

msg = "The default value for `force_alpha` will change"
pytestmark = pytest.mark.filterwarnings(f"ignore:{msg}:FutureWarning")

iris = datasets.load_iris()
rng = np.random.RandomState(0)
perm = rng.permutation(iris.target.size)
Expand Down
Loading
0