8000 ENH add `only_non_negative` parameter to `_check_sample_weight` by simonandras · Pull Request #20880 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH add only_non_negative parameter to _check_sample_weight #20880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Sep 8, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
d05db36
Adding informative error message for the negative weight case (linear…
simonandras Aug 28, 2021
7370c51
The negativity check is now in _check_sample_weight which can be turn…
simonandras Sep 1, 2021
8c07156
Adding parameter documentation to the changed _check_sample_weight an…
simonandras Sep 1, 2021
f0583d9
Making some lines shorter than 88 char to pass tests
simonandras Sep 2, 2021
b3216f8
Make one line documentation in two lines to pass test (it was too long)
simonandras Sep 2, 2021
bc914dc
Delete whitespace in blank line
simonandras Sep 2, 2021
1cbdcb4
Add a dot to end of sentence (in modified function documentation)
simonandras Sep 2, 2021
a0d6a70
formatting with black .
simonandras Sep 2, 2021
622a1c7
Update sklearn/utils/validation.py
simonandras Sep 2, 2021
72c05f9
Update sklearn/linear_model/_base.py
simonandras Sep 2, 2021
fb01e06
Update sklearn/utils/validation.py
simonandras Sep 2, 2021
4797539
Merge branch 'scikit-learn:main' into my-new-branch
simonandras Sep 2, 2021
a8c8dd7
Merge branch 'main' of https://github.com/simonandras/scikit-learn in…
simonandras Sep 2, 2021
5658d22
Merge branch 'my-new-branch' of https://github.com/simonandras/scikit…
simonandras Sep 2, 2021
da6c6fd
Solve the previous problems
simonandras Sep 2, 2021
a7d5f2d
reformatting with black .
simonandras Sep 2, 2021
b8741db
In the documentation of _check_sample_weight changing the 3 space ind…
simonandras Sep 2, 2021
ed20f5a
Adding test to test_validation.py to test the negative weight error i…
simonandras Sep 2, 2021
9c24e51
black .
simonandras Sep 2, 2021
3acb9b3
In LinearRegression in fit method in _check_sample_weight function ca…
simonandras Sep 2, 2021
95ac8bf
Added entry to the changelog (sklearn.utils). Also corrected 2 typo i…
simonandras Sep 3, 2021
acece97
Merge branch 'main' into my-new-branch
simonandras Sep 6, 2021
97cae25
Merge branch 'main' into my-new-branch
ogrisel Sep 7, 2021
12acd84
Merge branch 'scikit-learn:main' into my-new-branch
simonandras Sep 7, 2021
6864e9b
move whats new version
glemaitre Sep 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions doc/whats_new/v1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,20 @@ Changelog
:pr:`123456` by :user:`Joe Bloggs <joeongithub>`.
where 123456 is the *pull request* number, not the issue number.


:mod:`sklearn.decomposition`
............................
:mod:`sklearn.utils`
....................

- |Enhancement| :func:`utils.validation._check_sample_weight` can perform a
non-negativity check on the sample weights. It can be turned on
using the only_non_negative bool parameter.
Estimators that check for non-negative weights are updated:
:func:`linear_model.LinearRegression` (here the previous
error message was misleading),
:func:`ensemble.AdaBoostClassifier`,
:func:`ensemble.AdaBoostRegressor`,
:func:`neighbors.KernelDensity`.
:pr:`20880` by :user:`Guillaume Lemaitre <glemaitre>`
and :user:`András Simon <simonandras>`.


Code and Documentation Contributors
Expand Down
8 changes: 4 additions & 4 deletions sklearn/ensemble/_weight_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,10 @@ def fit(self, X, y, sample_weight=None):
y_numeric=is_regressor(self),
)

sample_weight = _check_sample_weight(sample_weight, X, np.float64, copy=True)
sample_weight = _check_sample_weight(
sample_weight, X, np.float64, copy=True, only_non_negative=True
)
sample_weight /= sample_weight.sum()
if np.any(sample_weight < 0):
raise ValueError("sample_weight cannot contain negative weights")

# Check parameters
self._validate_estimator()
Expand All @@ -136,7 +136,7 @@ def fit(self, X, y, sample_weight=None):
self.estimator_weights_ = np.zeros(self.n_estimators, dtype=np.float64)
self.estimator_errors_ = np.ones(self.n_estimators, dtype=np.float64)

# Initializion of the random number instance that will be used to
# Initialization of the random number instance that will be used to
# generate a seed at each iteration
random_state = check_random_state(self.random_state)

Expand Down
2 changes: 1 addition & 1 deletion sklearn/ensemble/tests/test_weight_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,6 @@ def test_adaboost_negative_weight_error(model, X, y):
sample_weight = np.ones_like(y)
sample_weight[-1] = -10

err_msg = "sample_weight cannot contain negative weight"
err_msg = "Negative values in data passed to `sample_weight`"
with pytest.raises(ValueError, match=err_msg):
model.fit(X, y, sample_weight=sample_weight)
4 changes: 3 additions & 1 deletion sklearn/linear_model/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -663,7 +663,9 @@ def fit(self, X, y, sample_weight=None):
)

if sample_weight is not None:
sample_weight = _check_sample_weight(sample_weight, X, dtype=X.dtype)
sample_weight = _check_sample_weight(
sample_weight, X, dtype=X.dtype, only_non_negative=True
)

X, y, X_offset, y_offset, X_scale = self._preprocess_data(
X,
Expand Down
6 changes: 3 additions & 3 deletions sklearn/neighbors/_kde.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,9 +191,9 @@ def fit(self, X, y=None, sample_weight=None):
X = self._validate_data(X, order="C", dtype=DTYPE)

if sample_weight is not None:
sample_weight = _check_sample_weight(sample_weight, X, DTYPE)
if sample_weight.min() <= 0:
raise ValueError("sample_weight must have positive values")
sample_weight = _check_sample_weight(
sample_weight, X, DTYPE, only_non_negative=True
)

kwargs = self.metric_params
if kwargs is None:
Expand Down
2 changes: 1 addition & 1 deletion sklearn/neighbors/tests/test_kde.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ def test_sample_weight_invalid():
data = np.reshape([1.0, 2.0, 3.0], (-1, 1))

sample_weight = [0.1, -0.2, 0.3]
expected_err = "sample_weight must have positive values"
expected_err = "Negative values in data passed to `sample_weight`"
with pytest.raises(ValueError, match=expected_err):
kde.fit(data, sample_weight=sample_weight)

Expand Down
10 changes: 9 additions & 1 deletion sklearn/utils/tests/test_validation.py
2364
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,8 @@
FLOAT_DTYPES,
_get_feature_names,
_check_feature_names_in,
_check_fit_params,
)
from sklearn.utils.validation import _check_fit_params
from sklearn.base import BaseEstimator
import sklearn

Expand Down Expand Up @@ -1253,6 +1253,14 @@ def test_check_sample_weight():
sample_weight = _check_sample_weight(None, X, dtype=X.dtype)
assert sample_weight.dtype == np.float64

# check negative weight when only_non_negative=True
X = np.ones((5, 2))
sample_weight = np.ones(_num_samples(X))
sample_weight[-1] = -10
err_msg = "Negative values in data passed to `sample_weight`"
with pytest.raises(ValueError, match=err_msg):
_check_sample_weight(sample_weight, X, only_non_negative=True)


@pytest.mark.parametrize("toarray", [np.array, sp.csr_matrix, sp.csc_matrix])
def test_allclose_dense_sparse_equals(toarray):
Expand Down
26 changes: 18 additions & 8 deletions sklearn/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1492,7 +1492,9 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False):
return lambdas


def _check_sample_weight(sample_weight, X, dtype=None, copy=False):
def _check_sample_weight(
sample_weight, X, dtype=None, copy=False, only_non_negative=False
):
"""Validate sample weights.

Note that passing sample_weight=None will output an array of ones.
Expand All @@ -1503,25 +1505,30 @@ def _check_sample_weight(sample_weight, X, dtype=None, copy=False):
Parameters
----------
sample_weight : {ndarray, Number or None}, shape (n_samples,)
Input sample weights.
Input sample weights.

X : {ndarray, list, sparse matrix}
Input data.

only_non_negative : bool, default=False,
Whether or not the weights are expected to be non-negative.

.. versionadded:: 1.0

dtype : dtype, default=None
dtype of the validated `sample_weight`.
If None, and the input `sample_weight` is an array, the dtype of the
input is preserved; otherwise an array with the default numpy dtype
is be allocated. If `dtype` is not one of `float32`, `float64`,
`None`, the output will be of dtype `float64`.
dtype of the validated `sample_weight`.
If None, and the input `sample_weight` is an array, the dtype of the
input is preserved; otherwise an array with the default numpy dtype
is be allocated. If `dtype` is not one of `float32`, `float64`,
`None`, the output will be of dtype `float64`.

copy : bool, default=False
If True, a copy of sample_weight will be created.

Returns
-------
sample_weight : ndarray of shape (n_samples,)
Validated sample weight. It is guaranteed to be "C" contiguous.
Validated sample weight. It is guaranteed to be "C" contiguous.
"""
n_samples = _num_samples(X)

Expand Down Expand Up @@ -1553,6 +1560,9 @@ def _check_sample_weight(sample_weight, X, dtype=None, copy=False):
)
)

if only_non_negative:
check_non_negative(sample_weight, "`sample_weight`")

return sample_weight


Expand Down
0