8000 ENH Support sample weights in PartialDependenceDisplay.from_estimator by vitaliset · Pull Request #26644 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Support sample weights in PartialDependenceDisplay.from_estimator #26644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions doc/whats_new/v1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -421,10 +421,11 @@ Changelog
.........................

- |Enhancement| Added support for `sample_weight` in
:func:`inspection.partial_dependence`. This allows for weighted averaging when
aggregating for each value of the grid we are making the inspection on. The
option is only available when `method` is set to `brute`. :pr:`25209`
by :user:`Carlo Lemos <vitaliset>`.
:func:`inspection.partial_dependence` and
:meth:`inspection.PartialDependenceDisplay.from_estimator`. This allows for
weighted averaging when aggregating for each value of the grid we are making the
inspection on. The option is only available when `method` is set to `brute`.
:pr:`25209` and :pr:`26644` by :user:`Carlo Lemos <vitaliset>`.

- |API| :func:`inspection.partial_dependence` returns a :class:`utils.Bunch` with
new key: `grid_values`. The `values` key is deprecated in favor of `grid_values`
Expand Down
25 changes: 19 additions & 6 deletions sklearn/inspection/_plot/partial_dependence.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,9 @@ class PartialDependenceDisplay:

.. note::
The fast ``method='recursion'`` option is only available for
``kind='average'``. Plotting individual dependencies requires using
the slower ``method='brute'`` option.
`kind='average'` and `sample_weights=None`. Computing individual
dependencies and doing weighted averages requires using the slower
`method='brute'`.

.. versionadded:: 0.24
Add `kind` parameter with `'average'`, `'individual'`, and `'both'`
Expand Down Expand Up @@ -247,6 +248,7 @@ def from_estimator(
X,
features,
*,
sample_weight=None,
categorical_features=None,
feature_names=None,
target=None,
@@ -337,6 +339,14 @@ def from_estimator(
with `kind='average'`). Each tuple must be of size 2.
If any entry is a string, then it must be in ``feature_names``.

sample_weight : array-like of shape (n_samples,), default=None
Sample weights are used to calculate weighted means when averaging the
model output. If `None`, then samples are equally weighted. If
`sample_weight` is not `None`, then `method` will be set to `'brute'`.
Note that `sample_weight` is ignored for `kind='individual'`.

.. versionadded:: 1.3

categorical_features : array-like of shape (n_features,) or shape \
(n_categorical_features,), dtype={bool, int, str}, default=None
Indicates the categorical features.
Expand Down Expand Up @@ -409,7 +419,8 @@ def from_estimator(
computationally intensive.

- `'auto'`: the `'recursion'` is used for estimators that support it,
and `'brute'` is used otherwise.
and `'brute'` is used otherwise. If `sample_weight` is not `None`,
then `'brute'` is used regardless of the estimator.

Please see :ref:`this note <pdp_method_differences>` for
differences between the `'brute'` and `'recursion'` method.
Expand Down Expand Up @@ -464,9 +475,10 @@ def from_estimator(
- ``kind='average'`` results in the traditional PD plot;
- ``kind='individual'`` results in the ICE plot.

Note that the fast ``method='recursion'`` option is only available for
``kind='average'``. Plotting individual dependencies requires using the
slower ``method='brute'`` option.
Note that the fast `method='recursion'` option is only available for
`kind='average'` and `sample_weights=None`. Computing individual
dependencies and doing weighted averages requires using the slower
`method='brute'`.

centered : bool, default=False
If `True`, the ICE and PD lines will start at the origin of the
Expand Down Expand Up @@ -693,6 +705,7 @@ def from_estimator(
estimator,
X,
fxs,
sample_weight=sample_weight,
feature_names=feature_names,
categorical_features=categorical_features,
response_method=response_method,
Expand Down
31 changes: 31 additions & 0 deletions sklearn/inspection/_plot/tests/test_plot_partial_dependence.py
Original file line number Diff line number Diff line change
Expand Up @@ -1086,3 +1086,34 @@ def test_partial_dependence_display_kind_centered_interaction(
)

assert all([ln._y[0] == 0.0 for ln in disp.lines_.ravel() if ln is not None])


def test_partial_dependence_display_with_constant_sample_weight(
pyplot,
clf_diabetes,
diabetes,
):
"""Check that the utilization of a constant sample weight maintains the
standard behavior.
"""
disp = PartialDependenceDisplay.from_estimator(
clf_diabetes,
diabetes.data,
[0, 1],
kind="average",
method="brute",
)

sample_weight = np.ones_like(diabetes.target)
disp_sw = PartialDependenceDisplay.from_estimator(
clf_diabetes,
diabetes.data,
[0, 1],
sample_weight=sample_weight,
kind="average",
method="brute",
)

assert np.array_equal(
disp.pd_results[0]["average"], disp_sw.pd_results[0]["average"]
)
0