8000 FEA Partial dependence plots (#12599) · jeremiedbb/scikit-learn@5135252 · GitHub 8000
[go: up one dir, main page]

Skip to content

Commit 5135252

Browse files
NicolasHugjeremiedbb
authored andcommitted
FEA Partial dependence plots (scikit-learn#12599)
1 parent 8d4bffc commit 5135252

21 files changed

+1680
-265
lines changed

azure-pipelines.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ jobs:
3838
PYAMG_VERSION: '*'
3939
PILLOW_VERSION: '*'
4040
JOBLIB_VERSION: '*'
41+
MATPLOTLIB_VERSION: '*'
4142
COVERAGE: 'true'
4243
CHECK_PYTEST_SOFT_DEPENDENCY: 'true'
4344
TEST_DOCSTRINGS: 'true'

build_tools/azure/install.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ if [[ "$DISTRIB" == "conda" ]]; then
4747
TO_INSTALL="$TO_INSTALL pillow=$PILLOW_VERSION"
4848
fi
4949

50+
if [[ -n "$MATPLOTLIB_VERSION" ]]; then
51+
TO_INSTALL="$TO_INSTALL matplotlib=$MATPLOTLIB_VERSION"
52+
fi
53+
5054
make_conda $TO_INSTALL
5155

5256
elif [[ "$DISTRIB" == "ubuntu" ]]; then

doc/inspection.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
.. include:: includes/big_toc_css.rst
2+
3+
.. _inspection:
4+
5+
Inspection
6+
----------
7+
8+
.. toctree::
9+
10+
modules/partial_dependence

doc/modules/classes.rst

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -428,23 +428,6 @@ Samples generator
428428
:template: function.rst
429429

430430

431-
partial dependence
432-
------------------
433-
434-
.. automodule:: sklearn.ensemble.partial_dependence
435-
:no-members:
436-
:no-inherited-members:
437-
438-
.. currentmodule:: sklearn
439-
440-
.. autosummary::
441-
:toctree: generated/
442-
:template: function.rst
443-
444-
ensemble.partial_dependence.partial_dependence
445-
ensemble.partial_dependence.plot_partial_dependence
446-
447-
448431
.. _exceptions_ref:
449432

450433
:mod:`sklearn.exceptions`: Exceptions and warnings
@@ -1230,6 +1213,25 @@ Model validation
12301213
pipeline.make_union
12311214

12321215

1216+
.. _inspection_ref:
1217+
1218+
:mod:`sklearn.inspection`: inspection
1219+
=====================================
1220+
1221+
.. automodule:: sklearn.inspection
1222+
:no-members:
1223+
:no-inherited-members:
1224+
1225+
.. currentmodule:: sklearn
1226+
1227+
.. autosummary::
1228+
:toctree: generated/
1229+
:template: function.rst
1230+
1231+
inspection.partial_dependence
1232+
inspection.plot_partial_dependence
1233+
1234+
12331235
.. _preprocessing_ref:
12341236

12351237
:mod:`sklearn.preprocessing`: Preprocessing and Normalization
@@ -1510,6 +1512,13 @@ To be removed in 0.23
15101512
metrics.jaccard_similarity_score
15111513
linear_model.logistic_regression_path
15121514

1515+
.. autosummary::
1516+
:toctree: generated/
1517+
:template: function.rst
1518+
1519+
ensemble.partial_dependence.partial_dependence
1520+
ensemble.partial_dependence.plot_partial_dependence
1521+
15131522

15141523
To be removed in 0.22
15151524
---------------------

doc/modules/ensemble.rst

Lines changed: 0 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -797,131 +797,6 @@ accessed via the ``feature_importances_`` property::
797797

798798
* :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regression.py`
799799

800-
.. currentmodule:: sklearn.ensemble.partial_dependence
801-
802-
.. _partial_dependence:
803-
804-
Partial dependence
805-
..................
806-
807-
Partial dependence plots (PDP) show the dependence between the target response
808-
and a set of 'target' features, marginalizing over the
809-
values of all other features (the 'complement' features).
810-
Intuitively, we can interpret the partial dependence as the expected
811-
target response [1]_ as a function of the 'target' features [2]_.
812-
813-
Due to the limits of human perception the size of the target feature
814-
set must be small (usually, one or two) thus the target features are
815-
usually chosen among the most important features.
816-
817-
The Figure below shows four one-way and one two-way partial dependence plots
818-
for the California housing dataset:
819-
820-
.. figure:: ../auto_examples/ensemble/images/sphx_glr_plot_partial_dependence_001.png
821-
:target: ../auto_examples/ensemble/plot_partial_dependence.html
822-
:align: center
823-
:scale: 70
824-
825-
One-way PDPs tell us about the interaction between the target
826-
response and the target feature (e.g. linear, non-linear).
827-
The upper left plot in the above Figure shows the effect of the
828-
median income in a district on the median house price; we can
829-
clearly see a linear relationship among them.
830-
831-
PDPs with two target features show the
832-
interactions among the two features. For example, the two-variable PDP in the
833-
above Figure shows the dependence of median house price on joint
834-
values of house age and avg. occupants per household. We can clearly
835-
see an interaction between the two features:
836-
For an avg. occupancy greater than two, the house price is nearly independent
837-
of the house age, whereas for values less than two there is a strong dependence
838-
on age.
839-
840-
The module :mod:`partial_dependence` provides a convenience function
841-
:func:`~sklearn.ensemble.partial_dependence.plot_partial_dependence`
842-
to create one-way and two-way partial dependence plots. In the below example
843-
we show how to create a grid of partial dependence plots: two one-way
844-
PDPs for the features ``0`` and ``1`` and a two-way PDP between the two
845-
features::
846-
847-
>>> from sklearn.datasets import make_hastie_10_2
848-
>>> from sklearn.ensemble import GradientBoostingClassifier
849-
>>> from sklearn.ensemble.partial_dependence import plot_partial_dependence
850-
851-
>>> X, y = make_hastie_10_2(random_state=0)
852-
>>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
853-
... max_depth=1, random_state=0).fit(X, y)
854-
>>> features = [0, 1, (0, 1)]
855-
>>> fig, axs = plot_partial_dependence(clf, X, features) #doctest: +SKIP
856-
857-
For multi-class models, you need to set the class label for which the
858-
PDPs should be created via the ``label`` argument::
859-
860-
>>> from sklearn.datasets import load_iris
861-
>>> iris = load_iris()
862-
>>> mc_clf = GradientBoostingClassifier(n_estimators=10,
863-
... max_depth=1).fit(iris.data, iris.target)
864-
>>> features = [3, 2, (3, 2)]
865-
>>> fig, axs = plot_partial_dependence(mc_clf, X, features, label=0) #doctest: +SKIP
866-
867-
If you need the raw values of the partial dependence function rather
868-
than the plots you can use the
869-
:func:`~sklearn.ensemble.partial_dependence.partial_dependence` function::
870-
871-
>>> from sklearn.ensemble.partial_dependence import partial_dependence
872-
873-
>>> pdp, axes = partial_dependence(clf, [0], X=X)
874-
>>> pdp # doctest: +ELLIPSIS
875-
array([[ 2.46643157, 2.46643157, ...
876-
>>> axes # doctest: +ELLIPSIS
877-
[array([-1.62497054, -1.59201391, ...
878-
879-
The function requires either the argument ``grid`` which specifies the
880-
values of the target features on which the partial dependence function
881-
should be evaluated or the argument ``X`` which is a convenience mode
882-
for automatically creating ``grid`` from the training data. If ``X``
883-
is given, the ``axes`` value returned by the function gives the axis
884-
for each target feature.
885-
886-
For each value of the 'target' features in the ``grid`` the partial
887-
dependence function need to marginalize the predictions of a tree over
888-
all possible values of the 'complement' features. In decision trees
889-
this function can be evaluated efficiently without reference to the
890-
training data. For each grid point a weighted tree traversal is
891-
performed: if a split node involves a 'target' feature, the
892-
corresponding left or right branch is followed, otherwise both
893-
branches are followed, each branch is weighted by the fraction of
894-
training samples that entered that branch. Finally, the partial
895-
dependence is given by a weighted average of all visited leaves. For
896-
tree ensembles the results of each individual tree are again
897-
averaged.
898-
899-
.. rubric:: Footnotes
900-
901-
.. [1] For classification with ``loss='deviance'`` the target
902-
response is logit(p).
903-
904-
.. [2] More precisely its the expectation of the target response after
905-
accounting for the initial model; partial dependence plots
906-
do not include the ``init`` model.
907-
908-
.. topic:: Examples:
909-
910-
* :ref:`sphx_glr_auto_examples_ensemble_plot_partial_dependence.py`
911-
912-
913-
.. topic:: References
914-
915-
.. [F2001] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine",
916-
The Annals of Statistics, Vol. 29, No. 5, 2001.
917-
918-
.. [F1999] J. Friedman, "Stochastic Gradient Boosting", 1999
919-
920-
.. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical Learning Ed. 2", Springer, 2009.
921-
922-
.. [R2007] G. Ridgeway, "Generalized Boosted Models: A guide to the gbm package", 2007
923-
924-
925800
.. _voting_classifier:
926801

927802
Voting Classifier

doc/modules/partial_dependence.rst

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
2+
.. _partial_dependence:
3+
4+
========================
5+
Partial dependence plots
6+
========================
7+
8+
.. currentmodule:: sklearn.inspection
9+
10+
Partial dependence plots (PDP) show the dependence between the target
11+
response [1]_ and a set of 'target' features, marginalizing over the values
12+
of all other features (the 'complement' features). Intuitively, we can
13+
interpret the partial dependence as the expected target response as a
14+
function of the 'target' features.
15+
16+
Due to the limits of human perception the size of the target feature set
17+
must be small (usually, one or two) thus the target features are usually
18+
chosen among the most important features.
19+
20+
The figure below shows four one-way and one two-way partial dependence plots
21+
for the California housing dataset, with a :class:`GradientBoostingRegressor
22+
<sklearn.ensemble.GradientBoostingRegressor>`:
23+
24+
.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png
25+
:target: ../auto_examples/inspection/plot_partial_dependence.html
26+
:align: center
27+
:scale: 70
28+
29+
One-way PDPs tell us about the interaction between the target response and
30+
the target feature (e.g. linear, non-linear). The upper left plot in the
31+
above figure shows the effect of the median income in a district on the
32+
median house price; we can clearly see a linear relationship among them. Note
33+
that PDPs assume that the target features are independent from the complement
34+
features, and this assumption is often violated in practice.
35+
36+
PDPs with two target features show the interactions among the two features.
37+
For example, the two-variable PDP in the above figure shows the dependence
38+
of median house price on joint values of house age and average occupants per
39+
household. We can clearly see an interaction between the two features: for
40+
an average occupancy greater than two, the house price is nearly independent of
41+
the house age, whereas for values less than 2 there is a strong dependence
42+
on age.
43+
44+
The :mod:`sklearn.inspection` module provides a convenience function
45+
:func:`plot_partial_dependence` to create one-way and two-way partial
46+
dependence plots. In the below example we show how to create a grid of
47+
partial dependence plots: two one-way PDPs for the features ``0`` and ``1``
48+
and a two-way PDP between the two features::
49+
50+
>>> from sklearn.datasets import make_hastie_10_2
51+
>>> from sklearn.ensemble import GradientBoostingClassifier
52+
>>> from sklearn.inspection import plot_partial_dependence
53+
54+
>>> X, y = make_hastie_10_2(random_state=0)
55+
>>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
56+
... max_depth=1, random_state=0).fit(X, y)
57+
>>> features = [0, 1, (0, 1)]
58+
>>> plot_partial_dependence(clf, X, features) #doctest: +SKIP
59+
60+
You can access the newly created figure and Axes objects using ``plt.gcf()``
61+
and ``plt.gca()``.
62+
63+
For multi-class classification, you need to set the class label for which
64+
the PDPs should be created via the ``target`` argument::
65+
66+
>>> from sklearn.datasets import load_iris
67+
>>> iris = load_iris()
68+
>>> mc_clf = GradientBoostingClassifier(n_estimators=10,
69+
... max_depth=1).fit(iris.data, iris.target)
70+
>>> features = [3, 2, (3, 2)]
71+
>>> plot_partial_dependence(mc_clf, X, features, target=0) #doctest: +SKIP
72+
73+
The same parameter ``target`` is used to specify the target in multi-output
74+
regression settings.
75+
76+
If you need the raw values of the partial dependence function rather than
77+
the plots, you can use the
78+
:func:`sklearn.inspection.partial_dependence` function::
79+
80+
>>> from sklearn.inspection import partial_dependence
81+
82+
>>> pdp, axes = partial_dependence(clf, X, [0])
83+
>>> pdp # doctest: +ELLIPSIS
84+
array([[ 2.466..., 2.466..., ...
85+
>>> axes # doctest: +ELLIPSIS
86+
[array([-1.624..., -1.592..., ...
87+
88+
The values at which the partial dependence should be evaluated are directly
89+
generated from ``X``. For 2-way partial dependence, a 2D-grid of values is
90+
generated. The ``values`` field returned by
91+
:func:`sklearn.inspection.partial_dependence` gives the actual values
92+
used in the grid for each target feature. They also correspond to the axis
93+
of the plots.
94+
95+
For each value of the 'target' features in the ``grid`` the partial
96+
dependence function needs to marginalize the predictions of the estimator
97+
over all possible values of the 'complement' features. With the ``'brute'``
98+
method, this is done by replacing every target feature value of ``X`` by those
99+
in the grid, and computing the average prediction.
100+
101+
In decision trees this can be evaluated efficiently without reference to the
102+
training data (``'recursion'`` method). For each grid point a weighted tree
103+
traversal is performed: if a split node involves a 'target' feature, the
104+
corresponding left or right branch is followed, otherwise both branches are
105+
followed, each branch is weighted by the fraction of training samples that
106+
entered that branch. Finally, the partial dependence is given by a weighted
107+
average of all visited leaves. Note that with the ``'recursion'`` method,
108+
``X`` is only used to generate the grid, not to compute the averaged
109+
predictions. The averaged predictions will always be computed on the data with
110+
which the trees were trained.
111+
112+
.. rubric:: Footnotes
113+
114+
.. [1] For classification, the target response may be the probability of a
115+
class (the positive class for binary classification), or the decision
116+
function.
117+
118+
.. topic:: Examples:
119+
120+
* :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py`
121+
122+
.. topic:: References
123+
124+
.. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, `The Elements of
125+
Statistical Learning <https://web.stanford.edu/~hastie/ElemStatLearn//>`_,
126+
Second Edition, Section 10.13.2, Springer, 2009.
127+
128+
.. [Mol2019] C. Molnar, `Interpretable Machine Learning
129+
<https://christophm.github.io/interpretable-ml-book/>`_, Section 5.1, 2019.

doc/user_guide.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ User Guide
1818
supervised_learning.rst
1919
unsupervised_learning.rst
2020
model_selection.rst
21+
inspection.rst
2122
data_transforms.rst
2223
Dataset loading utilities <datasets/index.rst>
2324
modules/computing.rst

0 commit comments

Comments
 (0)
0