|
| 1 | + |
| 2 | +.. _partial_dependence: |
| 3 | + |
| 4 | +======================== |
| 5 | +Partial dependence plots |
| 6 | +======================== |
| 7 | + |
| 8 | +.. currentmodule:: sklearn.inspection |
| 9 | + |
| 10 | +Partial dependence plots (PDP) show the dependence between the target |
| 11 | +response [1]_ and a set of 'target' features, marginalizing over the values |
| 12 | +of all other features (the 'complement' features). Intuitively, we can |
| 13 | +interpret the partial dependence as the expected target response as a |
| 14 | +function of the 'target' features. |
| 15 | + |
| 16 | +Due to the limits of human perception the size of the target feature set |
| 17 | +must be small (usually, one or two) thus the target features are usually |
| 18 | +chosen among the most important features. |
| 19 | + |
| 20 | +The figure below shows four one-way and one two-way partial dependence plots |
| 21 | +for the California housing dataset, with a :class:`GradientBoostingRegressor |
| 22 | +<sklearn.ensemble.GradientBoostingRegressor>`: |
| 23 | + |
| 24 | +.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png |
| 25 | + :target: ../auto_examples/inspection/plot_partial_dependence.html |
| 26 | + :align: center |
| 27 | + :scale: 70 |
| 28 | + |
| 29 | +One-way PDPs tell us about the interaction between the target response and |
| 30 | +the target feature (e.g. linear, non-linear). The upper left plot in the |
| 31 | +above figure shows the effect of the median income in a district on the |
| 32 | +median house price; we can clearly see a linear relationship among them. Note |
| 33 | +that PDPs assume that the target features are independent from the complement |
| 34 | +features, and this assumption is often violated in practice. |
| 35 | + |
| 36 | +PDPs with two target features show the interactions among the two features. |
| 37 | +For example, the two-variable PDP in the above figure shows the dependence |
| 38 | +of median house price on joint values of house age and average occupants per |
| 39 | +household. We can clearly see an interaction between the two features: for |
| 40 | +an average occupancy greater than two, the house price is nearly independent of |
| 41 | +the house age, whereas for values less than 2 there is a strong dependence |
| 42 | +on age. |
| 43 | + |
| 44 | +The :mod:`sklearn.inspection` module provides a convenience function |
| 45 | +:func:`plot_partial_dependence` to create one-way and two-way partial |
| 46 | +dependence plots. In the below example we show how to create a grid of |
| 47 | +partial dependence plots: two one-way PDPs for the features ``0`` and ``1`` |
| 48 | +and a two-way PDP between the two features:: |
| 49 | + |
| 50 | + >>> from sklearn.datasets import make_hastie_10_2 |
| 51 | + >>> from sklearn.ensemble import GradientBoostingClassifier |
| 52 | + >>> from sklearn.inspection import plot_partial_dependence |
| 53 | + |
| 54 | + >>> X, y = make_hastie_10_2(random_state=0) |
| 55 | + >>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, |
| 56 | + ... max_depth=1, random_state=0).fit(X, y) |
| 57 | + >>> features = [0, 1, (0, 1)] |
| 58 | + >>> plot_partial_dependence(clf, X, features) #doctest: +SKIP |
| 59 | + |
| 60 | +You can access the newly created figure and Axes objects using ``plt.gcf()`` |
| 61 | +and ``plt.gca()``. |
| 62 | + |
| 63 | +For multi-class classification, you need to set the class label for which |
| 64 | +the PDPs should be created via the ``target`` argument:: |
| 65 | + |
| 66 | + >>> from sklearn.datasets import load_iris |
| 67 | + >>> iris = load_iris() |
| 68 | + >>> mc_clf = GradientBoostingClassifier(n_estimators=10, |
| 69 | + ... max_depth=1).fit(iris.data, iris.target) |
| 70 | + >>> features = [3, 2, (3, 2)] |
| 71 | + >>> plot_partial_dependence(mc_clf, X, features, target=0) #doctest: +SKIP |
| 72 | + |
| 73 | +The same parameter ``target`` is used to specify the target in multi-output |
| 74 | +regression settings. |
| 75 | + |
| 76 | +If you need the raw values of the partial dependence function rather than |
| 77 | +the plots, you can use the |
| 78 | +:func:`sklearn.inspection.partial_dependence` function:: |
| 79 | + |
| 80 | + >>> from sklearn.inspection import partial_dependence |
| 81 | + |
| 82 | + >>> pdp, axes = partial_dependence(clf, X, [0]) |
| 83 | + >>> pdp # doctest: +ELLIPSIS |
| 84 | + array([[ 2.466..., 2.466..., ... |
| 85 | + >>> axes # doctest: +ELLIPSIS |
| 86 | + [array([-1.624..., -1.592..., ... |
| 87 | + |
| 88 | +The values at which the partial dependence should be evaluated are directly |
| 89 | +generated from ``X``. For 2-way partial dependence, a 2D-grid of values is |
| 90 | +generated. The ``values`` field returned by |
| 91 | +:func:`sklearn.inspection.partial_dependence` gives the actual values |
| 92 | +used in the grid for each target feature. They also correspond to the axis |
| 93 | +of the plots. |
| 94 | + |
| 95 | +For each value of the 'target' features in the ``grid`` the partial |
| 96 | +dependence function needs to marginalize the predictions of the estimator |
| 97 | +over all possible values of the 'complement' features. With the ``'brute'`` |
| 98 | +method, this is done by replacing every target feature value of ``X`` by those |
| 99 | +in the grid, and computing the average prediction. |
| 100 | + |
| 101 | +In decision trees this can be evaluated efficiently without reference to the |
| 102 | +training data (``'recursion'`` method). For each grid point a weighted tree |
| 103 | +traversal is performed: if a split node involves a 'target' feature, the |
| 104 | +corresponding left or right branch is followed, otherwise both branches are |
| 105 | +followed, each branch is weighted by the fraction of training samples that |
| 106 | +entered that branch. Finally, the partial dependence is given by a weighted |
| 107 | +average of all visited leaves. Note that with the ``'recursion'`` method, |
| 108 | +``X`` is only used to generate the grid, not to compute the averaged |
| 109 | +predictions. The averaged predictions will always be computed on the data with |
| 110 | +which the trees were trained. |
| 111 | + |
| 112 | +.. rubric:: Footnotes |
| 113 | + |
| 114 | +.. [1] For classification, the target response may be the probability of a |
| 115 | + class (the positive class for binary classification), or the decision |
| 116 | + function. |
| 117 | +
|
| 118 | +.. topic:: Examples: |
| 119 | + |
| 120 | + * :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py` |
| 121 | + |
| 122 | +.. topic:: References |
| 123 | + |
| 124 | + .. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, `The Elements of |
| 125 | + Statistical Learning <https://web.stanford.edu/~hastie/ElemStatLearn//>`_, |
| 126 | + Second Edition, Section 10.13.2, Springer, 2009. |
| 127 | +
|
| 128 | + .. [Mol2019] C. Molnar, `Interpretable Machine Learning |
| 129 | + <https://christophm.github.io/interpretable-ml-book/>`_, Section 5.1, 2019. |
0 commit comments