jeremiedbb
diff --git a/‎azure-pipelines.yml
Lines changed: 1 addition & 0 deletions b/‎azure-pipelines.yml
Lines changed: 1 addition & 0 deletions
diff --git a/‎build_tools/azure/install.sh
Lines changed: 4 additions & 0 deletions b/‎build_tools/azure/install.sh
Lines changed: 4 additions & 0 deletions
diff --git a/‎doc/inspection.rst
Lines changed: 10 additions & 0 deletions b/‎doc/inspection.rst
Lines changed: 10 additions & 0 deletions
diff --git a/‎doc/modules/classes.rst
Lines changed: 26 additions & 17 deletions b/‎doc/modules/classes.rst
Lines changed: 26 additions & 17 deletions
diff --git a/‎doc/modules/ensemble.rst
Lines changed: 0 additions & 125 deletions b/‎doc/modules/ensemble.rst
Lines changed: 0 additions & 125 deletions
diff --git a/‎doc/modules/partial_dependence.rst
Lines changed: 129 additions & 0 deletions b/‎doc/modules/partial_dependence.rst
Lines changed: 129 additions & 0 deletions
diff --git a/‎doc/user_guide.rst
Lines changed: 1 addition & 0 deletions b/‎doc/user_guide.rst
Lines changed: 1 addition & 0 deletions
@@ -38,6 +38,7 @@ jobs:
         PYAMG_VERSION: '*'
         PILLOW_VERSION: '*'
         JOBLIB_VERSION: '*'
+        MATPLOTLIB_VERSION: '*'
         COVERAGE: 'true'
         CHECK_PYTEST_SOFT_DEPENDENCY: 'true'
         TEST_DOCSTRINGS: 'true'
 
@@ -47,6 +47,10 @@ if [[ "$DISTRIB" == "conda" ]]; then
         TO_INSTALL="$TO_INSTALL pillow=$PILLOW_VERSION"
     fi
 
+    if [[ -n "$MATPLOTLIB_VERSION" ]]; then
+        TO_INSTALL="$TO_INSTALL matplotlib=$MATPLOTLIB_VERSION"
+    fi
+
 	make_conda $TO_INSTALL
 
 elif [[ "$DISTRIB" == "ubuntu" ]]; then
 
@@ -0,0 +1,10 @@
+.. include:: includes/big_toc_css.rst
+
+.. _inspection:
+
+Inspection
+----------
+
+.. toctree::
+
+    modules/partial_dependence
@@ -428,23 +428,6 @@ Samples generator
    :template: function.rst
 
 
-partial dependence
-------------------
-
-.. automodule:: sklearn.ensemble.partial_dependence
-   :no-members:
-   :no-inherited-members:
-
-.. currentmodule:: sklearn
-
-.. autosummary::
-   :toctree: generated/
-   :template: function.rst
-
-   ensemble.partial_dependence.partial_dependence
-   ensemble.partial_dependence.plot_partial_dependence
-
-
 .. _exceptions_ref:
 
 :mod:`sklearn.exceptions`: Exceptions and warnings
@@ -1230,6 +1213,25 @@ Model validation
    pipeline.make_union
 
 
+.. _inspection_ref:
+
+:mod:`sklearn.inspection`: inspection
+=====================================
+
+.. automodule:: sklearn.inspection
+   :no-members:
+   :no-inherited-members:
+
+.. currentmodule:: sklearn
+
+.. autosummary::
+   :toctree: generated/
+   :template: function.rst
+
+   inspection.partial_dependence
+   inspection.plot_partial_dependence
+
+
 .. _preprocessing_ref:
 
 :mod:`sklearn.preprocessing`: Preprocessing and Normalization
@@ -1510,6 +1512,13 @@ To be removed in 0.23
    metrics.jaccard_similarity_score
    linear_model.logistic_regression_path
 
+.. autosummary::
+   :toctree: generated/
+   :template: function.rst
+
+   ensemble.partial_dependence.partial_dependence
+   ensemble.partial_dependence.plot_partial_dependence
+
 
 To be removed in 0.22
 ---------------------
 
@@ -797,131 +797,6 @@ accessed via the ``feature_importances_`` property::
 
  * :ref:`sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regression.py`
 
-.. currentmodule:: sklearn.ensemble.partial_dependence
-
-.. _partial_dependence:
-
-Partial dependence
-..................
-
-Partial dependence plots (PDP) show the dependence between the target response
-and a set of 'target' features, marginalizing over the
-values of all other features (the 'complement' features).
-Intuitively, we can interpret the partial dependence as the expected
-target response [1]_ as a function of the 'target' features [2]_.
-
-Due to the limits of human perception the size of the target feature
-set must be small (usually, one or two) thus the target features are
-usually chosen among the most important features.
-
-The Figure below shows four one-way and one two-way partial dependence plots
-for the California housing dataset:
-
-.. figure:: ../auto_examples/ensemble/images/sphx_glr_plot_partial_dependence_001.png
-   :target: ../auto_examples/ensemble/plot_partial_dependence.html
-   :align: center
-   :scale: 70
-
-One-way PDPs tell us about the interaction between the target
-response and the target feature (e.g. linear, non-linear).
-The upper left plot in the above Figure shows the effect of the
-median income in a district on the median house price; we can
-clearly see a linear relationship among them.
-
-PDPs with two target features show the
-interactions among the two features. For example, the two-variable PDP in the
-above Figure shows the dependence of median house price on joint
-values of house age and avg. occupants per household. We can clearly
-see an interaction between the two features:
-For an avg. occupancy greater than two, the house price is nearly independent
-of the house age, whereas for values less than two there is a strong dependence
-on age.
-
-The module :mod:`partial_dependence` provides a convenience function
-:func:`~sklearn.ensemble.partial_dependence.plot_partial_dependence`
-to create one-way and two-way partial dependence plots. In the below example
-we show how to create a grid of partial dependence plots: two one-way
-PDPs for the features ``0`` and ``1`` and a two-way PDP between the two
-features::
-
-    >>> from sklearn.datasets import make_hastie_10_2
-    >>> from sklearn.ensemble import GradientBoostingClassifier
-    >>> from sklearn.ensemble.partial_dependence import plot_partial_dependence
-
-    >>> X, y = make_hastie_10_2(random_state=0)
-    >>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
-    ...     max_depth=1, random_state=0).fit(X, y)
-    >>> features = [0, 1, (0, 1)]
-    >>> fig, axs = plot_partial_dependence(clf, X, features) #doctest: +SKIP
-
-For multi-class models, you need to set the class label for which the
-PDPs should be created via the ``label`` argument::
-
-    >>> from sklearn.datasets import load_iris
-    >>> iris = load_iris()
-    >>> mc_clf = GradientBoostingClassifier(n_estimators=10,
-    ...     max_depth=1).fit(iris.data, iris.target)
-    >>> features = [3, 2, (3, 2)]
-    >>> fig, axs = plot_partial_dependence(mc_clf, X, features, label=0) #doctest: +SKIP
-
-If you need the raw values of the partial dependence function rather
-than the plots you can use the
-:func:`~sklearn.ensemble.partial_dependence.partial_dependence` function::
-
-    >>> from sklearn.ensemble.partial_dependence import partial_dependence
-
-    >>> pdp, axes = partial_dependence(clf, [0], X=X)
-    >>> pdp  # doctest: +ELLIPSIS
-    array([[ 2.46643157,  2.46643157, ...
-    >>> axes  # doctest: +ELLIPSIS
-    [array([-1.62497054, -1.59201391, ...
-
-The function requires either the argument ``grid`` which specifies the
-values of the target features on which the partial dependence function
-should be evaluated or the argument ``X`` which is a convenience mode
-for automatically creating ``grid`` from the training data. If ``X``
-is given, the ``axes`` value returned by the function gives the axis
-for each target feature.
-
-For each value of the 'target' features in the ``grid`` the partial
-dependence function need to marginalize the predictions of a tree over
-all possible values of the 'complement' features. In decision trees
-this function can be evaluated efficiently without reference to the
-training data. For each grid point a weighted tree traversal is
-performed: if a split node involves a 'target' feature, the
-corresponding left or right branch is followed, otherwise both
-branches are followed, each branch is weighted by the fraction of
-training samples that entered that branch. Finally, the partial
-dependence is given by a weighted average of all visited leaves. For
-tree ensembles the results of each individual tree are again
-averaged.
-
-.. rubric:: Footnotes
-
-.. [1] For classification with ``loss='deviance'``  the target
-   response is logit(p).
-
-.. [2] More precisely its the expectation of the target response after
-   accounting for the initial model; partial dependence plots
-   do not include the ``init`` model.
-
-.. topic:: Examples:
-
- * :ref:`sphx_glr_auto_examples_ensemble_plot_partial_dependence.py`
-
-
-.. topic:: References
-
- .. [F2001] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine",
-   The Annals of Statistics, Vol. 29, No. 5, 2001.
-
- .. [F1999] J. Friedman, "Stochastic Gradient Boosting", 1999
-
- .. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical Learning Ed. 2", Springer, 2009.
-
- .. [R2007] G. Ridgeway, "Generalized Boosted Models: A guide to the gbm package", 2007
-
-
  .. _voting_classifier:
 
 Voting Classifier
 
@@ -0,0 +1,129 @@
+
+.. _partial_dependence:
+
+========================
+Partial dependence plots
+========================
+
+.. currentmodule:: sklearn.inspection
+
+Partial dependence plots (PDP) show the dependence between the target
+response [1]_ and a set of 'target' features, marginalizing over the values
+of all other features (the 'complement' features). Intuitively, we can
+interpret the partial dependence as the expected target response as a
+function of the 'target' features.
+
+Due to the limits of human perception the size of the target feature set
+must be small (usually, one or two) thus the target features are usually
+chosen among the most important features.
+
+The figure below shows four one-way and one two-way partial dependence plots
+for the California housing dataset, with a :class:`GradientBoostingRegressor
+<sklearn.ensemble.GradientBoostingRegressor>`:
+
+.. figure:: ../auto_examples/inspection/images/sphx_glr_plot_partial_dependence_002.png
+   :target: ../auto_examples/inspection/plot_partial_dependence.html
+   :align: center
+   :scale: 70
+
+One-way PDPs tell us about the interaction between the target response and
+the target feature (e.g. linear, non-linear). The upper left plot in the
+above figure shows the effect of the median income in a district on the
+median house price; we can clearly see a linear relationship among them. Note
+that PDPs assume that the target features are independent from the complement
+features, and this assumption is often violated in practice.
+
+PDPs with two target features show the interactions among the two features.
+For example, the two-variable PDP in the above figure shows the dependence
+of median house price on joint values of house age and average occupants per
+household. We can clearly see an interaction between the two features: for
+an average occupancy greater than two, the house price is nearly independent of
+the house age, whereas for values less than 2 there is a strong dependence
+on age.
+
+The :mod:`sklearn.inspection` module provides a convenience function
+:func:`plot_partial_dependence` to create one-way and two-way partial
+dependence plots. In the below example we show how to create a grid of
+partial dependence plots: two one-way PDPs for the features ``0`` and ``1``
+and a two-way PDP between the two features::
+
+    >>> from sklearn.datasets import make_hastie_10_2
+    >>> from sklearn.ensemble import GradientBoostingClassifier
+    >>> from sklearn.inspection import plot_partial_dependence
+
+    >>> X, y = make_hastie_10_2(random_state=0)
+    >>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
+    ...     max_depth=1, random_state=0).fit(X, y)
+    >>> features = [0, 1, (0, 1)]
+    >>> plot_partial_dependence(clf, X, features) #doctest: +SKIP
+
+You can access the newly created figure and Axes objects using ``plt.gcf()``
+and ``plt.gca()``.
+
+For multi-class classification, you need to set the class label for which
+the PDPs should be created via the ``target`` argument::
+
+    >>> from sklearn.datasets import load_iris
+    >>> iris = load_iris()
+    >>> mc_clf = GradientBoostingClassifier(n_estimators=10,
+    ...     max_depth=1).fit(iris.data, iris.target)
+    >>> features = [3, 2, (3, 2)]
+    >>> plot_partial_dependence(mc_clf, X, features, target=0) #doctest: +SKIP
+
+The same parameter ``target`` is used to specify the target in multi-output
+regression settings.
+
+If you need the raw values of the partial dependence function rather than
+the plots, you can use the
+:func:`sklearn.inspection.partial_dependence` function::
+
+    >>> from sklearn.inspection import partial_dependence
+
+    >>> pdp, axes = partial_dependence(clf, X, [0])
+    >>> pdp  # doctest: +ELLIPSIS
+    array([[ 2.466...,  2.466..., ...
+    >>> axes  # doctest: +ELLIPSIS
+    [array([-1.624..., -1.592..., ...
+
+The values at which the partial dependence should be evaluated are directly
+generated from ``X``. For 2-way partial dependence, a 2D-grid of values is
+generated. The ``values`` field returned by
+:func:`sklearn.inspection.partial_dependence` gives the actual values
+used in the grid for each target feature. They also correspond to the axis
+of the plots.
+
+For each value of the 'target' features in the ``grid`` the partial
+dependence function needs to marginalize the predictions of the estimator
+over all possible values of the 'complement' features. With the ``'brute'``
+method, this is done by replacing every target feature value of ``X`` by those
+in the grid, and computing the average prediction.
+
+In decision trees this can be evaluated efficiently without reference to the
+training data (``'recursion'`` method). For each grid point a weighted tree
+traversal is performed: if a split node involves a 'target' feature, the
+corresponding left or right branch is followed, otherwise both branches are
+followed, each branch is weighted by the fraction of training samples that
+entered that branch. Finally, the partial dependence is given by a weighted
+average of all visited leaves. Note that with the ``'recursion'`` method,
+``X`` is only used to generate the grid, not to compute the averaged
+predictions. The averaged predictions will always be computed on the data with
+which the trees were trained.
+
+.. rubric:: Footnotes
+
+.. [1] For classification, the target response may be the probability of a
+   class (the positive class for binary classification), or the decision
+   function.
+
+.. topic:: Examples:
+
+ * :ref:`sphx_glr_auto_examples_inspection_plot_partial_dependence.py`
+
+.. topic:: References
+
+ .. [HTF2009] T. Hastie, R. Tibshirani and J. Friedman, `The Elements of
+    Statistical Learning <https://web.stanford.edu/~hastie/ElemStatLearn//>`_,
+    Second Edition, Section 10.13.2, Springer, 2009.
+
+ .. [Mol2019] C. Molnar, `Interpretable Machine Learning
+    <https://christophm.github.io/interpretable-ml-book/>`_, Section 5.1, 2019.
@@ -18,6 +18,7 @@ User Guide
    supervised_learning.rst
    unsupervised_learning.rst
    model_selection.rst
+   inspection.rst
    data_transforms.rst
    Dataset loading utilities <datasets/index.rst>
    modules/computing.rst