@@ -797,6 +797,131 @@ accessed via the ``feature_importances_`` property::
797
797
798
798
* :ref: `sphx_glr_auto_examples_ensemble_plot_gradient_boosting_regression.py `
799
799
800
+ .. currentmodule :: sklearn.ensemble.partial_dependence
801
+
802
+ .. _partial_dependence :
803
+
804
+ Partial dependence
805
+ ..................
806
+
807
+ Partial dependence plots (PDP) show the dependence between the target response
808
+ and a set of 'target' features, marginalizing over the
809
+ values of all other features (the 'complement' features).
810
+ Intuitively, we can interpret the partial dependence as the expected
811
+ target response [1 ]_ as a function of the 'target' features [2 ]_.
812
+
813
+ Due to the limits of human perception the size of the target feature
814
+ set must be small (usually, one or two) thus the target features are
815
+ usually chosen among the most important features.
816
+
817
+ The Figure below shows four one-way and one two-way partial dependence plots
818
+ for the California housing dataset:
819
+
820
+ .. figure :: ../auto_examples/ensemble/images/sphx_glr_plot_partial_dependence_001.png
821
+ :target: ../auto_examples/ensemble/plot_partial_dependence.html
822
+ :align: center
823
+ :scale: 70
824
+
825
+ One-way PDPs tell us about the interaction between the target
826
+ response and the target feature (e.g. linear, non-linear).
827
+ The upper left plot in the above Figure shows the effect of the
828
+ median income in a district on the median house price; we can
829
+ clearly see a linear relationship among them.
830
+
831
+ PDPs with two target features show the
832
+ interactions among the two features. For example, the two-variable PDP in the
833
+ above Figure shows the dependence of median house price on joint
834
+ values of house age and avg. occupants per household. We can clearly
835
+ see an interaction between the two features:
836
+ For an avg. occupancy greater than two, the house price is nearly independent
837
+ of the house age, whereas for values less than two there is a strong dependence
838
+ on age.
839
+
840
+ The module :mod: `partial_dependence ` provides a convenience function
841
+ :func: `~sklearn.ensemble.partial_dependence.plot_partial_dependence `
842
+ to create one-way and two-way partial dependence plots. In the below example
843
+ we show how to create a grid of partial dependence plots: two one-way
844
+ PDPs for the features ``0 `` and ``1 `` and a two-way PDP between the two
845
+ features::
846
+
847
+ >>> from sklearn.datasets import make_hastie_10_2
848
+ >>> from sklearn.ensemble import GradientBoostingClassifier
849
+ >>> from sklearn.ensemble.partial_dependence import plot_partial_dependence
850
+
851
+ >>> X, y = make_hastie_10_2(random_state=0)
852
+ >>> clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
853
+ ... max_depth=1, random_state=0).fit(X, y)
854
+ >>> features = [0, 1, (0, 1)]
855
+ >>> fig, axs = plot_partial_dependence(clf, X, features) #doctest: +SKIP
856
+
857
+ For multi-class models, you need to set the class label for which the
858
+ PDPs should be created via the ``label `` argument::
859
+
860
+ >>> from sklearn.datasets import load_iris
861
+ >>> iris = load_iris()
862
+ >>> mc_clf = GradientBoostingClassifier(n_estimators=10,
863
+ ... max_depth=1).fit(iris.data, iris.target)
864
+ >>> features = [3, 2, (3, 2)]
865
+ >>> fig, axs = plot_partial_dependence(mc_clf, X, features, label=0) #doctest: +SKIP
866
+
867
+ If you need the raw values of the partial dependence function rather
868
+ than the plots you can use the
869
+ :func: `~sklearn.ensemble.partial_dependence.partial_dependence ` function::
870
+
871
+ >>> from sklearn.ensemble.partial_dependence import partial_dependence
872
+
873
+ >>> pdp, axes = partial_dependence(clf, [0], X=X)
874
+ >>> pdp # doctest: +ELLIPSIS
875
+ array([[ 2.46643157, 2.46643157, ...
876
+ >>> axes # doctest: +ELLIPSIS
877
+ [array([-1.62497054, -1.59201391, ...
878
+
879
+ The function requires either the argument ``grid `` which specifies the
880
+ values of the target features on which the partial dependence function
881
+ should be evaluated or the argument ``X `` which is a convenience mode
882
+ for automatically creating ``grid `` from the training data. If ``X ``
883
+ is given, the ``axes `` value returned by the function gives the axis
884
+ for each target feature.
885
+
886
+ For each value of the 'target' features in the ``grid `` the partial
887
+ dependence function need to marginalize the predictions of a tree over
888
+ all possible values of the 'complement' features. In decision trees
889
+ this function can be evaluated efficiently without reference to the
890
+ training data. For each grid point a weighted tree traversal is
891
+ performed: if a split node involves a 'target' feature, the
892
+ corresponding left or right branch is followed, otherwise both
893
+ branches are followed, each branch is weighted by the fraction of
894
+ training samples that entered that branch. Finally, the partial
895
+ dependence is given by a weighted average of all visited leaves. For
896
+ tree ensembles the results of each individual tree are again
897
+ averaged.
898
+
899
+ .. rubric :: Footnotes
900
+
901
+ .. [1 ] For classification with ``loss='deviance' `` the target
902
+ response is logit(p).
903
+
904
+ .. [2 ] More precisely its the expectation of the target response after
905
+ accounting for the initial model; partial dependence plots
906
+ do not include the ``init `` model.
907
+
908
+ .. topic :: Examples:
909
+
910
+ * :ref: `sphx_glr_auto_examples_ensemble_plot_partial_dependence.py `
911
+
912
+
913
+ .. topic :: References
914
+
915
+ .. [F2001 ] J. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine",
916
+ The Annals of Statistics, Vol. 29, No. 5, 2001.
917
+
918
+ .. [F1999 ] J. Friedman, "Stochastic Gradient Boosting", 1999
919
+
920
+ .. [HTF2009 ] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical Learning Ed. 2", Springer, 2009.
921
+
922
+ .. [R2007 ] G. Ridgeway, "Generalized Boosted Models: A guide to the gbm package", 2007
923
+
924
+
800
925
.. _voting_classifier :
801
926
802
927
Voting Classifier
0 commit comments