scikit-learn
diff --git a/‎doc/modules/ensemble.rst
Lines changed: 3 additions & 2 deletions b/‎doc/modules/ensemble.rst
Lines changed: 3 additions & 2 deletions
diff --git a/‎doc/modules/tree.rst
Lines changed: 20 additions & 7 deletions b/‎doc/modules/tree.rst
Lines changed: 20 additions & 7 deletions
diff --git a/‎doc/whats_new/v0.20.rst
Lines changed: 0 additions & 13 deletions b/‎doc/whats_new/v0.20.rst
Lines changed: 0 additions & 13 deletions
diff --git a/‎examples/ensemble/plot_adaboost_hastie_10_2.py
Lines changed: 2 additions & 2 deletions b/‎examples/ensemble/plot_adaboost_hastie_10_2.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/ensemble/plot_gradient_boosting_oob.py
Lines changed: 1 addition & 1 deletion b/‎examples/ensemble/plot_gradient_boosting_oob.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/ensemble/plot_gradient_boosting_quantile.py
Lines changed: 2 additions & 1 deletion b/‎examples/ensemble/plot_gradient_boosting_quantile.py
Lines changed: 2 additions & 1 deletion
@@ -218,7 +218,7 @@ setting ``oob_score=True``.
     The size of the model with the default parameters is :math:`O( M * N * log (N) )`,
     where :math:`M` is the number of trees and :math:`N` is the number of samples.
     In order to reduce the size of the model, you can change these parameters:
-    ``min_samples_split``, ``max_leaf_nodes`` and ``max_depth``.
+    ``min_samples_split``, ``max_leaf_nodes``, ``max_depth`` and ``min_samples_leaf``.
 
 Parallelization
 ---------------
@@ -393,7 +393,8 @@ The number of weak learners is controlled by the parameter ``n_estimators``. The
 the final combination. By default, weak learners are decision stumps. Different
 weak learners can be specified through the ``base_estimator`` parameter.
 The main parameters to tune to obtain good results are ``n_estimators`` and
-the complexity of the base estimators (e.g., its depth ``max_depth``).
+the complexity of the base estimators (e.g., its depth ``max_depth`` or
+minimum required number of samples to consider a split ``min_samples_split``).
 
 .. topic:: Examples:
 
 
@@ -330,18 +330,31 @@ Tips on practical use
     for each additional level the tree grows to.  Use ``max_depth`` to control
     the size of the tree to prevent overfitting.
 
-  * Use ``min_samples_split`` to control the number of samples at a leaf node.
-    A very small number will usually mean the tree will overfit, whereas a
-    large number will prevent the tree from learning the data. If the sample
-    size varies greatly, a float number can be used as percentage in this
-    parameter. Note that ``min_samples_split`` can create arbitrarily
-    small leaves.
+  * Use ``min_samples_split`` or ``min_samples_leaf`` to ensure that multiple
+    samples inform every decision in the tree, by controlling which splits will
+    be considered. A very small number will usually mean the tree will overfit,
+    whereas a large number will prevent the tree from learning the data. Try
+    ``min_samples_leaf=5`` as an initial value. If the sample size varies
+    greatly, a float number can be used as percentage in these two parameters.
+    While ``min_samples_split`` can create arbitrarily small leaves,
+    ``min_samples_leaf`` guarantees that each leaf has a minimum size, avoiding
+    low-variance, over-fit leaf nodes in regression problems.  For
+    classification with few classes, ``min_samples_leaf=1`` is often the best
+    choice.
 
   * Balance your dataset before training to prevent the tree from being biased
     toward the classes that are dominant. Class balancing can be done by
     sampling an equal number of samples from each class, or preferably by
     normalizing the sum of the sample weights (``sample_weight``) for each
-    class to the same value.
+    class to the same value. Also note that weight-based pre-pruning criteria,
+    such as ``min_weight_fraction_leaf``, will then be less biased toward
+    dominant classes than criteria that are not aware of the sample weights,
+    like ``min_samples_leaf``.
+
+  * If the samples are weighted, it will be easier to optimize the tree
+    structure using weight-based pre-pruning criterion such as
+    ``min_weight_fraction_leaf``, which ensure that leaf nodes contain at least
+    a fraction of the overall sum of the sample weights.
 
   * All decision trees use ``np.float32`` arrays internally.
     If training data is not in this format, a copy of the dataset will be made.
 
@@ -343,12 +343,6 @@ Support for Python 3.3 has been officially dropped.
   while mask does not allow this functionality.
   :issue:`9524` by :user:`Guillaume Lemaitre <glemaitre>`.
 
-- |API| The parameters ``min_samples_leaf`` and ``min_weight_fraction_leaf`` in
-  tree-based ensembles are deprecated and will be removed (fixed to 1 and 0
-  respectively) in version 0.22.  These parameters were not effective for
-  regularization and at worst would produce bad splits.  :issue:`10773` by
-  :user:`Bob Chen <lasagnaman>` and `Joel Nothman`_.
-
 - |Fix| :class:`ensemble.BaseBagging` where one could not deterministically
   reproduce ``fit`` result using the object attributes when ``random_state``
   is set. :issue:`9723` by :user:`Guillaume Lemaitre <glemaitre>`.
@@ -1035,13 +1029,6 @@ Support for Python 3.3 has been officially dropped.
   considered all samples to be of equal weight importance.
   :issue:`11464` by :user:`John Stott <JohnStott>`.
 
-- |API| The parameters ``min_samples_leaf`` and ``min_weight_fraction_leaf`` in
-  :class:`tree.DecisionTreeClassifier` and :class:`tree.DecisionTreeRegressor`
-  are deprecated and will be removed (fixed to 1 and 0 respectively) in version
-  0.22.  These parameters were not effective for regularization and at worst
-  would produce bad splits.  :issue:`10773` by :user:`Bob Chen <lasagnaman>`
-  and `Joel Nothman`_.
-
 
 :mod:`sklearn.utils`
 ....................
 
@@ -43,11 +43,11 @@
 X_test, y_test = X[2000:], y[2000:]
 X_train, y_train = X[:2000], y[:2000]
 
-dt_stump = DecisionTreeClassifier(max_depth=1)
+dt_stump = DecisionTreeClassifier(max_depth=1, min_samples_leaf=1)
 dt_stump.fit(X_train, y_train)
 dt_stump_err = 1.0 - dt_stump.score(X_test, y_test)
 
-dt = DecisionTreeClassifier(max_depth=9)
+dt = DecisionTreeClassifier(max_depth=9, min_samples_leaf=1)
 dt.fit(X_train, y_train)
 dt_err = 1.0 - dt.score(X_test, y_test)
 
 
@@ -55,7 +55,7 @@
 
 # Fit classifier with out-of-bag estimates
 params = {'n_estimators': 1200, 'max_depth': 3, 'subsample': 0.5,
-          'learning_rate': 0.01, 'random_state': 3}
+          'learning_rate': 0.01, 'min_samples_leaf': 1, 'random_state': 3}
 clf = ensemble.GradientBoostingClassifier(**params)
 
 clf.fit(X_train, y_train)
 
@@ -41,7 +41,8 @@ def f(x):
 
 clf = GradientBoostingRegressor(loss='quantile', alpha=alpha,
                                 n_estimators=250, max_depth=3,
-                                learning_rate=.1, min_samples_split=9)
+                                learning_rate=.1, min_samples_leaf=9,
+                                min_samples_split=9)
 
 clf.fit(X, y)