8000 [MRG+1] FIX/DOC Improve documentation regarding non-determinitic tree… · scikit-learn/scikit-learn@cc3ce58 · GitHub
[go: up one dir, main page]

Skip to content

Commit cc3ce58

Browse files
glemaitreraghavrv
authored andcommitted
[MRG+1] FIX/DOC Improve documentation regarding non-determinitic tree behaviour (#8452)
* FIX/DOC Improve documentation regarding non-determinitic tree behaviour * FIX correct max_features
1 parent e1ca40d commit cc3ce58

File tree

3 files changed

+55
-1
lines changed

3 files changed

+55
-1
lines changed

sklearn/ensemble/forest.py

+18
Original file line numberDiff line numberDiff line change
@@ -889,6 +889,15 @@ class labels (multi-output problem).
889889
was never left out during the bootstrap. In this case,
890890
`oob_decision_function_` might contain NaN.
891891
892+
Notes
893+
-----
894+
The features are always randomly permuted at each split. Therefore,
895+
the best found split may vary, even with the same training data,
896+
``max_features=n_features`` and ``bootstrap=False``, if the improvement
897+
of the criterion is identical for several splits enumerated during the
898+
search of the best split. To obtain a deterministic behaviour during
899+
fitting, ``random_state`` has to be fixed.
900+
892901
References
893902
----------
894903
@@ -1070,6 +1079,15 @@ class RandomForestRegressor(ForestRegressor):
10701079
oob_prediction_ : array of shape = [n_samples]
10711080
Prediction computed with out-of-bag estimate on the training set.
10721081
1082+
Notes
1083+
-----
1084+
The features are always randomly permuted at each split. Therefore,
1085+
the best found split may vary, even with the same training data,
1086+
``max_features=n_features`` and ``bootstrap=False``, if the improvement
1087+
of the criterion is identical for several splits enumerated during the
1088+
search of the best split. To obtain a deterministic behaviour during
1089+
fitting, ``random_state`` has to be fixed.
1090+
10731091
References
10741092
----------
10751093

sklearn/ensemble/gradient_boosting.py

+19-1
Original file line numberDiff line numberDiff line change
@@ -1384,6 +1384,14 @@ class GradientBoostingClassifier(BaseGradientBoosting, ClassifierMixin):
13841384
The collection of fitted sub-estimators. ``loss_.K`` is 1 for binary
13851385
classification, otherwise n_classes.
13861386
1387+
Notes
1388+
-----
1389+
The features are always randomly permuted at each split. Therefore,
1390+
the best found split may vary, even with the same training data and
1391+
``max_features=n_features``, if the improvement of the criterion is
1392+
identical for several splits enumerated during the search of the best
1393+
split. To obtain a deterministic behaviour during fitting,
1394+
``random_state`` has to be fixed.
13871395
13881396
See also
13891397
--------
@@ -1727,7 +1735,8 @@ class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
17271735
warm_start : bool, default: False
17281736
When set to ``True``, reuse the solution of the previous call to fit
17291737
and add more estimators to the ensemble, otherwise, just erase the
1730-
previous solution.
1738+
p
1739+
revious solution.
17311740
17321741
random_state : int, RandomState instance or None, optional (default=None)
17331742
If int, random_state is the seed used by the random number generator;
@@ -1770,6 +1779,15 @@ class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
17701779
estimators_ : ndarray of DecisionTreeRegressor, shape = [n_estimators, 1]
17711780
The collection of fitted sub-estimators.
17721781
1782+
Notes
1783+
-----
1784+
The features are always randomly permuted at each split. Therefore,
1785+
the best found split may vary, even with the same training data and
1786+
``max_features=n_features``, if the improvement of the criterion is
1787+
identical for several splits enumerated during the search of the best
1788+
split. To obtain a deterministic behaviour during fitting,
1789+
``random_state`` has to be fixed.
1790+
17731791
See also
17741792
--------
17751793
DecisionTreeRegressor, RandomForestRegressor

sklearn/tree/tree.py

+18
Original file line numberDiff line numberDiff line change
@@ -629,6 +629,15 @@ class DecisionTreeClassifier(BaseDecisionTree, ClassifierMixin):
629629
tree_ : Tree object
630630
The underlying Tree object.
631631
632+
Notes
633+
-----
634+
The features are always randomly permuted at each split. Therefore,
635+
the best found split may vary, even with the same training data and
636+
``max_features=n_features``, if the improvement of the criterion is
637+
identical for several splits enumerated during the search of the best
638+
split. To obtain a deterministic behaviour during fitting,
639+
``random_state`` has to be fixed.
640+
632641
See also
633642
--------
634643
DecisionTreeRegressor
@@ -922,6 +931,15 @@ class DecisionTreeRegressor(BaseDecisionTree, RegressorMixin):
922931
tree_ : Tree object
923932
The underlying Tree object.
924933
934+
Notes
935+
-----
936+
The features are always randomly permuted at each split. Therefore,
937+
the best found split may vary, even with the same training data and
938+
``max_features=n_features``, if the improvement of the criterion is
939+
identical for several splits enumerated during the search of the best
940+
split. To obtain a deterministic behaviour during fitting,
941+
``random_state`` has to be fixed.
942+
925943
See also
926944
--------
927945
DecisionTreeClassifier

0 commit comments

Comments
 (0)
0