8000 DOC clarify feature importance calculation (#11521) · scikit-learn/scikit-learn@99f4fea · GitHub
[go: up one dir, main page]

Skip to content

Commit 99f4fea

Browse files
deniederhutjnothman
authored andcommitted
DOC clarify feature importance calculation (#11521)
1 parent c1738a3 commit 99f4fea

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

doc/modules/ensemble.rst

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -260,11 +260,16 @@ respect to the predictability of the target variable. Features used at
260260
the top of the tree contribute to the final prediction decision of a
261261
larger fraction of the input samples. The **expected fraction of the
262262
samples** they contribute to can thus be used as an estimate of the
263-
**relative importance of the features**.
263+
**relative importance of the features**. In scikit-learn, the fraction of
264+
samples a feature contributes to is combined with the decrease in impurity
265+
from splitting them to create a normalized estimate of the predictive power
266+
of that feature.
264267

265-
By **averaging** those expected activity rates over several randomized
268+
By **averaging** the estimates of predictive ability over several randomized
266269
trees one can **reduce the variance** of such an estimate and use it
267-
for feature selection.
270+
for feature selection. This is known as the mean decrease in impurity, or MDI.
271+
Refer to [L2014]_ for more information on MDI and feature importance
272+
evaluation with Random Forests.
268273

269274
The following example shows a color-coded representation of the relative
270275
importances of each individual pixel for a face recognition task using
@@ -288,6 +293,12 @@ to the prediction function.
288293

289294
.. _random_trees_embedding:
290295

296+
.. topic:: References
297+
298+
.. [L2014] G. Louppe,
299+
"Understanding Random Forests: From Theory to Practice",
300+
PhD Thesis, U. of Liege, 2014.
301+
291302
Totally Random Trees Embedding
292303
------------------------------
293304

0 commit comments

Comments
 (0)
0