Improvement on Permutation importance example in release highlights #17313

venkyyuvy · 2020-05-23T09:56:40Z

Describe the issue linked to the documentation

when I look at the example given here, I got confused why the feature names are not sorted with respect to importance.

Suggest a potential alternative/fix

X, y = make_classification(random_state=0, n_features=5,
                           n_informative=3)
rf = RandomForestClassifier(random_state=0).fit(X, y)
result = permutation_importance(rf, X, y, n_repeats=10, random_state=0,
                                n_jobs=-1)

feature_names = np.array([f'x_{i}' for i in range(X.shape[1])])

fig, ax = plt.subplots()
sorted_idx = result.importances_mean.argsort()
ax.boxplot(result.importances[sorted_idx].T,
           vert=False, labels=feature_names[sorted_idx])
ax.set_title("Permutation Importance of each feature")
ax.set_ylabel("Features")
fig.tight_layout()
plt.show()

Also, for clarity may be we can set n_redundant=0, hence emphasising that permutation_importance identifies the 3 informative features precisely.

X, y = make_classification(random_state=0, n_features=5,
                           n_informative=3, n_redundant=0)
rf = RandomForestClassifier(random_state=0).fit(X, y)
result = permutation_importance(rf, X, y, n_repeats=10, random_state=0,
                                n_jobs=-1)

feature_names = np.array([f'x_{i}' for i in range(X.shape[1])])

fig, ax = plt.subplots()
sorted_idx = result.importances_mean.argsort()
ax.boxplot(result.importances[sorted_idx].T,
           vert=False, labels=feature_names[sorted_idx])
ax.set_title("Permutation Importance of each feature")
ax.set_ylabel("Features")
fig.tight_layout()
plt.show()

The text was updated successfully, but these errors were encountered:

jnothman · 2020-05-24T14:33:30Z

I agree, the ticklabels are misleading. PR welcome.

I am happy with keeping the redundant features in, but could be persuaded otherwise.

venkyyuvy · 2020-05-25T02:19:29Z

As you know, the results of permutation_importance will suffer when the features are correlated. Hence, for an intro example (when n_reduntant!=0 we will have duplicates of same feature - 100% correlation) do we really have to showcase the one which is depicting the above mentioned con?

venkyyuvy added the Documentation label May 23, 2020

venkyyuvy mentioned this issue May 25, 2020

[Doc] sorting feature wrt imp #17331

Merged

glemaitre closed this as completed in #17331 May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improvement on Permutation importance example in release highlights #17313

Improvement on Permutation importance example in release highlights #17313

Uh oh!

Uh oh!

Uh oh!

Improvement on Permutation importance example in release highlights #17313

Improvement on Permutation importance example in release highlights #17313

Comments

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Uh oh!

Uh oh!

Uh oh!