8000 Cloning decision tree estimators breaks criterion objects · Issue #6420 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Cloning decision tree estimators breaks criterion objects #6420
Closed
@panisson

Description

@panisson

I'm trying to implement different criterions for decision trees.
I've found that decision trees could accept a Criterion object as a criterion parameter:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L335
And the easiest way to implement other criterions would be to implement subclasses of tree._criterion.Criterion class.

The normal way to pass a criterion to a decision tree is by using its string name, and it works fine:

from sklearn import tree, model_selection, metrics, datasets
import numpy as np

X, y = datasets.make_classification(n_samples=1000, random_state=42)
cv = model_selection.KFold(n_folds=10, shuffle=True, random_state=43)

dtc = tree.DecisionTreeClassifier(criterion='gini', random_state=42)
print np.mean(model_selection.cross_val_score(dtc, X, y, cv=cv))

mean score is 0.866.

However, if I use a Criterion object, it does not work anymore:

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))
dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
print np.mean(model_selection.cross_val_score(dtc, X, y, cv=cv))

mean score now is 0.476.
It seems that the cloning of the decision tree is breaking the criterion object in some way, because this code is also not working:

from sklearn.base import clone

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))

dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
scores = []
for train_idx, test_idx in cv.split(X, y):
    estimator = clone(dtc)
    estimator.fit(X[train_idx], y[train_idx])
    scores.append(metrics.accuracy_score(y[test_idx], estimator.predict(X[test_idx])))
print np.mean(scores)

but if I reset the criterion object of the estimator by, e.g.,

estimator.criterion = dtc.criterion

then the score values are back to normal.

I could not find where the cloning is breaking the criterion object, any help would be welcome.

Thanks all for your effort on this project, sklearn is really great!

regards
André

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugModerateAnything that requires some knowledge of conventions and best practices

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0