Cloning decision tree estimators breaks criterion objects

I'm trying to implement different criterions for decision trees.
I've found that decision trees could accept a Criterion object as a criterion parameter:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L335
And the easiest way to implement other criterions would be to implement subclasses of tree._criterion.Criterion class.

The normal way to pass a criterion to a decision tree is by using its string name, and it works fine:

from sklearn import tree, model_selection, metrics, datasets
import numpy as np

X, y = datasets.make_classification(n_samples=1000, random_state=42)
cv = model_selection.KFold(n_folds=10, shuffle=True, random_state=43)

dtc = tree.DecisionTreeClassifier(criterion='gini', random_state=42)
print np.mean(model_selection.cross_val_score(dtc, X, y, cv=cv))

mean score is 0.866.

However, if I use a Criterion object, it does not work anymore:

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))
dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
print np.mean(model_selection.cross_val_score(dtc, X, y, cv=cv))

mean score now is 0.476.
It seems that the cloning of the decision tree is breaking the criterion object in some way, because this code is also not working:

from sklearn.base import clone

gini = tree._criterion.Gini(n_outputs=1, n_classes=np.array([2]))

dtc = tree.DecisionTreeClassifier(criterion=gini, random_state=42)
scores = []
for train_idx, test_idx in cv.split(X, y):
    estimator = clone(dtc)
    estimator.fit(X[train_idx], y[train_idx])
    scores.append(metrics.accuracy_score(y[test_idx], estimator.predict(X[test_idx])))
print np.mean(scores)

but if I reset the criterion object of the estimator by, e.g.,

estimator.criterion = dtc.criterion

then the score values are back to normal.

I could not find where the cloning is breaking the criterion object, any help would be welcome.

Thanks all for your effort on this project, sklearn is really great!

regards
André

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions