min_impurity_split parameter of GradientBoostingRegressor is not used #9514

CyrilLeMat · 2017-08-09T16:26:24Z

Description

min_impurity_split parameter of GradientBoostingRegressor is not used

Steps/Code to Reproduce

Example:

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)
clf2 = GradientBoostingRegressor(min_impurity_split="")
clf3 = GradientBoostingRegressor(min_impurity_split=None)

Expected Results

This example should raise an error

ValueError: min_impurity_split must be greater than or equal to 0

Actual Results

No errors

Versions

Darwin-16.5.0-x86_64-i386-64bit
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
NumPy 1.13.1
SciPy 0.19.0
Scikit-Learn 0.18.2

jrbourbeau · 2017-08-09T18:38:33Z

Hey @CyrilLeMat, you're totally right about how GradientBoostingRegressor should raise an error when min_impurity_split is < 0. But it looks like that error is raised when you actually call the fit method for GradientBoostingRegressor (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L287)

For example,

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)

Doesn't raise an error, but

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)
# Make up some training data
X = [[1, 2, 3], [4, 5, 6]]
y = [0, 1]
# Fit classifier
clf.fit(X, y)

raises ValueError: min_impurity_split must be greater than or equal to 0.

CyrilLeMat · 2017-08-10T07:51:44Z

Hi,
thanks for your answer,
However your exact code don't raise any error on my side.
it prints:

GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
             max_leaf_nodes=None, min_impurity_split=-0.1,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=100,
             presort='auto', random_state=None, subsample=1.0, verbose=0,
             warm_start=False)

jnothman · 2017-08-10T08:05:11Z

do you then call fit?

…

On 10 Aug 2017 5:51 pm, "CyrilLeMat" ***@***.***> wrote: Hi, thanks for your answer, However your exact code don't raise any error on my side. it prints: '''python GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None, learning_rate=0.1, loss='ls', max_depth=3, max_features=None, max_leaf_nodes=None, min_impurity_split=-0.1, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, presort='auto', random_state=None, subsample=1.0, verbose=0, warm_start=False) ''' — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9514 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6357C1NwqkJJTQ2iOfT2FtbF0BVKks5sWraSgaJpZM4OyUt9> .

CyrilLeMat · 2017-08-10T08:14:48Z

Hi!

yes I did,

I found the problem on my package : When the DecisionTreeRegressor are created, there is no parameter 'min_impurity_split =self. min_impurity_split'
However it is present here :
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L771

something new with the 0.18.2?

(edit : sorry it's min_impurity_split, not min_samples_split)

CyrilLeMat · 2017-08-10T08:29:49Z

8000

we can see here that the min_impurity_split is not used.
https://github.com/scikit-learn/scikit-learn/blob/0.18.X/sklearn/ensemble/gradient_boosting.py#L766

however it seems to be back on the 0.19:
https://github.com/scikit-learn/scikit-learn/blob/0.19.X/sklearn/ensemble/gradient_boosting.py#765

jrbourbeau · 2017-08-10T12:53:10Z

Good catch, I should have specified that I'm using 0.20.dev0 on my previous comment. It looks like min_impurity_split was added in PR #8007.

amueller · 2017-08-11T22:44:15Z

@CyrilLeMat but if there is no parameter of that name, you get an error, right?

jrbourbeau · 2017-08-11T23:29:31Z

I just tried

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)
# Make up some training data
X = [[1, 2, 3], [4, 5, 6]]
y = [0, 1]
# Fit classifier
clf.fit(X, y)

with v0.18.2 and it doesn't throw an error. There is a min_impurity_split parameter for GradientBoostingRegressor in this version. However, during fitting, the min_impurity_split value from GradientBoostingRegressor is not passed to the DecisionTreeRegressor that is induced on the residuals (where the validation of min_impurity_split occurs). So the default value for min_impurity_split (1e-7) is used instead, leading to no error being thrown for an invalid min_impurity_split value in GradientBoostingRegressor.

jnothman · 2017-08-13T13:28:21Z

Is it passed correctly in v0.19? Looks like it is, to me.

…

On 12 August 2017 at 09:29, James Bourbeau ***@***.***> wrote: I just tried from sklearn.ensemble import GradientBoostingRegressor clf = GradientBoostingRegressor(min_impurity_split=-0.1)# Make up some training data X = [[1, 2, 3], [4, 5, 6]] y = [0, 1]# Fit classifier clf.fit(X, y) with v0.18.2 and it doesn't throw an error. There is a min_impurity_split parameter for GradientBoostingRegressor in this version. However, during fitting, the min_impurity_split value from GradientBoostingRegressor is not passed to the DecisionTreeRegressor that is induced on the residuals (where the validation of min_impurity_split occurs). So the default value for min_impurity_split (1e-7) is used instead. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9514 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66BxL0GW-ayIQsSXcubYNUBo3qUzks5sXOPegaJpZM4OyUt9> .

jrbourbeau · 2017-08-13T14:54:34Z

Yeah, it's passed correctly in v0.19

amueller closed this as completed Aug 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

min_impurity_split parameter of GradientBoostingRegressor is not used #9514

min_impurity_split parameter of GradientBoostingRegressor is not used #9514

min_impurity_split parameter of GradientBoostingRegressor is not used #9514

min_impurity_split parameter of GradientBoostingRegressor is not used #9514

Comments

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions