8000 min_impurity_split parameter of GradientBoostingRegressor is not used · Issue #9514 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

min_impurity_split parameter of GradientBoostingRegressor is not used #9514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CyrilLeMat opened this issue Aug 9, 2017 · 10 comments
Closed

Comments

@CyrilLeMat
Copy link
CyrilLeMat commented Aug 9, 2017

Description

min_impurity_split parameter of GradientBoostingRegressor is not used

Steps/Code to Reproduce

Example:

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)
clf2 = GradientBoostingRegressor(min_impurity_split="")
clf3 = GradientBoostingRegressor(min_impurity_split=None)

Expected Results

This example should raise an error

ValueError: min_impurity_split must be greater than or equal to 0

Actual Results

No errors

Versions

Darwin-16.5.0-x86_64-i386-64bit
Python 3.6.1 (v3.6.1:69c0db5050, Mar 21 2017, 01:21:04)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
NumPy 1.13.1
SciPy 0.19.0
Scikit-Learn 0.18.2

@jrbourbeau
Copy link
Contributor

Hey @CyrilLeMat, you're totally right about how GradientBoostingRegressor should raise an error when min_impurity_split is < 0. But it looks like that error is raised when you actually call the fit method for GradientBoostingRegressor (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/tree.py#L287)

For example,

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)

Doesn't raise an error, but

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)
# Make up some training data
X = [[1, 2, 3], [4, 5, 6]]
y = [0, 1]
# Fit classifier
clf.fit(X, y)

raises ValueError: min_impurity_split must be greater than or equal to 0.

@amueller amueller closed this as completed Aug 9, 2017
@CyrilLeMat
Copy link
Author
CyrilLeMat commented Aug 10, 2017

Hi,
thanks for your answer,
However your exact code don't raise any error on my side.
it prints:

GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
             max_leaf_nodes=None, min_impurity_split=-0.1,
             min_samples_leaf=1, min_samples_split=2,
             min_weight_fraction_leaf=0.0, n_estimators=100,
             presort='auto', random_state=None, subsample=1.0, verbose=0,
             warm_start=False)

@jnothman
Copy link
Member
jnothman commented Aug 10, 2017 via email

@CyrilLeMat
Copy link
Author
CyrilLeMat commented Aug 10, 2017

Hi!

yes I did,

I found the problem on my package : When the DecisionTreeRegressor are created, there is no parameter 'min_impurity_split =self. min_impurity_split'
However it is present here :
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L771

something new with the 0.18.2?

(edit : sorry it's min_impurity_split, not min_samples_split)

@CyrilLeMat
Copy link
Author

@jrbourbeau
Copy link
Contributor

Good catch, I should have specified that I'm using 0.20.dev0 on my previous comment. It looks like min_impurity_split was added in PR #8007.

@amueller
Copy link
Member

@CyrilLeMat but if there is no parameter of that name, you get an error, right?

@jrbourbeau
Copy link
Contributor
jrbourbeau commented Aug 11, 2017

I just tried

from sklearn.ensemble import GradientBoostingRegressor
clf = GradientBoostingRegressor(min_impurity_split=-0.1)
# Make up some training data
X = [[1, 2, 3], [4, 5, 6]]
y = [0, 1]
# Fit classifier
clf.fit(X, y)

with v0.18.2 and it doesn't throw an error. There is a min_impurity_split parameter for GradientBoostingRegressor in this version. However, during fitting, the min_impurity_split value from GradientBoostingRegressor is not passed to the DecisionTreeRegressor that is induced on the residuals (where the validation of min_impurity_split occurs). So the default value for min_impurity_split (1e-7) is used instead, leading to no error being thrown for an invalid min_impurity_split value in GradientBoostingRegressor.

@jnothman
Copy link
Member
jnothman commented Aug 13, 2017 via email

@jrbourbeau
Copy link
Contributor

Yeah, it's passed correctly in v0.19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0