-
Notifications
You must be signed in to change notification settings - Fork 2
Pprett/gradient boosting #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pprett/gradient boosting #6
Conversation
@glouppe some of the tests fail due to numerical issues (an aftermath of changing dtype). I fixed those but I notice a performance regression for the following benchmark::
it goes from::
to::
|
hmm... I think I hunted it down::
This is 4 times the usual timing due to y and y_pred having different dtype. |
The error of the (best) split. | ||
For leaves `init_error == `best_error`. | ||
|
||
init_error : np.ndarray of float64 | ||
init_error : np.ndarray of DTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should init_error
or best_error
have type DTYPE which is the dtype of the data array? Either use np.float32 or np.float64. I tend to use np.float64 whenever possible (i.e. when memory consumption is not an issue).
wow... seems like 32bit floating point arithmetic in numpy is substantially slower than 64bit arithmetic::
vs 32bit::
it seems that |
Wow that's huge. I was not aware of this. Actually, my machine is 32 bits that's the reason why I like to have the possibility to not use float64. I will have a deeper look at it tomorrow. I'll revert my changes if I come to no good solution. |
it might be slower on 64bit machines but a 6-fold increase is too 2012/3/19 Gilles Louppe
Peter Prettenhofer |
Gilles, I just checked the other (regression) models in sklearn, it seems that only |
Okay, I agree. I'll revert my changes tomorrow. On 19 March 2012 22:32, Peter Prettenhofer
|
This reverts commit 3509e16. Conflicts: sklearn/ensemble/gradient_boosting.py sklearn/tree/tree.py
I just pushed a reverse commit. |
@glouppe thanks - I updated |
nitpick fixes, pep8 and fix math equations
Revised text classification chapter
This is my first bunch of commits regarding your PR.
I really like how you managed to remove the "terminal" mechanisms from the Tree code :)
My changes are the following:
Most of those do not actually concern the boosting module. I still have to review the gradient_boosting.py file into more depth. (Later today or tomorrow).