8000 API proposal for losses · Issue #5044 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

API proposal for losses #5044

New issue
8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arogozhnikov opened this issue Jul 28, 2015 · 10 comments
Closed

API proposal for losses #5044

arogozhnikov opened this issue Jul 28, 2015 · 10 comments

Comments

@arogozhnikov
Copy link

@glouppe @ogrisel

I was working on different modifications of gradient boosting with specific loss functions, and unfortunately I was not able to reuse scikit-learn's gradient boosting for my purposes.

My problems are not a reason to change anything in sklearn, but the following concept also resolves other issues (see below).

Loss function is an estimator

  1. Loss function has parameters.
    For instance, this is useful for different kind or regularization
  2. Loss function should be fitted: loss.fit(X, y, sample_weight=sample_weight)
    This means, that loss function has state (and may keep some useful information), or do some heavy precomputations.
  3. there are methods for negative gradient and for updating tree leaves and, etc.
  4. Since loss is estimator, it can be cloned.

Possible benefits

  1. Ranking algorithms implemented as loss functions
    • they require some initial work with ranks, which is done during fit
    • also name of special column with ranks is passed as parameter to loss function, but the ranks are obtained during fitting.
  2. My algorithm requires computing neighbors in some variables, this is done during fitting of loss function.
  3. Loss-specific regularizations as parameters.
  4. Flexibility: loss functions that can be reused by other algorithms (i.e. I'm using them for pruning, but one can use them to build rankers over logistic regression with modified loss function, which seems nice)

Loss function will become the main logic of algorithm, but this is probably fine, because there are different losses for different problems, which actually makes GB so universal.

Implementation

If you're interested in some details of implementation, you're welcome to see hep_ml.losses sources, there is already an example of ranking loss function.

@amueller
Copy link
Member

What do you understand as a loss function?
What I think when I hear loss function is just a formula, which is stateless and not estimated from data.

I don't understand 1, 2 and 4.

@arogozhnikov
Copy link
Author

Hi, Andreas.

By loss function I mean target of optimization, and in many cases this is indeed a formula (mse, logloss).

However, the idea of stepwise optimization of GB was many times generalized, probably ranking is good demonstration of this:

  • in most cases loss involves some information from dataset (are samples from same query?)
  • in many cases we cannot even give the expression for target, but we still use GB scheme by providing some pseudoresidual and building tree on that.
  • in some cases we use updating tree leaves, sometimes this are very specific (see DirectRank as a limit case)

@glouppe
Copy link
Contributor
glouppe commented Jul 29, 2015

Thanks for raising the issue @arogozhnikov ! In general, I agree we should improve the API on the loss functions used within our GBRT implementation. This is something I have been thinking about lately as well and I agree we should make it easier for people to plugin new functions if they need to. Currently, doing that is quite cumbersome... I have no opinion yet on how to best do that however -- I'll dig into what you have implemented for hep_ml.

CC: @pprett @ndawe @jmschrei who might have opinion on this as well.

@arjoly
Copy link
Member
arjoly commented Jul 29, 2015

I agree that loss should be an estimator.

I have a private boosting implementation with loss being regressor. The fit and predict method correspond to the initial estimate of the gradient boosting algorithm. It allows to fix a lot api issue due to the fact that initial estimator is a mix of inconsistent and non-api compliant estimators.

However, @pprett raised previously that fixing that would break the backward compatibility with serialization. This is not worth it if there is no benefit for direct users.

@jmschrei
Copy link
Member

My thoughts are similar to @arjoly . I am unsure how to handle the backwards compatibility issue, however. I may have a more informative response when I get to that section of the code (still in _tree.pyx).

@arjoly
Copy link
Member
arjoly commented Jul 29, 2015

For the backward compatibility with serialisation, it's better to have it. But in practice, we make modifications if there are some benefits. Preserving serialisation between versions is a lot of trouble and huge maintenance burden.

For the backward compatibility in term of interface (functions, classes, ...), we have the rules to be backward compatible for 2 versions to allow users to adapt. Deprecation warning are raised in the meantime.

< 8000 div class="avatar-parent-child TimelineItem-avatar d-none d-md-block"> @glouppe
Copy link
Contributor
glouppe commented Sep 8, 2015

Ping @jmschrei @arjoly : This is something to keep in mind with respect to #5212

@lorentzenchr
Copy link
Member

Closing in favor of #15123 which is now implemented in #20567.

@arogozhnikov
Copy link
Author

Thanks for working on deduplication of losses in sklearn @lorentzenchr.

However cases discussed in this issue are not covered by #15123

I have implemented generic loss for GB in hep_ml a while ago, it covers case of ranking and other losses that use additional information in their formulation.

@lorentzenchr
Copy link
Member

You can use

class BaseHistGradientBoosting(BaseEstimator, ABC):
and pass your own loss function in init. Note, however, that this is not part of the public API and may change at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
0