8000 Document the advantage of our new GBDT · Issue #15392 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Document the advantage of our new GBDT #15392
Closed
@qinhanmin2014

Description

@qinhanmin2014

I think it will be useful to document the advantage of our new GBDT (perhaps along with some benchmarks), so that users know when to use it.

Some insights from Nicolas:

  • as guillaume said we can 100% control how they interact with the scikit-learn ecosystem (cross-validation, grid search etc.). This isn't the case for the other libraries which may only support part of it. Typically I'm not sure the LightGBM estimators pass our checks
  • scikit-learn is arguably more popular than LightGBM or XGBoost alone, so the estimators have more exposure by being included here
  • The APIs are significantly different. For example I really doubt our API for categorical variables will be similar to that of LightGBM.
  • not everybody has a GPU, and the CPU implem is still order of magnitude faster than the other GBDT estimators that we have

Though personally I'm still not persuaded, e.g.,

  • For the first reason, I think if we really care about interact with sklearn, perhaps a better way is to collaborate with existing GBDT, instead of writing a new one. Another possible way is to be more tolerant, e.g., flatten the prediction in voting.
  • For the second reason, if we only consider GBDT, then I think xgboost, lightgbm, catboost is more famous than scikit-learn.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0