Closed
Description
I think it will be useful to document the advantage of our new GBDT (perhaps along with some benchmarks), so that users know when to use it.
Some insights from Nicolas:
- as guillaume said we can 100% control how they interact with the scikit-learn ecosystem (cross-validation, grid search etc.). This isn't the case for the other libraries which may only support part of it. Typically I'm not sure the LightGBM estimators pass our checks
- scikit-learn is arguably more popular than LightGBM or XGBoost alone, so the estimators have more exposure by being included here
- The APIs are significantly different. For example I really doubt our API for categorical variables will be similar to that of LightGBM.
- not everybody has a GPU, and the CPU implem is still order of magnitude faster than the other GBDT estimators that we have
Though personally I'm still not persuaded, e.g.,
- For the first reason, I think if we really care about interact with sklearn, perhaps a better way is to collaborate with existing GBDT, instead of writing a new one. Another possible way is to be more tolerant, e.g., flatten the prediction in voting.
- For the second reason, if we only consider GBDT, then I think xgboost, lightgbm, catboost is more famous than scikit-learn.