Closed
Description
Hello,
the H2o ML Framework supports an enum-encoding scheme. It would be nice to have this for sklearn as well. As far as I know there are no contributions made to add this for sklearn tree-based-models.
This would be useful to handle categorical features without some curse-of-dimensionality issue (One-Hot) and any kind of ordinality implied. This seems to be a nice approach to find the best split in tree-based-models (e.g. random forest) for categorical features. There is also an implementation of this in LightGBM: Read Section about optimal split for categorical features where there are 2^(k-1) - 1
possible subsets of the k-categorical features for splitting.
Anyone has thoughts about this?
Metadata
Metadata
Assignees
Labels
No labels