@@ -1077,10 +1077,13 @@ categorical features as continuous (ordinal), which happens for ordinal-encoded
1077
1077
categorical data, since categories are nominal quantities where order does not
1078
1078
matter.
1079
1079
1080
- To enable categorical support, a boolean mask can be passed to the
1081
- `categorical_features ` parameter, indicating which feature is categorical. In
1082
- the following, the first feature will be treated as categorical and the
1083
- second feature as numerical::
1080
+ There are several ways to use the native categorical feature support for those
1081
+ estimators. The simplest way it to pass the training data as `pandas.DataFrame `
1082
+ where the categorical features are of type `category `.
1083
+
1084
+ Alternatively it is possible to pass a boolean mask to the `categorical_features `
1085
+ parameter, indicating which feature is categorical. In the following, the first
1086
+ feature will be treated as categorical and the second feature as numerical::
1084
1087
1085
1088
>>> gbdt = HistGradientBoostingClassifier(categorical_features=[True, False])
1086
1089
@@ -1089,10 +1092,18 @@ categorical features::
1089
1092
1090
1093
>>> gbdt = HistGradientBoostingClassifier(categorical_features=[0])
1091
1094
1092
- The cardinality of each categorical feature should be less than the `max_bins `
1093
- parameter, and each categorical feature is expected to be encoded in
1094
- `[0, max_bins - 1] `. To that end, it might be useful to pre-process the data
1095
- with an :class: `~sklearn.preprocessing.OrdinalEncoder ` as done in
1095
+ Finally, one can pass a list of strings indicating the names of the categorical
1096
+ if training data is passed as a dataframe with string column names::
1097
+
1098
+ >>> gbdt = HistGradientBoostingClassifier(categorical_features=['f0'])
1099
+
1100
+ In any case, the cardinality of each categorical feature should be less than
1101
+ the `max_bins ` parameter, and each categorical feature is expected to be
1102
+ encoded in `[0, max_bins - 1] `.
1103
+
1104
+ If the original data is not already using numerical encoding for the
1105
+ categorical features, it can to pre-processed with an
1106
+ :class: `~sklearn.preprocessing.OrdinalEncoder ` as done in
1096
1107
:ref: `sphx_glr_auto_examples_ensemble_plot_gradient_boosting_categorical.py `.
1097
1108
1098
1109
If there are missing values during training, the missing values will be
0 commit comments