@@ -667,6 +667,22 @@ constant-width bins. The 'quantile' strategy uses the quantiles values to have
667
667
equally populated bins in each feature. The 'kmeans' strategy defines bins based
668
668
on a k-means clustering procedure performed on each feature independently.
669
669
670
+ Be aware that one can specify custom bins by passing a callable defining the
671
+ discretization strategy to :class: `~sklearn.preprocessing.FunctionTransformer `.
672
+ For instance, we can use the Pandas function :func: `pandas.cut `::
673
+
674
+ >>> import pandas as pd
675
+ >>> import numpy as np
676
+ >>> bins = [0, 1, 13, 20, 60, np.inf]
677
+ >>> labels = ['infant', 'kid', 'teen', 'adult', 'senior citizen']
678
+ >>> transformer = preprocessing.FunctionTransformer(
679
+ ... pd.cut, kw_args={'bins': bins, 'labels': labels, 'retbins': False}
680
+ ... )
681
+ >>> X = np.array([0.2, 2, 15, 25, 97])
682
+ >>> transformer.fit_transform(X)
683
+ ['infant', 'kid', 'teen', 'adult', 'senior citizen']
684
+ Categories (5, object): ['infant' < 'kid' < 'teen' < 'adult' < 'senior citizen']
685
+
670
686
.. topic :: Examples:
671
687
672
688
* :ref: `sphx_glr_auto_examples_preprocessing_plot_discretization.py `
0 commit comments