8000 ENH Allows multiclass target in `TargetEncoder` by lucyleeow · Pull Request #26674 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH Allows multiclass target in TargetEncoder #26674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 65 commits into from
Sep 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
c8d35eb
add multiclass fun, first draft
lucyleeow Jun 23, 2023
037c6bf
black
lucyleeow Jun 23, 2023
14c6cf3
remove testing cruft
lucyleeow Jun 23, 2023
e6b4364
Merge branch 'main' into target_encoder_multi
lucyleeow Jun 23, 2023
4fba5dd
formatting
lucyleeow Jun 23, 2023
2c01358
formatting
lucyleeow Jun 23, 2023
6f7c98a
reorder feature columns, group same features
lucyleeow Jun 26, 2023
f773a96
fix transform_X_ordinal
lucyleeow Jun 26, 2023
f9ced51
formatting
lucyleeow Jun 26, 2023
0b9dbc2
Merge branch 'main' into target_encoder_multi
lucyleeow Jul 5, 2023
41d86e8
fix target mean in cv
lucyleeow Jul 5, 2023
896dfbf
Merge branch 'main' into target_encoder_multi
lucyleeow Jul 17, 2023
2f702e0
review
lucyleeow Jul 18, 2023
ecd4342
Merge branch 'main' into target_encoder_multi
lucyleeow Jul 18, 2023
0365d3e
typo
lucyleeow Jul 18, 2023
5bc120b
fix multi_idx, remove reodering of encodings
lucyleeow Jul 18, 2023
d2d37a0
formatting
lucyleeow Jul 18, 2023
6bae2c1
fix indicies, add transform multiclass fun
lucyleeow Jul 19, 2023
3358fca
format
lucyleeow Jul 19, 2023
6d3be26
simplify test
lucyleeow Jul 19, 2023
acec424
add comment
lucyleeow Jul 19, 2023
2b33d47
add _ to end of n_classes
lucyleeow Jul 19, 2023
96fedab
test nitpicks
lucyleeow Jul 19, 2023
ca46455
order X_out in place, use single transform X ordinal fun
lucyleeow Jul 19, 2023
68db78a
add get_feature_names_out
lucyleeow Jul 19, 2023
c2f8517
add changelog
lucyleeow Jul 19, 2023
c7e9a0f
format
lucyleeow Jul 19, 2023
0675f98
format
lucyleeow Jul 19, 2023
e3ff49a
amend doc wording
lucyleeow Jul 19, 2023
22bbd7c
typo
lucyleeow Jul 19, 2023
41fad59
revert test
lucyleeow Jul 19, 2023
2fbd73a
add back _learn_multiclass_encodings
lucyleeow Jul 20, 2023
2689857
carry through change in encoding order
lucyleeow Jul 20, 2023
b234826
fix docstring
lucyleeow Jul 20, 2023
060cef8
minor update _learn_multiclass_encodings
lucyleeow Jul 21, 2023
5e361cd
typos
lucyleeow Jul 21, 2023
ae7ca8a
add multiclass test
lucyleeow Jul 21, 2023
0add115
add multi to feature names out test
lucyleeow Jul 21, 2023
2488aed
amend test_use_regression_target
lucyleeow Jul 21, 2023
712afee
Merge branch 'main' into target_encoder_multi
lucyleeow Jul 21, 2023
6ec5950
update userguide and docstring
lucyleeow Jul 21, 2023
260db03
format
lucyleeow Jul 21, 2023
2362139
expand enc docstring
lucyleeow Jul 21, 2023
f7d60fd
doc fixes
lucyleeow Jul 21, 2023
36f43a5
format
lucyleeow Jul 21, 2023
58dca08
Merge branch 'main' into target_encoder_multi
lucyleeow Aug 2, 2023
f22fae1
review
lucyleeow Aug 2, 2023
3465d1f
review
lucyleeow Aug 2, 2023
142db3c
remove binary tag
lucyleeow Aug 2, 2023
dce2c56
update name off learn_encodings
lucyleeow Aug 3, 2023
f99bfae
format
lucyleeow Aug 3, 2023
2dca9dd
revert test nitpicks
lucyleeow Aug 8, 2023
a01b656
review
lucyleeow Aug 8, 2023
36b31ab
Merge branch 'main' into target_encoder_multi
lucyleeow Aug 9, 2023
10b51d5
Merge branch 'main' into target_encoder_multi
lucyleeow Aug 9, 2023
cdbf2ba
review
lucyleeow Sep 7, 2023
2c2c23a
Merge branch 'main' into target_encoder_multi
lucyleeow Sep 7, 2023
704b7b3
black
lucyleeow Sep 7, 2023
f80dafe
Merge branch 'main' into target_encoder_multi
ogrisel Sep 7, 2023
4cb3821
Fix merge mistake + simplify target parametrization in test
ogrisel Sep 7, 2023
53f8966
Assert rather than comment on expected assertion in test.
ogrisel Sep 7, 2023
c18a4c0
Remove redundant merge artifact
ogrisel Sep 7, 2023
607cc1d
Rename y_int to y_numeric in tests because it can be a flow for regre…
ogrisel Sep 7, 2023
33a381e
More consistently named private method
ogrisel Sep 7, 2023
9fb9987
Use pytest to check for expected warnings when dealing with unique cl…
ogrisel Sep 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions doc/modules/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -903,6 +903,19 @@ computed as an empirical Bayes estimate: :math:`m=\sigma_i^2/\tau^2`, where
:math:`\sigma_i^2` is the variance of `y` with category :math:`i` and
:math:`\tau^2` is the global variance of `y`.

For multiclass classification targets, the formulation is similar to binary
classification:

.. math::
S_{ij} = \lambda_i\frac{n_{iY_j}}{n_i} + (1 - \lambda_i)\frac{n_{Y_j}}{n}

where :math:`S_{ij}` is the encoding for category :math:`i` and class :math:`j`,
:math:`n_{iY_j}` is the number of observations with :math:`Y=j` and category
:math:`i`, :math:`n_i` is the number of observations with category :math:`i`,
:math:`n_{Y_j}` is the number of observations with :math:`Y=j`, :math:`n` is the
number of observations, and :math:`\lambda_i` is a shrinkage factor for category
:math:`i`.

For continuous targets, the formulation is similar to binary classification:

.. math::
Expand Down
6 changes: 6 additions & 0 deletions doc/whats_new/v1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,12 @@ Changelog
to :ref:`metadata routing user guide <metadata_routing>`. :pr:`26789` by
`Adrin Jalali`_.

:mod:`sklearn.preprocessing`
............................

- |Enhancement| :class:`preprocessing.TargetEncoder` now supports `target_type`
'multiclass'. :pr:`26674` by :user:`Lucy Liu <lucyleeow>`.

:mod:`sklearn.model_selection`
..............................

Expand Down
Loading
0