8000 FIX missing_indices were calculated twice in OrdinalEncoder (#27017) · TamaraAtanasoska/scikit-learn@6ef7523 · GitHub
[go: up one dir, main page]

Skip to content

Commit 6ef7523

Browse files
xuefeng-xuthomasjpfan
authored andcommitted
FIX missing_indices were calculated twice in OrdinalEncoder (scikit-learn#27017)
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
1 parent c6777ae commit 6ef7523

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

doc/whats_new/v1.4.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,10 @@ Changelog
167167
:mod:`sklearn.preprocessing`
168168
............................
169169

170+
- |Efficiency| :class:`preprocessing.OrdinalEncoder` avoids calculating
171+
missing indices twice to improve efficiency.
172+
:pr:`27017` by `Xuefeng Xu <xuefeng-xu>`.
173+
170174
- |Fix| :class:`preprocessing.OneHotEncoder` shows a more informative error message
171175
when `sparse_output=True` and the output is configured to be pandas.
172176
:pr:`26931` by `Thomas Fan`_.

sklearn/preprocessing/_encoders.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1508,15 +1508,11 @@ def fit(self, X, y=None):
15081508
if infrequent is not None:
15091509
cardinalities[feature_idx] -= len(infrequent)
15101510

1511-
# stores the missing indices per category
1512-
self._missing_indices = {}
1511+
# missing values are not considered part of the cardinality
1512+
# when considering unknown categories or encoded_missing_value
15131513
for cat_idx, categories_for_idx in enumerate(self.categories_):
1514-
for i, cat in enumerate(categories_for_idx):
1514+
for cat in categories_for_idx:
15151515
if is_scalar_nan(cat):
1516-
self._missing_indices[cat_idx] = i
1517-
1518-
# missing values are not considered part of the cardinality
1519-
# when considering unknown categories or encoded_missing_value
15201516
cardinalities[cat_idx] -= 1
15211517
continue
15221518

0 commit comments

Comments
 (0)
0