8000 Revert "[MRG] ENH apply sparse_threshold even if all columns are spar… · xhluca/scikit-learn@fa3352e · GitHub
[go: up one dir, main page]

Skip to content

Commit fa3352e

Browse files
author
Xing
committed
Revert "[MRG] ENH apply sparse_threshold even if all columns are sparse (scikit-learn#12304)"
This reverts commit 1dc7cc0.
1 parent f5dd86c commit fa3352e

File tree

3 files changed

+12
-13
lines changed

3 files changed

+12
-13
lines changed

doc/whats_new/v0.20.rst

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,6 @@ Changelog
3434
columns with types not convertible to a numeric.
3535
:issue:`11912` by :user:`Adrin Jalali <adrinjalali>`.
3636

37-
- |API| :class:`compose.ColumnTransformer` now applies the ``sparse_threshold``
38-
even if all transformation results are sparse. :issue:`12304` by `Andreas
39-
Müller`_.
40-
4137
:mod:`sklearn.datasets`
4238
............................
4339

sklearn/compose/_column_transformer.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,12 @@ class ColumnTransformer(_BaseComposition, TransformerMixin):
8585
estimator must support `fit` and `transform`.
8686
8787
sparse_threshold : float, default = 0.3
88-
If the output of the different transfromers contains sparse matrices,
89-
these will be stacked as a sparse matrix if the overall density is
90-
lower than this value. Use ``sparse_threshold=0`` to always return
91-
dense. When the transformed output consists of all dense data, the
92-
stacked result will be dense, and this keyword will be ignored.
88+
If the transformed output consists of a mix of sparse and dense data,
89+
it will be stacked as a sparse matrix if the density is lower than this
90+
value. Use ``sparse_threshold=0`` to always return dense.
91+
When the transformed output consists of all sparse or all dense data,
92+
the stacked result will be sparse or dense, respectively, and this
93+
keyword will be ignored.
9394
9495
n_jobs : int or None, optional (default=None)
9596
Number of jobs to run in parallel.
@@ -455,7 +456,9 @@ def fit_transform(self, X, y=None):
455456
Xs, transformers = zip(*result)
456457

457458
# determine if concatenated output will be sparse or not
458-
if any(sparse.issparse(X) for X in Xs):
459+
if all(sparse.issparse(X) for X in Xs):
460+
self.sparse_output_ = True
461+
elif any(sparse.issparse(X) for X in Xs):
459462
nnz = sum(X.nnz if sparse.issparse(X) else X.size for X in Xs)
460463
total = sum(X.shape[0] * X.shape[1] if sparse.issparse(X)
461464
else X.size for X in Xs)

sklearn/compose/tests/test_column_transformer.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -402,13 +402,13 @@ def test_column_transformer_sparse_threshold():
402402
X_array = np.array([['a', 'b'], ['A', 'B']], dtype=object).T
403403
# above data has sparsity of 4 / 8 = 0.5
404404

405-
# apply threshold even if all sparse
405+
# if all sparse, keep sparse (even if above threshold)
406406
col_trans = ColumnTransformer([('trans1', OneHotEncoder(), [0]),
407407
('trans2', OneHotEncoder(), [1])],
408408
sparse_threshold=0.2)
409409
res = col_trans.fit_transform(X_array)
410-
assert not sparse.issparse(res)
411-
assert not col_trans.sparse_output_
410+
assert sparse.issparse(res)
411+
assert col_trans.sparse_output_
412412

413413
# mixed -> sparsity of (4 + 2) / 8 = 0.75
414414
for thres in [0.75001, 1]:

0 commit comments

Comments
 (0)
0