8000 change ColumnTransformer input order by adrinjalali · Pull Request #12396 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

change ColumnTransformer input order #12396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a5e94d8
change make_column_transformer input order
adrinjalali Oct 16, 2018
2e89237
add whats_new entry
adrinjalali Oct 16, 2018
8c6e0b6
fix user guide
adrinjalali Oct 16, 2018
167fa09
fix more tests
adrinjalali Oct 16, 2018
f0d8b01
don't accept a mix of input order
adrinjalali Oct 16, 2018
034fb61
check all given tuples, but don't accept a mixture
adrinjalali Oct 16, 2018
ed01669
change make_column_transformer input order
adrinjalali Oct 16, 2018
d574883
add whats_new entry
adrinjalali Oct 16, 2018
45e9f8b
fix user guide
adrinjalali Oct 16, 2018
2b4f957
fix more tests
adrinjalali Oct 16, 2018
97e4ff1
don't accept a mix of input order
adrinjalali Oct 16, 2018
2447956
check all given tuples, but don't accept a mixture
adrinjalali Oct 16, 2018
664f3bb
changing the order to column, transformer
adrinjalali Oct 22, 2018
e344d71
all tests pass
adrinjalali Oct 22, 2018
67a6e07
add DeprecatoinWarning and its test
adrinjalali Oct 22, 2018
ddd5de7
remove extra code
adrinjalali Oct 22, 2018
5750789
fix docstrings
adrinjalali Oct 22, 2018
76c10bd
merge conflicts
adrinjalali Oct 22, 2018
6a44a82
remove extra line
adrinjalali Oct 22, 2018
b4ee409
fix whats_new entry
adrinjalali Oct 22, 2018
c06d820
revert make_column_transformer change and apply Joel's comments
adrinjalali Oct 25, 2018
512b49a
fix make_column_transformer's docstring example
adrinjalali Oct 25, 2018
06608a8
add missing _validate_transformers to _validate_tuple_order
adrinjalali Oct 25, 2018
94e9c6b
restore the original tuple order in case the swap doesn't work
adrinjalali Oct 25, 2018
ff66fa8
remove unnecessary `pass`
adrinjalali Oct 25, 2018
958dd1f
fix some of the compose.rst examples
adrinjalali Oct 25, 2018
834f96b
add a test and fix whats_new entry
adrinjalali Oct 26, 2018
c2b80ec
fix example codes
adrinjalali Oct 28, 2018
6eb5f03
Merge branch 'master' into column_transformer_order
amueller Oct 29, 2018
008f6c7
improve the warning message
adrinjalali Oct 30, 2018
e710ecf
simplify test
adrinjalali Oct 31, 2018
3269769
fix flake8
adrinjalali Nov 6, 2018
7363526
explain the deprecation phase in the whats_new
adrinjalali Nov 7, 2018
581b300
further clarification in whats_new
adrinjalali Nov 7, 2018
ee58432
check transform for no warnings
adrinjalali Nov 7, 2018
cc95da1
raise a hard error in case the wrong tuple order is passed.
adrinjalali Nov 10, 2018
64adfd5
Merge remote-tracking branch 'upstream/master' into api/ct-order
adrinjalali Nov 12, 2018
d42f1b3
towards more backward compatibility
adrinjalali Nov 13, 2018
c9f983b
almost fully backward compatible
adrinjalali Nov 13, 2018
43a79cc
make_column_transformer uses the deprecated tuple order by default
adrinjalali Nov 13, 2018
faf1ade
warn in make_column_transformer
adrinjalali Nov 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions doc/modules/compose.rst
Original file line number Diff line number Diff line change
Expand Up @@ -418,8 +418,8 @@ By default, the remaining rating columns are ignored (``remainder='drop'``)::
>>> from sklearn.compose import ColumnTransformer
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> column_trans = ColumnTransformer(
... [('city_category', CountVectorizer(analyzer=lambda x: [x]), 'city'),
... ('title_bow', CountVectorizer(), 'title')],
... [('city_category', 'city', CountVectorizer(analyzer=lambda x: [x])),
... ('title_bow', 'title', CountVectorizer())],
... remainder='drop')

>>> column_trans.fit(X) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
Expand Down Expand Up @@ -458,8 +458,8 @@ We can keep the remaining rating columns by setting
transformation::

>>> column_trans = ColumnTransformer(
... [('city_category', CountVectorizer(analyzer=lambda x: [x]), 'city'),
... ('title_bow', CountVectorizer(), 'title')],
... [('city_category', 'city', CountVectorizer(analyzer=lambda x: [x])),
... ('title_bow', 'title', CountVectorizer())],
... remainder='passthrough')

>>> column_trans.fit_transform(X)
Expand All @@ -475,8 +475,8 @@ the transformation::

>>> from sklearn.preprocessing import MinMaxScaler
>>> column_trans = ColumnTransformer(
... [('city_category', CountVectorizer(analyzer=lambda x: [x]), 'city'),
... ('title_bow', CountVectorizer(), 'title')],
... [('city_category', 'city', CountVectorizer(analyzer=lambda x: [x])),
... ('title_bow', 'title', CountVectorizer())],
... remainder=MinMaxScaler())

>>> column_trans.fit_transform(X)[:, -2:]
Expand Down
8 changes: 8 additions & 0 deletions doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,14 @@ Changelog
even if all transformation results are sparse. :issue:`12304` by `Andreas
Müller`_.

- |API| :class:`compose.ColumnTransformer` now expects
`(name, columns, transformer)` instead of `(name, transformer, columns)`
as the first argument to the constructor.
The support for the `(name, transformer, columns)` order will be removed
in v0.22, and until then, a `UserWarning` is raised if the deprecated
input order is given.
:issue:`12339` by :user:`Adrin Jalali <adrinjalali>`.

:mod:`sklearn.datasets`
............................

Expand Down
10 changes: 5 additions & 5 deletions examples/compose/plot_column_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,19 +93,19 @@ def transform(self, posts):
('union', ColumnTransformer(
[
# Pulling features from the post's subject line (first column)
('subject', TfidfVectorizer(min_df=50), 0),
('subject', 0, TfidfVectorizer(min_df=50)),

# Pipeline for standard bag-of-words model for body (second column)
('body_bow', Pipeline([
('body_bow', 1, Pipeline([
('tfidf', TfidfVectorizer()),
('best', TruncatedSVD(n_components=50)),
]), 1),
])),

# Pipeline for pulling ad hoc features from post's body
('body_stats', Pipeline([
('body_stats', 1, Pipeline([
('stats', TextStats()), # returns a list of dicts
('vect', DictVectorizer()), # list of dicts -> feature matrix
]), 1),
])),
],

# weight components in ColumnTransformer
Expand Down
4 changes: 2 additions & 2 deletions examples/compose/plot_column_transformer_mixed_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@

preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
('num', numeric_features, numeric_transformer),
('cat', categorical_features, categorical_transformer)])

# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
Expand Down
Loading
0