8000 [MRG + 1] ENH add check_inverse in FunctionTransformer by glemaitre · Pull Request #9399 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG + 1] ENH add check_inverse in FunctionTransformer #9399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
4659cb4
EHN add check_inverse in FunctionTransformer
glemaitre Jul 18, 2017
72d3c54
Add whats new entry and short narrative doc
glemaitre Jul 18, 2017
df07603
Sparse support
glemaitre Jul 18, 2017
9a5777c
better handle sparse data
glemaitre Jul 19, 2017
bd7ad2f
Address andreas comments
glemaitre Jul 21, 2017
5c1851b
PEP8
glemaitre Jul 21, 2017
4fd988c
Merge branch 'master' into check_inverse_function_transformer
glemaitre Jul 26, 2017
3a764a7
Absolute tolerance default
glemaitre Jul 26, 2017
586e8ca
DOC fix docstring
glemaitre Jul 27, 2017
43f876c
Remove random state and make check_inverse deterministic
glemaitre Jul 27, 2017
f3c0d10
FIX remove random_state from init
glemaitre Jul 31, 2017
7a19979
PEP8
glemaitre Jul 31, 2017
e59f493
DOC motivation for the inverse
glemaitre Aug 1, 2017
6cb5b5d
make check_inverse=True default with a warning
glemaitre Aug 2, 2017
72e2005
PEP8
glemaitre Aug 2, 2017
45e0cb3
Merge remote-tracking branch 'origin/master' into check_inverse_funct…
glemaitre Aug 2, 2017
afdeca7
FIX get back X from check_array
glemaitre Aug 2, 2017
e4045a1
Andread comments
glemaitre Aug 2, 2017
c8c23fa
Merge branch 'master' into check_inverse_function_transformer
glemaitre Aug 17, 2017
4276618
Update whats new
glemaitre Aug 17, 2017
0297a4a
remove blank line
glemaitre Aug 17, 2017
677cd2a
joel s comments
glemaitre Aug 17, 2017
cec6f53
no check if one of forward or inverse not provided
glemaitre Aug 17, 2017
5238a33
DOC fixes and example of filterwarnings
glemaitre Aug 18, 2017
31abd47
DOC fix warningfiltering
glemaitre Aug 22, 2017
4d31e52
Merge remote-tracking branch 'origin/master' into check_inverse_funct…
glemaitre Oct 25, 2017
65b134a
DOC fix merge error git
glemaitre Oct 25, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/modules/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,15 @@ a transformer that applies a log transformation in a pipeline, do::
array([[ 0. , 0.69314718],
[ 1.09861229, 1.38629436]])

You can ensure that ``func`` and ``inverse_func`` are the inverse of each other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that a warning is raised and can be turned into an error with: give simplefilter example which tests message

by setting ``check_inverse=True`` and calling ``fit`` before
``transform``. Please note that a warning is raised and can be turned into an
error with a ``filterwarnings``::

>>> import warnings
>>> warnings.filterwarnings("error", message=".*check_inverse*.",
... category=UserWarning, append=False)

For a full code example that demonstrates using a :class:`FunctionTransformer`
to do custom feature selection,
see :ref:`sphx_glr_auto_examples_preprocessing_plot_function_transformer.py`
7 changes: 6 additions & 1 deletion doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Classifiers and regressors
- Added :class:`naive_bayes.ComplementNB`, which implements the Complement
Naive Bayes classifier described in Rennie et al. (2003).
By :user:`Michael A. Alcorn <airalcorn2>`.

Model evaluation

- Added the :func:`metrics.balanced_accuracy` metric and a corresponding
Expand All @@ -65,6 +65,11 @@ Classifiers and regressors
:class:`sklearn.naive_bayes.GaussianNB` to give a precise control over
variances calculation. :issue:`9681` by :user:`Dmitry Mottl <Mottl>`.

- A parameter ``check_inverse`` was added to :class:`FunctionTransformer`
to ensure that ``func`` and ``inverse_func`` are the inverse of each
other.
:issue:`9399` by :user:`Guillaume Lemaitre <glemaitre>`.

Model evaluation and meta-estimators

- A scorer based on :func:`metrics.brier_score_loss` is also available.
Expand Down
31 changes: 27 additions & 4 deletions sklearn/preprocessing/_function_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from ..base import BaseEstimator, TransformerMixin
from ..utils import check_array
from ..utils.testing import assert_allclose_dense_sparse
from ..externals.six import string_types


Expand All @@ -19,8 +20,6 @@ class FunctionTransformer(BaseEstimator, TransformerMixin):
function. This is useful for stateless transformations such as taking the
log of frequencies, doing custom scaling, etc.

A FunctionTransformer will not do any checks on its function's output.

Note: If a lambda is used as the function, then the resulting
transformer will not be pickleable.

Expand Down Expand Up @@ -59,6 +58,13 @@ class FunctionTransformer(BaseEstimator, TransformerMixin):

.. deprecated::0.19

check_inverse : bool, default=True
Whether to check that or ``func`` followed by ``inverse_func`` leads to
the original inputs. It can be used for a sanity check, raising a
warning when the condition is not fulfilled.

.. versionadded:: 0.20

kw_args : dict, optional
Dictionary of additional keyword arguments to pass to func.

Expand All @@ -67,16 +73,30 @@ class FunctionTransformer(BaseEstimator, TransformerMixin):

"""
def __init__(self, func=None, inverse_func=None, validate=True,
accept_sparse=False, pass_y='deprecated',
accept_sparse=False, pass_y='deprecated', check_inverse=True,
kw_args=None, inv_kw_args=None):
self.func = func
self.inverse_func = inverse_func
self.validate = validate
self.accept_sparse = accept_sparse
self.pass_y = pass_y
self.check_inverse = check_inverse
self.kw_args = kw_args
self.inv_kw_args = inv_kw_args

def _check_inverse_transform(self, X):
"""Check that func and inverse_func are the inverse."""
idx_selected = slice(None, None, max(1, X.shape[0] // 100))
try:
assert_allclose_dense_sparse(
X[idx_selected],
self.inverse_transform(self.transform(X[idx_selected])))
except AssertionError:
warnings.warn("The provided functions are not strictly"
" inverse of each other. If you are sure you"
" want to proceed regardless, set"
" 'check_inverse=False'.", UserWarning)

def fit(self, X, y=None):
"""Fit transformer by checking X.

Expand All @@ -92,7 +112,10 @@ def fit(self, X, y=None):
self
"""
if self.validate:
check_array(X, self.accept_sparse)
X = check_array(X, self.accept_sparse)
if (self.check_inverse and not (self.func is None or
self.inverse_func is None)):
self._check_inverse_transform(X)
return self

def transform(self, X, y='deprecated'):
Expand Down
46 changes: 44 additions & 2 deletions sklearn/preprocessing/tests/test_function_transformer.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import numpy as np
from scipy import sparse

from sklearn.preprocessing import FunctionTransformer
from sklearn.utils.testing import assert_equal, assert_array_equal
from sklearn.utils.testing import assert_warns_message
from sklearn.utils.testing import (assert_equal, assert_array_equal,
assert_allclose_dense_sparse)
from sklearn.utils.testing import assert_warns_message, assert_no_warnings


def _make_func(args_store, kwargs_store, func=lambda X, *a, **k: X):
Expand Down Expand Up @@ -126,3 +128,43 @@ def test_inverse_transform():
F.inverse_transform(F.transform(X)),
np.around(np.sqrt(X), decimals=3),
)


def test_check_inverse():
X_dense = np.array([1, 4, 9, 16], dtype=np.float64).reshape((2, 2))

X_list = [X_dense,
sparse.csr_matrix(X_dense),
sparse.csc_matrix(X_dense)]

for X in X_list:
if sparse.issparse(X):
accept_sparse = True
else:
accept_sparse = False
trans = FunctionTransformer(func=np.sqrt,
inverse_func=np.around,
accept_sparse=accept_sparse,
check_inverse=True)
assert_warns_message(UserWarning,
"The provided functions are not strictly"
" inverse of each other. If you are sure you"
" want to proceed regardless, set"
" 'check_inverse=False'.",
trans.fit, X)

trans = FunctionTransformer(func=np.expm1,
inverse_func=np.log1p,
accept_sparse=accept_sparse,
check_inverse=True)
Xt = assert_no_warnings(trans.fit_transform, X)
assert_allclose_dense_sparse(X, trans.inverse_transform(Xt))

# check that we don't check inverse when one of the func or inverse is not
# provided.
trans = FunctionTransformer(func=np.expm1, inverse_func=None,
check_inverse=True)
assert_no_warnings(trans.fit, X_dense)
trans = FunctionTransformer(func=None, inverse_func=np.expm1,
check_inverse=True)
assert_no_warnings(trans.fit, X_dense)
0