8000 [MRG + 1] Remove "warn_on_dtype" from check_array by praths007 · Pull Request #13382 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG + 1] Remove "warn_on_dtype" from check_array #13382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
ba5fd14
Merge pull request #1 from scikit-learn/master
praths007 Mar 4, 2019
d9926ba
added datatype conversion for arrays whose boolean distance is to be …
praths007 Mar 4, 2019
760a88b
flake 8 changes
praths007 Mar 4, 2019
972d9e6
Merge pull request #2 from scikit-learn/master
praths007 Mar 13, 2019
9d0c75d
Merge branches 'master' and 'remove_warn_on_dtype_from_check_array' o…
praths007 Mar 13, 2019
9a73b7c
removed warn_on_dtype from check_pairwise_array
praths007 Mar 13, 2019
f2fc999
removed flake 8 issue
praths007 Mar 13, 2019
e9a6fec
restored extraneous flake 8 changes
praths007 Mar 18, 2019
6d45e73
deprecated warn_on_dtype acceptance in check_array
praths007 Mar 18, 2019
4dcfc93
flake 8 grief fix
praths007 Mar 18, 2019
80074cb
Merge pull request #3 from scikit-learn/master
praths007 Mar 18, 2019
0a615eb
Merge branch 'master' of https://github.com/praths007/scikit-learn in…
praths007 Mar 19, 2019
db005d3
removed uneccessary changes
praths007 Mar 19, 2019
187a2c9
incorporated review changes #1
praths007 Mar 25, 2019
9120535
Merge pull request #4 from scikit-learn/master
praths007 Mar 25, 2019
285e463
Merge branch 'master' of https://github.com/praths007/scikit-learn in…
praths007 Mar 25, 2019
6845199
remove obsolete merge changes
praths007 Mar 25, 2019
27b6fe6
changed ignore message
praths007 Mar 25, 2019
af3c07a
change deprecation warning message
praths007 Mar 25, 2019
6c17ca2
filter out deprecation warnings in assert warns and assert no warnings
praths007 Mar 25, 2019
801acd3
reverted `testing.py` changes
praths007 Mar 25, 2019
a918302
review changes #4
praths007 Mar 25, 2019
db58ec8
remove setting to None
praths007 Mar 25, 2019
4accc4e
review changes #5 (tweaked test cases)
praths007 Mar 26, 2019
35d969a
flake 8 changes
praths007 Mar 26, 2019
90380d4
updated test cases for deprecation
praths007 Mar 26, 2019
dde46ca
fixed failing tests
praths007 Mar 26, 2019
90472d2
flake 8
praths007 Mar 26, 2019
36267ba
flake 8#2
praths007 Mar 26, 2019
c3f1dcc
filter warnings
praths007 Mar 26, 2019
d3d276d
ignore warnings
praths007 Mar 27, 2019
498dcc2
warn_on_dtype False fix
praths007 Mar 27, 2019
7d178bb
logic changes
praths007 Mar 27, 2019
2e137b1
remove typo
praths007 Mar 27, 2019
7cc5671
remove warn_on_dtype from check_X_y
praths007 Mar 27, 2019
30196a8
updated docstring for circleci
praths007 Mar 27, 2019
91b23e5
added optional
praths007 Mar 27, 2019
e808afc
changes to tests
praths007 Mar 27, 2019
5e6a304
check_x_y changes
praths007 Mar 27, 2019
2fdd34c
changing False to None
praths007 Mar 27, 2019
97440aa
review changes
praths007 Mar 28, 2019
5eb8d17
fixing doc
praths007 Mar 28, 2019
660476e
fixing doc#2
praths007 Mar 28, 2019
ad611e2
removed unwanted examples
praths007 Mar 28, 2019
09627dc
Merge pull request #5 from scikit-learn/master
praths007 Mar 28, 2019
1567f89
Merge branch 'master' of https://github.com/praths007/scikit-learn in…
praths007 Mar 28, 2019
350ef5c
review changes
praths007 Mar 29, 2019
69981aa
restored test, updated changelog, modified docstring
praths007 Mar 29, 2019
62d2834
added .
praths007 Mar 29, 2019
1a39743
updated changelog
praths007 Mar 29, 2019
4943d1e
updated changelog
praths007 Mar 29, 2019
6e0b773
cosmetic changes
praths007 Apr 19, 2019
9f26b49
Merge pull request #6 from scikit-learn/master
praths007 Apr 19, 2019
57b1f04
Merge pull request #7 from praths007/master
praths007 Apr 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/whats_new/v0.21.rst
Original file line number Diff line number Diff line change
Expand Up @@ -627,6 +627,15 @@ Support for Python 3.4 and below has been officially dropped.
affects all ensemble methods using decision trees.
:issue:`12344` by :user:`Adrin Jalali <adrinjalali>`.

:mod:`sklearn.utils`
...................

- |API| Deprecated ``warn_on_dtype`` parameter from :func:`utils.check_array`
and :func:`utils.check_X_y`. Added explicit warning for dtype conversion
in :func:`check_pairwise_arrays` if the ``metric`` being passed is a
pairwise boolean metric.
:issue:`13382` by :user:`Prathmesh Savale <praths007>`.

Multiple modules
................

Expand Down
13 changes: 9 additions & 4 deletions sklearn/metrics/pairwise.py
8000
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
from ..utils._joblib import effective_n_jobs

from .pairwise_fast import _chi2_kernel_fast, _sparse_manhattan
from ..exceptions import DataConversionWarning


# Utility Functions
Expand Down Expand Up @@ -99,19 +100,18 @@ def check_pairwise_arrays(X, Y, precomputed=False, dtype=None):
"""
X, Y, dtype_float = _return_float_dtype(X, Y)

warn_on_dtype = dtype is not None
estimator = 'check_pairwise_arrays'
if dtype is None:
dtype = dtype_float

if Y is X or Y is None:
X = Y = check_array(X, accept_sparse='csr', dtype=dtype,
warn_on_dtype=warn_on_dtype, estimator=estimator)
estimator=estimator)
else:
X = check_array(X, accept_sparse='csr', dtype=dtype,
warn_on_dtype=warn_on_dtype, estimator=estimator)
estimator=estimator)
Y = check_array(Y, accept_sparse='csr', dtype=dtype,
warn_on_dtype=warn_on_dtype, estimator=estimator)
estimator=estimator)

if precomputed:
if X.shape[1] != Y.shape[0]:
Expand Down Expand Up @@ -1421,6 +1421,11 @@ def pairwise_distances(X, Y=None, metric="euclidean", n_jobs=None, **kwds):
" support sparse matrices.")

dtype = bool if metric in PAIRWISE_BOOLEAN_FUNCTIONS else None

if dtype == bool and (X.dtype != bool or Y.dtype != bool):
msg = "Data was converted to boolean for metric %s" % metric
warnings.warn(msg, DataConversionWarning)

X, Y = check_pairwise_arrays(X, Y, dtype=dtype)

# precompute data-derived metric params
Expand Down
15 changes: 15 additions & 0 deletions sklearn/metrics/tests/test_pairwise.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,21 @@ def test_pairwise_boolean_distance(metric):
res[np.isnan(res)] = 0
assert np.sum(res != 0) == 0

# non-boolean arrays are converted to boolean for boolean
# distance metrics with a data conversion warning
msg = "Data was converted to boolean for metric %s" % metric
with pytest.warns(DataConversionWarning, match=msg):
pairwise_distances(X, metric=metric)


def test_no_data_conversion_warning():
# No warnings issued if metric is not a boolean distance function
rng = np.random.RandomState(0)
X = rng.randn(5, 4)
with pytest.warns(None) as records:
pairwise_distances(X, metric="minkowski")
assert len(records) == 0


@pytest.mark.parametrize('func', [pairwise_distances, pairwise_kernels])
def test_pairwise_precomputed(func):
Expand Down
14 changes: 7 additions & 7 deletions sklearn/preprocessing/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,8 +137,8 @@ def scale(X, axis=0, with_mean=True, with_std=True, copy=True):

""" # noqa
X = check_array(X, accept_sparse='csc', copy=copy, ensure_2d=False,
warn_on_dtype=False, estimator='the scale function',
dtype=FLOAT_DTYPES, force_all_finite='allow-nan')
estimator='the scale function', dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')
if sparse.issparse(X):
if with_mean:
raise ValueError(
Expand Down Expand Up @@ -348,7 +348,7 @@ def partial_fit(self, X, y=None):
raise TypeError("MinMaxScaler does no support sparse input. "
"You may consider to use MaxAbsScaler instead.")

X = check_array(X, copy=self.copy, warn_on_dtype=False,
X = check_array(X, copy=self.copy,
estimator=self, dtype=FLOAT_DTYPES,
force_all_finite="allow-nan")

Expand Down Expand Up @@ -468,7 +468,7 @@ def minmax_scale(X, feature_range=(0, 1), axis=0, copy=True):
""" # noqa
# Unlike the scaler object, this function allows 1d input.
# If copy is required, it will be done inside the scaler object.
X = check_array(X, copy=False, ensure_2d=False, warn_on_dtype=False,
X = check_array(X, copy=False, ensure_2d=False,
dtype=FLOAT_DTYPES, force_all_finite='allow-nan')
original_ndim = X.ndim

Expand Down Expand Up @@ -659,8 +659,8 @@ def partial_fit(self, X, y=None):
Ignored
"""
X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
warn_on_dtype=False, estimator=self,
dtype=FLOAT_DTYPES, force_all_finite='allow-nan')
estimator=self, dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')

# Even in the case of `with_mean=False`, we update the mean anyway
# This is needed for the incremental computation of the var
Expand Down Expand Up @@ -753,7 +753,7 @@ def transform(self, X, copy=None):
check_is_fitted(self, 'scale_')

copy = copy if copy is not None else self.copy
X = check_array(X, accept_sparse='csr', copy=copy, warn_on_dtype=False,
X = check_array(X, accept_sparse='csr', copy=copy,
estimator=self, dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')

Expand Down
41 changes: 30 additions & 11 deletions sklearn/utils/tests/test_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -387,12 +387,15 @@ def test_check_array_dtype_warning():
assert_equal(X_checked.dtype, np.float64)

for X in float64_data:
X_checked = assert_no_warnings(check_array, X, dtype=np.float64,
accept_sparse=True, warn_on_dtype=True)
assert_equal(X_checked.dtype, np.float64)
X_checked = assert_no_warnings(check_array, X, dtype=np.float64,
accept_sparse=True, warn_on_dtype=False)
assert_equal(X_checked.dtype, np.float64)
with pytest.warns(None) as record:
warnings.simplefilter("ignore", DeprecationWarning) # 0.23
X_checked = check_array(X, dtype=np.float64,
accept_sparse=True, warn_on_dtype=True)
assert_equal(X_checked.dtype, np.float64)
X_checked = check_array(X, dtype=np.float64,
accept_sparse=True, warn_on_dtype=False)
assert_equal(X_checked.dtype, np.float64)
assert len(record) == 0

for X in float32_data:
X_checked = assert_no_warnings(check_array, X,
Expand All @@ -417,6 +420,17 @@ def test_check_array_dtype_warning():
assert_equal(X_checked.format, 'csr')


def test_check_array_warn_on_dtype_deprecation():
X = np.asarray([[0.0], [1.0]])
Y = np.asarray([[2.0], [3.0]])
with pytest.warns(DeprecationWarning,
match="'warn_on_dtype' is deprecated"):
check_array(X, warn_on_dtype=True)
with pytest.warns(DeprecationWarning,
match="'warn_on_dtype' is deprecated"):
check_X_y(X, Y, warn_on_dtype=True)


def test_check_array_accept_sparse_type_exception():
X = [[1, 2], [3, 4]]
X_csr = sp.csr_matrix(X)
Expand Down Expand Up @@ -690,8 +704,7 @@ def test_suppress_validation():
def test_check_array_series():
# regression test that check_array works on pandas Series
pd = importorskip("pandas")
res = check_array(pd.Series([1, 2, 3]), ensure_2d=False,
warn_on_dtype=True)
res = check_array(pd.Series([1, 2, 3]), ensure_2d=False)
assert_array_equal(res, np.array([1, 2, 3]))

# with categorical dtype (not a numpy dtype) (GH12699)
Expand All @@ -712,7 +725,10 @@ def test_check_dataframe_warns_on_dtype():
check_array, df, dtype=np.float64, warn_on_dtype=True)
assert_warns(DataConversionWarning, check_array, df,
dtype='numeric', warn_on_dtype=True)
assert_no_warnings(check_array, df, dtype='object', warn_on_dtype=True)
with pytest.warns(None) as record:
warnings.simplefilter("ignore", DeprecationWarning) # 0.23
check_array(df, dtype='object', warn_on_dtype=True)
assert len(record) == 0

# Also check that it raises a warning for mixed dtypes in a DataFrame.
df_mixed = pd.DataFrame([['1', 2, 3], ['4', 5, 6]])
Expand All @@ -728,8 +744,11 @@ def test_check_dataframe_warns_on_dtype():
df_mixed_numeric = pd.DataFrame([[1., 2, 3], [4., 5, 6]])
assert_warns(DataConversionWarning, check_array, df_mixed_numeric,
dtype='numeric', warn_on_dtype=True)
assert_no_warnings(check_array, df_mixed_numeric.astype(int),
dtype='numeric', warn_on_dtype=True)
with pytest.warns(None) as record:
warnings.simplefilter("ignore", DeprecationWarning) # 0.23
check_array(df_mixed_numeric.astype(int),
dtype='numeric', warn_on_dtype=True)
assert len(record) == 0


class DummyMemory:
Expand Down
25 changes: 20 additions & 5 deletions sklearn/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ def _ensure_no_complex_data(array):
def check_array(array, accept_sparse=False, accept_large_sparse=True,
dtype="numeric", order=None, copy=False, force_all_finite=True,
ensure_2d=True, allow_nd=False, ensure_min_samples=1,
ensure_min_features=1, warn_on_dtype=False, estimator=None):
ensure_min_features=1, warn_on_dtype=None, estimator=None):

"""Input validation on an array, list, sparse matrix or similar.

Expand Down Expand Up @@ -407,19 +407,30 @@ def check_array(array, accept_sparse=False, accept_large_sparse=True,
dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0
disables this check.

warn_on_dtype : boolean (default=False)
warn_on_dtype : boolean or None, optional (default=None)
Raise DataConversionWarning if the dtype of the input data structure
does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21
``warn_on_dtype`` is deprecated in version 0.21 and will be
removed in 0.23.

estimator : str or estimator instance (default=None)
If passed, include the name of the estimator in warning messages.

Returns
-------
array_converted : object
The converted and validated array.

"""
# warn_on_dtype deprecation
if warn_on_dtype is not None:
warnings.warn(
"'warn_on_dtype' is deprecated in version 0.21 and will be "
"removed in 0.23. Don't set `warn_on_dtype` to remove this "
"warning.",
DeprecationWarning)

# store reference to original array to check if copy is needed when
# function returns
array_orig = array
Expand Down Expand Up @@ -590,7 +601,7 @@ def check_X_y(X, y, accept_sparse=False, accept_large_sparse=True,
dtype="numeric", order=None, copy=False, force_all_finite=True,
ensure_2d=True, allow_nd=False, multi_output=False,
ensure_min_samples=1, ensure_min_features=1, y_numeric=False,
warn_on_dtype=False, estimator=None):
warn_on_dtype=None, estimator=None):
"""Input validation for standard estimators.

Checks X and y for consistent length, enforces X to be 2D and y 1D. By
Expand Down Expand Up @@ -675,10 +686,14 @@ def check_X_y(X, y, accept_sparse=False, accept_large_sparse=True,
it is converted to float64. Should only be used for regression
algorithms.

warn_on_dtype : boolean (default=False)
warn_on_dtype : boolean or None, optional (default=None)
Raise DataConversionWarning if the dtype of the input data structure
does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21
``warn_on_dtype`` is deprecated in version 0.21 and will be
removed in 0.23.

estimator : str or estimator instance (default=None)
If passed, include the name of the estimator in warning messages.

Expand Down
0