MAINT test globally setting output via context manager #24932

glemaitre · 2022-11-15T11:29:35Z

Requires a merge of #24930 to test all estimators

This PR uses the config_context to set globally the output of estimators.

TODO:

merge MAINT run common tests on IterativeImputer #24930
Fix IterativeImputer
add the following cases:
- request pandas output without providing dataframe at fit and transform
- request pandas output without providing dataframe at transform

glemaitre · 2022-11-17T13:23:47Z

While it is quite trivial to fix the issue for our transformer, we might start to break third-party transformers that have nested transformer(s) without any warning. This could be quite problematic.

sklearn/impute/_iterative.py

glemaitre · 2022-11-18T13:36:35Z

sklearn/impute/_iterative.py

@@ -707,7 +722,7 @@ def fit_transform(self, X, y=None):
                Xt, estimator = self._impute_one_feature(
                    Xt,
                    mask_missing_values,
-                    feat_idx,
+                    int(feat_idx),


We need to understand why feat_idx is not an integer here.

The feat_idx is coming out of _get_ordered_idx, where a few different things are done based on the imputation_order. Do you know if the issue happened in general, or for a specific imputation_order?

I assume that it was the default one. I will have a closer look to this function.

OK the reason to get a numpy array was for the following:

scikit-learn/sklearn/utils/__init__.py

Lines 189 to 193 in 75db1bc

if hasattr(key, "shape"):

# Work-around for indexing with read-only key in pandas

# FIXME: solved in pandas 0.25

key = np.asarray(key)

key = key if key.flags.writeable else key.copy()

We can remove it.

np.int64(10) will expose a shape attribute :)

thomasjpfan

Thank you for the PR!

thomasjpfan · 2022-11-21T18:26:01Z

sklearn/impute/_iterative.py

-        Xt[~mask_missing_values] = X[~mask_missing_values]
+        for feat_idx, feat_mask in enumerate(mask_missing_values.T):


Does this introduce a performance regression when everything is a ndarray?

It looks like a slowdown of x1.5

Could we add a disjunction based on whether mask_missing_values is a np.array or a pd.DataFrame?

Let's write a small helper function because the same pattern also appears in transform.

The problem is not mask_missing_values being an array or a dataframe. This is always a 2D numpy array here.

The problem is that _safe_indexing is not written to handle 2D arrays. Maybe, we could make it possible to pass axis=None and in this case, we expect to have a 2D array of the same shape as the original one. It might be only working for NumPy array thought.

I made a distinction for whether Xt is a dataframe or not. Also pandas has a method for this specific use case. I put the logic in a function named _assign_where.

sklearn/utils/estimator_checks.py

jjerphan

LGTM. Thank you, @glemaitre.

I just have a few comments and remarks.

jjerphan · 2022-11-22T10:09:33Z

sklearn/impute/_iterative.py

-        Xt[~mask_missing_values] = X[~mask_missing_values]
+        for feat_idx, feat_mask in enumerate(mask_missing_values.T):


Could we add a disjunction based on whether mask_missing_values is a np.array or a pd.DataFrame?

jjerphan · 2022-11-22T10:11:32Z

sklearn/utils/__init__.py

@@ -186,7 +186,9 @@ def _array_indexing(array, key, key_dtype, axis):

 def _pandas_indexing(X, key, key_dtype, axis):
    """Index a pandas dataframe or a series."""
-    if hasattr(key, "shape"):
+    if hasattr(key, "shape") and not np.isscalar(key):


Could we use _is_arraylike_not_scalar instead?

scikit-learn/sklearn/utils/validation.py

Lines 257 to 264 in 74ddf01

def _is_arraylike(x):

"""Returns whether the input is array-like."""

return hasattr(x, "__len__") or hasattr(x, "shape") or hasattr(x, "__array__")

def _is_arraylike_not_scalar(array):

"""Return True if array is array-like and not a scalar"""

return _is_arraylike(array) and not np.isscalar(array)

At first we could not because a tuple is an array-like and there's a different logic for the tuple case right after. But it turns out that we can get rid of the pandas < 0.25 workaround and that there's no reason to make it a list when it's a tuple and an array when it's not. We can always make it an array.

So in the end I think we can use _is_array_like_not_scalar and simplify the whole logic. Done in a642ec9

sklearn/utils/__init__.py

sklearn/utils/estimator_checks.py

sklearn/preprocessing/_encoders.py

sklearn/utils/__init__.py

jeremiedbb

@glemaitre I directly pushed the requested changes and simplified the logic of some parts. LGTM

sklearn/impute/_iterative.py

jeremiedbb · 2022-11-24T13:13:57Z

I think all the comments have been addressed. The performance regression has been avoided. Let's roll.

MAINT test globally setting output via context manager

40767dd

glemaitre marked this pull request as draft November 15, 2022 11:29

github-actions bot added the module:utils label Nov 15, 2022

glemaitre added this to the 1.2 milestone Nov 15, 2022

glemaitre added Blocker No Changelog Needed labels Nov 15, 2022

glemaitre added 3 commits November 17, 2022 10:31

Merge branch 'main' into is/24931

272caa2

Merge branch 'main' into is/24931

2b08bbc

FIX intial imputer should work on NumPy array

4a14a02

glemaitre added 2 commits November 17, 2022 15:22

refactoring

4858f39

add some comments

6518fdb

glemaitre marked this pull request as ready for review November 17, 2022 14:27

glemaitre commented Nov 17, 2022

View reviewed changes

sklearn/impute/_iterative.py Outdated Show resolved Hide resolved

glemaitre added 2 commits November 17, 2022 18:12

make IterativeImputer container agnostic

b560b59

remove debug

66c4196

jeremiedbb reviewed Nov 18, 2022

View reviewed changes

sklearn/impute/_iterative.py Outdated Show resolved Hide resolved

glemaitre commented Nov 18, 2022

View reviewed changes

glemaitre added 5 commits November 21, 2022 11:58

MAINT introduce _safe_assign in IterativeImputer

2c6c248

iter

a845b75

iter

65ae84a

TST add test for _safe_assign

c71afea

iter

f6c0383

thomasjpfan reviewed Nov 21, 2022

View reviewed changes

jjerphan approved these changes Nov 22, 2022

View reviewed changes

jeremiedbb reviewed Nov 22, 2022

View reviewed changes

sklearn/preprocessing/_encoders.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

address review comments + simplify

a642ec9

jeremiedbb approved these changes Nov 23, 2022

View reviewed changes

glemaitre commented Nov 23, 2022

View reviewed changes

sklearn/impute/_iterative.py Show resolved Hide resolved

TST add test for _assign_where

eb044cd

jeremiedbb merged commit af16e59 into scikit-learn:main Nov 24, 2022

glemaitre mentioned this pull request Jan 12, 2023

FIX Set TSNE's internal PCA to always use numpy as output #25370

Merged

	if hasattr(key, "shape"):
	# Work-around for indexing with read-only key in pandas
	# FIXME: solved in pandas 0.25
	key = np.asarray(key)
	key = key if key.flags.writeable else key.copy()

		Xt[~mask_missing_values] = X[~mask_missing_values]
		for feat_idx, feat_mask in enumerate(mask_missing_values.T):

	def _is_arraylike(x):
	"""Returns whether the input is array-like."""
	return hasattr(x, "__len__") or hasattr(x, "shape") or hasattr(x, "__array__")


	def _is_arraylike_not_scalar(array):
	"""Return True if array is array-like and not a scalar"""
	return _is_arraylike(array) and not np.isscalar(array)

Uh oh!

MAINT test globally setting output via context manager #24932

MAINT test globally setting output via context manager #24932

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!