8000 MAINT unpack 0-dim NumPy array instead of implicit conversion by glemaitre · Pull Request #26345 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT unpack 0-dim NumPy array instead of implicit conversion #26345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 9, 2023

Conversation

glemaitre
Copy link
Member

Solve the deprecation warning observed in #26154

Unpack 0-dim array explicitly instead of making an implicit conversion that will raise an error in the future NumPy (1.25).

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in at least some of these cases, wouldn't it be better to have them as scalars rather than arrays in the first place? I'm not super comfortable with some of these hard coded indexings. Feels like we might either be missing values, or that they should have been a scalar in the first place.

@glemaitre
Copy link
Member Author

Indeed, this is a good point. I think that for the tests, this is fine. I will look more closely at the LogisticRegressionCV.

@@ -506,6 +506,9 @@ def _logistic_regression_path(
w0 = np.concatenate([coef_.ravel(), intercept_])
else:
w0 = coef_.ravel()
# n_iter_i is an array for each class. However, `target` is always encoded
# in {-1, 1}, so we only take the first element of n_iter_i.
n_iter_i = n_iter_i[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here this is really specific to liblinear. We should preserve the array because it will be useful in multiclass case in other part of the code.

else:
alpha = (sorted_values[0] - sorted_values[1]) / (1 - sorted_values[1])
else:
# Regression tree or multi-output
if isinstance(value, Iterable):
# regression tree
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the reason why we cannot unpack the value earlier.

Copy link
Member Author
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the places and check it makes sense to unpack value.
I added a comment why the unpacking is happening such that the future us will not find this unpacking weird.

@glemaitre
Copy link
Member Author

This is a codecov failure.

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@@ -247,17 +248,21 @@ def get_color(self, value):
color = list(self.colors["rgb"][np.argmax(value)])
sorted_values = sorted(value, reverse=True)
if len(sorted_values) == 1:
alpha = 0
alpha = 0.0
else:
alpha = (sorted_values[0] - sorted_values[1]) / (1 - sorted_values[1])
else:
# Regression tree or multi-output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so here it's never multi-output? if it is, we shouldn't take the first value, should we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the multi-output case is when value is not an Iterable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O_o then what is it? that seems very shaky, is there another way we could check for multioutput? this right now seems very confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I prefer to unpack from the call-site:

if tree.n_outputs == 1:
node_val = tree.value[node_id][0, :] / tree.weighted_n_node_samples[node_id]
if tree.n_classes[0] == 1:
# Regression
node_val = tree.value[node_id][0, :]

and leave get_color alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I let get_color alone :)
I added a comment to differentiate the regression case from the classification from single class

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually I use ndarray.item to unpacking 0-dim ndarrays into Python scalars. Using item is a little safer, because it will raise an error if the underlying array has more than one element.

@@ -247,17 +248,21 @@ def get_color(self, value):
color = list(self.colors["rgb"][np.argmax(value)])
sorted_values = sorted(value, reverse=True)
if len(sorted_values) == 1:
alpha = 0
alpha = 0.0
else:
alpha = (sorted_values[0] - sorted_values[1]) / (1 - sorted_values[1])
else:
# Regression tree or multi-output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I prefer to unpack from the call-site:

if tree.n_outputs == 1:
node_val = tree.value[node_id][0, :] / tree.weighted_n_node_samples[node_id]
if tree.n_classes[0] == 1:
# Regression
node_val = tree.value[node_id][0, :]

and leave get_color alone.

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpicks

glemaitre and others added 3 commits June 9, 2023 10:33
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
@glemaitre
Copy link
Member Author

I used item instead of indexing as suggested by @thomasjpfan

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks much nicer now. Thanks @glemaitre

Copy link
Member
@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I triggered a scipy dev build to check if we're not missing any of these

@glemaitre
Copy link
Member Author

The failure seems to be some other failures :)

@jeremiedbb
Copy link
Member

I checked that all remaining failures are unrelated to 0 dim arrays. Let's merge

@jeremiedbb jeremiedbb merged commit 9eea5b7 into scikit-learn:main Jun 9, 2023
manudarmi pushed a commit to primait/scikit-learn that referenced this pull request Jun 12, 2023
…-learn#26345)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…-learn#26345)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0