-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT unpack 0-dim NumPy array instead of implicit conversion #26345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT unpack 0-dim NumPy array instead of implicit conversion #26345
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in at least some of these cases, wouldn't it be better to have them as scalars rather than arrays in the first place? I'm not super comfortable with some of these hard coded indexings. Feels like we might either be missing values, or that they should have been a scalar in the first place.
Indeed, this is a good point. I think that for the tests, this is fine. I will look more closely at the |
sklearn/linear_model/_logistic.py
Outdated
@@ -506,6 +506,9 @@ def _logistic_regression_path( | |||
w0 = np.concatenate([coef_.ravel(), intercept_]) | |||
else: | |||
w0 = coef_.ravel() | |||
# n_iter_i is an array for each class. However, `target` is always encoded | |||
# in {-1, 1}, so we only take the first element of n_iter_i. | |||
n_iter_i = n_iter_i[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here this is really specific to liblinear
. We should preserve the array because it will be useful in multiclass case in other part of the code.
sklearn/tree/_export.py
Outdated
else: | ||
alpha = (sorted_values[0] - sorted_values[1]) / (1 - sorted_values[1]) | ||
else: | ||
# Regression tree or multi-output | ||
if isinstance(value, Iterable): | ||
# regression tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the reason why we cannot unpack the value earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the places and check it makes sense to unpack value.
I added a comment why the unpacking is happening such that the future us will not find this unpacking weird.
This is a codecov failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
sklearn/tree/_export.py
Outdated
@@ -247,17 +248,21 @@ def get_color(self, value): | |||
color = list(self.colors["rgb"][np.argmax(value)]) | |||
sorted_values = sorted(value, reverse=True) | |||
if len(sorted_values) == 1: | |||
alpha = 0 | |||
alpha = 0.0 | |||
else: | |||
alpha = (sorted_values[0] - sorted_values[1]) / (1 - sorted_values[1]) | |||
else: | |||
# Regression tree or multi-output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so here it's never multi-output? if it is, we shouldn't take the first value, should we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, the multi-output case is when value
is not an Iterable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O_o then what is it? that seems very shaky, is there another way we could check for multioutput? this right now seems very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I prefer to unpack from the call-site:
scikit-learn/sklearn/tree/_export.py
Lines 277 to 281 in a2b8571
if tree.n_outputs == 1: | |
node_val = tree.value[node_id][0, :] / tree.weighted_n_node_samples[node_id] | |
if tree.n_classes[0] == 1: | |
# Regression | |
node_val = tree.value[node_id][0, :] |
and leave get_color
alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I let get_color
alone :)
I added a comment to differentiate the regression case from the classification from single class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I use ndarray.item to unpacking 0-dim ndarrays into Python scalars. Using item
is a little safer, because it will raise an error if the underlying array has more than one element.
sklearn/tree/_export.py
Outdated
@@ -247,17 +248,21 @@ def get_color(self, value): | |||
color = list(self.colors["rgb"][np.argmax(value)]) | |||
sorted_values = sorted(value, reverse=True) | |||
if len(sorted_values) == 1: | |||
alpha = 0 | |||
alpha = 0.0 | |||
else: | |||
alpha = (sorted_values[0] - sorted_values[1]) / (1 - sorted_values[1]) | |||
else: | |||
# Regression tree or multi-output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I prefer to unpack from the call-site:
scikit-learn/sklearn/tree/_export.py
Lines 277 to 281 in a2b8571
if tree.n_outputs == 1: | |
node_val = tree.value[node_id][0, :] / tree.weighted_n_node_samples[node_id] | |
if tree.n_classes[0] == 1: | |
# Regression | |
node_val = tree.value[node_id][0, :] |
and leave get_color
alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicks
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
I used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much nicer now. Thanks @glemaitre
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I triggered a scipy dev build to check if we're not missing any of these
The failure seems to be some other failures :) |
I checked that all remaining failures are unrelated to 0 dim arrays. Let's merge |
…-learn#26345) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
…-learn#26345) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Solve the deprecation warning observed in #26154
Unpack 0-dim array explicitly instead of making an implicit conversion that will raise an error in the future NumPy (1.25).