8000 Refactor the fix for MyPy errors due to NumPy v2.2.0 by nabenabe0928 · Pull Request #5853 · optuna/optuna · GitHub
[go: up one dir, main page]

Skip to content

Refactor the fix for MyPy errors due to NumPy v2.2.0 #5853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 18, 2024

Conversation

nabenabe0928
Copy link
Contributor
@nabenabe0928 nabenabe0928 commented Dec 10, 2024

Motivation

This PR is to fix misleading typing to hotfix CI checking.

Description of the changes

  • Eliminate typings that do not align with the actual behaviors

Note

These changes can probably be removed after the following issue is resolved.
numpy/numpy#27957

@@ -103,7 +103,7 @@ def __call__(self, study: Study, population: list[FrozenTrial]) -> list[FrozenTr

def _generate_default_reference_point(
n_objectives: int, dividing_parameter: int = 3
) -> np.ndarray[tuple[int, int], np.dtype[np.float64]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, this typing is stricter and correct, but for consistency, I removed this typing.

@@ -329,14 +329,15 @@ def _warn_and_convert_inf(

def _get_constraint_vals_and_feasibility(
study: Study, trials: list[FrozenTrial]
) -> tuple[np.ndarray, np.bool | np.ndarray]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second return is definitely not boolean, so I removed this to avoid any confusion.

contribs = np.prod(diff_of_loss_vals_and_ref_point, axis=-1)
selected_indices = np.zeros(subset_size, dtype=int)
selected_vecs = np.empty((subset_size, n_objectives))
indices: np.ndarray[tuple[int, ...], np.dtype[Any]] = np.arange(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typing is also not correct, because indices is always a 1d array.
As this typing says indices can be a ND array, which is not correct, I removed this.

@@ -208,7 +208,8 @@ def local_search_mixed(
# TODO(kAIto47802): Think of a better way to handle this.
lengthscales = 1 / np.sqrt(inverse_squared_lengthscales[continuous_indices])

discrete_indices = np.where(steps > 0)[0]
# NOTE(nabenabe): MyPy Redefinition for NumPy v2.2.0. (Cast signed int to int)
discrete_indices = np.where(steps > 0)[0].astype(int)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By casting from unsinged int to int, we can avoid the issue although this loosens the typing.

Comment on lines 130 to 132
nondominated_indices: np.ndarray[tuple[int, ...], np.dtype[np.signedinteger[Any]]] = np.arange(
n_trials
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 137 to 140
# NOTE(nabenabe0928): Ignore typing for a temporal solution to NumPy v2.2.0 weird typing.
nondominated_indices = nondominated_indices[
nondominated_and_not_top
] # type: ignore[assignment]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any good ways to tell NumPy that the return of this vectorization is gonna be a 1d integer array, so I ignore the typing here for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HideakiImamura
Copy link
Member

@porink0424 @y0z Could you review this PR?

Copy link
Collaborator
@porink0424 porink0424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, PTAL.

Comment on lines 95 to 106
(n_solutions, n_objectives) = rank_i_loss_vals.shape
assert isinstance(n_solutions, int), "MyPy Redefinition for NumPy v2.2.0."
contribs = np.prod(diff_of_loss_vals_and_ref_point, axis=-1)
selected_indices = np.zeros(subset_size, dtype=int)
selected_vecs = np.empty((subset_size, n_objectives))
indices: np.ndarray[tuple[int, ...], np.dtype[Any]] = np.arange(
rank_i_loss_vals.shape[0], dtype=int
)
indices = np.arange(n_solutions)
for k in range(subset_size):
max_index = int(np.argmax(contribs))
selected_indices[k] = indices[max_index]
selected_vecs[k] = rank_i_loss_vals[max_index].copy()
keep = np.ones(contribs.size, dtype=bool)
assert n_solutions - k > 0
keep = np.ones(n_solutions - k, dtype=bool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation code itself has changed, and I'm concerned it might result in different behavior.

This seems to go beyond the scope of the PR, which is meant to fix static analysis issues with mypy, so would it be possible to adjust it to stay within the scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 129 to 131
# TODO(nabenabe): Replace with the following once Python 3.8 is dropped.
# nondominated_indices: np.ndarray[tuple[int], np.dtype[np.signedinteger]] = ...
nondominated_indices = np.arange(n_trials)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original type np.ndarray[tuple[int, ...], np.dtype[np.signedinteger[Any]]] is not incorrect, just loosely typed, so I think that leaving a TODO comment would be better than using a type ignore statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@porink0424 porink0424 added the code-fix Change that does not change the behavior, such as code refactoring. label Dec 13, 2024
Copy link
Member
@y0z y0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@y0z y0z removed their assignment Dec 13, 2024
Copy link
Collaborator
@porink0424 porink0424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the refactor goes beyond addressing the errors by mypy.

Comment on lines 95 to 96
(n_solutions, n_objectives) = rank_i_loss_vals.shape
n_solutions = int(n_solutions) # MyPy Redefinition for NumPy v2.2.0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the previous review, I still have some concerns due to my limited understanding of the logic:

  • Is it safe to remove the assertion comparing subset_size and rank_i_indices.size?
  • Are rank_i_loss_vals.shape[1] and reference_point.size guaranteed to be the same value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to remove the assertion comparing subset_size and rank_i_indices.size?

Yes, it is, because we had assert n_solutions - k > 0 in the for loop, but as we removed this line, I reverted.

The original assertion was assert subset_size < n_solutions.
What we did was assert k < n_solutions, which we, in turn, check till whether subset_size - 1 < n_solutions.
The only difference is that we do not check the case of subset_size < n_solutions, but because of the following line:

if rank_i_indices.size == subset_size:
return rank_i_indices

we were already checking subset_size != n_solutions and subset_size - 1 < n_solutions, leading to subset_size <= n_solutions and subset_size != n_solutions, i.e. subset_size < n_solutions.

This completed the original assertion.

Are rank_i_loss_vals.shape[1] and reference_point.size guaranteed to be the same value?

Yes, it is guaranteed.

First of all, the code-level wise, it is guaranteed as _solve_hssp is only used here.

worst_point = np.max(rank_i_lvals, axis=0)
reference_point = np.maximum(1.1 * worst_point, 0.9 * worst_point)

But most importantly, _solve_hssp is, roughly speaking, a sorting algorithm for an array with a shape of (n_solutions, n_objectives) given a reference point with the shape of (n_objectives, ).

Comment on lines 105 to 106
assert n_solutions - k > 0
keep = np.ones(n_solutions - k, dtype=bool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use contribs.size as it is in the original logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tweaked the code and I made it.
But interestingly, the numpy sizing is now very sensitive to the cast information.

For example, if we replace indices = np.arange(n_solutions) with indices = np.arange(int(n_solutions)), contribs.size does not work anymore.

Anyways, your concern was resolved.

Copy link
Member
@y0z y0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator
@porink0424 porink0424 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation and prompt fix.
LGTM 👍

@porink0424 porink0424 removed their assignment Dec 18, 2024
@y0z y0z merged commit 945f856 into optuna:master Dec 18, 2024
14 checks passed
@y0z y0z added this to the v4.2.0 milestone Dec 18, 2024
@nabenabe0928 nabenabe0928 deleted the refactor-numpy-2.2.0-mypy branch June 5, 2025 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-fix Change that does not change the behavior, such as code refactoring.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0