8000 MNT: improved check_dimension type inference, added warnings, tests by QuentinSoubeyran · Pull Request #1067 · scikit-optimize/scikit-optimize · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

MNT: improved check_dimension type inference, added warnings, tests #1067

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

QuentinSoubeyran
Copy link
Contributor
@QuentinSoubeyran QuentinSoubeyran commented Sep 30, 2021

Fixes:

This PR fixes skopt.space.check_dimension to actually follow its own documentation:

  • list always lead to Categorical object
  • tuple lead to Integer or Real object, as inferred by the bound types
  • numpy arrays are treated as lists, and documented
  • For now, the old behaviour of check_dimension is retained, and a warning is raised if the improved tightened inference would be different or fail.

The new behaviour would fix:

  • there is no way to short hand a Categorical of two numbers, as lists also go through the inference (despite the documentation mentioning it only for tuples), which returns an Integer or Real object
  • Mixing int and float bounds may lead to an Integer object storing a float in one of its bounds, leading to crashes when generated points fall outside the float bound due to rounding
  • The documentation doesn't clarify how an Integer is distinguished from a Real, sometime surprising users who used ints but meant a Real object

@pep8speaks
Copy link
pep8speaks commented Sep 30, 2021

Hello @QuentinSoubeyran! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-18 20:30:55 UTC

@QuentinSoubeyran QuentinSoubeyran force-pushed the fix-check_dimension branch 4 times, most recently from 6fe0e96 to 55cefce Compare October 1, 2021 10:52
@QuentinSoubeyran QuentinSoubeyran changed the title [WIP] improved check_dimension type inference, added warnings, tests [MRG] improved check_dimension type inference, added warnings, tests Oct 1, 2021
@QuentinSoubeyran
Copy link
Contributor Author

@kernc @glouppe This is ready for review

def _check_dimension(dimension, transform=None):
if isinstance(dimension, Dimension):
return dimension
elif isinstance(dimension, (list, np.ndarray)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that a tuple of strings is no longer supported. Why does it need to be a list?

Copy link
Contributor Author
@QuentinSoubeyran QuentinSoubeyran Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to tighten the interface to prevent user errors leading to valid, but unwanted, dimension objects. This is most surely inspired by my uses of type-checkers in python, as the above makes the signature of check_dimension not depend on the actual values contained in the passed sequence:

  • list becomes a Categorical
  • tuple[int, int] (and the 3- and 4-long forms) become Integer
  • tuple[int, float], tuple[float, int] and tuple[float, float] (and corresponding 3 and 4-long forms) become Real

It was also to avoid an incorrect tuple silently becoming a Categorical, e.g.

  • (0, 1, "uniform", base) where base="10" was no parsed from a data source, silently returning a Categorical instead of the intended Integer
  • (0, 1, *params) where params=(5, "log-uniform", 2) because the semantic/content of params is wrong, silently falling back to a Categorical
  • [True, False] being inferred to an Integer (though the old implementation has a dedicated branch to avoid that).

We could make it so that tuples that fail the Integer/Real inference default to a Categorical as before. This would also allow to use any Iterable, like so:

if isinstance(dimension, tuple) and 2 <= len(dimension) <= 4:
    low, high, *args = dimension
    if (not args or isinstance(args[0], str)) and (len(args) < 2 or isinstance(args[1], int)):
        if isinstance(low, numbers.Integral) and isinstance(high,  numbers.Integral):
              return Integer(...)
        elif isinstance(low, numbers.Real) and isinstance(high, numbers.Real):
            return Real(...)
# not an else, fallback
if isinstance(dimension, Iterable):
    return Categorical(...)
raise ValueError(...)

Do you prefer that other type-tightness ?

Comment on lines 113 to 120
if (isinstance(low, numbers.Integral)
and isinstance(high, numbers.Integral)):
return Integer(int(low), int(high), *args, transform=transform)
elif isinstance(low, numbers.Real) and isinstance(high, numbers.Real):
return Real(float(low), float(high), *args, transform=transform)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low and high are already int (or float). Why cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The float cast is because one of the boundaries might be an int, and a Real dimension asks for two float in its doc.
The int cast was to support other integral types like bool or np.int32.

Since int is not a sublcass of float, I think the float cast is necessary to respect Real's interface. I'm not too sure about the int cast, I mostly did it for symmetry.

Comment on lines +144 to +148
elif all(isinstance(dim, numbers.Integral) for dim in dimension):
return Integer(*map(int, dimension), transform=transform)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, why this map()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@QuentinSoubeyran
Copy link
Contributor Author

@kernc I made tuples fall back to Categorical dimensions, with a warning if the tuple seems to be wrong. Is this better ?

@QuentinSoubeyran QuentinSoubeyran changed the title [MRG] improved check_dimension type inference, added warnings, tests MNT: improved check_dimension type inference, added warnings, tests Oct 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0