API for non continuous inputs #59

glouppe · 2016-04-20T05:28:34Z

At the moment, input values are assumed to live within a bounded continuous range. We should think about an API on how to specify integer and symbolic values as well, and what would be the consequences for the algorithms we implemented so far.

betatim · 2016-04-20T05:40:09Z

These are two existing ways of specifying such things, hyperopt and GridSearch:

space = hp.choice('a',
    [
        ('case 1', 1 + hp.lognormal('c1', 0, 1)),
        ('case 2', hp.uniform('c2', -10, 10))
    ])

param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

It would be good to adopt an existing one if it isn't too awkward for us. We could reuse some of the code for the sampling etc and as a user I don't have to get used to yet another slightly different way of specifying choices.

glouppe · 2016-04-20T05:43:14Z

Good idea indeed. I would be in favour of following as closely as possible elements from the scikit-learn API. I am not sure if it really fits though, as we would lack a way to specify the scale.

betatim · 2016-04-20T07:11:31Z

An idea:

param_grid = [
  {'C': [1, 10, 100, 1000], # categorical C
   'kernel': ['linear']},
  {'C': skopt.range(1,1000), # continuos C
   'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

If you pass a list/tuple we assume it is categorical, to specify a range you have to use skopt.range (better name?). Could also flip it around depending on what is more common use case.

param_grid = [
  {'C': [1, 10, 100, 1000], # categorical C because len()>2
   'kernel': ['linear']},
  {'C': [1,1000], # continuos C because len()==2
   # categorical gamma, special notation to distinguish 2 element categorical
   'gamma': skopt.categorical(0.001, 0.0001),
   # single element or multi element list of non numerical -> categorical
   'kernel': ['rbf'],
   'something_categorical': ['hello', 'world']},
 ]

glouppe · 2016-04-20T07:18:25Z

Hmmm, why not. Maybe take the same as in https://github.com/hyperopt/hyperopt/wiki/FMin#21-parameter-expressions but with some syntax shortcuts? (a list == hp.choice, a pair == hp.uniform)

MechCoder · 2016-04-20T21:04:07Z

We are not limiting ourselves to ML hyperparamters right? Why not something simplistic like

gp_minimize(bounds, discrete, log_scale)

where discrete and log_scale are boolean masks or integer arrays as returned by np.where. What are the disadvantages of such an interface? I think we should keep the interface as similar to SciPy as possible rather than sklearn

MechCoder · 2016-04-20T21:26:48Z

We will not able to support true categorical parameters like a SVM kernel for instance but we should advise the users to manually provide and get the lower and upper bounds rather than complicate the API for this specific use case.

MechCoder · 2016-04-21T02:42:34Z

While working on #62 I realize that categorical is not the same as discrete.

For instance, knowing that a DecisionTree with max_depth=100 gives a certain cv score, we know that max_depth=99 is not going to change much. However, knowing the cv score about a SVM RBF kernel (encoded as 1) does not tell anything about a linear kernel (encoded as 0). How are we going to handle that!?

betatim · 2016-04-21T11:24:32Z

I'd like to be able to support something like the SVM RBF use case. This means supporting the use case where the value of a variable A determines what other variables are "valid". As an example: if A=1 then there are parameters B and C to be set but if A=2 then there are parameters C and D to set.

I like having one "object" that describes the whole search space instead of splitting it up/having to specify masks. At least I'd hope we can find a way to write down the whole search space that is easy for humans to write down without making a mistake. So for me finding a good syntax is the top priority.

MechCoder · 2016-04-21T13:20:12Z

I was just afraid that it was becoming a bit too verbose, but I don't find a better option. How do you propose to handle the distinction between categorical and discrete?

betatim · 2016-04-24T16:53:35Z

Could we use:

skopt.categorical(0.001, 0.0001) # two choices
(10, 100) # integer values in the range 10 - 100
(10., 100.) # continuos values in the range 10 - 100
(1, 2, 4, 8) # categorical
("hello", "world") # category

Would that work?

merge upstream

glouppe added API Major labels Apr 20, 2016

betatim mentioned this issue Apr 25, 2016

[WIP] API for specifying discrete and categorical variables #70

Merged

3 tasks

MechCoder closed this as completed Jun 10, 2016

holgern added a commit that referenced this issue Feb 26, 2020

Merge pull request #59 from scikit-optimize/master

c8ba0e8

merge upstream

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API for non continuous inputs #59

API for non continuous inputs #59

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

API for non continuous inputs #59

API for non continuous inputs #59

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!