8000 API for non continuous inputs · Issue #59 · scikit-optimize/scikit-optimize · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

API for non continuous inputs #59

Closed
glouppe opened this issue Apr 20, 2016 · 10 comments
Closed

API for non continuous inputs #59

glouppe opened this issue Apr 20, 2016 · 10 comments

Comments

@glouppe
Copy link
Member
glouppe commented Apr 20, 2016

At the moment, input values are assumed to live within a bounded continuous range. We should think about an API on how to specify integer and symbolic values as well, and what would be the consequences for the algorithms we implemented so far.

@betatim
Copy link
Member
betatim commented Apr 20, 2016

These are two existing ways of specifying such things, hyperopt and GridSearch:

space = hp.choice('a',
    [
        ('case 1', 1 + hp.lognormal('c1', 0, 1)),
        ('case 2', hp.uniform('c2', -10, 10))
    ])
param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

It would be good to adopt an existing one if it isn't too awkward for us. We could reuse some of the code for the sampling etc and as a user I don't have to get used to yet another slightly different way of specifying choices.

@glouppe
Copy link
Member Author
glouppe commented Apr 20, 2016

Good idea indeed. I would be in favour of following as closely as possible elements from the scikit-learn API. I am not sure if it really fits though, as we would lack a way to specify the scale.

@betatim
Copy link
Member
betatim commented Apr 20, 2016

An idea:

param_grid = [
  {'C': [1, 10, 100, 1000], # categorical C
   'kernel': ['linear']},
  {'C': skopt.range(1,1000), # continuos C
   'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

If you pass a list/tuple we assume it is categorical, to specify a range you have to use skopt.range (better name?). Could also flip it around depending on what is more common use case.

param_grid = [
  {'C': [1, 10, 100, 1000], # categorical C because len()>2
   'kernel': ['linear']},
  {'C': [1,1000], # continuos C because len()==2
   # categorical gamma, special notation to distinguish 2 element categorical
   'gamma': skopt.categorical(0.001, 0.0001),
   # single element or multi element list of non numerical -> categorical
   'kernel': ['rbf'],
   'something_categorical': ['hello', 'world']},
 ]

@glouppe
Copy link
Member Author
glouppe commented Apr 20, 2016

Hmmm, why not. Maybe take the same as in https://github.com/hyperopt/hyperopt/wiki/FMin#21-parameter-expressions but with some syntax shortcuts? (a list == hp.choice, a pair == hp.uniform)

@MechCoder
Copy link
Member

We are not limiting ourselves to ML hyperparamters right? Why not something simplistic like

gp_minimize(bounds, discrete, log_scale)

where discrete and log_scale are boolean masks or integer arrays as returned by np.where. What are the disadvantages of such an interface? I think we should keep the interface as similar to SciPy as possible rather than sklearn

@MechCoder
Copy link
Member
MechCoder commented Apr 20, 2016

We will not able to support true categorical parameters like a SVM kernel for instance but we should advise the users to manually provide and get the lower and upper bounds rather than complicate the API for this specific use case.

@MechCoder
Copy link
Member

While working on #62 I realize that categorical is not the same as discrete.

For instance, knowing that a DecisionTree with max_depth=100 gives a certain cv score, we know that max_depth=99 is not going to change much. However, knowing the cv score about a SVM RBF kernel (encoded as 1) does not tell anything about a linear kernel (encoded as 0). How are we going to handle that!?

@betatim
Copy link
Member
betatim commented Apr 21, 2016

I'd like to be able to support something like the SVM RBF use case. This means supporting the use case where the value of a variable A determines what other variables are "valid". As an example: if A=1 then there are parameters B and C to be set but if A=2 then there are parameters C and D to set.

I like having one "object" that describes the whole search space instead of splitting it up/having to specify masks. At least I'd hope we can find a way to write down the whole search space that is easy for humans to write down without making a mistake. So for me finding a good syntax is the top priority.

@MechCoder
Copy link
Member

I was just afraid that it was becoming a bit too verbose, but I don't find a better option. How do you propose to handle the distinction between categorical and discrete?

@betatim
Copy link
Member
betatim commented Apr 24, 2016

Could we use:

skopt.categorical(0.001, 0.0001) # two choices
(10, 100) # integer values in the range 10 - 100
(10., 100.) # continuos values in the range 10 - 100
(1, 2, 4, 8) # categorical
("hello", "world") # category

Would that work?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants
0