-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
grid_search: feeding parameters to scorer functions #8158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also see #4632. What other use-cases are there apart from |
Good question. A list of use-cases would be:
Thanks for pointing out the related thread #4632. Answering your last point on that thread, I believe a
|
Parameters which, unlike sample_weight, are independent of the data
sub-sample, can be provided using make_scorer.
…On 6 January 2017 at 20:23, xavierfontaine ***@***.***> wrote:
Reopened #8158 <#8158>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8158 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68GfHeSCA-eqGsKK4zulCBRyOJNWks5rPggugaJpZM4LcBJy>
.
|
Fair point, I guess only the use of data-dependent arguments in cross-validation would require an evolution of sklearn's code. Then I guess we are back to the first issue I opened (#8156): what would be the best way to allow for validation sample weights in I do not feel qualified to express an opinion on whether implementing sample props (#4497) should be done before tackling this issue, or whether the proposal I made above (or something else entirely) would be a way to go in the meanwhile. |
Distributing the samples is not a real issue. The API issue, even if
limited to sample_weight,
8000
is about making it possible to switch on
sample_weight being passed to scorers or not. But ideally, this should be
consistent with how we handle other things like sample_weight. It may also
be ideal to support passing sample_weight to scoring, but not to the
estimator to be fit (though I admit this can be solved hackily by wrapping
the estimator). So some way of routing additional parameters to fit is
really what we're talking about.
…On 9 January 2017 at 19:24, xavierfontaine ***@***.***> wrote:
Fair point, I guess only the use of data-dependent arguments in
cross-validation would require an evolution of sklearn's code.
Then I guess we are back to the first issue I opened (#8156
<#8156>): what would
be the best way to allow for validation sample weights in grid_search?
I do not feel qualified to express an opinion on whether we should wait
for implementing sample props (#4497
<#4497>) before
tackling this issue, whether the proposal I made above would be a way to
go, or whether something else entirely would be preferable.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8158 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66DY22VpJ4JZZgT9r0sAsYFLMtmMks5rQe6tgaJpZM4LcBJy>
.
|
I would be interested in this. I am running a wrapper using Keras, and it shows a progress bar even when I set Sample code: from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
def configure(nodes=10):
model = Sequential()
model.add(layers.Dense(nodes))
model.add(layers.Dense(2))
model.compile()
return model
hyperparams = {'nodes': [20, 40, 60, 80, 100, 128]}
GridSearchCV(estimator=KerasClassifier(configure),
param_grid=hyperparams).fit(data, labels, verbose=0) Output:
The reason I think the problem is _fit_and_score:
Notice how kwargs are passed before and after. |
I just got a task where I have to implement a custom metric that needs additional data per sample, like this: X, y = make_classification()
k = np.random.rand(len(X)) # additional data that's required to calculate the custom metric
def my_metric(y, y_pred, k):
return np.average(y*y_pred*k) # just an example... Using this in a GridSearchCV: gridsrch = GridSearchCV(estimator = LogisticRegression(),
param_grid = {"C": [0.1, 1, 10]},
scoring = make_scorer(my_metric, k=k), # passing k to the scorer...
cv = 3)
gridsrch.fit(X, y) ... fails with the following error in line
I actually found a workaround by using pandas data frames / series and abusing their indexes... the following works as expected: # convert X, y and k to pandas data frames / series
df_X = pd.DataFrame(X)
df_y = pd.Series(y)
df_k = pd.Series(k)
def my_metric(y, y_pred, k):
# abuse y's index to grab the right elements from k
return np.average(y*y_pred*k.loc[y.index])
gridsrch = GridSearchCV(estimator = LogisticRegression(),
param_grid = {"C": [0.1, 1, 10]},
scoring = make_scorer(my_metric, k=df_k),
cv = 3)
gridsrch.fit(df_X, df_y) ... but it'd be great if this also worked without pandas data frames somehow. |
Description
GridSearchCV
andRandomizedSearchCV
do not allow for passing parameters to thescorer
function. This could be made possible by adding an extrascorer_params
, similar to thefit_params
argument.For consistency with
fit_params
, special care would have to be paid to sample weights. Weights fed throughfit_params
are currently well-distributed across training folds. Similarly, weights fed throughscorer_params
would have to be distributed across validation folds.Is adding this feature under consideration?
If not, should I give it a try?
(Nota: follow-up issue to the closed #8156. I believed the change in focus required a new thread.)
The text was updated successfully, but these errors were encountered: