-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[WIP] Sample properties #4696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Sample properties #4696
Conversation
ba9cc83
to
e7f0509
Compare
ping @GaelVaroquaux @ogrisel @jnothman @vene @agramfort |
And also feedback for the issues. |
|
I would at least not use abbreviations, and go instead for something like Another thing which is not entirely clear for me. Why couldnt we simply put these properties into |
The handling of In a way, this is mostly renaming One thing that is not nice about After doing this, I am not entirely convinced any more that adding new api is better than what we have + |
We could use the convention that any fit_param starting by |
That idea is appealing and I will have to think a little more about it. One On 12 May 2015 at 11:12, Mathieu Blondel notifications@github.com wrote:
|
if we use sample_props column names should be weight, group which is not
that awkward.
|
The problem with sample_props is that support in a method for a particular On 12 May 2015 at 17:05, Alexandre Gramfort notifications@github.com
|
what do you suggest then? **kwargs are not an option
|
|
sample_props doesn't seem fundamentally very different from fit_params. It's a dict which contains additional data to be used for fitting. It's different from a **kwargs: the name fit_params appears in the doc string. We just need a rule to decide when the additional data passed in fit_params needs to be sliced or not for cross validation. Above, I suggested that any data whose key is prefixed by sample_ needs to be sliced. I didn't completely get why we can't use data frames as input to fit_params.
ok point taken. Now the fit_params could be used in transform for
example. So fit_ is not necessarily a good name...
|
Good point for the name... |
@mblondel the problem is the way that cross_val_score(GridSearchCV(SGDClassifier(), params), fit_params={'sample_weights':sample_weights}) because then In that way I disagree with
I would say
|
Btw, maybe to make my goals more clear. I want these two things to work: cross_val_score(GridSearchCV(SGDClassifier(), params), i_dont_care_what_its_called={'sample_weights':sample_weights}) and cross_val_score(GridSearchCV(SGDClassifier(), params, cv=LeaveOneLabelOut()), i_dont_care_what_its_called={'labels': groups}, cv=LeaveOneLabelOut()) |
I think this remains the biggest basic design issue. It may deserve discussion at the sprint. It is tricky, and I suspect some backwards compatibility issues will have to be ignored. |
Do we have a list of use-cases? do we want a SLEP? As I pointed out above, the current sample_weight support is lacking. What other applications do we have? the cross-validation groups? Anything else? |
Shot at #4497.
Todo:
Issues
len
ofdict
s behaves different from recarray and dataframes. Not supporting dicts might lead to a cleaner interface, but not sure.