8000 Automatically set 'V' parameter to cov(X) when using mahalanobis distance metric · Issue #6915 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Automatically set 'V' parameter to cov(X) when using mahalanobis distance metric #6915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jln-ho opened this issue Jun 21, 2016 · 7 comments

Comments

@jln-ho
Copy link
jln-ho commented Jun 21, 2016

Would it be considered bad design to automatically set the 'V' parameter for the mahalanobis distance metric to cov(X) for all estimators that (possibly) use this metric?

I'm asking because it would facilitate the use of the mahalanobis distance in cross validation scenarios in general (because one won't have to set the V parameter over and and over again for each fold), or more specifically when used with GridSearchCV (because with the way things are now, it won't work with a KNeighborsClassifier using the malahanobis distance, for example).

On the other hand, adding code to many estimators' fit() methods just to fix this minor issue seems a bit messy to me. Could there be a more elegant way? I'd be glad to take suggestions and follow them up with a PR.

@amueller
Copy link
Member
amueller commented Oct 7, 2016

What do you mean by "V" parameter?

@amueller
Copy link
Member
amueller commented Oct 13, 2016

ah, this is for the mahalanobis in the trees, I guess... actually the scipy one requires VI (the inverse of the covariance matrix) as a parameter. Can you give an example of the grid-search you want to do @jln-ho ?

@jln-ho
Copy link
Author
jln-ho commented Oct 17, 2016

Yes, sorry, VI is what I meant. Here's an example of how I would want to auto-tune the n_neighbors parameter for a KNeighborsClassifier using GridSearchCV:

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
clf = GridSearchCV(estimator=KNeighborsClassifier(),
                   param_grid={"metric": ["mahalanobis"], "n_neighbors": [3, 5, 7]})
clf.fit(iris.data, iris.target)

This produces an error because the VI parameter is not set. Even if you set it on an instance of KNeighborsClassifier before passing it into GridSearchCV it won't work because GridSearchCV will create new instances of the classifier for each iteration.
If you change "mahalanobis" to e.g. "minkowski" it will work because the minkowski distance does not require any additional parameters.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Oct 17, 2016 via email

@jln-ho
Copy link
Author
jln-ho commented Oct 17, 2016

Yes, I'd be glad. Don't know exactly how long it will take, though. It should be an easy fix so if anyone wants to jump in for me, go right ahead! I probably won't start working on this before the weekend anyway.

@amueller
Copy link
Member

I don't understand why passing it to KNeighborsClassifier doesn't work:

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier

iris = load_iris()
clf = GridSearchCV(estimator=KNeighborsClassifier(metric_params={'V': np.eye(4)}),
                   param_grid={"metric": ["mahalanobis"], "n_neighbors": [3, 5, 7]})
clf.fit(iris.data, iris.target)

@amueller
Copy link
Member

So if you want to have a V that depends on the particular split of the data, that's more tricky, so if you want V to be created dynamically based on the training part of the cross-validation split, I'm not sure how to do that.
For that you could do

make_pipe(PCA(whiten=True), KNeighborsClassifier())

though. That should be what you want, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0