-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Automatically set 'V' parameter to cov(X) when using mahalanobis distance metric #6915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What do you mean by "V" parameter? |
ah, this is for the mahalanobis in the trees, I guess... actually the scipy one requires VI (the inverse of the covariance matrix) as a parameter. Can you give an example of the grid-search you want to do @jln-ho ? |
Yes, sorry, VI is what I meant. Here's an example of how I would want to auto-tune the from sklearn.grid_search import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
clf = GridSearchCV(estimator=KNeighborsClassifier(),
param_grid={"metric": ["mahalanobis"], "n_neighbors": [3, 5, 7]})
clf.fit(iris.data, iris.target) This produces an error because the |
+1. Do you want to do a pull request?
|
Yes, I'd be glad. Don't know exactly how long it will take, though. It should be an easy fix so if anyone wants to jump in for me, go right ahead! I probably won't start working on this before the weekend anyway. |
I don't understand why passing it to from sklearn.grid_search import GridSearchCV
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
iris = load_iris()
clf = GridSearchCV(estimator=KNeighborsClassifier(metric_params={'V': np.eye(4)}),
param_grid={"metric": ["mahalanobis"], "n_neighbors": [3, 5, 7]})
clf.fit(iris.data, iris.target) |
So if you want to have a V that depends on the particular split of the data, that's more tricky, so if you want make_pipe(PCA(whiten=True), KNeighborsClassifier()) though. That should be what you want, right? |
Would it be considered bad design to automatically set the 'V' parameter for the mahalanobis distance metric to cov(X) for all estimators that (possibly) use this metric?
I'm asking because it would facilitate the use of the mahalanobis distance in cross validation scenarios in general (because one won't have to set the V parameter over and and over again for each fold), or more specifically when used with GridSearchCV (because with the way things are now, it won't work with a KNeighborsClassifier using the malahanobis distance, for example).
On the other hand, adding code to many estimators'
fit()
methods just to fix this minor issue seems a bit messy to me. Could there be a more elegant way? I'd be glad to take suggestions and follow them up with a PR.The text was updated successfully, but these errors were encountered: