8000 Local outlier factor gives incorrect results · Issue #9874 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

Local outlier factor gives incorrect results #9874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeroneandrews-sony opened this issue Oct 5, 2017 · 5 comments
Closed

Local outlier factor gives incorrect results #9874

jeroneandrews-sony opened this issue Oct 5, 2017 · 5 comments

Comments

@jeroneandrews-sony
Copy link
jeroneandrews-sony commented Oct 5, 2017

Steps/Code to Reproduce

import numpy as np
from sklearn.neighbors import LocalOutlierFactor
c=np.array([[0,0],[0,1],[1,1],[3,0]])
k=2 #numNeighbours
clf = LocalOutlierFactor(n_neighbors=k, n_jobs=-1,algorithm='brute',metric='manhattan').fit(c) 
Z = clf._decision_function(c)
print(-Z)
[ 0.875  0.875  0.875  1.5  ]

However, the result should be: [ 0.875 1.333 0.875 2. ] if you work it out by hand (assuming a sample can't be the 1st nearest neighbour to itself). So, I think there is something wrong with the LOF implementation. Could somebody confirm either way?

@jeroneandrews-sony
Copy link
Author
jeroneandrews-sony commented Oct 6, 2017

I have also just compared the scikit result with the R package 'Rlof', which also gives me a result of:

c = matrix(c(0,0,0,1,1,1,3,0),nrow=4, ncol=2,byrow=TRUE)
> lof(c, 2, method="manhattan")
[1] 0.875000 1.333333 0.875000 2.000000

So there is definitely something wrong with the scikit implementation.

@albertcthomas
Copy link
Contributor
albertcthomas commented Oct 6, 2017

The LOFs of the training samples are given by clf.negative_outlier_factor_ which gives the opposite of the output you obtain with Rlof.

import numpy as np
from sklearn.neighbors import LocalOutlierFactor
c=np.array([[0,0],[0,1],[1,1],[3,0]])
k=2 #numNeighbours
clf = LocalOutlierFactor(n_neighbors=k, n_jobs=-1,algorithm='brute',metric='manhattan').fit(c) 
Z = clf.negative_outlier_factor_
print(-Z)
[ 0.875  1.33333333  0.875   2. ]

The private _decision_function is meant to be used on test samples.

@ngoix
Copy link
Contributor
ngoix commented Oct 6, 2017

@jeroneandrews have you tried using fit_predict (public) method? LOF predict (private) method is meant to extend the use of LOF to new data, and thus consider new data as "duplicates" if they already are in the training set.

@ngoix
Copy link
Contributor
ngoix commented Oct 6, 2017

...predict, or more precisely decision_function private method in your case.
edit: As @albertcthomas pointed out, clf.negative_outlier_factor_ gives the correct answer.

@jeroneandrews-sony
Copy link
Author

@albertcthomas @ngoix
Thank you for your responses, I missed negative_outlier_factor_ in the documentation 775B .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0