-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BUG: clamping is implemented incorrectly in sklearn.semi_supervised.LabelSpreading #5774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I just noticed that the definition of For example. if we set
It is easy to see that only the corrected option will actually clamp the old labels. Someone else should independently verify the mathematics and the documentation here as well. |
indeed. Please have a look at #3550, too. I am a bit too busy at the moment to follow up on this. |
Just a small update to help people who may wander on to this issue: I am currently maintaining a version of the semi-supervised learning algorithm forked from I intend to create a pull request if and when I have a good enough test-case to prove things one way or the other. |
@musically-ut it would be great if you could provide a PR. |
@musically-ut I have done some tests with your code and it seems to be correctly clamping the labels. Do you have plans to submit a PR soon? |
@boechat107 Great to know and good to have some independent verification. :) Just to be clear, I am assuming that you tested The primary problem before I make a PR is coming up with good tests. The current tests, for example, work with the current code, but it is not clear that they are obviously correct (and, indeed, my code produces different results on some of the test cases). I spent a while trying to come up with test cases by hand-solving the problems, but couldn't find examples good enough to be obviously correct and which cover all the corners. Beyond that, there are a few problems with the underlying theory of the implementation which I have fixed in my fork. E.g. the PhD thesis referenced does not consider directional k-NN graphs (only undirected versions), while the implementation uses directed version. I suspect that this was only an oversight. However, this is a major change and I will not be comfortable including it in sklearn without another expert looking at it. |
@musically-ut Yes, I used the code from the repo you maintain. I didn't notice this detail about the graph and I think you are correct. Your implementation seems to fix this too, assuming it is another bug. But what do you think about fixing one issue at a time? It seems clear that you fixed the label clamping (I'm going to check I think I'll have some more days to work with this and I could help with something, like some test cases. @amueller, would you like to give us some comments? The PR #3758 looks inactive... |
Changes to fixing scikit-learn#5774 (label clamping)
Fixed in #9239 |
The code which does clamping in
sklearn.semi_supervised.LabelSpreading
appears to be incorrect:This does the following:
i
th sample is labeled, then:y_new[i] = 1.0 * M * y_old[i] + (1 - alpha) * y_init[i]
i
th sample is unlabeled, then:y_new[i] = alpha * M * y_old[i] + 0.0
This is clearly incorrect. The correct way to do this is:
i
th sample is labeled, then:y_new[i] = alpha * M * y_old[i] + (1 - alpha) * y_init[i]
i
th sample is unlabeled, then:y_new[i] = 1.0 * M * y_old[i] + 0.0
The fix is relatively simple:
I can create a PR for this but am not sure what kind of test cases I should add to avoid a regression, if any.
Test case:
With the fix in place, it takes only 6 iterations.
The text was updated successfully, but these errors were encountered: