-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So is the issue of non-symmetric restricted to ties? I wonder if we should have a more consistent method for handling ties in KNN. |
@jnothman No, not really. Consider the following example ( Here the kNN graph is asymmetric because the lone point does not have any edges pointed to it. Was that your confusion? Does this issue make sense now? |
nice drawing :) [or you can think of points on a line at np.arange(n)** 2, then the nearest neighbor is always the point to the left] |
So any ideas about how to fix the problem, apart from the crude solution I have implemented? I suspect that the call to |
of course, thanks for the illustration
…On 9 December 2016 at 05:59, Utkarsh Upadhyay ***@***.***> wrote:
So any ideas about how to fix the problem, apart from the crude solution I
have implemented?
I suspect that the call to _get_kernel with y != None (which is used here
<http://data/precomp-20-le-40-contri.pickle>) will also need the same fix.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8008 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz60co9A9EpAD2_4EAVW1aAiffqBhsks5rGFN8gaJpZM4LHRSf>
.
|
Also, fix/update the tests. Fixes scikit-learn#8008.
Also, fix/update the tests. Fixes scikit-learn#8008.
Could you please summarise here what possible (ideally not hacky) options we have for undirected KNN kernels? Thanks a lot. |
A knn graph is a directed graph and non symmetric, in general. If we need symmetry #6727 is a very reasonable solution, but not unique or natural. But it implements a new kernel. I don't think it's a matter of hacky or not. |
Another comment is that symmetry does not imply reachability. Imagine two well separated k-clusters with the available labels all in one with a symmetryzed knn kernel. The label free cluster should remain unlabeled afaik. |
Yes, the make-the-kNN-matrix-symmetric solution is a bit of a hack. It doesn't guarantee connectedness and does not have any theoretical justification (in as far as I can search for). This is a paper which takes a more methodical approach: http://ieeexplore.ieee.org/abstract/document/4358958/?part=1
The underlying idea behind the paper is to first create a k-NN graph, then infer the weight of each edge by minimizing the linear-reconstruction error:
such that More generally, for semi supervised learning on directed graphs, this is one of the key references: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/SDG.pdf
Their algorithm uses one extra parameter I'll add more references as and when I find them. |
If the goal was to have a kernel that can be represented by a sparse
matrix, is connected and symmetric wouldn't the minimum spanning tree fit
the bill? Relation with knn is tenuous at best though. Actually, there is a
paper about knn and mst by Arefin et al but it's behind a paywall.
…On Tue, Sep 12, 2017, 1:02 AM Utkarsh Upadhyay ***@***.***> wrote:
Yes, the make-the-kNN-matrix-symmetric solution is a bit of a hack. It
doesn't guarantee connectedness and does not have any theoretical
justification (in as far as I can search for).
This is a paper which takes a more methodical approach:
http://ieeexplore.ieee.org/abstract/document/4358958/?part=1
Wang, Fei, and Changshui Zhang. "Label propagation through linear
neighborhoods." IEEE Transactions on Knowledge and Data Engineering 20.1
(2008): 55-67.
The underlying idea behind the paper is to first create a k-NN graph, then
infer the *weight* of each edge by minimizing the linear-reconstruction
error:
sum( (x[i] - w[i][j] * x[j]) **2 for i in Vertex(G) for j in kNN(i, G) )
such that sum(x[i][j] for j in kNN(i, G)) == 1 for all i. After that, it
uses a LabelPropagation like approach to propagate the labels. I'm not very
sure how to solve the constrained the LP problem efficiently to create the
kernel.
More generally, for semi supervised learning on directed graphs, this is
one of the key references:
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/SDG.pdf
Zhou, Denny, Thomas Hofmann, and Bernhard Schölkopf. "Semi-supervised
learning on directed graphs." Advances in neural information processing
systems. 2005.
Their algorithm uses one extra parameter lambda to balance out the
weighing of transduction over hubs and authorities, but otherwise can be
reduced to LabelSpreading algorithm with an altered S matrix in
polynomial time (requires figuring out the in-degree and out-degree of each
node).
I'll add more references as and when I find them.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8008 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3RW6UcPoHUxHlLHQkvO6aweu6MT8dVks5shjqbgaJpZM4LHRSf>
.
|
This is with reference to the PR #6727 and is discussed in the comments in #5774.
Description
The current implementation of
LabelPropagation
, insemi_supervised
module, creates directed k-NN graphs when theknn
kernel is chosen.One way to fix it is as shown in musically-ut/semi_supervised.
Steps to Reproduce / Expected Results / Actual ResultsThis is part of the reason I didn't submit it as a separate issue: it is difficult to come up with test cases which are obviously correct or incorrect.
However, none of the papers about semi-supervised learning cited in #6727 considers directed graphs. Using directed graphs has some obvious problems, like rendering parts of the graph completely unreachable, thus limiting the propagation of labels only from roots to descendants.
The text was updated successfully, but these errors were encountered: