[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008

musically-ut · 2016-12-07T23:47:56Z

This is with reference to the PR #6727 and is discussed in the comments in #5774.

Description

The current implementation of LabelPropagation, in semi_supervised module, creates directed k-NN graphs when the knn kernel is chosen.

One way to fix it is as shown in musically-ut/semi_supervised.

Steps to Reproduce / Expected Results / Actual Results

This is part of the reason I didn't submit it as a separate issue: it is difficult to come up with test cases which are obviously correct or incorrect.

However, none of the papers about semi-supervised learning cited in #6727 considers directed graphs. Using directed graphs has some obvious problems, like rendering parts of the graph completely unreachable, thus limiting the propagation of labels only from roots to descendants.

The text was updated successfully, but these errors were encountered:

jnothman · 2016-12-08T02:00:29Z

So is the issue of non-symmetric restricted to ties? I wonder if we should have a more consistent method for handling ties in KNN.

musically-ut · 2016-12-08T08:00:17Z

@jnothman No, not really. Consider the following example (k = 2):

Here the kNN graph is asymmetric because the lone point does not have any edges pointed to it.

Was that your confusion? Does this issue make sense now?

amueller · 2016-12-08T16:55:34Z

nice drawing :) [or you can think of points on a line at np.arange(n)** 2, then the nearest neighbor is always the point to the left]

musically-ut · 2016-12-08T18:59:07Z

So any ideas about how to fix the problem, apart from the crude solution I have implemented?

I suspect that the call to _get_kernel with y != None (which is used here) will also need the same fix.

jnothman · 2016-12-09T03:32:06Z

of course, thanks for the illustration

…

On 9 December 2016 at 05:59, Utkarsh Upadhyay ***@***.***> wrote: So any ideas about how to fix the problem, apart from the crude solution I have implemented? I suspect that the call to _get_kernel with y != None (which is used here <http://data/precomp-20-le-40-contri.pickle>) will also need the same fix. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8008 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60co9A9EpAD2_4EAVW1aAiffqBhsks5rGFN8gaJpZM4LHRSf> .

Also, fix/update the tests. Fixes scikit-learn#8008.

jnothman · 2017-07-29T11:43:36Z

Could you please summarise here what possible (ideally not hacky) options we have for undirected KNN kernels? Thanks a lot.

piccolbo · 2017-09-01T17:39:55Z

A knn graph is a directed graph and non symmetric, in general. If we need symmetry #6727 is a very reasonable solution, but not unique or natural. But it implements a new kernel. I don't think it's a matter of hacky or not.

piccolbo · 2017-09-01T17:53:55Z

Another comment is that symmetry does not imply reachability. Imagine two well separated k-clusters with the available labels all in one with a symmetryzed knn kernel. The label free cluster should remain unlabeled afaik.

musically-ut · 2017-09-12T08:00:55Z

Yes, the make-the-kNN-matrix-symmetric solution is a bit of a hack. It doesn't guarantee connectedness and does not have any theoretical justification (in as far as I can search for).

This is a paper which takes a more methodical approach: http://ieeexplore.ieee.org/abstract/document/4358958/?part=1

Wang, Fei, and Changshui Zhang. "Label propagation through linear neighborhoods." IEEE Transactions on Knowledge and Data Engineering 20.1 (2008): 55-67.

The underlying idea behind the paper is to first create a k-NN graph, then infer the weight of each edge by minimizing the linear-reconstruction error:

sum( (x[i] - w[i][j] * x[j]) **2 for i in Vertex(G) for j in kNN(i, G) )

such that sum(x[i][j] for j in kNN(i, G)) == 1 for all i. After that, it uses a LabelPropagation like approach to propagate the labels. I'm not very sure how to solve the constrained the LP problem efficiently to create the kernel.

More generally, for semi supervised learning on directed graphs, this is one of the key references: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/SDG.pdf

Zhou, Denny, Thomas Hofmann, and Bernhard Schölkopf. "Semi-supervised learning on directed graphs." Advances in neural information processing systems. 2005.

Their algorithm uses one extra parameter lambda to balance out the weighing of transduction over hubs and authorities, but otherwise can be reduced to LabelSpreading algorithm with an altered S matrix in polynomial time (requires figuring out the in-degree and out-degree of each node).

I'll add more references as and when I find them.

piccolbo · 2017-09-12T19:02:36Z

If the goal was to have a kernel that can be represented by a sparse matrix, is connected and symmetric wouldn't the minimum spanning tree fit the bill? Relation with knn is tenuous at best though. Actually, there is a paper about knn and mst by Arefin et al but it's behind a paywall.

…

On Tue, Sep 12, 2017, 1:02 AM Utkarsh Upadhyay ***@***.***> wrote: Yes, the make-the-kNN-matrix-symmetric solution is a bit of a hack. It doesn't guarantee connectedness and does not have any theoretical justification (in as far as I can search for). This is a paper which takes a more methodical approach: http://ieeexplore.ieee.org/abstract/document/4358958/?part=1 Wang, Fei, and Changshui Zhang. "Label propagation through linear neighborhoods." IEEE Transactions on Knowledge and Data Engineering 20.1 (2008): 55-67. The underlying idea behind the paper is to first create a k-NN graph, then infer the *weight* of each edge by minimizing the linear-reconstruction error: sum( (x[i] - w[i][j] * x[j]) **2 for i in Vertex(G) for j in kNN(i, G) ) such that sum(x[i][j] for j in kNN(i, G)) == 1 for all i. After that, it uses a LabelPropagation like approach to propagate the labels. I'm not very sure how to solve the constrained the LP problem efficiently to create the kernel. More generally, for semi supervised learning on directed graphs, this is one of the key references: https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/SDG.pdf Zhou, Denny, Thomas Hofmann, and Bernhard Schölkopf. "Semi-supervised learning on directed graphs." Advances in neural information processing systems. 2005. Their algorithm uses one extra parameter lambda to balance out the weighing of transduction over hubs and authorities, but otherwise can be reduced to LabelSpreading algorithm with an altered S matrix in polynomial time (requires figuring out the in-degree and out-degree of each node). I'll add more references as and when I find them. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8008 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA3RW6UcPoHUxHlLHQkvO6aweu6MT8dVks5shjqbgaJpZM4LHRSf> .

This was referenced Jun 22, 2017

[WIP] Try finish #6727: alpha deprecation in LabelPropagation #9192

Closed

[MRG+1] Fix semi_supervised #9239

Merged

alpapado mentioned this issue Jul 7, 2017

[BUG] Label propagation sometimes produces label_distributions that contain Nan. #9292

Closed

musically-ut added a commit to musically-ut/scikit-learn that referenced this issue Jul 23, 2017

Make knn kernel undirected.

3d4c1f3

Also, fix/update the tests. Fixes scikit-learn#8008.

musically-ut mentioned this issue Jul 23, 2017

[WIP] Make knn kernel undirected. #9439

Closed

1 task

musically-ut added a commit to musically-ut/scikit-learn that referenced this issue Jul 28, 2017

Make knn kernel undirected.

d99d853

Also, fix/update the tests. Fixes scikit-learn#8008.

musically-ut mentioned this issue Aug 9, 2018

Label Spreading: knn Kernel Variant differs from Reference Paper #11784

Open

cmarmo added module:semi_supervised Enhancement labels Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008

[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008

[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008

[BUG] LabelPropagation should use undirected graphs for knn kernel. #8008

Comments

Description

Steps to Reproduce / Expected Results / Actual Results