-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Fix semi_supervised #9239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Fix semi_supervised #9239
Conversation
could you please summarise what makes this wip: what is unresolved?
…On 28 Jun 2017 4:57 pm, "Utkarsh Upadhyay" ***@***.***> wrote:
Closes #3550 <#3550>,
#5774 <#5774>, #3758
<#3758>, #6727
<#6727> and deprecates
#9192 <#9192>.
Pinging @jnothman <https://github.com/jnothman>, @boechat107
<https://github.com/boechat107>, @MechCoder <https://github.com/mechcoder>.
Am I forgetting someone?
------------------------------
You can view, comment on, or merge this pull request online at:
#9239
Commit Summary
- Files for my dev environment with Docker
- Fixing label clamping (alpha=0 for hard clamping)
- Deprecating alpha, fixing its value to zero
- Correct way to deprecate alpha for LabelPropagation
- Detailed info for LabelSpreading's alpha parameter
- Minor changes in the deprecation message
- Improving "deprecated" doc string and raising DeprecationWarning
- Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha
- Removal of my development files
- Using sphinx's "deprecated" tag (jnothman's suggestion)
- Deprecation warning: stating that the alpha's value will be ignored
- Use __init__ with alpha=None
- Merge branch 'master' into lpalpha
- Update what's new
- Merge pull request #2 from jnothman/lpalpha
- Merge branch 'master' into issue-5774
- Try fix RuntimeWarning in test_alpha_deprecation
- DOC Indent deprecation details
- DOC wording
- Update docs
- Change to the one true implementation.
- Add sanity-checked impl. of Label{Propagation,Spreading}
- Raise ValueError if alpha is invalid in LabelSpreading.
- Add a normalizing step before clamping to LabelPropagation.
File Changes
- *M* doc/modules/label_propagation.rst
<https://github.com/scikit-learn/scikit-learn/pull/9239/files#diff-0>
(4)
- *M* doc/whats_new.rst
<https://github.com/scikit-learn/scikit-learn/pull/9239/files#diff-1>
(5)
- *M* examples/semi_supervised/plot_label_propagation_structure.py
<https://github.com/scikit-learn/scikit-learn/pull/9239/files#diff-2>
(2)
- *M* sklearn/semi_supervised/label_propagation.py
<https://github.com/scikit-learn/scikit-learn/pull/9239/files#diff-3>
(61)
- *M* sklearn/semi_supervised/tests/test_label_propagation.py
<https://github.com/scikit-learn/scikit-learn/pull/9239/files#diff-4>
(66)
Patch Links:
- https://github.com/scikit-learn/scikit-learn/pull/9239.patch
- https://github.com/scikit-learn/scikit-learn/pull/9239.diff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9239>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz67qW6fQwY3Muzsa3kn9eNbATqrJ8ks5sIflvgaJpZM4OHkEw>
.
|
Just making Travis happy, I think. Overall, this is a piecemeal effort to fix semi_supervised module as some other issues will remain.
I'm fixing the flake8 failures as we speak. |
The failure on appveyor was unrelated to the current changes. |
can you please merge master? that should get rid of the appveyor error. |
The previous way was breaking the test sklearn.tests.test_common.test_all_estimators
Based on the original paper.
This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests.
Updated; awaiting review. |
@@ -264,13 +280,21 @@ def fit(self, X, y): | |||
l_previous = self.label_distributions_ | |||
self.label_distributions_ = safe_sparse_dot( | |||
graph_matrix, self.label_distributions_) | |||
|
|||
if alpha == 'propagation': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be self.variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah!
I wish we had a test to see if normalization was working. :(
@@ -264,13 +280,21 @@ def fit(self, X, y): | |||
l_previous = self.label_distributions_ | |||
self.label_distributions_ = safe_sparse_dot( | |||
graph_matrix, self.label_distributions_) | |||
|
|||
if alpha == 'propagation': | |||
# LabelPropagation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and i think this comment is then redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is there some way to invent a graph where you get very different normalisation factors for different rows, so that that substantially affects the propagation?
doc/whats_new.rst
Outdated
- Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement | ||
``LabelPropagation`` and ``LabelSpreading`` as done in the referenced | ||
papers. :class:`semi_supervised.LabelPropagation` now always does hard | ||
clamping. Its ``alpha`` parameter now defaults to ``None`` and is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should say that "alpha has no effect and is deprecated to be ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
if self.variant == 'propagation': | ||
# LabelPropagation | ||
clamp_weights = np.ones((n_samples, 1)) | ||
clamp_weights[~unlabeled, 0] = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the same as clamp_weights = unlabelled.reshape(-1, 1)
clamp_weights[~unlabeled, 0] = 0.0 | ||
else: | ||
# LabelSpreading | ||
clamp_weights = alpha * np.ones((n_samples, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use clamp_weights = alpha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have removed clamp_weights
completely in favor of keeping only alpha
.
normalizer = np.sum( | ||
self.label_distributions_, axis=1)[:, np.newaxis] | ||
self.label_distributions_ /= normalizer | ||
|
||
# clamp | ||
self.label_distributions_ = np.multiply( | ||
clamp_weights, self.label_distributions_) + y_static |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. In the label prop case, we could just be using np.where(unlabelled, self.label_distributions, y_static)
unless I'm mistaken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, np.where(unlabeled[:, np.newaxis], self.label_distributions_, y_static)
to be more precise. Have factored out unlabeled[:, np.newaxis]
in the actual implementation.
Err ... the Travis failures, this time, seem to be unrelated to this PR. |
@@ -350,6 +373,14 @@ class LabelPropagation(BaseLabelPropagation): | |||
LabelSpreading : Alternate label propagation strategy more robust to noise | |||
""" | |||
|
|||
variant = 'propagation' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this a private attribute _variant
. Otherwise it could be consider public API.
@jnothman I think we can merge once this is made private. Ok on your side?
You can ignore tests failing with "HTTP Error 500" on mldata.org. |
Updated. PS: I don't remember whether merely pushing commits sends out an notification e-mail to subscribers by default or not on GitHub (I vaguely recall turning off those notifications for myself somewhere). In case it does, please tell me and I'm sorry about the noise coming from these "Updated" comments. |
It's fine, let's wait for @jnothman's feedback and CI results. |
Yes, I'm +1 for merge, because it would be good to finally have this fixed in the release. But it would be nice if we had a non-regression test for that normalisation issue. Want to think about it in the background and try propose something, @musically-ut? |
I will be offline soon. @jnothman I let you handle the end and merging of this PR with @musically-ut :) |
Thanks and well done, @musically-ut, @boechat107, @MechCoder and all. |
AWESOME! thank you so much to everybody working on this! |
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is invalid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
* Files for my dev environment with Docker * Fixing label clamping (alpha=0 for hard clamping) * Deprecating alpha, fixing its value to zero * Correct way to deprecate alpha for LabelPropagation The previous way was breaking the test sklearn.tests.test_common.test_all_estimators * Detailed info for LabelSpreading's alpha parameter Based on the original paper. * Minor changes in the deprecation message * Improving "deprecated" doc string and raising DeprecationWarning * Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha This solution isn't great, but it sets the correct value for alpha without violating the restrictions imposed by the tests. * Removal of my development files * Using sphinx's "deprecated" tag (jnothman's suggestion) * Deprecation warning: stating that the alpha's value will be ignored * Use __init__ with alpha=None * Update what's new * Try fix RuntimeWarning in test_alpha_deprecation * DOC Indent deprecation details * DOC wording * Update docs * Change to the one true implementation. * Add sanity-checked impl. of Label{Propagation,Spreading} * Raise ValueError if alpha is inv 179B alid in LabelSpreading. * Add a normalizing step before clamping to LabelPropagation. * Fix flake8 errors. * Remove duplicate imports. * DOC Update What's New. * Specify alpha's value in the error. * Tidy up tests. Add a test and add references, where needed. * Add comment to non-regression test. * Fix documentation. * Move check for alpha into fit from __init__. * Fix corner case of LabelSpreading with alpha=None. * alpha -> self.variant * Make Whats_new more explicit. * Simplify impl. of Label{Propagation,Spreading}. * variant -> _variant.
Closes #3550, #5774, #3758, #6727 and deprecates #9192.
Pinging @jnothman, @boechat107, @MechCoder. Am I forgetting someone?