8000 [MRG+1] Fix semi_supervised by musically-ut · Pull Request #9239 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Fix semi_supervised #9239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jul 3, 2017

Conversation

musically-ut
Copy link
Contributor

Closes #3550, #5774, #3758, #6727 and deprecates #9192.

Pinging @jnothman, @boechat107, @MechCoder. Am I forgetting someone?

@jnothman
Copy link
Member
jnothman commented Jun 29, 2017 via email

@musically-ut
Copy link
Contributor Author
musically-ut commented Jun 29, 2017

Just making Travis happy, I think.

Overall, this is a piecemeal effort to fix semi_supervised module as some other issues will remain.
Nevertheless, I believe it is best to fix these first and then take care of the rest in smaller, more contained, PRs:

I'm fixing the flake8 failures as we speak.

@musically-ut musically-ut changed the title [WIP] Fix semi supervised Fix semi supervised Jun 29, 2017
@ogrisel ogrisel added this to the 0.20 milestone Jun 29, 2017
@ogrisel ogrisel modified the milestones: 0.19, 0.20 Jun 29, 2017
@musically-ut
Copy link
Contributor Author

The failure on appveyor was unrelated to the current changes.

@musically-ut musically-ut changed the title Fix semi supervised [MRG] Fix semi_supervised Jun 29, 2017
@amueller
Copy link
Member

can you please merge master? that should get rid of the appveyor error.

@musically-ut
Copy link
Contributor Author

Updated; awaiting review.

@@ -264,13 +280,21 @@ def fit(self, X, y):
l_previous = self.label_distributions_
self.label_distributions_ = safe_sparse_dot(
graph_matrix, self.label_distributions_)

if alpha == 'propagation':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be self.variant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah!

I wish we had a test to see if normalization was working. :(

@@ -264,13 +280,21 @@ def fit(self, X, y):
l_previous = self.label_distributions_
self.label_distributions_ = safe_sparse_dot(
graph_matrix, self.label_distributions_)

if alpha == 'propagation':
# LabelPropagation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and i think this comment is then redundant

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is there some way to invent a graph where you get very different normalisation factors for different rows, so that that substantially affects the propagation?

- Fix :class:`semi_supervised.BaseLabelPropagation` to correctly implement
``LabelPropagation`` and ``LabelSpreading`` as done in the referenced
papers. :class:`semi_supervised.LabelPropagation` now always does hard
clamping. Its ``alpha`` parameter now defaults to ``None`` and is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should say that "alpha has no effect and is deprecated to be ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if self.variant == 'propagation':
# LabelPropagation
clamp_weights = np.ones((n_samples, 1))
clamp_weights[~unlabeled, 0] = 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the same as clamp_weights = unlabelled.reshape(-1, 1)

clamp_weights[~unlabeled, 0] = 0.0
else:
# LabelSpreading
clamp_weights = alpha * np.ones((n_samples, 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use clamp_weights = alpha

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed clamp_weights completely in favor of keeping only alpha.

normalizer = np.sum(
self.label_distributions_, axis=1)[:, np.newaxis]
self.label_distributions_ /= normalizer

# clamp
self.label_distributions_ = np.multiply(
clamp_weights, self.label_distributions_) + y_static
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. In the label prop case, we could just be using np.where(unlabelled, self.label_distributions, y_static) unless I'm mistaken

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, np.where(unlabeled[:, np.newaxis], self.label_distributions_, y_static) to be more precise. Have factored out unlabeled[:, np.newaxis] in the actual implementation.

@musically-ut
Copy link
Contributor Author

Err ... the Travis failures, this time, seem to be unrelated to this PR.

@@ -350,6 +373,14 @@ class LabelPropagation(BaseLabelPropagation):
LabelSpreading : Alternate label propagation strategy more robust to noise
"""

variant = 'propagation'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this a private attribute _variant. Otherwise it could be consider public API.

@jnothman I think we can merge once this is made private. Ok on your side?

@ogrisel
Copy link
Member
ogrisel commented Jul 3, 2017

You can ignore tests failing with "HTTP Error 500" on mldata.org.

@musically-ut
Copy link
Contributor Author

Updated.

PS: I don't remember whether merely pushing commits sends out an notification e-mail to subscribers by default or not on GitHub (I vaguely recall turning off those notifications for myself somewhere). In case it does, please tell me and I'm sorry about the noise coming from these "Updated" comments.

@ogrisel
Copy link
Member
ogrisel commented Jul 3, 2017

It's fine, let's wait for @jnothman's feedback and CI results.

@jnothman
Copy link
Member
jnothman commented Jul 3, 2017

Yes, I'm +1 for merge, because it would be good to finally have this fixed in the release. But it would be nice if we had a non-regression test for that normalisation issue. Want to think about it in the background and try propose something, @musically-ut?

@ogrisel
Copy link
Member
ogrisel commented Jul 3, 2017

I will be offline soon. @jnothman I let you handle the end and merging of this PR with @musically-ut :)

@musically-ut
Copy link
Contributor Author

👍

I am going to continue proposing some changes after the underlying algorithm is fixed (e.g. #8008), and will keep trying to come up with better tests in the background.

@ogrisel Thanks for the review!

@jnothman
Copy link
Member
jnothman commented Jul 3, 2017

Thanks and well done, @musically-ut, @boechat107, @MechCoder and all.

@amueller
Copy link
Member
amueller commented Jul 5, 2017

AWESOME! thank you so much to everybody working on this!

massich pushed a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is invalid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* Files for my dev environment with Docker

* Fixing label clamping (alpha=0 for hard clamping)

* Deprecating alpha, fixing its value to zero

* Correct way to deprecate alpha for LabelPropagation

The previous way was breaking the test
sklearn.tests.test_common.test_all_estimators

* Detailed info for LabelSpreading's alpha parameter

Based on the original paper.

* Minor changes in the deprecation message

* Improving "deprecated" doc string and raising DeprecationWarning

* Using a local "alpha" in "fit" to deprecate LabelPropagation's alpha

This solution isn't great, but it sets the correct value for alpha
without violating the restrictions imposed by the tests.

* Removal of my development files

* Using sphinx's "deprecated" tag (jnothman's suggestion)

* Deprecation warning: stating that the alpha's value will be ignored

* Use __init__ with alpha=None

* Update what's new

* Try fix RuntimeWarning in test_alpha_deprecation

* DOC Indent deprecation details

* DOC wording

* Update docs

* Change to the one true implementation.

* Add sanity-checked impl. of Label{Propagation,Spreading}

* Raise ValueError if alpha is inv
179B
alid in LabelSpreading.

* Add a normalizing step before clamping to LabelPropagation.

* Fix flake8 errors.

* Remove duplicate imports.

* DOC Update What's New.

* Specify alpha's value in the error.

* Tidy up tests.

Add a test and add references, where needed.

* Add comment to non-regression test.

* Fix documentation.

* Move check for alpha into fit from __init__.

* Fix corner case of LabelSpreading with alpha=None.

* alpha -> self.variant

* Make Whats_new more explicit.

* Simplify impl. of Label{Propagation,Spreading}.

* variant -> _variant.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0