[MRG+1] Fixing a bug where entropy included labeled items #8150

mdezube · 2017-01-03T19:43:49Z

This bug leads the loop to never converge no matter how high the iteration count since it includes already labeled items in the bucket of items it'll label in the next iteration.

This can readily be seen by adding a "print unlabeled_indices.shape" statement and noticing it stops shrinking after only 1/3 of the data is labeled.

Note I'm not sure how to rebuild the documentation and the .py and .ipynb files linked to here, is that done automatically?

This bug leads the loop to never converge no matter how high the iteration count since it includes already labeled items in the bucket of items it'll label in the next iteration. This can readily be seen by adding a "print unlabeled_indices.shape" statement and noticing it stops shrinking after only 1/3 of the data is labeled.

jnothman

Otherwise, I think this is the right thing to do.

jnothman · 2017-01-04T08:41:36Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

-    uncertainty_index = uncertainty_index = np.argsort(pred_entropies)[-5:]
+    uncertainty_index = np.argsort(pred_entropies)[::-1]
+    uncertainty_index = uncertainty_index[
+        np.in1d(uncertainty_index, unlabeled_indices)][:5]


Sorry, this comment didn't submit:
is it possible this expression will return fewer than 5 results? Does this need to be handled below?

It wouldn't cause an error, but we might as well break the loop earlier. Also I fixed the charts to handle if the iteration count is increased.

…reased

tguillemot · 2017-01-05T09:14:19Z

Travis is not happy because of pep8 problems.
Can you have a look @mdezube ?

mdezube · 2017-01-05T15:12:30Z

I took a look and it's due to line length. But most of the file doesn't conform to the specified line length anyways according to PEP 8 so I didn't think it was worth fixing specifically the lines I changed. I tend to always favor consistency throughout a file in situations like these.

Up to you if you want me to fix it, take a look at the file and let me know what your preference is.

tguillemot · 2017-01-05T15:27:05Z

@mdezube I understand your point of view.

Indeed some files in sklearn doesn't conform to PEP8.
As it takes a lot of time to correct them, we added the Travis process to control that the new line are PEP8 conform. If all the new PR are PEP8 conform, things will not be worst :).

A PR is most welcome if you want to correct some PEP8 problems in sklearn, but for now, if it will be easier to only fix the line you modified :). Thx !

tguillemot · 2017-01-05T15:27:50Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

-    f.text(.05, (1 - (i + 1) * .183),
-           "model %d\n\nfit with\n%d labels" % ((i + 1), i * 5 + 10), size=10)
+
+    # if running for more than 5 iterations visualiaze the gain only on the first 5.


If running for more than 5 iterations visualize the gain is only on the first 5

Done. I omitted the capitalization though since no other comments in the file are capitalized.

tguillemot · 2017-01-05T15:27:55Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

-        sub.set_title('predict: %i\ntrue: %i' % (
-            lp_model.transduction_[image_index], y[image_index]), size=10)
-        sub.axis('off')
+        # if running for more than 5 iterations visualiaze the gain only on the first 5.


If running for more than 5 iterations visualize the gain is only on the first 5

tguillemot · 2017-01-17T10:38:49Z

Still some PEP8 problems.
Can you have a look @mdezube ?

mdezube · 2017-01-17T14:40:54Z

@tguillemot I trimmed the whitespace per the pep8 warning. There's still a warning around my lines being too long, but I aimed for consistency throughout the file since all of the existing lines follow the 100 char limit, not 80. If you'd rather they all be 80 I'm happy to do this in a follow up PR, but I feel it should be a single CL instead of lumped in with this functionality fix.

tguillemot · 2017-01-17T15:01:23Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py


    # keep track of indices that we get labels for
    delete_indices = np.array([])

-    f.text(.05, (1 - (i + 1) * .183),
-           "model %d\n\nfit with\n%d labels" % ((i + 1), i * 5 + 10), size=10)
+    # if running for more than 5 iterations, visualize the gain on only the first 5


# for more than 5 iterations visualize the gain only on the first 5

tguillemot · 2017-01-17T15:02:57Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

+    # if running for more than 5 iterations, visualize the gain on only the first 5
+    if i < 5:
+        f.text(.05, (1 - (i + 1) * .183),
+               "model %d\n\nfit with\n%d labels" % ((i + 1), i * 5 + 10), size=10)


f.text(.05, (1 - (i + 1) * .183), "model %d\n\nfit with\n%d labels" % (i + 1, i * 5 + 10), size=10)

tguillemot · 2017-01-17T15:03:29Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

-        sub.set_title('predict: %i\ntrue: %i' % (
-            lp_model.transduction_[image_index], y[image_index]), size=10)
-        sub.axis('off')
+        # if running for more than 5 iterations, visualize the gain on only the first 5


# for more than 5 iterations visualize the gain only on the first 5

tguillemot · 2017-01-17T15:04:46Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

@@ -69,29 +72,35 @@
    pred_entropies = stats.distributions.entropy(
        lp_model.label_distributions_.T)

-    # select five digit examples that the classifier is most uncertain about
-    uncertainty_index = uncertainty_index = np.argsort(pred_entropies)[-5:]
+    # select up to five digit examples that the classifier is most uncertain about


# select up to 5 digit examples that the classifier is most uncertain about

mdezube · 2017-01-21T21:35:45Z

Comments addressed!

jnothman

Otherwise LGTM

jnothman · 2017-01-22T13:53:37Z

examples/semi_supervised/plot_label_propagation_digits_active_learning.py

-    f.text(.05, (1 - (i + 1) * .183),
-           "model %d\n\nfit with\n%d labels" % ((i + 1), i * 5 + 10), size=10)
+    # for more than 5 iterations, visualize the gain only on the first 5
+    if i < 5:


I'm not sure this is necessary. Perhaps if we want the number of iterations to look like a parameter the user might play with, we should pull it out into a named variable.

Might as well extract it into a variable :) See the latest changes.

Makes it more obvious that `max_iterations` can freely be changed

jnothman

Yes, that's clearer.

tguillemot · 2017-01-23T08:52:58Z

LGTM.

Both appveyor and circle seems unrelated.

mdezube · 2017-01-28T16:11:31Z

Now that this is approved, is there a way for me to merge it in? Or do I just wait?

jnothman · 2017-01-28T20:35:00Z

Yes, I'm okay to merge this.

jnothman · 2017-01-28T20:35:03Z

Thanks

mdezube · 2017-01-29T19:01:25Z

Thanks Joel! QQ, does this change automatically get picked up into the ipynb file that includes this demo? And should I expect it to go out in v .19?

jnothman · 2017-01-29T20:26:04Z

Yes and yes

…

On 30 Jan 2017 6:01 am, "Michael Dezube" ***@***.***> wrote: Thanks Joel! QQ, does this change automatically get picked up into the ipynb file that includes this demo? And should I expect it to go out in v .19? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#8150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_seXiVVaWAanf_PwLSCt3FAJ4m8ks5rXOIGgaJpZM4LZ9J5> .

tguillemot · 2017-01-30T07:55:22Z

Thanks @mdezube

mdezube · 2017-01-30T23:51:08Z

Happy to contribute!

…

On Mon, Jan 30, 2017 at 2:55 AM, Thierry Guillemot ***@***.*** > wrote: Thanks @mdezube <https://github.com/mdezube> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACo82lFAWcgAoA1vwuNld3GIopstCM-wks5rXZd-gaJpZM4LZ9J5> .

)

mdezube force-pushed the patch-1 branch from 51b64aa to 6f109f0 Compare January 4, 2017 03:18

jnothman reviewed Jan 4, 2017

View reviewed changes

mdezube added 2 commits January 4, 2017 09:37

Ending the loop when all items are labeled

a967447

Fixing a typo, and a break in the chart if the iteration count is inc…

724e964

…reased

tguillemot reviewed Jan 5, 2017

View reviewed changes

mdezube added 2 commits January 5, 2017 10:43

Fixing a typo

68235c6

Update plot_label_propagation_digits_active_learning.py

d1a3194

Trimming whitespace

8c8798d

tguillemot suggested changes Jan 17, 2017

View reviewed changes

Minor wordsmithing

6c968a7

jnothman approved these changes Jan 22, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Fixing a bug where entropy included labeled items~~ [MRG+1] Fixing a bug where entropy included labeled items Jan 22, 2017

Extracting max_iterations into a variable

726d0ff

Makes it more obvious that `max_iterations` can freely be changed

jnothman reviewed Jan 22, 2017

View reviewed changes

tguillemot approved these changes Jan 23, 2017

View reviewed changes

jnothman merged commit 1b0ec1b into scikit-learn:master Jan 28, 2017

sergeyf pushed a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

DOC Fixing a bug where entropy included labeled items (scikit-learn#8150

280591f

)

Przemo10 mentioned this pull request Mar 17, 2017

update fork (#1) #8606

C6AC Closed

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

DOC Fixing a bug where entropy included labeled items (scikit-learn#8150

ea572d9

)

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

DOC Fixing a bug where entropy included labeled items (scikit-learn#8150

4b1a52a

)

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

DOC Fixing a bug where entropy included labeled items (scikit-learn#8150

6e753b2

)

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

DOC Fixing a bug where entropy included labeled items (scikit-learn#8150

28e8010

)

Uh oh!

[MRG+1] Fixing a bug where entropy included labeled items #8150

[MRG+1] Fixing a bug where entropy included labeled items #8150

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!