-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[BUG] Label propagation sometimes produces label_distributions that contain Nan. #9292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is this on the current master? we've changed LabelSpreading a lot.
Ping @musically-ut.
…On 8 July 2017 at 04:24, Alexandros Papadopoulos ***@***.***> wrote:
Description
Invalid value encountered in true_divide through when calling fit on
LabelSpreading.
The variable normalizer in label_propagation.py:291 contains some zero
values and because of that, the division self.label_disributions_ /=
normalizer produces NaN.
Maybe there is a connection to #8008
<#8008>? In other
datasets, increasing the n_neighbors parameter to a larger than the
default value, caused the issue not to appear.
Steps/Code to Reproduce
from sklearn.datasets import fetch_mldatafrom sklearn.semi_supervised import label_propagationimport numpy
numpy.seterr(all='raise')
mnist = fetch_mldata('MNIST original', data_home="./tmp")
X = mnist.data[1:10000]
y = mnist.target[1:10000]
# Use only 300 labeled examples
y[300:] = -1
lp_model = label_propagation.LabelSpreading(kernel='knn', n_neighbors=7, n_jobs=-1)
lp_model.fit(X,y)
Expected Results
No error is thrown.
Actual Results
File "reproduce.py", line 16, in <module>
lp_model.fit(X,y)
File "...anaconda3/envs/ssl-py3/lib/python3.6/site-packages/sklearn/semi_supervised/label_propagation.py", line 291, in fit
self.label_distributions_ /= normalizer
FloatingPointError: invalid value encountered in true_divide
Versions
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.13.0
SciPy 0.19.0
Scikit-Learn 0.19.dev0
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9292>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz69YGncd_yQGsAxAiaaQw6qMW1qhZks5sLnfjgaJpZM4ORRzj>
.
|
The Can this be reproduced with the |
Yes I believe it is on the current master. That is, it contains the latest pull request regarding the label_propagation module. In the above code, changing the kernel to
|
Changing |
How might we avoid that underflow while calculating the kernel? |
I think detecting underflows in the kernel is a separate issue. Is there an example of any method in In this particular case, I think underflows can be safely set to zero; it would just mean that certain nodes are not connected to each other. We can use Sounds reasonable? |
There appear to be at least a couple of cases where we explicitly ignore underflow |
Yes, your solution to underflow seems reasonable. Not the same as this issue, though |
Hmm, I've been thinking about edge cases here. If we end up with a connected component which does not have any labeled nodes within it, then the normalization will produce I think it is reasonable to give the output |
I believe this is exactly the issue I am facing. It agrees with the observation that increasing the |
I have replicated this issue when instantiating the LabelSpreading model with the default parameter values, i.e., LabelSpreading(). When I switch to instantiate it with LabelSpreading(gamma=0.25, max_iter=5) then the error is not thrown. Even when using gamma=0, max_iter=1 to instantiate LabelSpreading works fine just not defining the values for those parameters generates the issue: |
Since some other people have also faced this issue (personal communication), I think it would be best to take the following steps:
How does that sound? |
Is 1/n more reasonable than the empirical distribution of known labels? |
Any comments about whether to show a warning as above or whether to throw an error (making the point moot) in case |
Well I don't think a failure to infer labels on some points should break
the whole point. I would raise a UserWarning and assign arbitrary labels.
|
I'm also having serious problems with this bug. Has there been any progress towards fixing it or finding workarounds? |
I am just wonderinf if this issue has been fixed ? Any updates? Thanks! |
Description
Invalid value encountered in true_divide through when calling fit on LabelSpreading.
After convergence, the label distribution for some samples is all zero and so the variable
normalizer
in label_propagation.py:291 contains some zero values causing the divisionself.label_disributions_ /= normalizer
to produce NaN.Maybe there is a connection to #8008? In other datasets, increasing the
n_neighbors
parameter to a larger than the default value, caused the issue not to appear.Steps/Code to Reproduce
Expected Results
No error is thrown.
Actual Results
Versions
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.13.0
SciPy 0.19.0
Scikit-Learn 0.19.dev0
The text was updated successfully, but these errors were encountered: