Description
Description
Reading this StackOverflow question lead me to check the code in naive_bayes.py where the priors are checked.
I did not check the whole method and what is internally assumed about these priors, but:
if priors.sum() != 1.0:
raise ValueError('The sum of the priors should be 1.')
obviously calls for trouble, like in the example in the above SO-post.
import numpy as np
priors = np.array([0.08, 0.14, 0.03, 0.16, 0.11, 0.16, 0.07, 0.14, 0.11, 0.0])
my_sum = np.sum(priors)
print('my_sum: ', my_sum)
print('naive: ', my_sum == 1.0)
print('safe: ', np.isclose(my_sum, 1.0))
#('my_sum: ', 1.0000000000000002)
#('naive: ', False)
#('safe: ', True)
Steps/Code to Reproduce
Just take the official GaussianNB example and use the numbers above.
Expected Results
Safe fp-math comparison OR internal correction when input-sum is expected to be close to 1.
Using np.isclose()
is a 5 second change, but without checking the remaining code (which i did not) i don't know if this will have potential to effect in errors in a later stage.
Actual Results
ValueError('The sum of the priors should be 1.')
Versions
Current master: d8c363f296948a9171ac8a5d69f79dcb56589335
.
Further remarks:
numpy.random.sample() is actually doing the more safe-approach too (but not using np.isclose()
) as seen here.