-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Fixed adapting tolerance value according to data type in manifold MDS #11167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/manifold/mds.py
Outdated
@@ -67,7 +67,15 @@ def _smacof_single(dissimilarities, metric=True, n_components=2, init=None, | |||
n_iter : int | |||
The number of iterations corresponding to the best stress. | |||
""" | |||
dissimilarities = check_symmetric(dissimilarities, raise_exception=True) | |||
if dissimilarities.dtype == np.float16: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these values? Why not dtype.eps? And input validation should probably not be this far down the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Celelibi suggested using these values, so I tried using these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using the build-in eps or some factor of that would probably be more principled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! I will try using eps rather than hard coding these values.
Tests are still failing. Let us know if you need help. |
@jnothman I am facing two issues in this PR.
|
Yes, don't worry about float16 for now.
I would've expected eps may be too small, but I've not looked into the
context.
|
Indeed, 1e-5 might make sense for float32 if 1e-10 is used for float64.
But I suspect that what we're actually seeing here is a bug in
euclidean_distances when working with float32, not merely an issue with the
tol of check_symmetric. Really, we need to fix #9354 before we can work on
this PR with euclidean distance as a test case.
|
In my tests, 1e-5 might be too large with float32 in some cases with the current implementation of >>> a = np.array([[0.1015854999423027, 0.9949246048927307], [0.1018471047282219, 0.9947648644447327]], dtype=np.float32)
>>> euclidean_distances(a)
array([[0. , 0.00024414],
[0. , 0. ]], dtype=float32) |
yes, let's put this on hold as i suspect the issue is not here but in
euclidean_distances which was naively adapted to 32bit for the latest
release
|
Fixed by #13554, I think. We can reopen if the issue persists. |
#### Reference Issues
Fixes #11159 .
check_symmetric: adapt tolerance to data type
#### What does this implement/fix? Explain your changes.
In manifold MDS set the tolerance values for the check_symmetric function as per the data type of input.
#### Any other comments?
Added a test.