-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Description
There is a test that MDS does not create an error when performing parallel processing (n_jobs > 1
)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/tests/test_mds.py#L55-L61
That test should also assert that the result is comparable to with n_jobs = 1
.
Steps/Code to Reproduce
Proposed test:
def test_MDS_parallel():
sim = np.array([[0, 5, 3, 4],
[5, 0, 2, 2],
[3, 2, 0, 1],
[4, 2, 1, 0]])
mds_clf = mds.MDS(metric=False, n_jobs=1, n_init=4, dissimilarity="precomputed")
result1 = mds_clf.fit(sim)
# TODO: if sim is modified my MDS.fit then set it back before the next test
mds_clf = mds.MDS(metric=False, n_jobs=4, n_init=4, dissimilarity="precomputed")
result2 = mds_clf.fit(sim)
assert_array_almost_equal(result1, result2, decimal=3)
... as a replacement for
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/tests/test_mds.py#L55-L61
The smacof algorithm involves an iterative search from n_init
random initial solutions. In the parallel case each job searches from a new defined seed. In the non-parallel case, random_state
is used to continue searching from a single starting seed. These varying approaches don't produce the same results. The non-parallel case must be modified to search from increasing seeds in order to be consistent with the parallel case. That will allow for consistent output when switching between the two parameter options and also allow for this new test to pass.
See
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/mds.py#L248-L271