8000 Improve test for MDS parallel processing case - requires update in MDS · Issue #10119 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Improve test for MDS parallel processing case - requires update in MDS #10119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jsoutherland opened this issue Nov 13, 2017 · 2 comments
Open
Labels

Comments

@jsoutherland
Copy link

Description

There is a test that MDS does not create an error when performing parallel processing (n_jobs > 1)
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/tests/test_mds.py#L55-L61

That test should also assert that the result is comparable to with n_jobs = 1.

Steps/Code to Reproduce

Proposed test:

def test_MDS_parallel():
    sim = np.array([[0, 5, 3, 4],
                    [5, 0, 2, 2],
                    [3, 2, 0, 1],
                    [4, 2, 1, 0]])
    mds_clf = mds.MDS(metric=False, n_jobs=1, n_init=4, dissimilarity="precomputed")
    result1 = mds_clf.fit(sim)

    # TODO: if sim is modified my MDS.fit then set it back before the next test

    mds_clf = mds.MDS(metric=False, n_jobs=4, n_init=4, dissimilarity="precomputed")
    result2 = mds_clf.fit(sim)

    assert_array_almost_equal(result1, result2, decimal=3)

... as a replacement for
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/tests/test_mds.py#L55-L61

The smacof algorithm involves an iterative search from n_init random initial solutions. In the parallel case each job searches from a new defined seed. In the non-parallel case, random_state is used to continue searching from a single starting seed. These varying approaches don't produce the same results. The non-parallel case must be modified to search from increasing seeds in order to be consistent with the parallel case. That will allow for consistent output when switching between the two parameter options and also allow for this new test to pass.

See
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/mds.py#L248-L271

Expected Results

Actual Results

Versions

@anurag03
Copy link

@jsoutherland I want to work on this issue.

@jsoutherland
Copy link
Author

@anurag03 go for it - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants
0