Mds extend #6222

webdrone · 2016-01-24T21:27:30Z

Extended MDS object to include a transform method for out-of-sample points as described here:
http://papers.nips.cc/paper/2461-out-of-sample-extensions-for-lle-isomap-mds-eigenmaps-and-spectral-clustering.pdf

SMACOF algorithm is used in the non-extendible case but not for the extendible case, which requires eigen-decomposition of (dis)similarity matrix.

…valent to PCA transfor m. The method introduces errors as new points are projected, compared to a new projection o f all points. See Bengio, Yoshua, et al. "Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering." Advances in neural information processing systems 16 (2004): 177-184.

webdrone · 2016-01-25T11:17:30Z

I will try and resolve the issues causing failed tests. In the meantime, this should fix issue #2887 to some extend.

webdrone · 2016-01-26T16:58:07Z

Closing until #4485 gets addressed, since this can be implemented on top of that.

kingjr · 2016-08-16T12:05:01Z

@webdrone is there any progress on this issue and is #4485 absolutely necessary?

webdrone · 2016-08-16T17:21:14Z

@kingjr, no progress unfortunately -- I did not have much time to look into how to create proper tests.

#4485 is not absolutely necessary. One might circumvent it by adding various cases in the code for if the MDS version being used is extensible or not (if we want to keep the SMACOF algorithm as default, since it has unique options that can be set). However, this alternative (committed here) is messy and an inelegant solution if the default is to change to an SVD solution regardless, as discussed in #4485.

amueller · 2016-08-26T20:14:26Z

@webdrone I think if you want to move forward with this, maybe take over #4485 (I had some comments there that were not resolved) so we can get that merged, and then work on this?

jsoutherland · 2017-09-14T14:50:06Z

@webdrone @amueller Hi - It looks like this was abandoned and I don't see it in master. I wanted to check if either of you know if this was continued in another PR? Thanks

webdrone · 2017-09-14T17:48:13Z

@jsoutherland @amueller

I don't know if this was continued elsewhere, I haven't worked on it further since I created this.

I don't know how to write tests, and these days I don't really have the time to take over more issues. Issue #4485 is now older than 2 years and still not merged for some reason.

I use sklearn daily, especially the GP stuff and every now and then I find inefficient implementations but seem unable to create new forks of the repo to submit new pull requests. This might also be due to my limited knowledge of git(hub) but it seems like as long as I have one fork, I cannot have another?

jnothman · 2017-09-15T04:05:46Z

you can create multiple branches on one fork for that purpose.

…

On 15 Sep 2017 3:48 am, "webdrone" ***@***.***> wrote: @jsoutherland <https://github.com/jsoutherland> @amueller <https://github.com/amueller> I don't know if this was continued elsewhere, I haven't worked on it further since I created this. I don't know how to write tests, and these days I don't really have the time to take over more issues. Issue #4485 <#4485> is now older than 2 years and still not merged for some reason. I use sklearn daily, especially the GP stuff and every now and then I find inefficient implementations but seem unable to create new forks of the repo to submit new pull requests. This might also be due to my limited knowledge of git(hub) but it seems like as long as I have one fork, I cannot have another? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6222 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6ycAE0TWurgpYFV-zMxE9BooTHEXks5siWbigaJpZM4HLPCP> .

jsoutherland · 2017-09-15T11:33:30Z

@webdrone I think this is a good feature and I need it. I haven't contributed to this project before but will be happy to figure out what tests and formatting are needed to finish the PR.

If it ends up being cleaner for me to start a new PR is that ok? The alternative is for me to fork webdrone, pull from sklearn upstream, rebase, finish edits, and then PR against webdrone. I don't know the sklearn organizers' style but they may ask for a PR with a more clean set of commits.

I'm trying to balance workflow complexity and history cleanness with preserving your original commits/PR.

lesteve · 2017-09-15T11:43:14Z

You can get this PR branch like this:

git fetch -fu upstream refs/pull/6222/head:the-name-of-your-branch-goes-here
git checkout the-name-of-your-branch-goes-here

work on this branch, commit and then open a new PR when you think it is ready. In the new PR you create mention explicitly that you are taking over this PR and add "Closes #old_pr_number_goes_here" in the description (not the title) so that the old PR gets automatically closed when the new one is merged.

lesteve · 2017-09-15T11:45:36Z

@jsoutherland you probably want to read the discussion in this PR and in #4485 to be aware what the problems were.

webdrone · 2017-09-15T12:06:35Z

@jsoutherland I have no problem if you want to pull afresh. Take my code from this commit if you want. I guess the cleanest thing to do would be implementing SVD priority and MDS-extend in one fell PR.

jsoutherland · 2017-09-25T16:41:50Z

@webdrone Thank you - I will try to finish both soon.

jsoutherland · 2017-09-26T12:34:15Z

@webdrone @lesteve @NelleV @jnothman I have been working with this branch and #4485.

It seems that this PR is in a good working state except for two documentation merge conflicts. It runs very quickly and produces good results on the manifold example (extendible=True):

vs (extendible=False):

#4485 was incomplete, is failing several tests (because it introduces complex numbers) and there has been some debate against fixing #1453 (at least with SVD as the default) which was one blocking decision in #4485.

It seems like this branch should not be held up by the others. #1453 is no longer agreed on as a good default behavior. If it is decided that defaulting to SVD is the right thing to do after this branch is merged in then we should be able to achieve that in the spirit of #4485 (but with this formulation).

jsoutherland · 2017-09-26T16:38:11Z

There are some tests failing and I could use guidance on the best way to resolve them. Now that MDS is capable of acting as a transformer, a lot of general purpose tests are applied. MDS is by default not capable of using transform() ... it must first be configured with extendible=True. Because of this, 5 tests fail with similar results:

======================================================================
ERROR: /home/josh/got/jsoutherland/scikit-learn/sklearn/tests/test_common.py.test_non_meta_estimators:check_estimators_dtypes(MDS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/josh/anaconda2/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/utils/testing.py", line 789, in __call__
    return self.check(*args, **kwargs)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/utils/testing.py", line 309, in wrapper
    return fn(*args, **kwargs)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/utils/estimator_checks.py", line 865, in check_estimators_dtypes
    getattr(estimator, method)(X_train)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/manifold/mds.py", line 475, in transform
    raise ValueError("Method only available if extendible is True.")
Valu
A618
eError: Method only available if extendible is True.

There are many possible ways to fix this:

Making extendible=True the default is probably not what we want.
We could skip the test results.
We could stop returning a ValueError...
There may be a way to configure MDS to be extendible during the test

Are there any suggestions on how to best fix these errors?

Thanks

jsoutherland · 2017-09-27T01:09:34Z

I started a new open PR to finish this. See #9834

webdrone added 2 commits January 24, 2016 21:23

Added test for extendible MDS.

aab1d93

webdrone mentioned this pull request Jan 25, 2016

[MRG] Issue 1453: MDS fall back to SVD when possible #4485

Closed

webdrone closed this Jan 26, 2016

jnothman added Need Contributor Stalled labels Sep 15, 2017

jsoutherland mentioned this pull request Sep 25, 2017

[WIP] Mds jsoutherland/scikit-learn#1

Open

jsoutherland mentioned this pull request Sep 26, 2017

[WIP] MDS extend jsoutherland/scikit-learn#2

Closed

jsoutherland mentioned this pull request Sep 26, 2017

[MRG] Extending MDS to new data #9834

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Mds extend #6222

Mds extend #6222

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Mds extend #6222

Mds extend #6222

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!