-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Mds extend #6222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mds extend #6222
Conversation
…valent to PCA transfor m. The method introduces errors as new points are projected, compared to a new projection o f all points. See Bengio, Yoshua, et al. "Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering." Advances in neural information processing systems 16 (2004): 177-184.
I will try and resolve the issues causing failed tests. In the meantime, this should fix issue #2887 to some extend. |
Closing until #4485 gets addressed, since this can be implemented on top of that. |
@kingjr, no progress unfortunately -- I did not have much time to look into how to create proper tests. #4485 is not absolutely necessary. One might circumvent it by adding various cases in the code for if the MDS version being used is extensible or not (if we want to keep the SMACOF algorithm as default, since it has unique options that can be set). However, this alternative (committed here) is messy and an inelegant solution if the default is to change to an SVD solution regardless, as discussed in #4485. |
I don't know if this was continued elsewhere, I haven't worked on it further since I created this. I don't know how to write tests, and these days I don't really have the time to take over more issues. Issue #4485 is now older than 2 years and still not merged for some reason. I use sklearn daily, especially the GP stuff and every now and then I find inefficient implementations but seem unable to create new forks of the repo to submit new pull requests. This might also be due to my limited knowledge of git(hub) but it seems like as long as I have one fork, I cannot have another? |
you can create multiple branches on one fork for that purpose.
…On 15 Sep 2017 3:48 am, "webdrone" ***@***.***> wrote:
@jsoutherland <https://github.com/jsoutherland> @amueller
<https://github.com/amueller>
I don't know if this was continued elsewhere, I haven't worked on it
further since I created this.
I don't know how to write tests, and these days I don't really have the
time to take over more issues. Issue #4485
<#4485> is now older
than 2 years and still not merged for some reason.
I use sklearn daily, especially the GP stuff and every now and then I find
inefficient implementations but seem unable to create new forks of the repo
to submit new pull requests. This might also be due to my limited knowledge
of git(hub) but it seems like as long as I have one fork, I cannot have
another?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6222 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6ycAE0TWurgpYFV-zMxE9BooTHEXks5siWbigaJpZM4HLPCP>
.
|
@webdrone I think this is a good feature and I need it. I haven't contributed to this project before but will be happy to figure out what tests and formatting are needed to finish the PR. If it ends up being cleaner for me to start a new PR is that ok? The alternative is for me to fork webdrone, pull from sklearn upstream, rebase, finish edits, and then PR against webdrone. I don't know the sklearn organizers' style but they may ask for a PR with a more c 8000 lean set of commits. I'm trying to balance workflow complexity and history cleanness with preserving your original commits/PR. |
You can get this PR branch like this:
work on this branch, commit and then open a new PR when you think it is ready. In the new PR you create mention explicitly that you are taking over this PR and add "Closes #old_pr_number_goes_here" in the description (not the title) so that the old PR gets automatically closed when the new one is merged. |
@jsoutherland you probably want to read the discussion in this PR and in #4485 to be aware what the problems were. |
@jsoutherland I have no problem if you want to pull afresh. Take my code from this commit if you want. I guess the cleanest thing to do would be implementing SVD priority and MDS-extend in one fell PR. |
@webdrone Thank you - I will try to finish both soon. |
@webdrone @lesteve @NelleV @jnothman I have been working with this branch and #4485. It seems that this PR is in a good working state except for two documentation merge conflicts. It runs very quickly and produces good results on the manifold example ( #4485 was incomplete, is failing several tests (because it introduces complex numbers) and there has been some debate against fixing #1453 (at least with SVD as the default) which was one blocking decision in #4485. It seems like this branch should not be held up by the others. #1453 is no longer agreed on as a good default behavior. If it is decided that defaulting to SVD is the right thing to do after this branch is merged in then we should be able to achieve that in the spirit of #4485 (but with this formulation). |
There are some tests failing and I could use guidance on the best way to resolve them. Now that MDS is capable of acting as a transformer, a lot of general purpose tests are applied. MDS is by default not capable of using
There are many possible ways to fix this:
Are there any suggestions on how to best fix these errors? Thanks |
I started a new open PR to finish this. See #9834 |
Extended MDS object to include a transform method for out-of-sample points as described here:
http://papers.nips.cc/paper/2461-out-of-sample-extensions-for-lle-isomap-mds-eigenmaps-and-spectral-clustering.pdf
SMACOF algorithm is used in the non-extendible case but not for the extendible case, which requires eigen-decomposition of (dis)similarity matrix.