8000 Mds extend by webdrone · Pull Request #6222 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Mds extend #6222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Mds extend #6222

wants to merge 2 commits into from

Conversation

webdrone
Copy link

Extended MDS object to include a transform method for out-of-sample points as described here:
http://papers.nips.cc/paper/2461-out-of-sample-extensions-for-lle-isomap-mds-eigenmaps-and-spectral-clustering.pdf

SMACOF algorithm is used in the non-extendible case but not for the extendible case, which requires eigen-decomposition of (dis)similarity matrix.

…valent to PCA transfor

m. The method introduces errors as new points are projected, compared to a new projection o
f all points. See Bengio, Yoshua, et al. "Out-of-sample extensions for lle, isomap, mds, eigenmaps, and spectral clustering." Advances in neural information processing systems 16 (2004): 177-184.
@webdrone
Copy link
Author

I will try and resolve the issues causing failed tests. In the meantime, this should fix issue #2887 to some extend.

@webdrone
Copy link
Author

Closing until #4485 gets addressed, since this can be implemented on top of that.

@webdrone webdrone closed this 8000 Jan 26, 2016
@kingjr
Copy link
Contributor
kingjr commented Aug 16, 2016

@webdrone is there any progress on this issue and is #4485 absolutely necessary?

@webdrone
Copy link
Author

@kingjr, no progress unfortunately -- I did not have much time to look into how to create proper tests.

#4485 is not absolutely necessary. One might circumvent it by adding various cases in the code for if the MDS version being used is extensible or not (if we want to keep the SMACOF algorithm as default, since it has unique options that can be set). However, this alternative (committed here) is messy and an inelegant solution if the default is to change to an SVD solution regardless, as discussed in #4485.

@amueller
Copy link
Member
amueller commented Aug 26, 2016

@webdrone I think if you want to move forward with this, maybe take over #4485 (I had some comments there that were not resolved) so we can get that merged, and then work on this?

@jsoutherland
Copy link

@webdrone @amueller Hi - It looks like this was abandoned and I don't see it in master. I wanted to check if either of you know if this was continued in another PR? Thanks

@webdrone
Copy link
Author

@jsoutherland @amueller

I don't know if this was continued elsewhere, I haven't worked on it further since I created this.

I don't know how to write tests, and these days I don't really have the time to take over more issues. Issue #4485 is now older than 2 years and still not merged for some reason.

I use sklearn daily, especially the GP stuff and every now and then I find inefficient implementations but seem unable to create new forks of the repo to submit new pull requests. This might also be due to my limited knowledge of git(hub) but it seems like as long as I have one fork, I cannot have another?

@jnothman
Copy link
Member
jnothman commented Sep 15, 2017 via email

@jsoutherland
Copy link

@webdrone I think this is a good feature and I need it. I haven't contributed to this project before but will be happy to figure out what tests and formatting are needed to finish the PR.

If it ends up being cleaner for me to start a new PR is that ok? The alternative is for me to fork webdrone, pull from sklearn upstream, rebase, finish edits, and then PR against webdrone. I don't know the sklearn organizers' style but they may ask for a PR with a more c 8000 lean set of commits.

I'm trying to balance workflow complexity and history cleanness with preserving your original commits/PR.

@lesteve
Copy link
Member
lesteve commented Sep 15, 2017

You can get this PR branch like this:

git fetch -fu upstream refs/pull/6222/head:the-name-of-your-branch-goes-here
git checkout the-name-of-your-branch-goes-here

work on this branch, commit and then open a new PR when you think it is ready. In the new PR you create mention explicitly that you are taking over this PR and add "Closes #old_pr_number_goes_here" in the description (not the title) so that the old PR gets automatically closed when the new one is merged.

@lesteve
Copy link
Member
lesteve commented Sep 15, 2017

@jsoutherland you probably want to read the discussion in this PR and in #4485 to be aware what the problems were.

@webdrone
Copy link
Author

@jsoutherland I have no problem if you want to pull afresh. Take my code from this commit if you want. I guess the cleanest thing to do would be implementing SVD priority and MDS-extend in one fell PR.

@jsoutherland
Copy link

@webdrone Thank you - I will try to finish both soon.

@jsoutherland
Copy link

@webdrone @lesteve @NelleV @jnothman I have been working with this branch and #4485.

It seems that this PR is in a good working state except for two documentation merge conflicts. It runs very quickly and produces good results on the manifold example (extendible=True):
manifold

vs (extendible=False):
manifold_orig

#4485 was incomplete, is failing several tests (because it introduces complex numbers) and there has been some debate against fixing #1453 (at least with SVD as the default) which was one blocking decision in #4485.

It seems like this branch should not be held up by the others. #1453 is no longer agreed on as a good default behavior. If it is decided that defaulting to SVD is the right thing to do after this branch is merged in then we should be able to achieve that in the spirit of #4485 (but with this formulation).

@jsoutherland
Copy link

There are some tests failing and I could use guidance on the best way to resolve them. Now that MDS is capable of acting as a transformer, a lot of general purpose tests are applied. MDS is by default not capable of using transform() ... it must first be configured with extendible=True. Because of this, 5 tests fail with similar results:

======================================================================
ERROR: /home/josh/got/jsoutherland/scikit-learn/sklearn/tests/test_common.py.test_non_meta_estimators:check_estimators_dtypes(MDS)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/josh/anaconda2/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/utils/testing.py", line 789, in __call__
    return self.check(*args, **kwargs)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/utils/testing.py", line 309, in wrapper
    return fn(*args, **kwargs)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/utils/estimator_checks.py", line 865, in check_estimators_dtypes
    getattr(estimator, method)(X_train)
  File "/home/josh/got/jsoutherland/scikit-learn/sklearn/manifold/mds.py", line 475, in transform
    raise ValueError("Method only available if extendible is True.")
ValueError: Method only available if extendible is True.

There are many possible ways to fix this:

  1. Making extendible=True the default is probably not what we want.
  2. We could skip the test results.
  3. We could stop returning a ValueError...
  4. There may be a way to configure MDS to be extendible during the test

Are there any suggestions on how to best fix these errors?

Thanks

@jsoutherland
Copy link

I started a new open PR to finish this. See #9834

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0