-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix TSNE init as NumPy array #7208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix TSNE init as NumPy array #7208
Conversation
I didn't realize the incompatibility of 'basestring' with Python 3. I'll fix it so it's compatible. |
@@ -305,11 +305,14 @@ def test_non_square_precomputed_distances(): | |||
|
|||
|
|||
def test_init_not_available(): | |||
# 'init' must be 'pca' or 'random'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just edit the comment instead of removing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Initialization of embedding. Possible options are 'random' and 'pca'. | ||
PCA initialization cannot be used with precomputed distances and is | ||
usually more globally stable than random initialization. | ||
init : string or NumPy array, optional (default: "random") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you use the more consistent numpy array
instead of NumPy
array? Also it would be nice to specify as numpy array of shape [n_samples, n_components]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, fixed for consistency/clarification
@@ -707,7 +710,7 @@ def _fit(self, X, skip_num_points=0): | |||
raise ValueError("n_iter should be at least 200") | |||
|
|||
if self.metric == "precomputed": | |||
if self.init == 'pca': | |||
if not isinstance(self.init, np.ndarray) and self.init == 'pca': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer:
if isinstance(self.init, string_types) and self.init == 'pca':
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had this in a previous commit, however one of the builds failed because it couldn't import six.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer:
if isinstance(self.init, string_types) and self.init == 'pca':
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't, that's why! I'll go back to using 'string_types' and get it from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed change
assert_raises_regexp(ValueError, m, TSNE, init="not available") | ||
|
||
|
||
def test_init_ndarray(): | ||
tsne = TSNE(init=np.zeros((100, 2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also do a fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the test I put in the comment below looks good. If so, I'll push it in so you can merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please just add a tsne.fit here and assess at least the size of the fitted attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, do you think it's also worthwhile to check the case of init with ndarray and metric='precomputed'? (one change in this PR involves fixing a FutureWarning in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added another test, should be good now.
besides +1 for merge |
Per my question a few comments up, I had written a test that reads:
I'll just push this one. Is this okay? |
thanks @b-carter |
* fix tsne init numpy array type checking and existing test * adds test for tsne init numpy array type checking * reorder TSNE elif statements for string options first for increased readability * string type checking with python 3 compatibility * restore removed comment * avoid use of six module and check string and unicode manually * use lowercase spelling of numpy for consistency * parens for better consistency * fix init as numpy array throwing FutureWarning * simplify logic for init validity checking * use string_types for string checking * improve init TSNE with ndarray test to check fit * add test for init TSNE with ndarray and metric precomputed
Reference Issue
Fixes #7204
What does this implement/fix? Explain your changes.
Fixes discrepancy between doc and type checking for the 'init' parameter in TSNE. NumPy array is now a possible value for 'init'. Type check and docstring are updated to reflect this change.
Updates existing test to reflect change in error msg.
Adds new test to validate NumPy array as a possible 'init' value.
Any other comments?
Made one other small readability change by reordering elif options to show string options before NumPy array check.