-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Partial centroid #13033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial centroid #13033
Conversation
tests are failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
self.centroids_ = np.empty((n_classes, n_features), dtype=np.float64) | ||
# Number of clusters in each class. | ||
nk = np.zeros(n_classes) | ||
old_nk = deepcopy(self.nk_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use self.nk_.copy()
rather than deepcopy.
# Number of clusters in each class. | ||
nk = np.zeros(n_classes) | ||
old_nk = deepcopy(self.nk_) | ||
old_centroids = deepcopy(self.true_centroids_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
@@ -110,53 +169,90 @@ def fit(self, X, y): | |||
check_classification_targets(y) | |||
|
|||
n_samples, n_features = X.shape | |||
|
|||
if _refit or _check_partial_fit_first_call(self, classes): | |||
self.true_classes_ = classes = np.asarray(classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please document the new attributes
y2 = [-1, -1, 1, -1, 1, 1] | ||
clf = NearestCentroid() | ||
clf.partial_fit(X2[:3], y2[:3], classes=[-1, 1]) | ||
assert_array_equal(clf.predict(T), [-1, 1, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is identical to true_result, so what have we shown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least check that the centroids differ from the complete fit?
clf = NearestCentroid() | ||
clf.partial_fit(X[:3], y[:3], classes=[-1, 1]) | ||
clf.partial_fit(X[3:], y[3:]) | ||
assert_array_equal(clf.predict(T), true_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also check that the centroids match using fit
on the full dataset?
# Ensure that the shrinking is correct. | ||
# The expected result is calculated by R (pamr), | ||
# which is implemented by the author of the original paper. | ||
# (One need to modify the code to output the new centroid in pamr.predict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modify which code? I don't understand this. You could just show the R code used to derive these numbers.
@@ -219,19 +226,33 @@ def _partial_fit(self, X, y, classes=None, _refit=False): | |||
self.centroids_ = self.true_centroids_[self.nk_ != 0] | |||
|
|||
if self.shrink_threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend moving the two "deepcopy" calls on lines 192, 193 into the "if self.shrink_threshold" conditional on line 228. This way you only compute them when needed.
@@ -177,6 +177,10 @@ def _partial_fit(self, X, y, classes=None, _refit=False): | |||
# Number of clusters in each class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user, I find the warning on lines 214-217 to be quite annoying. This code was present in the original implementation, but it might make sense to move it to the "init" instead so the user only gets the warning once. Or we could move it to the "if _refit or _check_partial_fit_first_call(self, classes)" conditional on line 173. Previously, we only received the warning when calling "fit", which isn't too often. But "partial_fit" is meant to be called often on small batches of data, and this warning gets printed every time, which is redundant and tiresome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having it only appear on the first partial_fit is reasonable.
@Robinspecteur are you still working on this? |
From what I can tell, this is really close to being finished.
…On Tue, Aug 6, 2019 at 8:52 AM Andreas Mueller ***@***.***> wrote:
@Robinspecteur <https://github.com/Robinspecteur> are you still working
on this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13033?email_source=notifications&email_token=AAGO26MUKZR3JTKDKCFI7UTQDGM5HA5CNFSM4GRW7JMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3VTA4Q#issuecomment-518729842>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAGO26P64FCUNFZ3Y5KGOXLQDGM5HANCNFSM4GRW7JMA>
.
|
@dirkpadfield good to know. The question is mainly if there's a point for maintainers to spend more time reviewing at this point, and/or if we should open this up for someone else to complete. There's some failing tests and your comments haven't been addressed yet :) |
Closing as superseded by #19262. |
Reference Issues/PRs
#12952
What does this implement/fix? Explain your changes.
Adds a partial_fit method for the NearestCentroid
Any other comments?