-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] iforest chunks for score_samples #13283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ngoix please put the PR title to MRG when ready on your side |
tests fix flake8 fix ci
@@ -325,3 +326,26 @@ def test_behaviour_param(): | |||
clf2 = IsolationForest(behaviour='new', contamination='auto').fit(X_train) | |||
assert_array_equal(clf1.decision_function([[2., 2.]]), | |||
clf2.decision_function([[2., 2.]])) | |||
|
|||
|
|||
# mock get_chunk_n_rows to actually test more than one chunk (here one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any idea how to merge these 2 tests ? they are the same, just with a different mocked chunk size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pytest's monkeypatch fixture rather than unittest.mock alone?
But for this to be a reasonable test, you also need to test that the Mock has been called (or does patch do that?)
@agramfort this is ready for reviews |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update what’s new needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looking good
sklearn/ensemble/iforest.py
Outdated
" be removed in 0.22.", DeprecationWarning) | ||
return self._threshold_ | ||
|
||
def _compute_chunked_score_samples(self, X, working_memory=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not for testing, why do we need the working_melody parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the working_memory parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not useful indeed, thanks
@@ -325,3 +326,26 @@ def test_behaviour_param(): | |||
clf2 = IsolationForest(behaviour='new', contamination='auto').fit(X_train) | |||
assert_array_equal(clf1.decision_function([[2., 2.]]), | |||
clf2.decision_function([[2., 2.]])) | |||
|
|||
|
|||
# mock get_chunk_n_rows to actually test more than one chunk (here one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pytest's monkeypatch fixture rather than unittest.mock alone?
But for this to be a reasonable test, you also need to test that the Mock has been called (or does patch do that?)
Tests failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise lgtm
sklearn/ensemble/iforest.py
Outdated
" be removed in 0.22.", DeprecationWarning) | ||
return self._threshold_ | ||
|
||
def _compute_chunked_score_samples(self, X, working_memory=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the working_memory parameter here?
doc/whats_new/v0.21.rst
Outdated
@@ -144,6 +144,9 @@ Support for Python 3.4 and below has been officially dropped. | |||
by avoiding keeping in memory each tree prediction. :issue:`13260` by | |||
`Nicolas Goix`_. | |||
|
|||
- |Efficiency| :class:`ensemble.IsolationForest` now uses chunks of data at | |||
prediction step. :issue:`13283` by `Nicolas Goix`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps say something like "capping the memory usage"
THanks @ngoix! |
…utation (scikit-learn#13283)" This reverts commit bce9351.
…utation (scikit-learn#13283)" This reverts commit bce9351.
fixes #12040
is currently based on top of #13260 (so do not review yet)
needs to be rebased on #13251