-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG + 1] add partial_fit to multioutput module #8054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] add partial_fit to multioutput module #8054
Conversation
tests are failing. Also, please use |
sorry about the typo.. and hmm can you explain more on using |
do a |
i see you point. more efforts needed, since current implementation doesn't consider about the reuse of self.estimators_, a simple clone to Parallel job is why the test cases failing.. |
9ac08c2
to
7b42573
Compare
@amueller can i have another comment? |
class MultiOutputEstimator(six.with_metaclass(ABCMeta, BaseEstimator)): | ||
|
||
def __init__(self, estimator, n_jobs=1): | ||
self.estimator = estimator | ||
self.n_jobs = n_jobs | ||
|
||
@if_delegate_has_method('estimator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, while you are at it, can you please add the @if_delegate_has_method
for the fit
and predict
functions too ? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not need if_delegate_has_method
for fit
. It is required for every estimator. I think in this context we are dealing with predictors and we do not need if_delegate_has_method
for predict
.
We also do not need the scope of the PR to be expanded unnecessarily. Throwing in a decorator means that decoration needs testing for instance.
Sorry for the unnecessary work, @yupbank
@@ -83,8 +164,10 @@ def fit(self, X, y, sample_weight=None): | |||
raise ValueError("Underlying regressor does not support" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small nitpick, could you replace regressor
with estimator
since it is the base class. Thanks again !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this was my mistake :)
else: | ||
estimator = copy.copy(estimator) | ||
|
||
if sample_weight is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if
estimator.partial_fit(X, y, classes=classes, sample_weight=sample_weight)
would suffice since partial_fit
function of the the base estimator would handle it appropriately. Let me know if it sounds okay !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the regressor and classifier is sharing same interface, so choice of argument for classes
is non-neglect-able, but for sample_weight
i might need to check, is it universal for sgd estimator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fit
in BaseSGD
only take X, y
, so i think i will keep this to make sure all the SGD estimator use this code..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for looking into it. I understand that classes
argument is not applicable for regressors. But in case someone inadvertently passes classes argument with a Regressor, the estimator would throw an error right ? Perhaps we can have separate partial_fit
function in the sub classes MultiOutputRegressor
and MultiOutputClassifier
? Just my 2c.
Also BaseSGD
has an abstract definition for fit
, which is implemented in the subclasses BaseSGDClassifier
and BaseSGDRegressor
right ?
Please do let me know what you think ? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, sorry I missed the partial_fit
function you have added to MultiOutputRegressor
. And also since it is a helper function, I get the reason for separate code paths based on classes
. Sorry for the noise..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classes
is required for classifiers' partial_fit
, but I think we have no requirement that sample_weight
be supported.
# train the multi_target_linear and also get the predictions. | ||
half_index = int(X.shape[0] / 2) | ||
multi_target_linear.partial_fit( | ||
X[:half_index], y[:half_index], classes=classes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also check check:
- sample weight
- passing
classes=None
raises appropriate error
half_index = 25 | ||
for n in range(3): | ||
sgr = SGDRegressor(random_state=0) | ||
sgr.partial_fit(X_train[:half_index], y_train[:half_index, n]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to test sample_weight
Data. | ||
|
||
y : (sparse) array-like, shape (n_samples, n_outputs) | ||
Multi-output targets. An indicator matrix turns on multilabel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multilabel is not appropriate here.
def __init__(self, estimator, n_jobs=1): | ||
super(MultiOutputRegressor, self).__init__(estimator, n_jobs) | ||
|
||
def partial_fit(self, X, y, sample_weight=None): | ||
""" Fit linear model with Stochastic Gradient Descent.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update
@@ -68,7 +148,8 @@ def fit(self, X, y, sample_weight=None): | |||
""" | |||
|
|||
if not hasattr(self.estimator, "fit"): | |||
raise ValueError("The base estimator should implement a fit method") | |||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These cosmetic fixes make it much harder to review your work.
classes : array, shape (n_classes, n_outputs) | ||
Classes across all calls to partial_fit. | ||
Can be obtained by via | ||
`[np.unique(y[:, i]) for i in range(y.shape[1])]`, where y is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only going to work if there are the same number of classes in each output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code is documentation is going to work even with different number of classes in each output.
but yeah, i need to change https://github.com/scikit-learn/scikit-learn/pull/8054/files#diff-66150694b846268fe58229e40db8b909R122 into classes [i] if classes is not None else None
.
import numpy as np
y = np.array([[0,1,2,0], [0, 1, 0, 1], [1, 0, 1, 0]]).T
In [11]: y
Out[11]:
array([[0, 0, 1],
[1, 1, 0],
[2, 0, 1],
[0, 1, 0]])
[np.unique(y[:, i]) for i in range(y.shape[1])]
[array([0, 1, 2]), array([0, 1]), array([0, 1])]
Multi-output targets. An indicator matrix turns on multilabel | ||
estimation. | ||
|
||
classes : array, shape (n_classes, n_outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should specify, "of int or string" and perhaps should not be array to allow for different numbers of classes in each output.
Data. | ||
|
||
y : (sparse) array-like, shape (n_samples, n_outputs) | ||
Multi-output targets. An indicator matrix turns on multilabel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really turn on anything. Multilabel is binary multioutput by definition.
class MultiOutputEstimator(six.with_metaclass(ABCMeta, BaseEstimator)): | ||
|
||
def __init__(self, estimator, n_jobs=1): | ||
self.estimator = estimator | ||
self.n_jobs = n_jobs | ||
|
||
@if_delegate_has_method('estimator') | ||
def partial_fit(self, X, y, classes=None, sample_weight=None): | ||
""" Fit linear model with Stochastic Gradient Descent.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh?
if first_time: | ||
estimator = clone(estimator) | ||
else: | ||
estimator = copy.copy(estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting implementation detail. I acknowledge that copying like this is the only easy way to maintain consistent behaviour whether or not parallelism is used. However I think it should be a deepcopy
if anything. And I suppose you should ensure this behaviour in tests: that across multiple calls to partial_fit
the same estimator objects are not maintained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think deepcopy is required, it is a waste of memory, the joblib.Parallel is going to take care of it to ensure not job is interfering with each other's input.
d = [0]
a = [d, d, d]
def change_value(x, i):
x[0][0] = i
return x
w = Parallel(n_jobs=1, backend='threading')(delayed(change_value)(a, n) for n in xrange(3))
w
[[0], [0], [0]]
[[1], [1], [1]]
[[2], [2], [2]]
w = Parallel(n_jobs=2, backend='multiprocessing')(delayed(change_value)(a, n) for n in xrange(3))
w
[[0], [0], [0]]
[[1], [1], [1]]
[[2], [2], [2]]
w = []
for n in xrange(3):
w.append(change_value(a, n))
In [67]: w
Out[67]: [[[2], [2], [2]], [[2], [2], [2]], [[2], [2], [2]]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yeah.. the test case need to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
joblib.Parallel will not take care of it in the n_jobs=1 case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I take that back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if joblib is performing a copy regardless, are you sure we need this copy.copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
if first_time: | ||
estimator = clone(estimator) | ||
else: | ||
estimator = copy.copy(estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if joblib is performing a copy regardless, are you sure we need this copy.copy?
class MultiOutputEstimator(six.with_metaclass(ABCMeta, BaseEstimator)): | ||
|
||
def __init__(self, estimator, n_jobs=1): | ||
self.estimator = estimator | ||
self.n_jobs = n_jobs | ||
|
||
@if_delegate_has_method('estimator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we do not need if_delegate_has_method
for fit
. It is required for every estimator. I think in this context we are dealing with predictors and we do not need if_delegate_has_method
for predict
.
We also do not need the scope of the PR to be expanded unnecessarily. Throwing in a decorator means that decoration needs testing for instance.
Sorry for the unnecessary work, @yupbank
class MultiOutputEstimator(six.with_metaclass(ABCMeta, BaseEstimator)): | ||
|
||
def __init__(self, estimator, n_jobs=1): | ||
self.estimator = estimator | ||
self.n_jobs = n_jobs | ||
|
||
@if_delegate_has_method('estimator') | ||
def partial_fit(self, X, y, classes=None, sample_weight=None): | ||
""" Fit the model to data with Stochastic Gradient Descent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, SGD?
def __init__(self, estimator, n_jobs=1): | ||
super(MultiOutputRegressor, self).__init__(estimator, n_jobs) | ||
|
||
def partial_fit(self, X, y, sample_weight=None): | ||
""" Fit the model to data with Stochastic Gradient Descent.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGD?
def test_mutli_output_classifiation_partial_fit_no_first_classes_exception(): | ||
sgd_linear_clf = SGDClassifier(loss='log', random_state=1) | ||
multi_target_linear = MultiOutputClassifier(sgd_linear_clf) | ||
assert_raises_regex(ValueError, "classes must be passed on the first call to partial_fit.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 line length
I looked over the tests yesterday, but can't recall if I checked for one that ensured that
regardless of |
class MultiOutputEstimator(six.with_metaclass(ABCMeta, BaseEstimator)): | ||
|
||
def __init__(self, estimator, n_jobs=1): | ||
self.estimator = estimator | ||
self.n_jobs = n_jobs | ||
|
||
@if_delegate_has_method('estimator') | ||
def partial_fit(self, X, y, classes=None, sample_weight=None): | ||
""" Fit the model to data with SGD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I'm saying is that it's not always SGD. Why do you mention SGD??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh man... i was so obsessed with sgd.. would delete that... i was heavily working on sgd related projects recently, sorry about that...
def __init__(self, estimator, n_jobs=1): | ||
super(MultiOutputRegressor, self).__init__(estimator, n_jobs) | ||
|
||
def partial_fit(self, X, y, sample_weight=None): | ||
""" Fit the model to data with SGD. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I'm saying is that it's not always SGD. Why do you mention SGD??
28031e0
to
59261e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from nitpicks, LGTM.
Please add an "Enhancement" entry in whats_new.rst. Thanks!
class MultiOutputEstimator(six.with_metaclass(ABCMeta, BaseEstimator)): | ||
|
||
def __init__(self, estimator, n_jobs=1): | ||
self.estimator = estimator | ||
self.n_jobs = n_jobs | ||
|
||
@if_delegate_has_method('estimator') | ||
def partial_fit(self, X, y, classes=None, sample_weight=None): | ||
""" Fit the model to data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say "Incrementally fit". Also remove space between quotes and words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR! A few minor comments and questions.
if first_time: | ||
estimator = clone(estimator) | ||
else: | ||
estimator = copy.deepcopy(estimator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Sorry if this question was answered before...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the estimators are cloned, can they not be fitted in parallel? I don't understand why you need to copy them? Maybe I'm missing something...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has been answered. it is to ensure fitting in different "process" on different estimators, and partial fit requires being called multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you will be partial-fitting one estimator per process right? Unless you are using the same estimator in 2 processes, I don't see the problem here... Maybe I'm just completely blind to some detail. Would you be kind to clarify please?
The way I see it - we have one estimator per target. Say there are 2 target, we have est1
and est2
. And if you give n_jobs>=2
. est1
will partial_fit
on process 1 and est2
on process 2 correct? Both of these are cloned from the original est instance and hence should train on the required batch of data without any side effects from parallelism...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh... yeah, when doing the second fit, i have already separated estimators. no clone/copy is needed anymore.
Thanks for the catch 👍
@@ -44,6 +44,10 @@ New features | |||
Enhancements | |||
............ | |||
|
|||
- :class:`multioutput.MultiOutputRegressor` and :class:`multioutput.MultiOutputClassifier` | |||
now support incremental fit using `partial_fit`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"support online learning using partial_fit
" sounds sexy? (nitpick of the highest order. Feel free to ignore)
Multi-output targets. | ||
|
||
classes : list of numpy arrays, shape (n_outputs) | ||
each array is unique classes for one output in str/int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each (capital E)
classes : list of numpy arrays, shape (n_outputs) | ||
each array is unique classes for one output in str/int | ||
Can be obtained by via | ||
`[np.unique(y[:, i]) for i in range(y.shape[1])]`, where y is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backticks
self : object | ||
Returns self. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line can be removed
|
||
def test_multi_output_classification_partial_fit_parallelism(): | ||
sgd_linear_clf = SGDClassifier(loss='log', random_state=1) | ||
mor = MultiOutputClassifier(sgd_linear_clf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is n_jobs
not set for parallel test?
sgd_linear_clf.partial_fit( | ||
X[:half_index], y[:half_index, i], classes=classes[i]) | ||
sgd_linear_clf.partial_fit(X[half_index:], y[half_index:, i]) | ||
assert_equal(list(sgd_linear_clf.predict(X)), list(predictions[:, i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use assert_array_equal
here...
multi_target_linear = MultiOutputClassifier(sgd_linear_clf) | ||
|
||
# train the multi_target_linear and also get the predictions. | ||
half_index = int(X.shape[0] / 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use X.shape[0] // 2
instead of using int
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If not already done, use from __future__ import division
at top of file)
half_index = int(X.shape[0] / 2) | ||
multi_target_linear.partial_fit( | ||
X[:half_index], y[:half_index], classes=classes) | ||
multi_target_linear.partial_fit(X[half_index:], y[half_index:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to test if the predictions are same for both multi_target_linear
and sgd_linear_clf
at each partial fit?
clf = MultiOutputClassifier(sgd_linear_clf) | ||
clf.fit(X, y) | ||
X_test = [[1.5, 2.5, 3.5]] | ||
assert_almost_equal(clf.predict(X_test), clf_w.predict(X_test)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_array_almost_equal
@@ -15,13 +15,15 @@ | |||
# License: BSD 3 clause | |||
|
|||
import numpy as np | |||
import copy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that I'm +1 for merge once CIs pass...
sgd_linear_clf.partial_fit( | ||
X[:half_index], y[:half_index, i], classes=classes[i]) | ||
sgd_linear_clf.partial_fit(X[half_index:], y[half_index:, i]) | ||
assert_array_equal(sgd_linear_clf.predict(X), predictions[:, i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try and put both these blocks in a loop? It'd save us some redundancy in code.
(sorry for the nagging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing :)
either so i'm not sure it is the right thing to do to add that in online learners |
@yupbank Sorry indeed, forget what I said :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot... I'm merging!
Thx @yupbank |
* add partial_fit to multioupt module * fix range in python3 * fix flake8 * fix the comments * fix according to comments * fix lint * remove pytest * fix ValueException message * py 3.5 compatiable classes * fix stuff * fix according the comments * remove used copy * flake8.. * fix docs * eventually, i use deepcopy to ensure the parallel * lint.. * address final comment * fix addressing the comments * update confirmed separate estimators * finally remove copy * compact test
* add partial_fit to multioupt module * fix range in python3 * fix flake8 * fix the comments * fix according to comments * fix lint * remove pytest * fix ValueException message * py 3.5 compatiable classes * fix stuff * fix according the comments * remove used copy * flake8.. * fix docs * eventually, i use deepcopy to ensure the parallel * lint.. * address final comment * fix addressing the comments * update confirmed separate estimators * finally remove copy * compact test
* add partial_fit to multioupt module * fix range in python3 * fix flake8 * fix the comments * fix according to comments * fix lint * remove pytest * fix ValueException message * py 3.5 compatiable classes * fix stuff * fix according the comments * remove used copy * flake8.. * fix docs * eventually, i use deepcopy to ensure the parallel * lint.. * address final comment * fix addressing the comments * update confirmed separate estimators * finally remove copy * compact test
* add partial_fit to multioupt module * fix range in python3 * fix flake8 * fix the comments * fix according to comments * fix lint * remove pytest * fix ValueException message * py 3.5 compatiable classes * fix stuff * fix according the comments * remove used copy * flake8.. * fix docs * eventually, i use deepcopy to ensure the parallel * lint.. * address final comment * fix addressing the comments * update confirmed separate estimators * finally remove copy * compact test
* add partial_fit to multioupt module * fix range in python3 * fix flake8 * fix the comments * fix according to comments * fix lint * remove pytest * fix ValueException message * py 3.5 compatiable classes * fix stuff * fix according the comments * remove used copy * flake8.. * fix docs * eventually, i use deepcopy to ensure the parallel * lint.. * address final comment * fix addressing the comments * update confirmed separate estimators * finally remove copy * compact test
* add partial_fit to multioupt module * fix range in python3 * fix flake8 * fix the comments * fix according to comments * fix lint * remove pytest * fix ValueException message * py 3.5 compatiable classes * fix stuff * fix according the comments * remove used copy * flake8.. * fix docs * eventually, i use deepcopy to ensure the parallel * lint.. * address final comment * fix addressing the comments * update confirmed separate estimators * finally remove copy * compact test
Reference Issue
Fixes: #8053
What does this implement/fix? Explain your changes.
add partial_fit
Any other comments?