clone should not really deepcopy constructor parameters

As of now, clone systematically triggers a copy of the input parameters:

>>> from sklearn.feature_extraction.text import CountVectorizer
>>> cntvec = CountVectorizer(vocabulary={'g': 0, 'a': 1, 't': 2, 'c': 3})
>>> cntvec.vocabulary is clone(cntvec).vocabulary
False

This could be inefficient on large input datastructures (e.g. for a vocabular with 100k+ tokens instead of 4 as in the previous example).

Furthermore, it forbidden to have estimators that mutate the input params datastructures and we actually have a hash-based check in our official tests:

>>> from sklearn.utils.estimator_checks import check_estimators_overwrite_params
>>> class EvilMutationEstimator(BaseEstimator):
...     def __init__(self, a=np.ones(42)):
...         self.a = a
...     def fit(self, X, y=None):
...         self.a[:] = 0
...         return self
...
>>> check_estimators_overwrite_params('bad', EvilMutationEstimator)
Traceback (most recent call last):
  File "<ipython-input-35-fd40a6bd7830>", line 1, in <module>
    check_estimators_overwrite_params('bad', EvilMutationEstimator)
  File "/Users/ogrisel/code/scikit-learn/sklearn/utils/estimator_checks.py", line 1331, in check_estimators_overwrite_params
    % (name, param_name, original_value, new_value))
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", line 817, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", line 1190, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/unittest/case.py", line 662, in fail
    raise self.failureException(msg)
AssertionError: '7f743b92849194794b6276898d494d6f' != '810ecd082b511abb9c3960f8f4adfee1'
- 7f743b92849194794b6276898d494d6f
+ 810ecd082b511abb9c3960f8f4adfee1
 : Estimator bad should not change or mutate  the parameter a from [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.  1.  1.  1.  1.] to [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.] during fit.

So I as sklearn treats constructor params as immutable data-structures I really do not see the need for clone to do a deepcopy on them.

WDYT?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions