[MRG] IterativeImputer: n_iter->max_iter #13061

sergeyf · 2019-01-28T19:56:56Z

This PR addresses a number of discussion, most recently in #11977.

The main purpose of this PR is to provide automatic early stopping. We are going to be using the early stopping rule that is used in the missForest R package:

The stopping criterion γ is met as soon as the difference between the newly imputed data matrix and the previous one increases for the first time with respect to both variable types, if present.

This is a 8000 sensible criterion, and has the added benefit of not needing to specify a tol.

Note that this criterion is not applied when sample_posterior=True because there is no steady state.

This PR does a few more things:

Reorders the parameters list to be a bit more intuitive.
Fixes a bug when sample_posterior=True using truncnormal to sample from the posterior.

sergeyf · 2019-01-28T19:57:49Z

@jnothman @glemaitre

Note that the new test test_iterative_imputer_clip_truncnorm may be currently failing during fit_transform line. This is because the mus are ALREADY outside of min and max clip values, so the truncnorm samples infinities.

I've pushed a proposed fix, but I don't know if it's statistically valid. Opinion appreciated.

jnothman

So this early stopping criterion avoids dangerous chaos, but does not avoid unnecessary fits and possible minor perturbations when, say, the result has converged but is not about to blow out.

jnothman · 2019-01-29T02:13:41Z

sklearn/impute.py

+                          'reached. Using result of round %d' % i_rnd)
+                    break
+                else:
+                    Xt_previous = np.copy(Xt)


I don't get why we're copying in both the if and the else here. I think maybe this one is needed if Xt keeps being updated, but the other is not.

You're right. we can do this in the if branch: Xt = Xt_previous.

jnothman · 2019-01-29T02:16:37Z

sklearn/impute.py

+            # stop early if difference between consecutive imputations goes up.
+            # if so, back off to previous imputation
+            if not self.sample_posterior:
+                norm_diff = np.linalg.norm(Xt - Xt_previous) \


Use parentheses rather than \ for line continuation.

Is this an obvious definition of "the difference between imputations"?

It's not obvious but it's documented here: https://academic.oup.com/bioinformatics/article/28/1/112/219101

I'll add it in a comment.

sergeyf · 2019-01-29T03:20:48Z

sklearn/impute.py

-            imputed_values[~good_sigmas] = mus[~good_sigmas]
-            mus = mus[good_sigmas]
-            sigmas = sigmas[good_sigmas]
+            # two types of problems: (1) non-positive sigmas, (2) mus outside


@jnothman @benlawson This is my proposed solution to the problem of having mus outside of the legal min_value and max_value. However, after implementing this, now this test fails: test_iterative_imputer_truncated_normal_posterior. So something is off here...

OK, what's happening is that the first prediction via mus, sigmas = predictor.predict(X_test, return_std=True) is outside of the min/max so it never gets to the truncated draw, and thus the imputations from imputations = np.array([imputer.transform(X)[0][0] for _ in range(1000)]) are all 0.5.

I changed the random seed, which allows the first draw to be between min/max bounds, and thus results in the test passing, I also turned down 1000 draws to 100 as it seemed unnecessarily slow.

sergeyf · 2019-01-29T03:55:55Z

I agree that So this early stopping criterion avoids dangerous chaos, but does not avoid unnecessary fits and possible minor perturbations when, say, the result has converged but is not about to blow out, but I haven't seen that kind of issue anywhere. All of the current examples are now stopping earlier. I think given that it's in use by another popular imputation package, and I haven't seen empirical evidence of what you're describing, and it conveniently has no tol to tune - it's a good approach.

sergeyf · 2019-01-29T04:53:28Z

One argument against this early stopping criterion is it broke the impute.rst test. It used to be:

    >>> import numpy as np
    >>> from sklearn.impute import IterativeImputer
    >>> imp = IterativeImputer(max_iter=10, random_state=0)
    >>> imp.fit([[1, 2], [3, 6], [4, 8], [np.nan, 3], [7, np.nan]])  # doctest: +NORMALIZE_WHITESPACE
    IterativeImputer(imputation_order='ascending', initial_strategy='mean',
                     max_iter=10, max_value=None, min_value=None,
                     missing_values=nan, n_nearest_features=None, predictor=None,
                     random_state=0, sample_posterior=False, verbose=0)
    >>> X_test = [[np.nan, 2], [6, np.nan], [np.nan, 6]]
    >>> # the model learns that the second feature is double the first
    >>> print(np.round(imp.transform(X_test)))
    [[ 1.  2.]
     [ 6. 12.]
     [ 3.  6.]]

But now, on my machine, early stopping happens after the 2nd iteration and you get:

[[1. 2.]
 [6. 7.]
 [4. 6.]]

Which is clearly wrong based on the training data. OK, I'm open to considering a tol based solution again!

jnothman · 2019-01-29T04:53:30Z

Do you mind if we do some benchmarks? E.g. take Boston, insert some MCAR, and plot the convergence statistic over number of iterations? Could show both the L2 and L∞ norms?

jnothman

Do you think we should issue a ConvergenceWarning if we reach max_iter, or is it not really important for imputation?

I hope to give this a bigger look later/tomorrow.

Please document n_iter_

This should be another list item rather than a note in the above attribute description

jnothman · 2019-01-29T04:55:11Z

sklearn/impute.py

@@ -511,7 +514,7 @@ class IterativeImputer(BaseEstimator, TransformerMixin):
        ``feat_idx`` is the current feature to be imputed,
        ``neighbor_feat_idx`` is the array of other features used to impute the
        current feature, and ``predictor`` is the trained predictor used for
-        the imputation. Length is ``self.n_features_with_missing_ * n_iter``.
+        the imputation. Length is ``self.n_features_with_missing_ * max_iter``.


max_iter -> n_iter_

jnothman · 2019-01-29T04:57:21Z

So that's a problem of stopping too early rather than too late! Hmm... If it's stopping too early, that basically means that it needs more burn-in for the imputed values to find some equilibrium? So maybe what you want is not tol, but something more like n_iter_no_decrease.

sergeyf · 2019-01-29T05:01:08Z

Haha, now you've got me convinced about tol =)

I'll try that for now - it's what you had originally: max(abs(X - Xprev)) < tol

Setting tol as a default of 1e-3 fixes the impute.rst example immediately. Let's see if all the tests pass as well.

sergeyf · 2019-01-29T05:04:53Z

Re: benchmarks. I'll do some tomorrow (it's 9pm here).

Re: convergence warning. Can't hurt. I'll find an example and stick it in here.

jnothman · 2019-01-29T05:44:55Z

Might want to have that max (which is an infty-norm so can maybe use the scipy norm implementation) divided by the norm of the non-missing values of the original X to be invariant to scale.

…

jnothman

It would be nice if the max_iter change here was pulled out from the rest..?

jnothman · 2019-01-29T06:56:23Z

sklearn/impute.py

@@ -511,7 +514,7 @@ class IterativeImputer(BaseEstimator, TransformerMixin):
        ``feat_idx`` is the current feature to be imputed,
        ``neighbor_feat_idx`` is the array of other features used to impute the
        current feature, and ``predictor`` is the trained predictor used for
-        the imputation. Length is ``self.n_features_with_missing_ * n_iter``.
+        the imputation. Length is ``self.n_features_with_missing_ * max_iter``.

    n_features_with_missing_ : int
        Number of features with missing values.


This should be another list item rather than a note in the above attribute description

jnothman · 2019-01-29T06:59:39Z

sklearn/tests/test_impute.py

+                               verbose=1,
+                               random_state=rng)
+    X_filled_100 = imputer.fit_transform(X_missing)
+    assert(len(imputer.imputation_sequence_) == d * 5)


So that the code is clear you can check n_iter_ directly

Can also check that tol=0 results in n_iter_ == max_iter?

and remove the outer brackets

glemaitre · 2019-01-29T13:45:47Z

sklearn/impute.py

-        the imputation. Length is ``self.n_features_with_missing_ * n_iter``.
+        the imputation. Length is ``self.n_features_with_missing_ *
+        self.n_iter_``. ``self.n_iter_`` is the number of iteration rounds that
+        actually occurred, taking early stopping into account.


remove actually

glemaitre · 2019-01-29T13:49:06Z

sklearn/impute.py

+        Maximum number of imputation rounds to perform before returning the
+        imputations computed during the final round. A round is a single
+        imputation of each feature with missing values. The stopping criterion
+        is met once abs(max(X_i - X_{i-1})) < tol. Note that early stopping is


Put the backticks around the formulation of the stopping criterion

glemaitre · 2019-01-29T14:07:22Z

sklearn/impute.py

+            mus_too_high = mus > self._max_value
+            imputed_values[mus_too_high] = self._max_value
+            # the rest can be sampled without statistical issues
+            sample_flag = positive_sigmas & ~mus_too_low & ~mus_too_high


I would use the term mask instead of flag. Maybe inrange_mask

glemaitre · 2019-01-29T14:17:52Z

sklearn/impute.py

+        self.n_iter_ = self.max_iter
+        if not self.sample_posterior:
+            Xt_previous = np.copy(Xt)
+        for i_rnd in range(self.max_iter):


I would prefer a while loop

while criterion > self.tol or self.n_iter_ < self.max_iter

The reasoning is that we already see that we will have an early stopping mechanism

You also need if not self.sample_posterior: so it gets messier. I would prefer to leave as is.

The idiom I have come to enjoy is:

for self.n_iter_ in range(1, self.max_iter + 1): update model if stopping condition is met: break else: warnings.warng(...)

(We have something similar in label_propagation.py, but there the stopping condition applies at the start of the loop)

glemaitre · 2019-01-29T14:20:57Z

sklearn/impute.py

-        for i_rnd in range(self.n_iter):
+        self.n_iter_ = self.max_iter
+        if not self.sample_posterior:
+            Xt_previous = np.copy(Xt)


np.copy(Xt) -> Xt.copy()

glemaitre · 2019-01-29T14:23:01Z

sklearn/impute.py

+                        print('[IterativeImputer] Early stopping criterion '
+                              'reached.')
+                    break
+                else:


you can remove the else:

I don't think that's true. I need to specify that Xt_previous = Xt.copy() otherwise the subsequent diff will be 0. Try it out.

what I mean is

if not self.sample_posterior: inf_norm = np.linalg.norm(Xt - Xt_previous, ord=np.inf, axis=None) if inf_norm < normalized_tol: self.n_iter_ = i_rnd + 1 if self.verbose > 0: print('[IterativeImputer] Early stopping criterion ' 'reached.') break Xt_previous = Xt.copy()

You don't need else since you will brea 10000 k in the if branch

Oh! I thought you meant that the entire else branch was not needed. Yup, you're right. I'll fix it.

glemaitre · 2019-01-29T14:29:19Z

sklearn/impute.py

            return Xt

-        imputations_per_round = len(self.imputation_sequence_) // self.n_iter
+        imps_per_round = len(self.imputation_sequence_) // self.n_iter_


Let the full name, this is more explicit. Split the line into two lines.

glemaitre · 2019-01-29T14:35:21Z

sklearn/tests/test_impute.py

+                               verbose=1,
+                               random_state=rng)
+    X_filled_5 = imputer.fit_transform(X_missing)
+    assert_allclose(X_filled_100, X_filled_5, atol=1e-7)


you can add an assert on n_iter_. The reason is that we were missing (+1) in some estimator when reporting the number of iteration. it could be quite useful.

Sorry, what does this mean? I added this:

imputer = IterativeImputer(max_iter=100, tol=0, sample_posterior=False, verbose=1, random_state=rng) imputer.fit(X_missing) assert imputer.max_iter == imputer.n_iter_

Yes, it is what I meant. Inverse the assert order. Usually expected come last.
assert imputer.n_iter_ == imputer.max_iter

sergeyf · 2019-01-31T18:05:27Z

@jnothman You asked for some convergence plots. I'm plotting inf norm and l2 norm

inf norm is np.linalg.norm(Xt - Xt_previous, ord=np.inf) / np.max(np.abs(X[~mask_missing_values]))

l2 norm is np.linalg.norm(Xt - Xt_previous, ord='fro') / np.linalg.norm(Xt, ord='fro')

Here are two plots. Convergence is quite rapid in both cases, but there are minor improvements in the case of California, while Boston stays flat.

The code that generates these is below, but you can't run it because I had to define custom class variables.

import numpy as np

from sklearn.datasets import load_boston, fetch_california_housing
from sklearn.impute import IterativeImputer

import matplotlib.pyplot as plt


def plot_convergence(X, dataname=''):
    X_missing = X.copy()
    X_missing[np.random.randn(*X.shape) < 0.5] = np.nan
    
    imputer = IterativeImputer(max_iter=25, tol=0).fit(X_missing)
    
    plt.figure(figsize=(12, 6))
    plt.title('IteratveiImputer Convergence for %s Data' %dataname)
    plt.semilogy(imputer.tol_inf)
    plt.semilogy(imputer.tol_l2)
    plt.legend(['Infinity norm of X_i - X_{i-1}', 'L2 norm of X_i - X_{i-1}'])
    plt.show()

X, y = load_boston(return_X_y=True)
plot_convergence(X, 'Boston')

X, y = fetch_california_housing(return_X_y=True)
plot_convergence(X, 'California Housing')

sergeyf · 2019-01-31T21:24:36Z

Also, I know we aren't doing convergence when sample_posterior=True, but I plotted it for that case anyway just to see what's going on. It's weird:

jnothman · 2019-02-04T21:19:32Z

I don't think it's so weird with sample_posterior... it's maybe even converging on a reasonably stable distribution.

sergeyf · 2019-02-04T22:00:28Z

Great. I'll say this is ready for MRG reviews then.

jnothman

This looks very good otherwise!

jnothman · 2019-02-06T10:07:18Z

sklearn/impute.py

+        Maximum number of imputation rounds to perform before returning the
+        imputations computed during the final round. A round is a single
+        imputation of each feature with missing values. The stopping criterion
+        is met once `abs(max(X_i - X_{i-1}))/abs(max(X[known_vals]))` < tol.


you might need to clarify that X_i is X at iteration i. Might be better to use t to index time?

jnothman · 2019-02-06T10:19:57Z

sklearn/impute.py

+        self.n_iter_ = self.max_iter
+        if not self.sample_posterior:
+            Xt_previous = np.copy(Xt)
+        for i_rnd in range(self.max_iter):


The idiom I have come to enjoy is:

for self.n_iter_ in range(1, self.max_iter + 1): update model if stopping condition is met: break else: warnings.warng(...)

(We have something similar in label_propagation.py, but there the stopping condition applies at the start of the loop)

sergeyf · 2019-02-06T19:01:49Z

@jnothman I've addressed your two comments. @glemaitre perhaps you'll take a second look?

glemaitre · 2019-02-06T20:56:54Z

The idiom I have come to enjoy is:

I thought about that pattern and gave up thinking that the else with the for is not so usual. Since that somebody else thought about it, happy to introduce 3E26 it :)

glemaitre

All good. Just three little tests for the coverage

glemaitre · 2019-02-06T21:03:39Z

sklearn/impute.py

@@ -879,7 +907,7 @@ def fit_transform(self, X, y=None):
        self.initial_imputer_ = None
        X, Xt, mask_missing_values = self._initial_imputation(X)

-        if self.n_iter == 0:
+        if self.max_iter == 0:
            return Xt


This line is not covered by a test (as previously). Could you add this case to the unit test?

glemaitre · 2019-02-06T21:05:10Z

sklearn/impute.py

+                                          axis=None)
+                if inf_norm < normalized_tol:
+                    if self.verbose > 0:
+                        print('[IterativeImputer] Early stopping criterion '


We also do not test the testing. Turning to 1 could be enough for a couple of tests. We will ensure that printing does not trigger an error.

glemaitre · 2019-02-06T21:06:24Z

sklearn/impute.py

@@ -942,10 +985,10 @@ def transform(self, X):

        X, Xt, mask_missing_values = self._initial_imputation(X)

-        if self.n_iter == 0:
+        if self.n_iter_ == 0:


This is also not tested. Could we force the n_iter_ and check that we have X == Xt.

sergeyf · 2019-02-08T17:15:24Z

@glemaitre I've added some tests to cover the cases you mentioned.

glemaitre · 2019-02-08T18:35:48Z

That's the error:

[00:15:01] ================================== FAILURES ===================================
[00:15:01] _______________________ test_iterative_imputer_verbose ________________________
[00:15:01] 
[00:15:01]     def test_iterative_imputer_verbose():
[00:15:01]         n = 10
[00:15:01]         d = 3
[00:15:01]         X = sparse_random_matrix(n, d, density=0.10).toarray()
[00:15:01]         imputer = IterativeImputer(missing_values=0, max_iter=1, verbose=1)
[00:15:01] >       imputer.fit(X)
[00:15:01] 
[00:15:01] X          = array([[0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.],
[00:15:01]        [0., 0., 0.]])
[00:15:01] d          = 3
[00:15:01] imputer    = IterativeImputer(imputation_order='ascending', initial_strategy='mean',
[00:15:01]                  max_iter=1, max_value=None, m...earest_features=None, predictor=None, random_state=None,
[00:15:01]                  sample_posterior=False, tol=0.001, verbose=1)
[00:15:01] n          = 10
[00:15:01] 
[00:15:01] c:\python37-x64\lib\site-packages\sklearn\tests\test_impute.py:542: 
[00:15:01] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[00:15:01] c:\python37-x64\lib\site-packages\sklearn\impute.py:1032: in fit
[00:15:01]     self.fit_transform(X)
[00:15:01] c:\python37-x64\lib\site-packages\sklearn\impute.py:930: in fit_transform
[00:15:01]     normalized_tol = self.tol * np.max(np.abs(X[~mask_missing_values]))
[00:15:01] c:\python37-x64\lib\site-packages\numpy\core\fromnumeric.py:2505: in amax
[00:15:01]     initial=initial)
[00:15:01] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[00:15:01] 
[00:15:01] obj = array([], dtype=float64), ufunc = <ufunc 'maximum'>, method = 'max'
[00:15:01] axis = None, dtype = None, out = None
[00:15:01] kwargs = {'initial': <no value>, 'keepdims': <no value>}, passkwargs = {}
[00:15:01] 
[00:15:01]     def _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs):
[00:15:01]         passkwargs = {k: v for k, v in kwargs.items()
[00:15:01]                       if v is not np._NoValue}
[00:15:01]     
[00:15:01]         if type(obj) is not mu.ndarray:
[00:15:01]             try:
[00:15:01]                 reduction = getattr(obj, method)
[00:15:01]             except AttributeError:
[00:15:01]                 pass
[00:15:01]             else:
[00:15:01]                 # This branch is needed for reductions like any which don't
[00:15:01]                 # support a dtype.
[00:15:01]                 if dtype is not None:
[00:15:01]                     return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
[00:15:01]                 else:
[00:15:01]                     return reduction(axis=axis, out=out, **passkwargs)
[00:15:01]     
[00:15:01] >       return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
[00:15:01] E       ValueError: zero-size array to reduction operation maximum which has no identity
[00:15:01] 
[00:15:01] axis       = None
[00:15:01] dtype      = None
[00:15:01] kwargs     = {'initial': <no value>, 'keepdims': <no value>}
[00:15:01] method     = 'max'
[00:15:01] obj        = array([], dtype=float64)
[00:15:01] out        = None
[00:15:01] passkwargs = {}
[00:15:01] ufunc      = <ufunc 'maximum'>

sergeyf · 2019-02-08T20:13:56Z

Weird this passed locally. Will investigate...

…

On Fri, Feb 8, 2019, 10:36 AM Guillaume Lemaitre ***@***.*** wrote: That's the error: [00:15:01] ================================== FAILURES =================================== [00:15:01] _______________________ test_iterative_imputer_verbose ________________________ [00:15:01] [00:15:01] def test_iterative_imputer_verbose(): [00:15:01] n = 10 [00:15:01] d = 3 [00:15:01] X = sparse_random_matrix(n, d, density=0.10).toarray() [00:15:01] imputer = IterativeImputer(missing_values=0, max_iter=1, verbose=1) [00:15:01] > imputer.fit(X) [00:15:01] [00:15:01] X = array([[0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.], [00:15:01] [0., 0., 0.]]) [00:15:01] d = 3 [00:15:01] imputer = IterativeImputer(imputation_order='ascending', initial_strategy='mean', [00:15:01] max_iter=1, max_value=None, m...earest_features=None, predictor=None, random_state=None, [00:15:01] sample_posterior=False, tol=0.001, verbose=1) [00:15:01] n = 10 [00:15:01] [00:15:01] c:\python37-x64\lib\site-packages\sklearn\tests\test_impute.py:542: [00:15:01] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [00:15:01] c:\python37-x64\lib\site-packages\sklearn\impute.py:1032: in fit [00:15:01] self.fit_transform(X) [00:15:01] c:\python37-x64\lib\site-packages\sklearn\impute.py:930: in fit_transform [00:15:01] normalized_tol = self.tol * np.max(np.abs(X[~mask_missing_values])) [00:15:01] c:\python37-x64\lib\site-packages\numpy\core\fromnumeric.py:2505: in amax [00:15:01] initial=initial) [00:15:01] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ [00:15:01] [00:15:01] obj = array([], dtype=float64), ufunc = <ufunc 'maximum'>, method = 'max' [00:15:01] axis = None, dtype = None, out = None [00:15:01] kwargs = {'initial': <no value>, 'keepdims': <no value>}, passkwargs = {} [00:15:01] [00:15:01] def _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs): [00:15:01] passkwargs = {k: v for k, v in kwargs.items() [00:15:01] if v is not np._NoValue} [00:15:01] [00:15:01] if type(obj) is not mu.ndarray: [00:15:01] try: [00:15:01] reduction = getattr(obj, method) [00:15:01] except AttributeError: [00:15:01] pass [00:15:01] else: [00:15:01] # This branch is needed for reductions like any which don't [00:15:01] # support a dtype. [00:15:01] if dtype is not None: [00:15:01] return reduction(axis=axis, dtype=dtype, out=out, **passkwargs) [00:15:01] else: [00:15:01] return reduction(axis=axis, out=out, **passkwargs) [00:15:01] [00:15:01] > return ufunc.reduce(obj, axis, dtype, out, **passkwargs) [00:15:01] E ValueError: zero-size array to reduction operation maximum which has no identity [00:15:01] [00:15:01] axis = None [00:15:01] dtype = None [00:15:01] kwargs = {'initial': <no value>, 'keepdims': <no value>} [00:15:01] method = 'max' [00:15:01] obj = array([], dtype=float64) [00:15:01] out = None [00:15:01] passkwargs = {} [00:15:01] ufunc = <ufunc 'maximum'> — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#13061 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABya7GOGy4DbuqvVi5mzaFQ8ReoXEfPvks5vLcO2gaJpZM4aWqTq> .

sergeyf · 2019-02-08T21:16:37Z

Fascinating - it accidentally made a test matrix that had no known values. I guess I should add a check for that and return without doing anything because you can't make an inference if you have 0 info?

sergeyf · 2019-02-08T21:29:58Z

@glemaitre I fixed that issue and made a few more very small changes to prevent other problems. Plus one more tests.

sklearn/impute.py

Co-Authored-By: sergeyf <sergeyfeldman@gmail.com>

sergeyf · 2019-02-10T23:50:39Z

Thanks, applied.

sergeyf · 2019-02-12T00:49:04Z

@jnothman @glemaitre I renamed predictor to estimator. Nothing else is happening in the latest commit. I'm still waiting on approval from @glemaitre.

glemaitre

Only one nitpick otherwise LGTM

glemaitre · 2019-02-12T11:36:28Z

sklearn/tests/test_impute.py

    d = 3
-    X = sparse_random_matrix(n, d, density=0.10).toarray()
+    X = sparse_random_matrix(n, d, density=0.10, random_state=rng).toarray()
    imputer = IterativeImputer(missing_values=0, max_iter=1, verbose=1)


could you put verbose at 2. I think that we have a case that we want verbose > 1

I did this two lines down (545):

imputer.verbose = 2 imputer.transform(X)

Nevermind, I'll just do it twice.

sergeyf · 2019-02-12T18:19:42Z

@glemaitre Ready to merge!

jnothman · 2019-02-12T20:54:49Z

Thanks once more @sergeyf

[WIP] n_iter->max_iter IterativeImputer

1c60e93

sergeyf added 5 commits January 28, 2019 11:58

flake

cdae9d1

flake

bbd6daa

fixing max_iter bug

f5403f2

proposed fix for truncnorm inf

b722fe8

better test for truncnorm issue

7a68a2a

jnothman reviewed Jan 29, 2019

View reviewed changes

sergeyf commented Jan 29, 2019

View reviewed changes

fixing broken test with a new random seed

c6d8305

sergeyf added 2 commits January 28, 2019 19:56

forgot to remove print

de38807

fix comment

e1f2229

jnothman reviewed Jan 29, 2019

View reviewed changes

overhauling to tol approach

aebd8fb

jnothman reviewed Jan 29, 2019

View reviewed changes

glemaitre reviewed Jan 29, 2019

View reviewed changes

triggering retest

bf03d3b

sergeyf changed the title ~~[WIP] IterativeImputer: n_iter->max_iter~~ [MRG] IterativeImputer: n_iter->max_iter Feb 4, 2019

jnothman reviewed Feb 6, 2019

View reviewed changes

addressing two more comments

30870f3

jnothman approved these changes Feb 6, 2019

View reviewed changes

glemaitre reviewed Feb 6, 2019

View reviewed changes

new tests

e058806

more tests and carefulness

39d37fe

jnothman reviewed Feb 10, 2019

View reviewed changes

sklearn/impute.py Outdated Show resolved Hide resolved

sklearn/impute.py Outdated Show resolved Hide resolved

jnothman and others added 2 commits February 10, 2019 15:49

Update sklearn/impute.py

50b4744

Co-Authored-By: sergeyf <sergeyfeldman@gmail.com>

Update sklearn/impute.py

a160c84

Co-Authored-By: sergeyf <sergeyfeldman@gmail.com>

renaming predictor to estimator

88196d0

glemaitre approved these changes Feb 12, 2019

View reviewed changes

Explicit testing of verbose=1 and verbose=2

a645faf

jnothman merged commit 92e7316 into scikit-learn:iterativeimputer Feb 12, 2019

Uh oh!

[MRG] IterativeImputer: n_iter->max_iter #13061

[MRG] IterativeImputer: n_iter->max_iter #13061

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!