8000 FIX numerical rounding issue causing test_poisson_zero_nodes to fail at random by ogrisel · Pull Request #18737 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FIX numerical rounding issue causing test_poisson_zero_nodes to fail at random #18737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ogrisel
Copy link
Member
@ogrisel ogrisel commented Nov 2, 2020

test_poisson_zero_nodes is failing at random on test data more complex than usual.

Here is an example of the original failure observed in #18727:

2020-11-02T17:46:34.7932442Z ___________________________ test_poisson_zero_nodes ___________________________
2020-11-02T17:46:34.7932888Z [gw0] win32 -- Python 3.7.9 c:\miniconda\envs\testvenv\python.exe
2020-11-02T17:46:34.7933220Z 
2020-11-02T17:46:34.7933464Z     def test_poisson_zero_nodes():
2020-11-02T17:46:34.7933881Z         # Test that sum(y)=0 and therefore y_pred=0 is forbidden on nodes.
2020-11-02T17:46:34.7934298Z         X = [[0, 0], [0, 1], [0, 2], [0, 3],
2020-11-02T17:46:34.7934611Z              [1, 0], [1, 2], [1, 2], [1, 3]]
2020-11-02T17:46:34.7934969Z         y = [0, 0, 0, 0, 1, 2, 3, 4]
2020-11-02T17:46:34.7935489Z         # Note that X[:, 0] == 0 is a 100% indicator for y == 0. The tree can
2020-11-02T17:46:34.7935835Z         # easily learn that:
2020-11-02T17:46:34.7936209Z         reg = DecisionTreeRegressor(criterion="mse", random_state=1)
2020-11-02T17:46:34.7936520Z         reg.fit(X, y)
2020-11-02T17:46:34.7936870Z         assert np.amin(reg.predict(X)) == 0
2020-11-02T17:46:34.7937250Z         # whereas Poisson must predict strictly positive numbers
2020-11-02T17:46:34.7937669Z         reg = DecisionTreeRegressor(criterion="poisson", random_state=1)
2020-11-02T17:46:34.7937979Z         reg.fit(X, y)
2020-11-02T17:46:34.7938282Z         assert np.all(reg.predict(X) > 0)
2020-11-02T17:46:34.7938635Z     
2020-11-02T17:46:34.7938968Z         # Test additional dataset where something could go wrong.
2020-11-02T17:46:34.7939317Z         n_features = 10
2020-11-02T17:46:34.7939587Z         X, y = datasets.make_regression(
2020-11-02T17:46:34.7939973Z             effective_rank=n_features * 2 // 3, tail_strength=0.6,
2020-11-02T17:46:34.7940279Z             n_samples=1_000,
2020-11-02T17:46:34.7940586Z             n_features=n_features,
2020-11-02T17:46:34.7940918Z             n_informative=n_features * 2 // 3,
2020-11-02T17:46:34.7941596Z         )
2020-11-02T17:46:34.7941874Z         # some excess zeros
2020-11-02T17:46:34.7942137Z         y[(-1 < y) & (y < 0)] = 0
2020-11-02T17:46:34.7942470Z         # make sure the target is positive
2020-11-02T17:46:34.7942786Z         y = np.abs(y)
2020-11-02T17:46:34.7943105Z         reg = DecisionTreeRegressor(criterion='poisson', random_state=42)
2020-11-02T17:46:34.7943477Z         reg.fit(X, y)
2020-11-02T17:46:34.7943785Z >       assert np.all(reg.predict(X) > 0)
2020-11-02T17:46:34.7944091Z E       AssertionError: assert False
2020-11-02T17:46:34.7944894Z E        +  where False = <function all at 0x000002B7FC7ADDC8>(array([2.10771821e+00, 1.65428151e+00, 5.98524378e+00, 3.19025199e+00,\n       4.95470497e+00, 1.16949401e-01, 2.291628...2.90288054e+00, 4.19665946e+00, 3.03705704e+00,\n       2.05303557e+00, 1.43979061e+00, 6.61262364e+00, 6.29637090e-01]) > 0)
2020-11-02T17:46:34.7945967Z E        +    where <function all at 0x000002B7FC7ADDC8> = np.all
2020-11-02T17:46:34.7947272Z E        +    and   array([2.10771821e+00, 1.65428151e+00, 5.98524378e+00, 3.19025199e+00,\n       4.95470497e+00, 1.16949401e-01, 2.291628...2.90288054e+00, 4.19665946e+00, 3.03705704e+00,\n       2.05303557e+00, 1.43979061e+00, 6.61262364e+00, 6.29637090e-01]) = <bound method BaseDecisionTree.predict of DecisionTreeRegressor(criterion='poisson', random_state=42)>(array([[-0.01698191,  0.02468876,  0.00559544, ...,  0.00576349,\n         0.00226511, -0.0235138 ],\n       [ 0.0054320...711, -0.01739591],\n       [-0.02495476, -0.01743518,  0.00785444, ..., -0.00650216,\n         0.02237994,  0.00958636]]))
2020-11-02T17:46:34.7948882Z E        +      where <bound method BaseDecisionTree.predict of DecisionTreeRegressor(criterion='poisson', random_state=42)> = DecisionTreeRegressor(criterion='poisson', random_state=42).predict
2020-11-02T17:46:34.7949395Z 
2020-11-02T17:46:34.7949734Z X          = array([[-0.01698191,  0.02468876,  0.00559544, ...,  0.00576349,
2020-11-02T17:46:34.7951425Z          0.00226511, -0.0235138 ],
2020-11-02T17:46:34.7951887Z        [ 0.0054320...711, -0.01739591],
2020-11-02T17:46:34.7952430Z        [-0.02495476, -0.01743518,  0.00785444, ..., -0.00650216,
2020-11-02T17:46:34.7952797Z          0.02237994,  0.00958636]])
2020-11-02T17:46:34.7953123Z n_features = 10
2020-11-02T17:46:34.7953434Z reg        = DecisionTreeRegressor(criterion='poisson', random_state=42)
2020-11-02T17:46:34.7953910Z y          = array([2.10771821e+00, 1.65428151e+00, 5.98524378e+00, 3.19025199e+00,
2020-11-02T17:46:34.7954426Z        4.95470497e+00, 0.00000000e+00, 2.291628...2.90288054e+00, 4.19665946e+00, 3.03705704e+00,
2020-11-02T17:46:34.7954873Z        2.05303557e+00, 1.43979061e+00, 6.61262364e+00, 0.00000000e+00])

@ogrisel
Copy link
Member Author
ogrisel commented Nov 2, 2020

Actually this might be a real bug @lorentzenchr. We are never supposed to get negative values, whatever the seed of the dataset right?

@ogrisel ogrisel added this to the 0.24 milestone Nov 2, 2020
@ogrisel ogrisel added the Bug label Nov 2, 2020
@lorentzenchr
Copy link
Member

@ogrisel That's right. Very concerning.

@lorentzenchr
Copy link
Member
lorentzenchr commented Nov 2, 2020

I can reproduce on my machine.
The Poisson criterion functions node_impurity return and children_impurity sets INFINITY while proxy_impurity_improvement returns -INFINITY whenever sum(y) <= 0. This should prevent splitting where sum(y) <= 0, but apparently, it doesn't.

The error could still be in the criterion or in the splitter logic. Maybe the splitter does not like INFINITY?

@lorentzenchr
Copy link
Member
lorentzenchr commented Nov 2, 2020

I think the problem sits in proxy_impurity_improvement

cdef double proxy_impurity_improvement(self) nogil:

When I replace it with the code from children_impurity, i.e. without the algebraic simplifications and throwing constant terms apart, the test test_poisson_zero_nodes passes (but 2 other tests fail).

What do we do if we can't fix it in the next 2 days?

8000

@glemaitre
Copy link
Member

What do we do if we can't fix it in the next 2 days?

We will revert it.

@NicolasHug
Copy link
Member

the test test_poisson_zero_nodes passes (but 2 other tests fail).

Which one are the failing tests? Are these critical tests?

@lorentzenchr
Copy link
Member

When I replace it with the code from children_impurity, i.e. without the algebraic simplifications and throwing constant terms apart, the test test_poisson_zero_nodes passes (but 2 other tests fail).

Which one are the failing tests? Are these critical tests?

  • test_diabetes_underfit. Could be increasing max_loss from 30 to 37.
  • test_sparse. No idea why it fails.
  • test_poisson_vs_mse. A dummy regressor, always predicting the mean, should not be better than a tree with poisson criterion. If this test fails something seems odd.
=========================================== FAILURES ============================================
_ test_diabetes_underfit[poisson-15-mean_poisson_deviance-30-DecisionTreeRegressor-DecisionTreeRegressor] _

name = 'DecisionTreeRegressor', Tree = <class 'sklearn.tree._classes.DecisionTreeRegressor'>
criterion = 'poisson', max_depth = 15
metric = <function mean_poisson_deviance at 0x7f9c400a1840>, max_loss = 30

    @skip_if_32bit
    @pytest.mark.parametrize("name, Tree", REG_TREES.items())
    @pytest.mark.parametrize(
        "criterion, max_depth, metric, max_loss",
        [("mse", 15, mean_squared_error, 60),
         ("mae", 20, mean_squared_error, 60),
         ("friedman_mse", 15, mean_squared_error, 60),
         ("poisson", 15, mean_poisson_deviance, 30)]
    )
    def test_diabetes_underfit(name, Tree, criterion, max_depth, metric, max_loss):
        # check consistency of trees when the depth and the number of features are
        # limited
    
        reg = Tree(
            criterion=criterion, max_depth=max_depth,
            max_features=6, random_state=0
        )
        reg.fit(diabetes.data, diabetes.target)
        loss = metric(diabetes.target, reg.predict(diabetes.data))
>       assert 0 < loss < max_loss
E       assert 36.42944729317141 < 30

sklearn/tree/tests/test_tree.py:311: AssertionError
_____________ test_sparse[check_sparse_criterion-sparse-mix-DecisionTreeRegressor] ______________

tree_type = 'DecisionTreeRegressor', dataset = 'sparse-mix'
check = <function check_sparse_criterion at 0x7f9c40898f28>

    @pytest.mark.parametrize("tree_type", SPARSE_TREES)
    @pytest.mark.parametrize("dataset",
                             ["sparse-pos", "sparse-neg", "sparse-mix", "zeros"])
    @pytest.mark.parametrize("check",
                             [check_sparse_parameters, check_sparse_criterion])
    def test_sparse(tree_type, dataset, check):
>       check(tree_type, dataset)

sklearn/tree/tests/test_tree.py:1492: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/tree/tests/test_tree.py:1482: in check_sparse_criterion
    "trees".format(tree))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

d = <sklearn.tree._tree.Tree object at 0x7f9c40fd8ac0>
s = <sklearn.tree._tree.Tree object at 0x7f9c40fd8b28>
message = 'DecisionTreeRegressor with dense and sparse format gave different trees'

    def assert_tree_equal(d, s, message):
        assert s.node_count == d.node_count, (
            "{0}: inequal number of node ({1} != {2})"
            "".format(message, s.node_count, d.node_count))
    
        assert_array_equal(d.children_right, s.children_right,
>                          message + ": inequal children_right")
E       AssertionError: 
E       Arrays are not equal
E       DecisionTreeRegressor with dense and sparse format gave different trees: inequal children_right
E       Mismatched elements: 5 / 7 (71.4%)
E       Max absolute difference: 7
E       Max relative difference: 7.
E        x: array([ 2, -1,  4, -1,  6, -1, -1], dtype=int64)
E        y: array([ 6,  3, -1,  5, -1, -1, -1], dtype=int64)

sklearn/tree/tests/test_tree.py:166: AssertionError
______________________________________ test_poisson_vs_mse ______________________________________

    def test_poisson_vs_mse():
        # For a Poisson distributed target, Poisson loss should give better results
        # than least squares measured in Poisson deviance as metric.
        # We have a similar test, test_poisson(), in
        # sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
        # Note: Some fine tuning was needed to have metric_poi < metric_dummy on
        # the test set!
        rng = np.random.RandomState(42)
        n_train, n_test, n_features = 500, 500, 10
        X = datasets.make_low_rank_matrix(n_samples=n_train + n_test,
                                          n_features=n_features, random_state=rng)
        # We create a log-linear Poisson model and downscale coef as it will get
        # exponentiated.
        coef = rng.uniform(low=-2, high=2, size=n_features) / np.max(X, axis=0)
        y = rng.poisson(lam=np.exp(X @ coef))
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=n_test,
                                                            random_state=rng)
        # We prevent some overfitting by setting min_samples_split=10.
        tree_poi = DecisionTreeRegressor(criterion="poisson",
                                         min_samples_split=10,
                                         random_state=rng)
        tree_mse = DecisionTreeRegressor(criterion="mse",
                                         min_samples_split=10,
                                         random_state=rng)
    
        tree_poi.fit(X_train, y_train)
        tree_mse.fit(X_train, y_train)
        dummy = DummyRegressor(strategy="mean").fit(X_train, y_train)
    
        for X, y, val in [(X_train, y_train, "train"), (X_test, y_test, "test")]:
            metric_poi = mean_poisson_deviance(y, tree_poi.predict(X))
            # mse might produce non-positive predictions => clip
            metric_mse = mean_poisson_deviance(y, np.clip(tree_mse.predict(X),
                                                          1e-15, None))
            metric_dummy = mean_poisson_deviance(y, dummy.predict(X))
            # As MSE might correctly predict 0 in train set, its train score can
            # be better than Poisson. This is no longer the case for the test set.
            if val == "test":
                assert metric_poi < metric_mse
>           assert metric_poi < metric_dummy
E           assert 7.705533301893956 < 5.918121619355619

sklearn/tree/tests/test_tree.py:2063: AssertionError
==================== 3 failed, 616 passed, 47 warnings in 571.44s (0:09:31) =====================

@ogrisel
Copy link
Member Author
ogrisel commented Nov 3, 2020

test_poisson_vs_mse. A dummy regressor, always predicting the mean, should not be better than a tree with poisson criterion. If this test fails something seems odd.

I agree this is fishy, but here we compare on the held-out test set. Maybe it's just a consequence of the finite size of the training and test sets?

Edit: actually the way the test is written, we don't know if this is the training set or the testing set that causes the failure.

@ogrisel ogrisel changed the title Fix the seed of test data in test_poisson_zero_nodes [NoMRG] Test test_poisson_zero_nodes with a range of dataset seeds Nov 3, 2020
@lorentzenchr
Copy link
Member
lorentzenchr commented Nov 3, 2020

Here is what happens:

  • DepthFirstTreeBuilder.build(..) calls BestSplitter.node_split(..) here
    splitter.node_split(impurity, &split, &n_constant_features)
  • BestSplitter.node_split(..) first determines the best split and then calculates the impurities of left and right child for this best split here
    self.criterion.reset()
    self.criterion.update(best.pos)
    self.criterion.children_impurity(&best.impurity_left,
    &best.impurity_right)
    best.improvement = self.criterion.impurity_improvement(
    impurity, best.impurity_left, best.impurity_right)
  • Finally, DepthFirstTreeBuilder.build(..) sets the value for the node here
    splitter.node_value(tree.value + node_id * tree.value_stride)

The Criterion internally uses a y_mean and I would have expected that the exact same value is set in splitter.node_value. But this fails.

Example Debug output:

Poisson.proxy_impurity_improvement(..):  y_mean_left = 1.181091337614762, y_mean_right = 1.1214961976297462  improvement = 0.4691274912474415  (start, pos, end) = (93, 94, 95)

Poisson.proxy_impurity_improvement(..):  y_mean_left = 1.181091337614762, y_mean_right = 1.1214961976297462  improvement = 0.4691274912474415  (start, pos, end) = (93, 94, 95)

Poisson.proxy_impurity_improvement(..):  y_mean_left = 1.181091337614762, y_mean_right = 1.1214961976297462  improvement = 0.4691274912474415  (start, pos, end) = (93, 94, 95)

Poisson.proxy_impurity_improvement(..):  y_mean_left = 1.181091337614762, y_mean_right = 1.1214961976297462  improvement = 0.4691274912474415  (start, pos, end) = (93, 94, 95)

Poisson.children_impurity(..):
    Poisson.poisson_loss(..):  y_mean = 1.181091337614762, p in range(93, 94)  loss = 0.0
    Poisson.poisson_loss(..):  y_mean = 1.1214961976297462, p in range(94, 95)  loss = 2.490221801276218e-16

BestSplitter.node_split(..):
  best.improvement = 7.713012593305286e-07, impurity = 0.0003856506296653888,   best.impurity_left = 0.0,   best.impurity_right = 2.490221801276218e-16

DepthFirstTreeBuilder.build(..):
  impurity = 0.0, improvement = 7.713012593305286e-07   node_value = 0.0
  node_id = 205, feature = 9, threshold = -0.03371891379356384

Why does it call Poisson.proxy_impurity_improvement 4 times on exactly the same samples/split proposals?
Edit: Because of multiple features being tested for the split.

@lorentzenchr
Copy link
Member
lorentzenchr commented Nov 3, 2020

Do we have a test that the implicitly used y_mean in the MSE criterion is the one that is finally set in splitter.node_value? On the training data, the average y of each leaf should be the prediction for that leaf, right?

The code is quite none-trivial. Maybe, I just misunderstood part of it.

@glemaitre
Copy link
Member

If I get it right, it means that at this node/leaf all samples have y=0, isn't it?

@glemaitre
Copy link
Member
glemaitre commented Nov 4, 2020

@lorentzenchr if you wish, we could make a quick call. I could catch up with your debugging and invest some time helping debugging.

@glemaitre
Copy link
Member

I think this is only a numerical error. y_mean <= 0 is not triggered because we are a very small number close to zero.

y_mean=4.440892098500626e-16 start=95 end=98
Poisson.poisson_loss(..):  y_mean = 1.4802973661668753e-16, p in range(95, 98)  loss = 0.0

I will investigate a bit more to check the way we compute this mean to ensure that this is the case.

@glemaitre
Copy link
Member

I would not be surprised that we don't get 0 because we compute sum_right from the subtraction between total and left:

sum_right[k] = sum_total[k] - sum_left[k]

@ogrisel
Copy link
Member Author
ogrisel commented Nov 4, 2020

I think this is only a numerical error. y_mean <= 0 is not triggered because we are a very small number close to zero.

This is the test in scikit-learn/sklearn/tree/_criterion.pyx:poisson_loss right? Indeed that looks brittle for very small values of y_mean caused by rounding error. Comparing to an precision dependent eps looks safe. WDYT @lorentzenchr?

@glemaitre feel free to push directly into this PR to share your experiments to fix this.

@glemaitre glemaitre self-requested a review November 4, 2020 14:20
@lorentzenchr
Copy link
Member

I think this is only a numerical error. y_mean <= 0 is not triggered because we are a very small number close to zero.

This is the test in scikit-learn/sklearn/tree/_criterion.pyx:poisson_loss right? Indeed that looks brittle for very small values of y_mean caused by rounding error. Comparing to an precision dependent eps looks safe. WDYT @lorentzenchr?

When writing this test, I didn't think about rounding errors in sum_right. Relaxing to some small positive value sounds good, ideally between 1* and 10* np.finfo(dtype).eps.

@glemaitre
Copy link
Member

I made the changes with 1 * EPS. Let see if this is enough. It was working locally.

@ogrisel
Copy link
Member Author
ogrisel commented Nov 4, 2020

You can also increase the range of seeds to 1000 or 10000 in this PR just to make sure that we fix the problem.

@glemaitre
Copy link
Member

Yep but it was still failing for some seed.

@glemaitre
Copy link
Member

around 280 times / 10000 times

@glemaitre
Copy link
Member

It seems better now.

Copy link
Member Author
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the fix looks right to me and the extensive seed test look stable. We can now reduce the number of seeds and merge this if @lorentzenchr agrees.

Well done @glemaitre!

Copy link
Member
@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after reducing then range in the test.
Great work @glemaitre, thank you very much!

@ogrisel ogrisel changed the title [NoMRG] Test test_poisson_zero_nodes with a range of dataset seeds [MRG] Test test_poisson_zero_nodes with a range of dataset seeds Nov 4, 2020
@ogrisel ogrisel changed the title [MRG] Test test_poisson_zero_nodes with a range of dataset seeds [MRG] Fix numerical rounding issue causing test_poisson_zero_nodes to fail at random Nov 4, 2020
@ogrisel
Copy link
Member Author
ogrisel commented Nov 4, 2020

I cannot approve this PR because I created it but +1 for merge.

Copy link
Member
@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve my changes :)

@lorentzenchr lorentzenchr changed the title [MRG] Fix numerical rounding issue causing test_poisson_zero_nodes to fail at random FIX numerical rounding issue causing test_poisson_zero_nodes to fail at random Nov 4, 2020
@lorentzenchr lorentzenchr merged commit f0e9d29 into scikit-learn:master Nov 4, 2020
@lorentzenchr
Copy link
Member

As I've introduced this bug 👀, I'm happy to merge the fix😏 @glemaitre and @ogrisel 👏

@ogrisel ogrisel deleted the rng-seed-test_poisson_zero_nodes branch November 4, 2020 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0