DOC Add example showcasing HGBT regression #26991

ArturoAmorQ · 2023-08-02T09:30:33Z

Reference Issues/PRs

Fixes #26826. See also #21967 and #23746 on missing values documentation.

What does this implement/fix? Explain your changes.

This PR adds an example to:

replace the landing-page figure by a simple didactic plot
showcase HGBT nice features such as:
- Quantile regression
- Support of missing values
- Monotonicity constraints
be cross-linked in the documentation
be cross-linked in other examples

Any other comments?

The original issue suggests also demoing support of categorical values, but we already have Categorical Feature Support in Gradient Boosting, which is only linked in the present example as it is a very good example itself.

Indeed, we also have a Monotonic constraints example but it can be merged with the example from this PR.

github-actions · 2023-08-02T09:32:08Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: ea70999. Link to the linter CI: here}

lorentzenchr

A first iteration.

examples/ensemble/plot_hgbt_regression.py

lorentzenchr · 2023-09-04T10:31:55Z

examples/ensemble/plot_hgbt_regression.py

+# %%
+# Notice energy transfer increases systematically during weekends.
+#
+# Effect of number of trees in HistGradientBoostingRegressor


Maybe, we should change this section to highlight early stopping and finding a good pair of (max_iter, learning_rate).
I consider this important as it is shown nowhere and usually the first step to train a HGBT.

If you need help here, just say so.

Maybe, we should change this section to highlight early stopping and finding a good pair of (max_iter, learning_rate)

I guess you mean by doing a grid search over those 2 parameters and exploring the respective n_iter_ attribute, right? Or what do you have in mind?

Grid search is overkill, even not so helpful. Just choose one learning rate, fit the HBGT on the training set and determine the best max_iter via early stopping and some explicit validation_fraction. From there on, use this max_iter and don't use early stopping anymore.
Once, this PR and #27124 are merged, we can add cross validation for this step.

I ran into the issue reported in #25460, as early_stopping uses shuffle=True for the internal validation, which is not the right thing to do when dealing with time series, as done in this example.

Exactly. But for the time being, it is better to "learn" max iter with the current implementation and early stopping than to rely on defaults.
After all, max iter and the learning rate are the most important parameters.

I would just add a comment that it is not optimal for time series ATM.

Addressed in 9a486b8.

examples/ensemble/plot_hgbt_regression.py

…nto hgbt_new_example

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

…n into hgbt_new_example

examples/ensemble/plot_hgbt_regression.py

…nto hgbt_new_example

examples/ensemble/plot_hgbt_regression.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

ArturoAmorQ · 2024-01-17T14:29:32Z

I think I have addressed all the comments. Please let me know otherwise.

examples/ensemble/plot_hgbt_regression.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

…nto hgbt_new_example

glemaitre

This looks already good to me. I'm just wondering that we don't show the automatic way of dealing with categorical data. The variable day is actually a "category" variable. I'm wondering if we could not simplify modify the code and mention that we have this new way to detecting and encoding the categorical variable internally.

glemaitre · 2024-02-19T13:24:17Z

examples/ensemble/plot_hgbt_regression.py

+_ = ax.legend()
+
+# %%
+# With just a few iterations, HGBT models can achieve convergence (see


I think we need a sentence to describe what we see on the figure. 5 iterations is not enough to be able to predict. But at 50, we are already able to do a good job.

The variable day is actually a "category" variable.

The thing is that day is already ordinal encoded, so I don't think we can demo the categorical variables support here.

It does not really matter. Since this is a category columns, the tree will treat it as such.

A potential thing that we could do is to replace the integers by the day as string at the beginning. I don't know if eventually it removes some boilerplate when plotting because we already have the name instead of remapping the integer to the days.

This said, I would be happy to merge this PR as-is because it is a net improvement and see if we can further improve it with this increment.

examples/ensemble/plot_hgbt_regression.py

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…nto hgbt_new_example

…n into hgbt_new_example

ArturoAmorQ · 2024-02-21T13:44:11Z

This said, I would be happy to merge this PR as-is because it is a net improvement and see if we can further improve it with this increment.

Should I push my own green button?

lorentzenchr · 2024-02-21T15:20:35Z

Should I push my own green button?

No. @glemaitre you did the latest review. Fine to merge?

glemaitre · 2024-02-22T11:39:20Z

Yes. Fine To merge. @lorentzenchr do you have further changes (just because the status is not approved on your side).

lorentzenchr · 2024-02-22T17:53:09Z

the status is not approved on your side

I approved a month ago and then you requested a review by me.
I‘ll merge.

glemaitre · 2024-02-22T17:55:55Z

Whoops my bad, I probably misclicked. Sorry. Sent from my iPhoneOn 22 Feb 2024, at 18:53, Christian Lorentzen ***@***.***> wrote: Merged #26991 into main. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

GaelVaroquaux · 2024-02-22T18:01:23Z

Very happy about this! Congratulation to every one involved.

Nitpick: I would remove "advanced" from the title, as I think that many people should look at this example, and not only advanced people.

If y'all agree, I'll send a PR that does nothing else than remove this word 😀 😀

lorentzenchr · 2024-02-22T19:54:43Z

@GaelVaroquaux Please go on and ping me for quick review if you‘d like.

GaelVaroquaux · 2024-02-22T20:08:33Z

Here it is #28508

I was not true to my words :$ I removed a tiny bit more than "advanced" :)

Thanks!!

DOC Add example showcasing HGBT regression

4240074

github-actions bot added the Documentation label Aug 2, 2023

ArturoAmorQ and others added 8 commits August 2, 2023 11:48

Replace the landing-page figure

9728566

Several tweaks

7842e6d

Wording

f5ac584

Add cross-links from other examples

353329d

Use dictionary to define monotonic_cst

1d56abd

Merge branch 'main' into hgbt_new_example

78eda9d

Add cross-links in the documentation

ff89b7c

Change title

543d280

lorentzenchr mentioned this pull request Aug 24, 2023

ENH add X_val and y_val to HGBT.fit #27124

Merged

ArturoAmorQ mentioned this pull request Aug 27, 2023

Add an example showcasing the HistGradientBoosting... #26826

Closed

4 tasks

lorentzenchr reviewed Sep 4, 2023

View reviewed changes

ArturoAmorQ and others added 4 commits September 7, 2023 11:43

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

1550069

…nto hgbt_new_example

Apply suggestions from code review

b77ab5c

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

Merge branch 'hgbt_new_example' of github.com:ArturoAmorQ/scikit-lear…

0126963

…n into hgbt_new_example

Iter on suggestions from code-review

4689b0f

lorentzenchr reviewed Sep 8, 2023

View reviewed changes

examples/ensemble/plot_hgbt_regression.py Outdated Show resolved Hide resolved

ArturoAmorQ added 4 commits October 3, 2023 12:15

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

2685f9b

…nto hgbt_new_example

Remove comment that will no longer be true in v1.4

86f8f67

Address comment from Christian on calibration

35c065a

Address comment from Christian on bias

c3e01fc

lorentzenchr reviewed Oct 3, 2023

View reviewed changes

examples/ensemble/plot_hgbt_regression.py Outdated Show resolved Hide resolved

examples/ensemble/plot_hgbt_regression.py Outdated Show resolved Hide resolved

examples/ensemble/plot_hgbt_regression.py Outdated Show resolved Hide resolved

ArturoAmorQ and others added 6 commits October 4, 2023 14:13

Apply suggestions from code review

093b8dd

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

Iter on suggestions

ff2888f

Silence warning from DataFrame.groupby

7471959

Add discussion on early stopping

9a486b8

Wording

822f3db

Rename instances of hgbt

97cf642

ArturoAmorQ added 5 commits January 17, 2024 15:10

Fix indentation

37bb831

Use programmatic way to round up n_iter

d1b809a

Set random state for deterministic results

5b18755

Add explanation on time-aware cross validation

9499e61

Add comment on overcronstraining feature

3b1789e

glemaitre self-requested a review January 22, 2024 09:59

glemaitre reviewed Jan 22, 2024

View reviewed changes

examples/ensemble/plot_hgbt_regression.py Show resolved Hide resolved

ArturoAmorQ and others added 4 commits January 23, 2024 10:50

Apply suggestion from Guillaume

d972fae

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update examples/ensemble/plot_adaboost_regression.py

9f49ad5

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

Format

d333c6d

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

593c5fb

…nto hgbt_new_example

ArturoAmorQ requested a review from glemaitre February 13, 2024 09:25

Merge branch 'main' into hgbt_new_example

d46cd41

glemaitre approved these changes Feb 19, 2024

View reviewed changes

ArturoAmorQ and others added 5 commits February 19, 2024 14:46

Update examples/ensemble/plot_hgbt_regression.py

c4a79e6

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Fix random_state

1010ecc

Wording as suggested by Guillaume

31db489

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

5e19545

…nto hgbt_new_example

Merge branch 'hgbt_new_example' of github.com:ArturoAmorQ/scikit-lear…

ea70999

…n into hgbt_new_example

lorentzenchr merged commit c826fec into scikit-learn:main Feb 22, 2024

ArturoAmorQ deleted the hgbt_new_example branch February 27, 2024 16:45

Uh oh!

DOC Add example showcasing HGBT regression #26991

DOC Add example showcasing HGBT regression #26991

Uh oh!

Conversation

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

✔️ Linting Passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants