FEA Callbacks base infrastructure + progress bars (Alternative to #27663) #28760

jeremiedbb · 2024-04-03T14:33:17Z

Alternative to #27663 based on feedback from the drafting meeting. I'm keeping both open for now for easier comparison.

github-actions · 2024-04-03T14:34:37Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: def2687. Link to the linter CI: here}

jeremiedbb · 2024-06-07T16:45:35Z

@glemaitre and @adrinjalali I think this is ready for reviews. I implemented the changes discussed during some drafting meetings compared to #27663.

A quick summary.

From the point of view of a user:

callback objects are available from sklearn.callback. In this PR, only ProgressBar is implemented.
callbacks are registered to estimators using the set_callbacks(ProgressBar()).
if I want to enable the ProgressBar to all inner steps of my pipeline: ProgressBar(max_estimator_depth=None). Limited to depth=1 by default for performance considerations.

From the point of view of a third party developer of estimators:

Callback support is enabled through the CallbackSupportMixin
A CallbackContext object must be created at the beginning of fit (using init_callback_context from the mixin). This object is then used to create sub-contexts for subtasks, evaluate the callback hooks, and eventually propagate the callbacks to sub-estimators.

From the point of view of a third party developer of callbacks:

Callbacks must follow the CallbackProtocol protocol (3 hooks essentially).
Callbacks that should be propagated to inner estimators must follow an additional protocol.
The 3 hooks, receive different args. Their signature will be refined and better documented in follow-up PRs. One key arg is the node of the current task, which is a TaskNode object and is useful to find at which step the hook was called and take actions upon that.

glemaitre

From my recall from the meeting, it looks what we discussed. I did a pass in the test to have a feeling on the usage from different perspective.

For me this is looking good.

@adrinjalali do you want to have a look at it.

glemaitre · 2024-07-25T09:45:16Z

sklearn/base.py

@@ -130,6 +130,10 @@ def _clone_parametrized(estimator, *, safe=True):

    params_set = new_object.get_params(deep=False)

+    # attach callbacks to the new estimator
+    if hasattr(estimator, "_skl_callbacks"):
+        new_object._skl_callbacks = clone(estimator._skl_callbacks, safe=False)


Reading this line, it makes me think that we should have a test in the callback file to check that clone does what it is supposed to do. Basically, here we are making some assumptions and it would be great that our tests are checking those.

I agree but the only callback in this PR (progress bar) is never cloned because it's always propagated from an outer estimator. So it's not clear what the exact behavior we want for clone.

That's why I'd rather clear that out when implementing the other callbacks and add the appropriate tests then.
(Note that I still need to define a clone somehow currently for tests that check for error messages, independent of the exact behavior of clone though).

I think it makes sense to have a test for equivalence of _skl_callbacks on objects in estimator_checks.py to make sure as we keep developing, things stay consistent, including third parties. I don't mind if that happens in a separate PR, before we merge into main

glemaitre · 2024-07-25T09:54:03Z

sklearn/callback/_progressbar.py

+    max_estimator_depth : int, default=1
+        The maximum number of nested levels of estimators to display progress bars for.
+        By default, only the progress bars of the outermost estimator are displayed.
+        If set to None, all levels are displayed.


Maybe not in this PR but in the branch targetting main, we will need to more documentation. Here, we are missing the attribute and probably an example usage.

I agree, examples will come in subsequent PRs. This param in particular is something that I'd like to improve though (see the AutoPropagatedProtocol). It can be improved in a subsequent PR as well, but definitely before merging in main.

sklearn/callback/tests/test_progressbar.py

adrinjalali

Having a look at this, I like the API for third party developers, but two things here are missing for me:

documentation: I feel the documentation is quite sparse. I would like to be able to understand what classes are for what purpose, what's a task, etc.
API: I feel everything which is not a dev API is private here, but used as a public attribute all around, which makes me uneasy. There should be a better distinction between things only the class itself touches, and things which should be used by outsider classes (yet inside callback infra)

adrinjalali · 2024-07-29T08:13:28Z

sklearn/callback/_callback_context.py

+class CallbackContext:
+    """Task level context for the callbacks.
+
+    This class is responsible for managing the callbacks and task tree of an estimator.


what's a task tree?

I've added more doc in the docstring of the TaskNode class. This class is public and has some public methods because a task node is passed to the callback hooks and can be used by a callback developer to perform different operations depending on the task at hand.

jeremiedbb · 2024-07-29T11:03:11Z

I feel everything which is not a dev API is private here, but used as a public attribute all around,

Can you give me specific examples, I'm not sure what you're talking about ?

adrinjalali · 2024-07-29T12:30:40Z

sklearn/callback/_base.py

+class CallbackProtocol(Protocol):
+    """Protocol for the callbacks"""
+
+    def _on_fit_begin(self, estimator, *, data):


for instance, why are these private?

I made these private because callback classes are public to the end user of scikit-learn but not its method which should only be implemented by callback developers and never called by anyone but scikit-learn internals.

I wanted to avoid these methods to appear on auto-completion in notebooks for instance.

It's hard to differentiate 3 levels of privacy (maintainers, third-party devs and end users) with a binary marker, leading underscore or not 😄

jeremiedbb · 2025-04-01T15:36:12Z

I agree that CallbackContext is still missing documentation. However this is the main, and only really, object needed to implement callback support in an estimator. My goal is to make a subsequent PR with a detailed example on how to implement callback support in an estimator which will better explain what is and how to use the CallbackContext class than just adding more to the docstring. What do you think ? To me this where the focus on doc should be the highest.

adrinjalali

Levels of abstraction, for third party devs at this point we do __sklearn_method_name__ pattern, if they're supposed to implement / override them to modify behavior.

I don't think we should use _method_name much, on things which are called outside the class itself (as in, really treat them as private). But to be able to suggest alternatives, I'd need to better understand this part of the codebase.

Could you maybe add a README file to the callback folder, explaining conceptually the overview of what each object / file is supposed to do? That makes it easier for me to review this, and understand where the scoping challenges are.

adrinjalali · 2025-04-28T09:46:57Z

sklearn/base.py

@@ -130,6 +130,10 @@ def _clone_parametrized(estimator, *, safe=True):

    params_set = new_object.get_params(deep=False)

+    # attach callbacks to the new estimator
+    if hasattr(estimator, "_skl_callbacks"):
+        new_object._skl_callbacks = clone(estimator._skl_callbacks, safe=False)


I think it makes sense to have a test for equivalence of _skl_callbacks on objects in estimator_checks.py to make sure as we keep developing, things stay consistent, including third parties. I don't mind if that happens in a separate PR, before we merge into main

jeremiedbb added 30 commits December 16, 2021 20:08

callback API

272e75f

cln nmf and test reconstruction attributes

584bdf7

cln snapshot + test snapshot + uuid for computation tree

bb32ff3

cln

7a1825d

black

3e3b25f

lint

26dbb69

wip

eb7b824

Merge branch 'master' into callback-api

9b913fd

class

f78442e

more tests

34bab15

cln

596a58e

wip

4f9363c

Merge remote-tracking branch 'upstream/main' into callback-api

030f68b

wip

35c5284

wip

115e184

wip

bdb4990

Merge remote-tracking branch 'upstream/main' into callback-api

d1bb5eb

wip

7a43c30

Merge remote-tracking branch 'upstream/main' into callback-api

573fd5d

wip

a218068

update poor_score

f794694

Merge remote-tracking branch 'upstream/main' into pr/jeremiedbb/22000

ab74f19

wip

37e569b

wip

d7208fa

Merge remote-tracking branch 'upstream/main' into pr/jeremiedbb/22000

774ff69

cln

b8ac1a5

Merge remote-tracking branch 'upstream/main' into pr/jeremiedbb/22000

e544cc4

wip

b644430

wip

3ab3d7f

wip

39c04cc

fix test progressbar

183b74e

jeremiedbb and others added 7 commits April 3, 2024 18:10

begin migrating tests

34ca96d

improve callback context API

a2d4975

Merge branch 'callbacks' into base_callbacks_2

bd18edf

merge

4f2f369

rework callback context internals + migrate tests

61cf977

lint

81824ce

iter

3fa6021

jeremiedbb added the No Changelog Needed label Jun 7, 2024

jeremiedbb added 3 commits June 7, 2024 18:51

add rich to pyproject.toml

82a9d13

fix docstrings and make callback hooks private for the end user

7d41b31

Merge branch 'callbacks' into base_callbacks_2

1e79b73

glemaitre self-requested a review July 25, 2024 08:21

glemaitre approved these changes Jul 25, 2024

View reviewed changes

adrinjalali reviewed Jul 29, 2024

View reviewed changes

glemaitre mentioned this pull request Feb 18, 2025

Add a progress bar to the randomized and grid search #30852

Closed

StefanieSenger mentioned this pull request Feb 19, 2025

Show progress during unfairness mitigation fairlearn/fairlearn#510

Open

jeremiedbb added 3 commits April 1, 2025 14:53

Merge branch 'callbacks' into base_callbacks_2

fc2136c

Merge branch 'callbacks' into base_callbacks_2

fff3301

more doc task node

d573c49

jeremiedbb added 4 commits April 1, 2025 17:40

lint

00e7306

add module

495733e

Merge branch 'callbacks' into base_callbacks_2

90d481c

api ref

def2687

adrinjalali reviewed Apr 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA Callbacks base infrastructure + progress bars (Alternative to #27663) #28760

FEA Callbacks base infrastructure + progress bars (Alternative to #27663) #28760

FEA Callbacks base infrastructure + progress bars (Alternative to #27663) #28760

Are you sure you want to change the base?

FEA Callbacks base infrastructure + progress bars (Alternative to #27663) #28760

Conversation

✔️ Linting Passed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment