-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
API Proposal for Logging #17439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The issue with this is say you have a pipeline and you want to log operations. Passing this verbose handler to each of its components will be tiresome. Usually one desn't want logging at the estimator level but rather at the application level. i.e. enable logging handler globally with optionally a way to tune how much verbosity we want. |
Good point. In this case we do not need to accept I still think it is useful to have an integer for verbosity. |
Alternatively we could get the default handler in each module eg.
and
and allow users to further configure those handlers if they wish. The verbosity messages would go to |
Having to configure the handler manually will always be possible by grabbing the logger with a string: logger = logging.getLogger('sklearn.svm')
logger.setLoggingLevel(...) My concern is how this complicates how to start logging. Right now we start logging like this: hist = HistGradientBoostingRegressor(verbose=1) If we only have a logging api based option: logger = logging.getLogger('sklearn.ensemble')
logger.setLoggingLevel(logging.INFO)
handler = logging.StreamHandler(sys.stdout)
logger.addHandler(handler)
hist = HistGradientBoostingRegressor() What I propose is this: logger = logging.getLogger(__name__)
class HistGradientBoostingRegressor:
def fit(self, X, y):
if self.verbose > 0:
handler = logging.StreamHandler(sys.stdout)
logger.addHandler(handler)
# maybe even set logging level based on verbose? so |
yes, I think keeping a I think it would be useful to think of logging in terms of callbacks (#16925). This is for instance done that way in keras as far as I can tell e.g. here. This would mean that the logging logic is completely separated from the estimator. |
What I'm proposing is more like: # this adds a stdout handler as wel
sklearn.setLogLevel(logging.DEBUG) # or logging.INFO
# or
sklearn.setLogLevel(sklearn.DEBUG) # not a fan of this one
HistGradientBoostingRegressor(verbose=1) I would rather not force having a |
We already have a config mechanism, so it could be, # globally
from sklearn._callbacks import LoggingCallback
sklearn.set_config(logging=LoggingCallback())
# or
with sklearn.config_context(logging=LoggingCallback()):
...
# or locally
HistGradientBoostingRegressor(verbose=1) # registers the callback interally where the logging callback can accept a number of logging options, including,
I mean there is a bunch of things to configure for logging. So either we hardcode them and hide it from the user (although I'm not sure what should they be hardcoded to) or we need to allow some mechanism to be able to change them. How much of it we should expose I'm not sure. The important part however is now this logging callback is registered on fit in each estimator, and depending on both the |
My question is why would we need a |
Well you need 1 object to call in estimators for printing / logging. Logging exposes several such objects including formatter, stream handler, logger. So we would need some container around it, unless we want to have them in the root namespace, and expect users to monkeypatch/update that for customization. It also happens that do something like progress bars or convergence plots, would need callbacks in the same parts of the code where we do logging. So it makes sense to at least try re-using some of it. Also TBH, whatever we do with logging I'm sure there will be users who will say that it doesn't work with their use case (e.g. matplotlib/matplotlib#13264 discussion). For instance registering a logging handler in scikit-learn is apparently not OK #15122, but we have to do some of it for Also I think that just printing things to stdout in a loggingcallback and not using the logging module, might also be OK for most interactive usage e.g. https://keras.io/api/callbacks/progbar_logger/. Having one container that is not the logging module makes it easier to swap. |
Since none of the models currently use logging, maybe this is rather a stdout vs stderr issue. I'll try to investigate. |
@jeremiedbb with callback API, are we going to / can we deprecate |
Callbacks will be usable for logging but I don't think they'll replace It doesn't change much for cython since we already need to acquire the gil for existing |
By default writing to a logger would not go to STDOUT without a handler. One solution to this is to attach a handler when verbose is an integer and use the logger if it is passed in.
Edit: See also #78.
The text was updated successfully, but these errors were encountered: