8000 Please don't configure logging in the top-level __init__.py · Issue #15122 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Please don't configure logging in the top-level __init__.py #15122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mdickinson opened this issue Oct 3, 2019 · 8 comments
Closed

Please don't configure logging in the top-level __init__.py #15122

mdickinson opened this issue Oct 3, 2019 · 8 comments
Labels
Bug High Priority High priority issues and pull requests
Milestone

Comments

@mdickinson
Copy link

import sklearn creates a StreamHandler and attaches it to the sklearn logger:

logger.addHandler(logging.StreamHandler())

I'm not sure what the motivation for this is, but it's a deviation from the normal "best practices" for logging, namely that libraries should restrict themselves to issuing log messages, but let the application do all logging configuration (setting up handlers, changing logger levels, and the like). There's lots written about this elsewhere, but here's one relevant blog post: http://pieces.openpolitics.com/2012/04/python-logging-best-practices/

In practice, this caused a hard-to-diagnose bug in our IPython- and sklearn-using application (actually, in more than one such application):

  • At application start time, we start an IPython kernel. That kernel swaps out sys.stdout and sys.stderr for its own custom streams, which rely on a lot of fairly complicated machinery (extra threads, ZMQ streams, the asyncio event loop, etc.)
  • sklearn was imported while that IPython kernel was running.
  • The log handler created at import time then picked up IPython's custom sys.stderr stream instead of the usual one.
  • At application stop time, the IPython kernel and associated machinery were stopped.
  • At process exit time, the stream associated to the handler was flushed (by the logging module's shutdown function, which is registered as an atexit handler). Because the IPython machinery was no longer active, we got a hard-to-understand traceback.

If the intent of the handler is to suppress the "No logger configured ..." messages from the std. lib., perhaps a logging.NullHandler could be used for that purpose instead? I'm happy to create a PR for this if the proposed change sounds acceptable.

@mdickinson
Copy link
Author

See #9240 for discussion leading to the introduction of this log configuration.

@jnothman
Copy link
Member
jnothman commented Oct 3, 2019 via email

@rth
Copy link
Member
rth commented Oct 30, 2019

I think using logging overall makes sense, and https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library is an interesting read.

If I understand the problem correctly we would like to use logging but that means that messages below the level of WARNING (i.e. INFO and DEBUG) will be hidden from the user unless they manually set the logging level. For instance, to give an example of library with logging see matplotlib documentation.

That is very different from the current scikit-learn API where estimators have a verbosity parameter. Also sklearn.datasets do use logging, with a custom log handler, hence this issue.

My question is if we use the standard logger and don't manually set/unset logging level how can we make verbosity parameter work as before?

@jnothman
Copy link
Member
jnothman commented Oct 30, 2019 via email

@rth
Copy link
Member
rth commented Oct 30, 2019

Does than mean deprecating the verbose parameter altogether?

Or alternatively setting verbose parameter would then set the logging level for the logger from its module? There is then a question of what happens for the inter-module logging. Say TSNE(verbose=1) uses neighbors module for a large part of the runtime. You could just pass verbose parameter there, but then does that mean that the log level of sklearn.neighbors changed (e.g. if we call KNeighgborsClassifier from somewhere else)? unless we then reset the logging level each time an estimator is called..

Overall logging is nice application level; library level it sounds complicated.

@rth
Copy link
Member
rth commented Oct 30, 2019

Another data point, is that joblib is using logging for inter-process logging, but progress messages from the verbose parameter are still printed directly to stdout/stderr as far as I can tell.

@jnothman jnothman added the High Priority High priority issues and pull requests label Apr 27, 2020
@jnothman
Copy link
Member

I think we should avoid neglecting this for more releases. This bad-citizen code was inserted accidentally.

@jnothman jnothman added this to the 0.23 milestone Apr 27, 2020
@thomasjpfan
Copy link
Member

Resolved by #16451

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug High Priority High priority issues and pull requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0