8000 [MRG+2] Add sklearn show_versions() method by aboucaud · Pull Request #11596 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] Add sklearn show_versions() method #11596

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jul 31, 2018

Conversation

aboucaud
Copy link
Contributor

PR that adds a show_versions() method convenient for debugging.

Reference Issues/PRs

Fixes #11522

What does this implement/fix? Explain your changes.

Fork of the pandas.show_versions().
It displays optionally the system BLAS information.

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Could you please provide an example of output (e.g. on your system)?

Also let's add a test that simply runs this. You can use the capsys fixture to capture output (see https://docs.pytest.org/en/2.8.7/capture.html#accessing-captured-output-from-a-test-function) and check that e.g. "numpy" key word is in the output.

("Cython", lambda mod: mod.__version__),
("pandas", lambda mod: mod.__version__),
("matplotlib", lambda mod: mod.__version__),
# ("sphinx", lambda mod: mod.__version__),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove all commented dependencies below.

@@ -0,0 +1,163 @@
"""
Utility methods to print system info for debugging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to make this module private _print_versions

@lesteve
Copy link
Member
lesteve commented Jul 17, 2018

You have a conflict in __init__.py it looks like.

@aboucaud
Copy link
Contributor Author
aboucaud commented Jul 17, 2018

Could you please provide an example on output (e.g. on your system)?

You mean here or in a docstring ?

What it looks like on my computer
System info
-----------
commit: 2d7567f9f7f89ece9b8f540d7b67fadb325ca9c1
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

BLAS info
---------
include_dirs: ['/usr/local/include']
language: c
define_macros: [('HAVE_CBLAS', None), ('ATLAS_INFO', '"\\"3.10.1\\""')]
library_dirs: ['/usr/local/lib']
CBLAS libs: ['ptf77blas', 'ptcblas', 'atlas', 'ptf77blas', 'ptcblas']

Python libs info
----------------
pip: 10.0.1
setuptools: 40.0.0
numpy: 1.14.5
scipy: 1.1.0
Cython: 0.28.4
pandas: 0.23.3
matplotlib: None

You can use the capsys fixture to capture output

Thanks @rth, I'll have a look at it.

@aboucaud
Copy link
Contributor Author

You have a conflict in init.py it looks like.

That's because you're merging on master too fast for me :)
Solved !

@rth
Copy link
Member
rth commented Jul 17, 2018

Here, just so that reviewers could see the result without running the code.

@aboucaud
Copy link
Contributor Author

See above in the clickable part of the comment.

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aboucaud ! There are a few flake8 failures.

I'll add a separate comment about formatting below.

return dict(deps_blob)


def show_versions(as_json=False, with_blas=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a docstring here explaining what this function does.

Let's set with_blas=True by default. Actually I would even remove the with_blas parameter altogether, and always show this information. I can't see a use case where we wouldn't care about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for always putting the blas info

except:
blas_info = {'error': "Could not retrieve BLAS information"}

if (as_json):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for parenthesis.

This branch of the code is not covered, we should add a test for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove the parenthesis, this is not C ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tell that to pandas devs.. I know you have one in the room :-)

print(j)
else:
with codecs.open(as_json, "wb", encoding='utf8') as f:
json.dump(j, f, indent=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we actually need this. Maybe remove the as_json parameter altogether. WDYT @lesteve ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree. This is useful for bug reports, I would not think of it as a programmatic way of accessing this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll just completely clean the code

cblas_libs, blas_info = get_blas_info()
blas_info['cblas_libs'] = cblas_libs
except:
blas_info = {'error': "Could not retrieve BLAS information"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'error' -> 'message'


# get full commit hash
commit = None
if os.path.isdir(".git") and os.path.isdir("sklearn"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code branch is currently not tested. I'm not sure we actually want it. It would be nice, not sure we should bother with it. @lesteve ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is really useful to include the git commit info. In my mind this is useful for users that come with a bug report. In 99.9% of the use cases they are using a released version.

In the case they are not using a released version, then the person can very easily provide this info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok it will remove complexity altogether

@@ -21,6 +21,7 @@
from ..utils.fixes import _Sequence as Sequence
from .deprecation import deprecated
from .. import get_config
from ._print_versions import show_versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we wanted to import this in sklearn/__init__.py instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@rth
Copy link
Member
rth commented Jul 17, 2018

System info
commit: 2d7567f
python: 3.6.5.final.0
python-bits: 64

I would just use sys.version here similarly to what is suggested in http://scikit-learn.org/stable/developers/contributing.html#filing-bugs

>>> import sys; print("Python", sys.version)
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0]

It also gives information about GCC & whether conda is used.

OS: Darwin

similarly, replace this,

OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little

with the output of,

>>> python -c "import platform; print(platform.platform())"
Linux-4.11.6-gentoo-x86_64-Intel-R-_Core-TM-_i5-6200U_CPU_@_2.30GHz-with-gentoo-2.4.1

LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

I would remove this as non essential.

BLAS info
include_dirs: ['/usr/local/include']
language: c
define_macros: [('HAVE_CBLAS', None), ('ATLAS_INFO', '"\"3.10.1\""')]
library_dirs: ['/usr/local/lib']
CBLAS libs: ['ptf77blas', 'ptcblas', 'atlas', 'ptf77blas', 'ptcblas']

I would remove the "language" and "include_dirs" keys. Then for "library_dirs" run ":".join on the list CBLAS libs run, ", ".join on the list. Finally make define macros print,

define_macros: HAVE_CBLAS=None, ATLAS_INFO="\\"3.10.1\\""

(or something similar without all the tuples/lists).

Python libs info
pip: 10.0.1
setuptools: 40.0.0
numpy: 1.14.5
scipy: 1.1.0
Cython: 0.28.4
pandas: 0.23.3
matplotlib: None

I don't think we care about matplotlib in most cases.

@lesteve might disagree with some of these comments.

Copy link
Member
@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.


# get full commit hash
commit = None
if os.path.isdir(".git") and os.path.isdir("sklearn"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is really useful to include the git commit info. In my mind this is useful for users that come with a bug report. In 99.9% of the use cases they are using a released version.

In the case they are not using a released version, then the person can very easily provide this info.

return dict(deps_blob)


def show_versions(as_json=False, with_blas=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for always putting the blas info

except:
blas_info = {'error': "Could not retrieve BLAS information"}

if (as_json):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And remove the parenthesis, this is not C ;-)

print(j)
else:
with codecs.open(as_json, "wb", encoding='utf8') as f:
json.dump(j, f, indent=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree. This is useful for bug reports, I would not think of it as a programmatic way of accessing this information.

@@ -21,6 +21,7 @@
from ..utils.fixes import _Sequence as Sequence
from .deprecation import deprecated
from .. import get_config
from ._print_versions import show_versions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

(sysname, nodename, release,
version, machine, processor) = platform.uname()
blob.extend([
("python", '.'.join(map(str, sys.version_info))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember whether I commented already, but maybe sys.executable would be nice? This way we know whether it is a system/user/conda install. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth mentioned sys.version for the same thing.

Which one is the best ? I don't have an opinion on this.

@aboucaud
Copy link
Contributor Author

thx for the useful comments @rth and @lesteve, I'll work on that tomorrow.

("scipy", lambda mod: mod.version.version),
("Cython", lambda mod: mod.__version__),
("pandas", lambda mod: mod.__version__),
("matplotlib", lambda mod: mod.__version__),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is pandas necessary as it's not a requirement for scikit-learn ?
And shouldn't we add scikit-learn's version ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably nice to have EED3 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas is often used together with matplotlib and would be nice to have to debug ColumnTransformer etc issues.

@aboucaud
Copy link
Contributor Author

latest output

System
------
    python: 3.6.5 (default, Mar 30 2018, 06:42:10)
executable: /Users/aboucaud/codes/python_venvs/sklearn-dev/bin/python3.6
  compiler: [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]
   machine: Darwin-16.7.0-x86_64-i386-64bit

BLAS
----
   message: Could not retrieve BLAS information

Python deps
-----------
       pip: 10.0.1
setuptools: 40.0.0
   sklearn: 0.20.dev0
     numpy: 1.14.5
     scipy: 1.1.0
    Cython: 0.28.4
    pandas: 0.23.3
matplotlib: None

@aboucaud
Copy link
Contributor Author

I don't understand why calling the builtin get_blas_info() sometimes raises an exception and sometimes works.

@aboucaud aboucaud changed the title [WIP] Add sklearn show_versions() method [MRG] Add sklearn show_versions() method Jul 18, 2018
@jeremiedbb
Copy link
Member

I guess we won't ask you to revert now, but in the future, you should not fix existing flake8 or pep8 errors.
I makes the review hard to follow.

@aboucaud
Copy link
Contributor Author

Well I did not..personally. I just have automatic whitespace trimming in my editor to avoid adding errors.

And if it happened, it means these errors were pushed on master recently.. so what is worse ? 😄

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be added to doc/modules/classes.rst

@aboucaud
Copy link
Contributor Author

This should probably be added to doc/modules/classes.rst

Done.

@aboucaud
Copy link
Contributor Author

Ready for review.
cc @lesteve @jakirkham @qinhanmin2014

Copy link
Member
@rth rth left a comment

Choose a reason for hiding this comment

The reason will be 10000 displayed to describe this comment to others. Learn more.

Minor comments below, otherwise LGTM. Thanks @aboucaud !

.. note::

This utility function is only available in scikit-learn v0.20+. For all
previous versions, one has to manually dig such information e.g. snippet::
Copy link
Member
@rth rth Jul 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For previous versions, one has to explicitly run::

     import platform; print(platform.platform())
     import sys; print("Python", sys.version)
     import numpy; print("NumPy", numpy.__version__)
     import scipy; print("SciPy", scipy.__version__)
     import sklearn; print("Scikit-Learn", sklearn.__version__)



def show_versions():
"Print useful debugging information to the terminal"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "to the terminal"

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's name the module _show_versions.py, or just put it in sklearn/_config.py. "print" is unhelpful in the name

("setuptools", lambda mod: mod.__version__),
("sklearn", lambda mod: mod.__version__),
("numpy", lambda mod: mod.__version__),
("scipy", lambda mod: mod.version.version),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get why we can't use __version__ here and hence in all cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can and should...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It slipt off my head. I changed it to avoid the lambda function.

@amueller
Copy link
Member

I vaguely remember someone arguing above to remove matplotlib? I would have included it but no strong opinion. Seems fine to me and we can always iterate easily on this.

This is what this looks like for me btw

System
------
    python: 3.7.0 (default, Jun 28 2018, 13:15:42)  [GCC 7.2.0]
executable: /home/andy/anaconda3/envs/py37/bin/python
   machine: Linux-4.13.0-46-generic-x86_64-with-debian-stretch-sid

BLAS
----
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/andy/anaconda3/envs/py37/lib
cblas_libs: mkl_rt, pthread

Python deps
-----------
       pip: 10.0.1
setuptools: 39.2.0
   sklearn: 0.20.dev0
     numpy: 1.14.5
     scipy: 1.1.0
    Cython: 0.28.3
    pandas: 0.23.3

Is it clear that that means I have mkl? I guess I'm not very good at reading these lol

@amueller
Copy link
Member

This excludes the compiler version, though, right? Earlier the PR had compiler versions. Why did we want that removed? Seems helpful. I think the [GCC 7.2.0] is the compiler the python was compiled with, not the compiler that's available in that env (i.e. what scikit-learn would be compiled with).

@aboucaud
Copy link
Contributor Author
aboucaud commented Jul 24, 2018

Is it clear that that means I have mkl? I guess I'm not very good at reading these lol

I do share the feeling.

Both the name macros and the results SCIPY_MKL_H=None, HAVE_CBLAS=None are not very explicit ; but maybe enough for devs to debug.
I am not really competent regarding BLAS but @rth, @jakirkham or @ogrisel might have a better opinion.

This excludes the compiler version, though, right? Earlier the PR had compiler versions. Why did we want that removed? Seems helpful. I think the [GCC 7.2.0] is the compiler the python was compiled with, not the compiler that's available in that env (i.e. what scikit-learn would be compiled with).

The "compiler" was the Python one (split of the Python info into Python + Python compiler) which failed on Windows and is now included in the same line.

I'm not sure how I would have access to the compiler used to build scikit-learn.

@aboucaud
Copy link
Contributor Author

FYI I'll be afk until Tuesday 31st. Please take over if need be.

10000

@rth
Copy link
Member
rth commented Jul 27, 2018

Is it clear that that means I have mkl? I guess I'm not very good at reading these lol

Both the name macros and the results SCIPY_MKL_H=None, HAVE_CBLAS=None are not very explicit ; but maybe enough for devs to debug.

Yeah, it's not that explicit, mostly cblas_libs: mkl_rt, pthread is enough to make the determination. The advantage is this should work for all the systems where scikit-learn was installed (as the underlying function is used in setup.py). If we wanted to print something more user friendly unless we test it for different BLAS (which we probably won't do), there is no certainty that it won't fail in some configurations. So I think it might be preferable to leave it like this.

I'm not sure how I would have access to the compiler used to build scikit-learn.

We could store show_version result (including the BLAS setup) at build time in a dynamically generated Cython file as suggested in #11596 (comment) but I think we can add that in a subsequent PR. This should already address most cases, I believe.

WDYT?

@rth
Copy link
Member
rth commented Jul 27, 2018

We can add back matplotlib if you prefer..

@rth
Copy link
Member
rth commented Jul 27, 2018

To be more specific, it's possible to determine the most common BLAS implementations by matching cblas_libs contents against this dictionary but again there are cases where it will fail (other BLAS implementations, possibly Windows etc). https://github.com/tomMoral/loky/pull/135 might do a better job of it, but in any case I don't think it's worth the effort here , as we just want some debug information..

@jnothman
Copy link
Member
jnothman commented Jul 29, 2018 via email

Copy link
Member
@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this PR, and people should feel free to merge it. But if there is time, I'd like my comment about the code snippet to be addressed.

@@ -112,7 +112,7 @@ following rules before submitting:
`new algorithm requirements
<http://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms>`_.

- If you are submitting a bug report, we strongly encourage you to follow the guidelines in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's already been mentioned to you, but I am going to say it a second time: such unrelated changes are a very bad practice (they create merge conflicts, and destroy the usefulness of "git blame").

< 10000 td data-line-number="143" class="blob-num blob-num-addition">
@@ -140,6 +140,13 @@ feedback:
your **Python, scikit-learn, numpy, and scipy versions**. This information
can be found by running the following code snippet::

import sklearn; sklearn.show_versions()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have this as a code snippet:

::

    >>> import sklearn
    >>> sklearn.show_versions() # doctests: +SKIP

@GaelVaroquaux GaelVaroquaux changed the title [MRG] Add sklearn show_versions() method [MRG+2] Add sklearn show_versions() method Jul 29, 2018
@jnothman jnothman merged commit b21f9b4 into scikit-learn:master Jul 31, 2018
@jnothman
Copy link
Member
jnothman commented Jul 31, 2018 via email

@aboucaud
Copy link
Contributor Author
aboucaud commented Aug 1, 2018

@jnothman I just noticed I forgot to rename the test file according to the new filename
test_print_versions.py => test_show_versions.py
Shall I push a quick fix in a separate PR ?

@rth
Copy link
Member
rth commented Aug 1, 2018

@aboucaud If you have time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sklearn.show_versions() similar to pandas.show_versions (with numpy blas binding info)
9 participants
0