-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Add sklearn show_versions() method #11596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Could you please provide an example of output (e.g. on your system)?
Also let's add a test that simply runs this. You can use the capsys fixture to capture output (see https://docs.pytest.org/en/2.8.7/capture.html#accessing-captured-output-from-a-test-function) and check that e.g. "numpy" key word is in the output.
sklearn/utils/print_versions.py
Outdated
("Cython", lambda mod: mod.__version__), | ||
("pandas", lambda mod: mod.__version__), | ||
("matplotlib", lambda mod: mod.__version__), | ||
# ("sphinx", lambda mod: mod.__version__), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove all commented dependencies below.
sklearn/utils/print_versions.py
Outdated
@@ -0,0 +1,163 @@ | |||
""" | |||
Utility methods to print system info for debugging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to make this module private _print_versions
You have a conflict in |
You mean here or in a docstring ? What it looks like on my computer
Thanks @rth, I'll have a look at it. |
That's because you're merging on master too fast for me :) |
Here, just so that reviewers could see the result without running the code. |
See above in the clickable part of the comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aboucaud ! There are a few flake8 failures.
I'll add a separate comment about formatting below.
sklearn/utils/_print_versions.py
Outdated
return dict(deps_blob) | ||
|
||
|
||
def show_versions(as_json=False, with_blas=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a docstring here explaining what this function does.
Let's set with_blas=True
by default. Actually I would even remove the with_blas
parameter altogether, and always show this information. I can't see a use case where we wouldn't care about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for always putting the blas info
sklearn/utils/_print_versions.py
Outdated
except: | ||
blas_info = {'error': "Could not retrieve BLAS information"} | ||
|
||
if (as_json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for parenthesis.
This branch of the code is not covered, we should add a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And remove the parenthesis, this is not C ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tell that to pandas devs.. I know you have one in the room :-)
sklearn/utils/_print_versions.py
Outdated
print(j) | ||
else: | ||
with codecs.open(as_json, "wb", encoding='utf8') as f: | ||
json.dump(j, f, indent=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we actually need this. Maybe remove the as_json
parameter altogether. WDYT @lesteve ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree. This is useful for bug reports, I would not think of it as a programmatic way of accessing this information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll just completely clean the code
sklearn/utils/_print_versions.py
Outdated
cblas_libs, blas_info = get_blas_info() | ||
blas_info['cblas_libs'] = cblas_libs | ||
except: | ||
blas_info = {'error': "Could not retrieve BLAS information"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'error'
-> 'message'
sklearn/utils/_print_versions.py
Outdated
|
||
# get full commit hash | ||
commit = None | ||
if os.path.isdir(".git") and os.path.isdir("sklearn"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code branch is currently not tested. I'm not sure we actually want it. It would be nice, not sure we should bother with it. @lesteve ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is really useful to include the git commit info. In my mind this is useful for users that come with a bug report. In 99.9% of the use cases they are using a released version.
In the case they are not using a released version, then the person can very easily provide this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok it will remove complexity altogether
sklearn/utils/__init__.py
Outdated
@@ -21,6 +21,7 @@ | |||
from ..utils.fixes import _Sequence as Sequence | |||
from .deprecation import deprecated | |||
from .. import get_config | |||
from ._print_versions import show_versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we wanted to import this in sklearn/__init__.py
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
I would just use >>> import sys; print("Python", sys.version)
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] It also gives information about GCC & whether conda is used.
similarly, replace this,
with the output of, >>> python -c "import platform; print(platform.platform())"
Linux-4.11.6-gentoo-x86_64-Intel-R-_Core-TM-_i5-6200U_CPU_@_2.30GHz-with-gentoo-2.4.1
I would remove this as non essential.
I would remove the "language" and "include_dirs" keys. Then for "library_dirs" run
(or something similar without all the tuples/lists).
I don't think we care about matplotlib in most cases. @lesteve might disagree with some of these comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
sklearn/utils/_print_versions.py
Outdated
|
||
# get full commit hash | ||
commit = None | ||
if os.path.isdir(".git") and os.path.isdir("sklearn"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is really useful to include the git commit info. In my mind this is useful for users that come with a bug report. In 99.9% of the use cases they are using a released version.
In the case they are not using a released version, then the person can very easily provide this info.
sklearn/utils/_print_versions.py
Outdated
return dict(deps_blob) | ||
|
||
|
||
def show_versions(as_json=False, with_blas=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for always putting the blas info
sklearn/utils/_print_versions.py
Outdated
except: | ||
blas_info = {'error': "Could not retrieve BLAS information"} | ||
|
||
if (as_json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And remove the parenthesis, this is not C ;-)
sklearn/utils/_print_versions.py
Outdated
print(j) | ||
else: | ||
with codecs.open(as_json, "wb", encoding='utf8') as f: | ||
json.dump(j, f, indent=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of agree. This is useful for bug reports, I would not think of it as a programmatic way of accessing this information.
sklearn/utils/__init__.py
Outdated
@@ -21,6 +21,7 @@ | |||
from ..utils.fixes import _Sequence as Sequence | |||
from .deprecation import deprecated | |||
from .. import get_config | |||
from ._print_versions import show_versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
sklearn/utils/_print_versions.py
Outdated
(sysname, nodename, release, | ||
version, machine, processor) = platform.uname() | ||
blob.extend([ | ||
("python", '.'.join(map(str, sys.version_info))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember whether I commented already, but maybe sys.executable
would be nice? This way we know whether it is a system/user/conda install. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth mentioned sys.version
for the same thing.
Which one is the best ? I don't have an opinion on this.
sklearn/utils/_print_versions.py
Outdated
("scipy", lambda mod: mod.version.version), | ||
("Cython", lambda mod: mod.__version__), | ||
("pandas", lambda mod: mod.__version__), | ||
("matplotlib", lambda mod: mod.__version__), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is pandas necessary as it's not a requirement for scikit-learn ?
And shouldn't we add scikit-learn's version ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably nice to have EED3 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas is often used together with matplotlib and would be nice to have to debug ColumnTransformer
etc issues.
latest output
|
I don't understand why calling the builtin |
I guess we won't ask you to revert now, but in the future, you should not fix existing flake8 or pep8 errors. |
Well I did not..personally. I just have automatic whitespace trimming in my editor to avoid adding errors. And if it happened, it means these errors were pushed on master recently.. so what is worse ? 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be added to doc/modules/classes.rst
Done. |
Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be 10000 displayed to describe this comment to others. Learn more.
Minor comments below, otherwise LGTM. Thanks @aboucaud !
doc/developers/contributing.rst
Outdated
.. note:: | ||
|
||
This utility function is only available in scikit-learn v0.20+. For all | ||
previous versions, one has to manually dig such information e.g. snippet:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For previous versions, one has to explicitly run::
import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import sklearn; print("Scikit-Learn", sklearn.__version__)
sklearn/utils/_print_versions.py
Outdated
|
||
|
||
def show_versions(): | ||
"Print useful debugging information to the terminal" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove "to the terminal"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's name the module _show_versions.py, or just put it in sklearn/_config.py. "print" is unhelpful in the name
sklearn/utils/_print_versions.py
Outdated
("setuptools", lambda mod: mod.__version__), | ||
("sklearn", lambda mod: mod.__version__), | ||
("numpy", lambda mod: mod.__version__), | ||
("scipy", lambda mod: mod.version.version), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get why we can't use __version__
here and hence in all cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can and should...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It slipt off my head. I changed it to avoid the lambda
function.
I vaguely remember someone arguing above to remove matplotlib? I would have included it but no strong opinion. Seems fine to me and we can always iterate easily on this. This is what this looks like for me btw
Is it clear that that means I have mkl? I guess I'm not very good at reading these lol |
This excludes the compiler version, though, right? Earlier the PR had compiler versions. Why did we want that removed? Seems helpful. I think the |
I do share the feeling. Both the name
The "compiler" was the Python one (split of the Python info into Python + Python compiler) which failed on Windows and is now included in the same line. I'm not sure how I would have access to the compiler used to build scikit-learn. |
FYI I'll be afk until Tuesday 31st. Please take over if need be. |
Yeah, it's not that explicit, mostly
We could store WDYT? |
We can add back matplotlib if you prefer.. |
To be more specific, it's possible to determine the most common BLAS implementations by matching |
I think we should just make sure this is merged before 0.20, and not
nitpick too much on details initially. We will, in time, discover its
shortcomings... as long as it's better than the status quo, we should
merge.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving this PR, and people should feel free to merge it. But if there is time, I'd like my comment about the code snippet to be addressed.
doc/developers/contributing.rst
Outdated
@@ -112,7 +112,7 @@ following rules before submitting: | |||
`new algorithm requirements | |||
<http://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms>`_. | |||
|
|||
- If you are submitting a bug report, we strongly encourage you to follow the guidelines in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's already been mentioned to you, but I am going to say it a second time: such unrelated changes are a very bad practice (they create merge conflicts, and destroy the usefulness of "git blame").
doc/developers/contributing.rst
Outdated
@@ -140,6 +140,13 @@ feedback: | |||
your **Python, scikit-learn, numpy, and scipy versions**. This information | |||
can be found by running the following code snippet:: | |||
|
|||
< 10000 td data-line-number="143" class="blob-num blob-num-addition"> | import sklearn; sklearn.show_versions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have this as a code snippet:
::
>>> import sklearn
>>> sklearn.show_versions() # doctests: +SKIP
thank you!
|
@jnothman I just noticed I forgot to rename the test file according to the new filename |
@aboucaud If you have time. Thanks! |
PR that adds a
show_versions()
method convenient for debugging.Reference Issues/PRs
Fixes #11522
What does this implement/fix? Explain your changes.
Fork of the
pandas.show_versions()
.It displays optionally the system BLAS information.